-
Semi-supervised Regression Analysis with Model Misspecification and High-dimensional Data
Authors:
Ye Tian,
Peng Wu,
Zhiqiang Tan
Abstract:
The accessibility of vast volumes of unlabeled data has sparked growing interest in semi-supervised learning (SSL) and covariate shift transfer learning (CSTL). In this paper, we present an inference framework for estimating regression coefficients in conditional mean models within both SSL and CSTL settings, while allowing for the misspecification of conditional mean models. We develop an augment…
▽ More
The accessibility of vast volumes of unlabeled data has sparked growing interest in semi-supervised learning (SSL) and covariate shift transfer learning (CSTL). In this paper, we present an inference framework for estimating regression coefficients in conditional mean models within both SSL and CSTL settings, while allowing for the misspecification of conditional mean models. We develop an augmented inverse probability weighted (AIPW) method, employing regularized calibrated estimators for both propensity score (PS) and outcome regression (OR) nuisance models, with PS and OR models being sequentially dependent. We show that when the PS model is correctly specified, the proposed estimator achieves consistency, asymptotic normality, and valid confidence intervals, even with possible OR model misspecification and high-dimensional data. Moreover, by suppressing detailed technical choices, we demonstrate that previous methods can be unified within our AIPW framework. Our theoretical findings are verified through extensive simulation studies and a real-world data application.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Personalized Binomial DAGs Learning with Network Structured Covariates
Authors:
Boxin Zhao,
Weishi Wang,
Dingyuan Zhu,
Ziqi Liu,
Dong Wang,
Zhiqiang Zhang,
Jun Zhou,
Mladen Kolar
Abstract:
The causal dependence in data is often characterized by Directed Acyclic Graphical (DAG) models, widely used in many areas. Causal discovery aims to recover the DAG structure using observational data. This paper focuses on causal discovery with multi-variate count data. We are motivated by real-world web visit data, recording individual user visits to multiple websites. Building a causal diagram c…
▽ More
The causal dependence in data is often characterized by Directed Acyclic Graphical (DAG) models, widely used in many areas. Causal discovery aims to recover the DAG structure using observational data. This paper focuses on causal discovery with multi-variate count data. We are motivated by real-world web visit data, recording individual user visits to multiple websites. Building a causal diagram can help understand user behavior in transitioning between websites, inspiring operational strategy. A challenge in modeling is user heterogeneity, as users with different backgrounds exhibit varied behaviors. Additionally, social network connections can result in similar behaviors among friends. We introduce personalized Binomial DAG models to address heterogeneity and network dependency between observations, which are common in real-world applications. To learn the proposed DAG model, we develop an algorithm that embeds the network structure into a dimension-reduced covariate, learns each node's neighborhood to reduce the DAG search space, and explores the variance-mean relation to determine the ordering. Simulations show our algorithm outperforms state-of-the-art competitors in heterogeneous data. We demonstrate its practical usefulness on a real-world web visit dataset.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Implementing Bayesian inference on a stochastic CO2-based grey-box model for assessing indoor air quality in Canadian primary schools
Authors:
Shujie Yan,
Jiwei Zou,
Chang Shu,
Justin Berquist,
Vincent Brochu,
Marc Veillette,
Danlin Hou,
Caroline Duchaine,
Liang,
Zhou,
Zhiqiang,
Zhai,
Liangzhu,
Wang
Abstract:
The COVID-19 pandemic brought global attention to indoor air quality (IAQ), which is intrinsically linked to clean air change rates. Estimating the air change rate in indoor environments, however, remains challenging. It is primarily due to the uncertainties associated with the air change rate estimation, such as pollutant generation rates, dynamics including weather and occupancies, and the limit…
▽ More
The COVID-19 pandemic brought global attention to indoor air quality (IAQ), which is intrinsically linked to clean air change rates. Estimating the air change rate in indoor environments, however, remains challenging. It is primarily due to the uncertainties associated with the air change rate estimation, such as pollutant generation rates, dynamics including weather and occupancies, and the limitations of deterministic approaches to accommodate these factors. In this study, Bayesian inference was implemented on a stochastic CO2-based grey-box model to infer modeled parameters and quantify uncertainties. The accuracy and robustness of the ventilation rate and CO2 emission rate estimated by the model were confirmed with CO2 tracer gas experiments conducted in an airtight chamber. Both prior and posterior predictive checks (PPC) were performed to demonstrate the advantage of this approach. In addition, uncertainties in real-life contexts were quantified with an incremental variance σ for the Wiener process. This approach was later applied to evaluate the ventilation conditions within two primary school classrooms in Montreal. The Equivalent Clean Airflow Rate (ECAi) was calculated following ASHRAE 241, and an insufficient clean air supply within both classrooms was identified. A supplement of 800 cfm clear air delivery rate (CADR) from air-cleaning devices is recommended for a sufficient ECAi. Finally, steady-state CO2 thresholds (Climit, Ctarget, and Cideal) were carried out to indicate when ECAi requirements could be achieved under various mitigation strategies, such as portable air cleaners and in-room ultraviolet light, with CADR values ranging from 200 to 1000 cfm.
△ Less
Submitted 1 May, 2024; v1 submitted 1 May, 2024;
originally announced May 2024.
-
Overfitting Reduction in Convex Regression
Authors:
Zhiqiang Liao,
Sheng Dai,
Eunji Lim,
Timo Kuosmanen
Abstract:
Convex regression is a method for estimating an unknown function $f_0$ from a data set of $n$ noisy observations when $f_0$ is known to be convex. This method has played an important role in operations research, economics, machine learning, and many other areas. It has been empirically observed that the convex regression estimator produces inconsistent estimates of $f_0$ and extremely large subgra…
▽ More
Convex regression is a method for estimating an unknown function $f_0$ from a data set of $n$ noisy observations when $f_0$ is known to be convex. This method has played an important role in operations research, economics, machine learning, and many other areas. It has been empirically observed that the convex regression estimator produces inconsistent estimates of $f_0$ and extremely large subgradients near the boundary of the domain of $f_0$ as $n$ increases. In this paper, we provide theoretical evidence of this overfitting behaviour. We also prove that the penalised convex regression estimator, one of the variants of the convex regression estimator, exhibits overfitting behaviour. To eliminate this behaviour, we propose two new estimators by placing a bound on the subgradients of the estimated function. We further show that our proposed estimators do not exhibit the overfitting behaviour by proving that (a) they converge to $f_0$ and (b) their subgradients converge to the gradient of $f_0$, both uniformly over the domain of $f_0$ with probability one as $n \rightarrow \infty$. We apply the proposed methods to compute the cost frontier function for Finnish electricity distribution firms and confirm their superior performance in predictive power over some existing methods.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
On semi-supervised estimation using exponential tilt mixture models
Authors:
Ye Tian,
Xinwei Zhang,
Zhiqiang Tan
Abstract:
Consider a semi-supervised setting with a labeled dataset of binary responses and predictors and an unlabeled dataset with only the predictors. Logistic regression is equivalent to an exponential tilt model in the labeled population. For semi-supervised estimation, we develop further analysis and understanding of a statistical approach using exponential tilt mixture (ETM) models and maximum nonpar…
▽ More
Consider a semi-supervised setting with a labeled dataset of binary responses and predictors and an unlabeled dataset with only the predictors. Logistic regression is equivalent to an exponential tilt model in the labeled population. For semi-supervised estimation, we develop further analysis and understanding of a statistical approach using exponential tilt mixture (ETM) models and maximum nonparametric likelihood estimation, while allowing that the class proportions may differ between the unlabeled and labeled data. We derive asymptotic properties of ETM-based estimation and demonstrate improved efficiency over supervised logistic regression in a random sampling setup and an outcome-stratified sampling setup previously used. Moreover, we reconcile such efficiency improvement with the existing semiparametric efficiency theory when the class proportions in the unlabeled and labeled data are restricted to be the same. We also provide a simulation study to numerically illustrate our theoretical findings.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Persistently Trained, Diffusion-assisted Energy-based Models
Authors:
Xinwei Zhang,
Zhiqiang Tan,
Zhijian Ou
Abstract:
Maximum likelihood (ML) learning for energy-based models (EBMs) is challenging, partly due to non-convergence of Markov chain Monte Carlo.Several variations of ML learning have been proposed, but existing methods all fail to achieve both post-training image generation and proper density estimation. We propose to introduce diffusion data and learn a joint EBM, called diffusion assisted-EBMs, throug…
▽ More
Maximum likelihood (ML) learning for energy-based models (EBMs) is challenging, partly due to non-convergence of Markov chain Monte Carlo.Several variations of ML learning have been proposed, but existing methods all fail to achieve both post-training image generation and proper density estimation. We propose to introduce diffusion data and learn a joint EBM, called diffusion assisted-EBMs, through persistent training (i.e., using persistent contrastive divergence) with an enhanced sampling algorithm to properly sample from complex, multimodal distributions. We present results from a 2D illustrative experiment and image experiments and demonstrate that, for the first time for image data, persistently trained EBMs can {\it simultaneously} achieve long-run stability, post-training image generation, and superior out-of-distribution detection.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
Understanding Accelerated Gradient Methods: Lyapunov Analyses and Hamiltonian Assisted Interpretations
Authors:
Penghui Fu,
Zhiqiang Tan
Abstract:
We formulate two classes of first-order algorithms more general than previously studied for minimizing smooth and strongly convex or, respectively, smooth and convex functions. We establish sufficient conditions, via new discrete Lyapunov analyses, for achieving accelerated convergence rates which match Nesterov's methods in the strongly and general convex settings. Next, we study the convergence…
▽ More
We formulate two classes of first-order algorithms more general than previously studied for minimizing smooth and strongly convex or, respectively, smooth and convex functions. We establish sufficient conditions, via new discrete Lyapunov analyses, for achieving accelerated convergence rates which match Nesterov's methods in the strongly and general convex settings. Next, we study the convergence of limiting ordinary differential equations (ODEs) and point out currently notable gaps between the convergence properties of the corresponding algorithms and ODEs. Finally, we propose a novel class of discrete algorithms, called the Hamiltonian assisted gradient method, directly based on a Hamiltonian function and several interpretable operations, and then demonstrate meaningful and unified interpretations of our acceleration conditions.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
Using Overlap Weights to Address Extreme Propensity Scores in Estimating Restricted Mean Counterfactual Survival Times
Authors:
Zhiqiang Cao,
Lama Ghazi,
Claudia Mastrogiacomo,
Laura Forastiere,
F. Perry Wilson,
Fan Li
Abstract:
While the inverse probability of treatment weighting (IPTW) is a commonly used approach for treatment comparisons in observational data, the resulting estimates may be subject to bias and excessively large variance when there is lack of overlap in the propensity score distributions. By smoothly down-weighting the units with extreme propensity scores, overlap weighting (OW) can help mitigate the bi…
▽ More
While the inverse probability of treatment weighting (IPTW) is a commonly used approach for treatment comparisons in observational data, the resulting estimates may be subject to bias and excessively large variance when there is lack of overlap in the propensity score distributions. By smoothly down-weighting the units with extreme propensity scores, overlap weighting (OW) can help mitigate the bias and variance issues associated with IPTW. Although theoretical and simulation results have supported the use of OW with continuous and binary outcomes, its performance with right-censored survival outcomes remains to be further investigated, especially when the target estimand is defined based on the restricted mean survival time (RMST)-a clinically meaningful summary measure free of the proportional hazards assumption. In this article, we combine propensity score weighting and inverse probability of censoring weighting to estimate the restricted mean counterfactual survival times, and propose computationally-efficient variance estimators. We conduct simulations to compare the performance of IPTW, trimming, and OW in terms of bias, variance, and 95% confidence interval coverage, under various degrees of covariate overlap. Regardless of overlap, we demonstrate the advantage of OW over IPTW and trimming methods in bias, variance, and coverage when the estimand is defined based on RMST.
△ Less
Submitted 10 February, 2024; v1 submitted 1 April, 2023;
originally announced April 2023.
-
Block-wise Primal-dual Algorithms for Large-scale Doubly Penalized ANOVA Modeling
Authors:
Penghui Fu,
Zhiqiang Tan
Abstract:
For multivariate nonparametric regression, doubly penalized ANOVA modeling (DPAM) has recently been proposed, using hierarchical total variations (HTVs) and empirical norms as penalties on the component functions such as main effects and multi-way interactions in a functional ANOVA decomposition of the underlying regression function. The two penalties play complementary roles: the HTV penalty prom…
▽ More
For multivariate nonparametric regression, doubly penalized ANOVA modeling (DPAM) has recently been proposed, using hierarchical total variations (HTVs) and empirical norms as penalties on the component functions such as main effects and multi-way interactions in a functional ANOVA decomposition of the underlying regression function. The two penalties play complementary roles: the HTV penalty promotes sparsity in the selection of basis functions within each component function, whereas the empirical-norm penalty promotes sparsity in the selection of component functions. We adopt backfitting or block minimization for training DPAM, and develop two suitable primal-dual algorithms, including both batch and stochastic versions, for updating each component function in single-block optimization. Existing applications of primal-dual algorithms are intractable in our setting with both HTV and empirical-norm penalties. Through extensive numerical experiments, we demonstrate the validity and advantage of our stochastic primal-dual algorithms, compared with their batch versions and a previous active-set algorithm, in large-scale scenarios.
△ Less
Submitted 19 October, 2022;
originally announced October 2022.
-
Convex Support Vector Regression
Authors:
Zhiqiang Liao,
Sheng Dai,
Timo Kuosmanen
Abstract:
Nonparametric regression subject to convexity or concavity constraints is increasingly popular in economics, finance, operations research, machine learning, and statistics. However, the conventional convex regression based on the least squares loss function often suffers from overfitting and outliers. This paper proposes to address these two issues by introducing the convex support vector regressi…
▽ More
Nonparametric regression subject to convexity or concavity constraints is increasingly popular in economics, finance, operations research, machine learning, and statistics. However, the conventional convex regression based on the least squares loss function often suffers from overfitting and outliers. This paper proposes to address these two issues by introducing the convex support vector regression (CSVR) method, which effectively combines the key elements of convex regression and support vector regression. Numerical experiments demonstrate the performance of CSVR in prediction accuracy and robustness that compares favorably with other state-of-the-art methods.
△ Less
Submitted 26 September, 2022;
originally announced September 2022.
-
Model-assisted sensitivity analysis for treatment effects under unmeasured confounding via regularized calibrated estimation
Authors:
Zhiqiang Tan
Abstract:
Consider sensitivity analysis for estimating average treatment effects under unmeasured confounding, assumed to satisfy a marginal sensitivity model. At the population level, we provide new representations for the sharp population bounds and doubly robust estimating functions, recently derived by Dorn, Guo, and Kallus. We also derive new, relaxed population bounds, depending on weighted linear out…
▽ More
Consider sensitivity analysis for estimating average treatment effects under unmeasured confounding, assumed to satisfy a marginal sensitivity model. At the population level, we provide new representations for the sharp population bounds and doubly robust estimating functions, recently derived by Dorn, Guo, and Kallus. We also derive new, relaxed population bounds, depending on weighted linear outcome quantile regression. At the sample level, we develop new methods and theory for obtaining not only doubly robust point estimators for the relaxed population bounds with respect to misspecification of a propensity score model or an outcome mean regression model, but also model-assisted confidence intervals which are valid if the propensity score model is correctly specified, but the outcome quantile and mean regression models may be misspecified. The relaxed population bounds reduce to the sharp bounds if outcome quantile regression is correctly specified. For a linear outcome mean regression model, the confidence intervals are also doubly robust. Our methods involve regularized calibrated estimation, with Lasso penalties but carefully chosen loss functions, for fitting propensity score and outcome mean and quantile regression models. We present a simulation study and an empirical application to an observational study on the effects of right heart catheterization.
△ Less
Submitted 22 September, 2022;
originally announced September 2022.
-
Characterizing player's playing styles based on Player Vectors for each playing position in the Chinese Football Super League
Authors:
Yuesen Li,
Shouxin Zong,
Yanfei Shen,
Zhiqiang Pu,
Miguel-Ángel Gómez,
Yixiong Cui
Abstract:
Characterizing playing style is important for football clubs on scouting, monitoring and match preparation. Previous studies considered a player's style as a combination of technical performances, failing to consider the spatial information. Therefore, this study aimed to characterize the playing styles of each playing position in the Chinese Football Super League (CSL) matches, integrating a rece…
▽ More
Characterizing playing style is important for football clubs on scouting, monitoring and match preparation. Previous studies considered a player's style as a combination of technical performances, failing to consider the spatial information. Therefore, this study aimed to characterize the playing styles of each playing position in the Chinese Football Super League (CSL) matches, integrating a recently adopted Player Vectors framework. Data of 960 matches from 2016-2019 CSL were used. Match ratings, and ten types of match events with the corresponding coordinates for all the lineup players whose on-pitch time exceeded 45 minutes were extracted. Players were first clustered into 8 positions. A player vector was constructed for each player in each match based on the Player Vectors using Nonnegative Matrix Factorization (NMF). Another NMF process was run on the player vectors to extract different types of playing styles. The resulting player vectors discovered 18 different playing styles in the CSL. Six performance indicators of each style were investigated to observe their contributions. In general, the playing styles of forwards and midfielders are in line with football performance evolution trends, while the styles of defenders should be reconsidered. Multifunctional playing styles were also found in high rated CSL players.
△ Less
Submitted 7 July, 2022; v1 submitted 5 May, 2022;
originally announced May 2022.
-
NetRCA: An Effective Network Fault Cause Localization Algorithm
Authors:
Chaoli Zhang,
Zhiqiang Zhou,
Yingying Zhang,
Linxiao Yang,
Kai He,
Qingsong Wen,
Liang Sun
Abstract:
Localizing the root cause of network faults is crucial to network operation and maintenance. However, due to the complicated network architectures and wireless environments, as well as limited labeled data, accurately localizing the true root cause is challenging. In this paper, we propose a novel algorithm named NetRCA to deal with this problem. Firstly, we extract effective derived features from…
▽ More
Localizing the root cause of network faults is crucial to network operation and maintenance. However, due to the complicated network architectures and wireless environments, as well as limited labeled data, accurately localizing the true root cause is challenging. In this paper, we propose a novel algorithm named NetRCA to deal with this problem. Firstly, we extract effective derived features from the original raw data by considering temporal, directional, attribution, and interaction characteristics. Secondly, we adopt multivariate time series similarity and label propagation to generate new training data from both labeled and unlabeled data to overcome the lack of labeled samples. Thirdly, we design an ensemble model which combines XGBoost, rule set learning, attribution model, and graph algorithm, to fully utilize all data information and enhance performance. Finally, experiments and analysis are conducted on the real-world dataset from ICASSP 2022 AIOps Challenge to demonstrate the superiority and effectiveness of our approach.
△ Less
Submitted 6 March, 2022; v1 submitted 22 February, 2022;
originally announced February 2022.
-
Imputation Maximization Stochastic Approximation with Application to Generalized Linear Mixed Models
Authors:
Zexi Song,
Zhiqiang Tan
Abstract:
Generalized linear mixed models are useful in studying hierarchical data with possibly non-Gaussian responses. However, the intractability of likelihood functions poses challenges for estimation. We develop a new method suitable for this problem, called imputation maximization stochastic approximation (IMSA). For each iteration, IMSA first imputes latent variables/random effects, then maximizes ov…
▽ More
Generalized linear mixed models are useful in studying hierarchical data with possibly non-Gaussian responses. However, the intractability of likelihood functions poses challenges for estimation. We develop a new method suitable for this problem, called imputation maximization stochastic approximation (IMSA). For each iteration, IMSA first imputes latent variables/random effects, then maximizes over the complete data likelihood, and finally moves the estimate towards the new maximizer while preserving a proportion of the previous value. The limiting point of IMSA satisfies a self-consistency property and can be less biased in finite samples than the maximum likelihood estimator solved by score-equation based stochastic approximation (ScoreSA). Numerically, IMSA can also be advantageous over ScoreSA in achieving more stable convergence and respecting the parameter ranges under various transformations such as nonnegative variance components. This is corroborated through our simulation studies where IMSA consistently outperforms ScoreSA.
△ Less
Submitted 24 January, 2022;
originally announced January 2022.
-
High-dimensional model-assisted inference for treatment effects with multi-valued treatments
Authors:
Wenfu Xu,
Zhiqiang Tan
Abstract:
Consider estimation of average treatment effects with multi-valued treatments using augmented inverse probability weighted (IPW) estimators, depending on outcome regression and propensity score models in high-dimensional settings. These regression models are often fitted by regularized likelihood-based estimation, while ignoring how the fitted functions are used in the subsequent inference about t…
▽ More
Consider estimation of average treatment effects with multi-valued treatments using augmented inverse probability weighted (IPW) estimators, depending on outcome regression and propensity score models in high-dimensional settings. These regression models are often fitted by regularized likelihood-based estimation, while ignoring how the fitted functions are used in the subsequent inference about the treatment parameters. Such separate estimation can be associated with known difficulties in existing methods. We develop regularized calibrated estimation for fitting propensity score and outcome regression models, where sparsity-including penalties are employed to facilitate variable selection but the loss functions are carefully chosen such that valid confidence intervals can be obtained under possible model misspecification. Unlike in the case of binary treatments, the usual augmented IPW estimator is generalized by allowing different copies of coefficient estimators in outcome regression to ensure just-identification. For propensity score estimation, the new loss function and estimating functions are directly tied to achieving covariate balance between weighted treatment groups. We develop practical numerical algorithms for computing the regularized calibrated estimators with group Lasso by innovatively exploiting Fisher scoring, and provide rigorous high-dimensional analysis for the resulting augmented IPW estimators under suitable sparsity conditions, while tackling technical issues absent or overlooked in previous analyses. We present simulation studies and an empirical application to estimate the effects of maternal smoking on birth weights. The proposed methods are implemented in the R package mRCAL.
△ Less
Submitted 23 January, 2022;
originally announced January 2022.
-
Tractable and Near-Optimal Adversarial Algorithms for Robust Estimation in Contaminated Gaussian Models
Authors:
Ziyue Wang,
Zhiqiang Tan
Abstract:
Consider the problem of simultaneous estimation of location and variance matrix under Huber's contaminated Gaussian model. First, we study minimum $f$-divergence estimation at the population level, corresponding to a generative adversarial method with a nonparametric discriminator and establish conditions on $f$-divergences which lead to robust estimation, similarly to robustness of minimum distan…
▽ More
Consider the problem of simultaneous estimation of location and variance matrix under Huber's contaminated Gaussian model. First, we study minimum $f$-divergence estimation at the population level, corresponding to a generative adversarial method with a nonparametric discriminator and establish conditions on $f$-divergences which lead to robust estimation, similarly to robustness of minimum distance estimation. More importantly, we develop tractable adversarial algorithms with simple spline discriminators, which can be implemented via nested optimization such that the discriminator parameters can be fully updated by maximizing a concave objective function given the current generator. The proposed methods are shown to achieve minimax optimal rates or near-optimal rates depending on the $f$-divergence and the penalty used. This is the first time such near-optimal error rates are established for adversarial algorithms with linear discriminators under Huber's contamination model. We present simulation studies to demonstrate advantages of the proposed methods over classic robust estimators, pairwise methods, and a generative adversarial method with neural network discriminators.
△ Less
Submitted 6 August, 2022; v1 submitted 23 December, 2021;
originally announced December 2021.
-
On Irreversible Metropolis Sampling Related to Langevin Dynamics
Authors:
Zexi Song,
Zhiqiang Tan
Abstract:
There has been considerable interest in designing Markov chain Monte Carlo algorithms by exploiting numerical methods for Langevin dynamics, which includes Hamiltonian dynamics as a deterministic case. A prominent approach is Hamiltonian Monte Carlo (HMC), where a leapfrog discretization of Hamiltonian dynamics is employed. We investigate a recently proposed class of irreversible sampling algorith…
▽ More
There has been considerable interest in designing Markov chain Monte Carlo algorithms by exploiting numerical methods for Langevin dynamics, which includes Hamiltonian dynamics as a deterministic case. A prominent approach is Hamiltonian Monte Carlo (HMC), where a leapfrog discretization of Hamiltonian dynamics is employed. We investigate a recently proposed class of irreversible sampling algorithms, called Hamiltonian assisted Metropolis sampling (HAMS), which uses an augmented target density similarly as in HMC, but involves a flexible proposal scheme and a carefully formulated acceptance-rejection scheme to achieve generalized reversibility. We show that as the step size tends to 0, the HAMS proposal satisfies a class of stochastic differential equations including Langevin dynamics as a special case. We provide theoretical results for HAMS under the univariate Gaussian setting, including the stationary variance, the expected acceptance rate, and the spectral radius. From these results, we derive default choices of tuning parameters for HAMS, such that only the step size needs to be tuned in applications. Various relatively recent algorithms for Langevin dynamics are also shown to fall in the class of HAMS proposals up to negligible differences. Our numerical experiments on sampling high-dimensional latent variables confirm that the HAMS algorithms consistently achieve superior performance, compared with several Metropolis-adjusted algorithms based on popular integrators of Langevin dynamics.
△ Less
Submitted 5 June, 2021;
originally announced June 2021.
-
Model-Assisted Inference for Covariate-Specific Treatment Effects with High-dimensional Data
Authors:
Peng Wu,
Zhiqiang Tan,
Wenjie Hu,
Xiao-Hua Zhou
Abstract:
Covariate-specific treatment effects (CSTEs) represent heterogeneous treatment effects across subpopulations defined by certain selected covariates. In this article, we consider marginal structural models where CSTEs are linearly represented using a set of basis functions of the selected covariates. We develop a new approach in high-dimensional settings to obtain not only doubly robust point estim…
▽ More
Covariate-specific treatment effects (CSTEs) represent heterogeneous treatment effects across subpopulations defined by certain selected covariates. In this article, we consider marginal structural models where CSTEs are linearly represented using a set of basis functions of the selected covariates. We develop a new approach in high-dimensional settings to obtain not only doubly robust point estimators of CSTEs, but also model-assisted confidence intervals, which are valid when a propensity score model is correctly specified but an outcome regression model may be misspecified. With a linear outcome model and subpopulations defined by discrete covariates, both point estimators and confidence intervals are doubly robust for CSTEs. In contrast, confidence intervals from existing high-dimensional methods are valid only when both the propensity score and outcome models are correctly specified. We establish asymptotic properties of the proposed point estimators and the associated confidence intervals. We present simulation studies and empirical applications which demonstrate the advantages of the proposed method compared with competing ones.
△ Less
Submitted 24 May, 2021;
originally announced May 2021.
-
Consistent and robust inference in hazard probability and odds models with discrete-time survival data
Authors:
Zhiqiang Tan
Abstract:
For discrete-time survival data, conditional likelihood inference in Cox's hazard odds model is theoretically desirable but exact calculation is numerical intractable with a moderate to large number of tied events. Unconditional maximum likelihood estimation over both regression coefficients and baseline hazard probabilities can be problematic with a large number of time intervals. We develop new…
▽ More
For discrete-time survival data, conditional likelihood inference in Cox's hazard odds model is theoretically desirable but exact calculation is numerical intractable with a moderate to large number of tied events. Unconditional maximum likelihood estimation over both regression coefficients and baseline hazard probabilities can be problematic with a large number of time intervals. We develop new methods and theory using numerically simple estimating functions, along with model-based and model-robust variance estimation, in hazard probability and odds models. For the probability hazard model, we derive as a consistent estimator the Breslow-Peto estimator, previously known as an approximation to the conditional likelihood estimator in the hazard odds model. For the odds hazard model, we propose a weighted Mantel-Haenszel estimator, which satisfies conditional unbiasedness given the numbers of events in addition to the risk sets and covariates, similarly to the conditional likelihood estimator. Our methods are expected to perform satisfactorily in a broad range of settings, with small or large numbers of tied events corresponding to a large or small number of time intervals. The methods are implemented in the R package dSurvival.
△ Less
Submitted 7 December, 2020;
originally announced December 2020.
-
Doubly Robust Semiparametric Inference Using Regularized Calibrated Estimation with High-dimensional Data
Authors:
Satyajit Ghosh,
Zhiqiang Tan
Abstract:
Consider semiparametric estimation where a doubly robust estimating function for a low-dimensional parameter is available, depending on two working models. With high-dimensional data, we develop regularized calibrated estimation as a general method for estimating the parameters in the two working models, such that valid Wald confidence intervals can be obtained for the parameter of interest under…
▽ More
Consider semiparametric estimation where a doubly robust estimating function for a low-dimensional parameter is available, depending on two working models. With high-dimensional data, we develop regularized calibrated estimation as a general method for estimating the parameters in the two working models, such that valid Wald confidence intervals can be obtained for the parameter of interest under suitable sparsity conditions if either of the two working models is correctly specified. We propose a computationally tractable two-step algorithm and provide rigorous theoretical analysis which justifies sufficiently fast rates of convergence for the regularized calibrated estimators in spite of sequential construction and establishes a desired asymptotic expansion for the doubly robust estimator. As concrete examples, we discuss applications to partially linear, log-linear, and logistic models and estimation of average treatment effects. Numerical studies in the former three examples demonstrate superior performance of our method, compared with debiased Lasso.
△ Less
Submitted 25 September, 2020;
originally announced September 2020.
-
High-dimensional Model-assisted Inference for Local Average Treatment Effects with Instrumental Variables
Authors:
Baoluo Sun,
Zhiqiang Tan
Abstract:
Consider the problem of estimating the local average treatment effect with an instrument variable, where the instrument unconfoundedness holds after adjusting for a set of measured covariates. Several unknown functions of the covariates need to be estimated through regression models, such as instrument propensity score and treatment and outcome regression models. We develop a computationally tract…
▽ More
Consider the problem of estimating the local average treatment effect with an instrument variable, where the instrument unconfoundedness holds after adjusting for a set of measured covariates. Several unknown functions of the covariates need to be estimated through regression models, such as instrument propensity score and treatment and outcome regression models. We develop a computationally tractable method in high-dimensional settings where the numbers of regression terms are close to or larger than the sample size. Our method exploits regularized calibrated estimation, which involves Lasso penalties but carefully chosen loss functions for estimating coefficient vectors in these regression models, and then employs a doubly robust estimator for the treatment parameter through augmented inverse probability weighting. We provide rigorous theoretical analysis to show that the resulting Wald confidence intervals are valid for the treatment parameter under suitable sparsity conditions if the instrument propensity score model is correctly specified, but the treatment and outcome regression models may be misspecified. For existing high-dimensional methods, valid confidence intervals are obtained for the treatment parameter if all three models are correctly specified. We evaluate the proposed methods via extensive simulation studies and an empirical application to estimate the returns to education.
△ Less
Submitted 19 September, 2020;
originally announced September 2020.
-
Hierarchical Message-Passing Graph Neural Networks
Authors:
Zhiqiang Zhong,
Cheng-Te Li,
Jun Pang
Abstract:
Graph Neural Networks (GNNs) have become a prominent approach to machine learning with graphs and have been increasingly applied in a multitude of domains. Nevertheless, since most existing GNN models are based on flat message-passing mechanisms, two limitations need to be tackled: (i) they are costly in encoding long-range information spanning the graph structure; (ii) they are failing to encode…
▽ More
Graph Neural Networks (GNNs) have become a prominent approach to machine learning with graphs and have been increasingly applied in a multitude of domains. Nevertheless, since most existing GNN models are based on flat message-passing mechanisms, two limitations need to be tackled: (i) they are costly in encoding long-range information spanning the graph structure; (ii) they are failing to encode features in the high-order neighbourhood in the graphs as they only perform information aggregation across the observed edges in the original graph. To deal with these two issues, we propose a novel Hierarchical Message-passing Graph Neural Networks framework. The key idea is generating a hierarchical structure that re-organises all nodes in a flat graph into multi-level super graphs, along with innovative intra- and inter-level propagation manners. The derived hierarchy creates shortcuts connecting far-away nodes so that informative long-range interactions can be efficiently accessed via message passing and incorporates meso- and macro-level semantics into the learned node representations. We present the first model to implement this framework, termed Hierarchical Community-aware Graph Neural Network (HC-GNN), with the assistance of a hierarchical community detection algorithm. The theoretical analysis illustrates HC-GNN's remarkable capacity in capturing long-range information without introducing heavy additional computation complexity. Empirical experiments conducted on 9 datasets under transductive, inductive, and few-shot settings exhibit that HC-GNN can outperform state-of-the-art GNN models in network analysis tasks, including node classification, link prediction, and community detection. Moreover, the model analysis further demonstrates HC-GNN's robustness facing graph sparsity and the flexibility in incorporating different GNN encoders.
△ Less
Submitted 26 October, 2022; v1 submitted 8 September, 2020;
originally announced September 2020.
-
Correct Normalization Matters: Understanding the Effect of Normalization On Deep Neural Network Models For Click-Through Rate Prediction
Authors:
Zhiqiang Wang,
Qingyun She,
PengTao Zhang,
Junlin Zhang
Abstract:
Normalization has become one of the most fundamental components in many deep neural networks for machine learning tasks while deep neural network has also been widely used in CTR estimation field. Among most of the proposed deep neural network models, few model utilize normalization approaches. Though some works such as Deep & Cross Network (DCN) and Neural Factorization Machine (NFM) use Batch No…
▽ More
Normalization has become one of the most fundamental components in many deep neural networks for machine learning tasks while deep neural network has also been widely used in CTR estimation field. Among most of the proposed deep neural network models, few model utilize normalization approaches. Though some works such as Deep & Cross Network (DCN) and Neural Factorization Machine (NFM) use Batch Normalization in MLP part of the structure, there isn't work to thoroughly explore the effect of the normalization on the DNN ranking systems. In this paper, we conduct a systematic study on the effect of widely used normalization schemas by applying the various normalization approaches to both feature embedding and MLP part in DNN model. Extensive experiments are conduct on three real-world datasets and the experiment results demonstrate that the correct normalization significantly enhances model's performance. We also propose a new and effective normalization approaches based on LayerNorm named variance only LayerNorm(VO-LN) in this work. A normalization enhanced DNN model named NormDNN is also proposed based on the above-mentioned observation. As for the reason why normalization works for DNN models in CTR estimation, we find that the variance of normalization plays the main role and give an explanation in this work.
△ Less
Submitted 7 July, 2020; v1 submitted 23 June, 2020;
originally announced June 2020.
-
Bandit Samplers for Training Graph Neural Networks
Authors:
Ziqi Liu,
Zhengwei Wu,
Zhiqiang Zhang,
Jun Zhou,
Shuang Yang,
Le Song,
Yuan Qi
Abstract:
Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs). However, due to the intractable computation of optimal sampling distribution, these sampling algorithms are suboptimal for GCNs and are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than…
▽ More
Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs). However, due to the intractable computation of optimal sampling distribution, these sampling algorithms are suboptimal for GCNs and are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT). The fundamental reason is that the embeddings of the neighbors or learned weights involved in the optimal sampling distribution are changing during the training and not known a priori, but only partially observed when sampled, thus making the derivation of an optimal variance reduced samplers non-trivial. In this paper, we formulate the optimization of the sampling variance as an adversary bandit problem, where the rewards are related to the node embeddings and learned weights, and can vary constantly. Thus a good sampler needs to acquire variance information about more neighbors (exploration) while at the same time optimizing the immediate sampling variance (exploit). We theoretically show that our algorithm asymptotically approaches the optimal variance within a factor of 3. We show the efficiency and effectiveness of our approach on multiple datasets.
△ Less
Submitted 11 June, 2020; v1 submitted 10 June, 2020;
originally announced June 2020.
-
Hamiltonian Assisted Metropolis Sampling
Authors:
Zexi Song,
Zhiqiang Tan
Abstract:
Various Markov chain Monte Carlo (MCMC) methods are studied to improve upon random walk Metropolis sampling, for simulation from complex distributions. Examples include Metropolis-adjusted Langevin algorithms, Hamiltonian Monte Carlo, and other recent algorithms related to underdamped Langevin dynamics. We propose a broad class of irreversible sampling algorithms, called Hamiltonian assisted Metro…
▽ More
Various Markov chain Monte Carlo (MCMC) methods are studied to improve upon random walk Metropolis sampling, for simulation from complex distributions. Examples include Metropolis-adjusted Langevin algorithms, Hamiltonian Monte Carlo, and other recent algorithms related to underdamped Langevin dynamics. We propose a broad class of irreversible sampling algorithms, called Hamiltonian assisted Metropolis sampling (HAMS), and develop two specific algorithms with appropriate tuning and preconditioning strategies. Our HAMS algorithms are designed to achieve two distinctive properties, while using an augmented target density with momentum as an auxiliary variable. One is generalized detailed balance, which induces an irreversible exploration of the target. The other is a rejection-free property, which allows our algorithms to perform satisfactorily with relatively large step sizes. Furthermore, we formulate a framework of generalized Metropolis--Hastings sampling, which not only highlights our construction of HAMS at a more abstract level, but also facilitates possible further development of irreversible MCMC algorithms. We present several numerical experiments, where the proposed algorithms are found to consistently yield superior results among existing ones.
△ Less
Submitted 16 May, 2020;
originally announced May 2020.
-
On loss functions and regret bounds for multi-category classification
Authors:
Zhiqiang Tan,
Xinwei Zhang
Abstract:
We develop new approaches in multi-class settings for constructing proper scoring rules and hinge-like losses and establishing corresponding regret bounds with respect to the zero-one or cost-weighted classification loss. Our construction of losses involves deriving new inverse map**s from a concave generalized entropy to a loss through the use of a convex dissimilarity function related to the m…
▽ More
We develop new approaches in multi-class settings for constructing proper scoring rules and hinge-like losses and establishing corresponding regret bounds with respect to the zero-one or cost-weighted classification loss. Our construction of losses involves deriving new inverse map**s from a concave generalized entropy to a loss through the use of a convex dissimilarity function related to the multi-distribution $f$-divergence. Moreover, we identify new classes of multi-class proper scoring rules, which also recover and reveal interesting relationships between various composite losses currently in use. We establish new classification regret bounds in general for multi-class proper scoring rules by exploiting the Bregman divergences of the associated generalized entropies, and, as applications, provide simple meaningful regret bounds for two specific classes of proper scoring rules. Finally, we derive new hinge-like convex losses, which are tighter convex extensions than related hinge-like losses and geometrically simpler with fewer non-differentiable edges, while achieving similar regret bounds. We also establish a general classification regret bound for all losses which induce the same generalized entropy as the zero-one loss.
△ Less
Submitted 15 May, 2021; v1 submitted 16 May, 2020;
originally announced May 2020.
-
A Global Benchmark of Algorithms for Segmenting Late Gadolinium-Enhanced Cardiac Magnetic Resonance Imaging
Authors:
Zhaohan Xiong,
Qing Xia,
Zhiqiang Hu,
Ning Huang,
Cheng Bian,
Yefeng Zheng,
Sulaiman Vesal,
Nishant Ravikumar,
Andreas Maier,
Xin Yang,
Pheng-Ann Heng,
Dong Ni,
Caizi Li,
Qianqian Tong,
Weixin Si,
Elodie Puybareau,
Younes Khoudli,
Thierry Geraud,
Chen Chen,
Wenjia Bai,
Daniel Rueckert,
Lingchao Xu,
Xiahai Zhuang,
Xinzhe Luo,
Shuman Jia
, et al. (19 additional authors not shown)
Abstract:
Segmentation of cardiac images, particularly late gadolinium-enhanced magnetic resonance imaging (LGE-MRI) widely used for visualizing diseased cardiac structures, is a crucial first step for clinical diagnosis and treatment. However, direct segmentation of LGE-MRIs is challenging due to its attenuated contrast. Since most clinical studies have relied on manual and labor-intensive approaches, auto…
▽ More
Segmentation of cardiac images, particularly late gadolinium-enhanced magnetic resonance imaging (LGE-MRI) widely used for visualizing diseased cardiac structures, is a crucial first step for clinical diagnosis and treatment. However, direct segmentation of LGE-MRIs is challenging due to its attenuated contrast. Since most clinical studies have relied on manual and labor-intensive approaches, automatic methods are of high interest, particularly optimized machine learning approaches. To address this, we organized the "2018 Left Atrium Segmentation Challenge" using 154 3D LGE-MRIs, currently the world's largest cardiac LGE-MRI dataset, and associated labels of the left atrium segmented by three medical experts, ultimately attracting the participation of 27 international teams. In this paper, extensive analysis of the submitted algorithms using technical and biological metrics was performed by undergoing subgroup analysis and conducting hyper-parameter analysis, offering an overall picture of the major design choices of convolutional neural networks (CNNs) and practical considerations for achieving state-of-the-art left atrium segmentation. Results show the top method achieved a dice score of 93.2% and a mean surface to a surface distance of 0.7 mm, significantly outperforming prior state-of-the-art. Particularly, our analysis demonstrated that double, sequentially used CNNs, in which a first CNN is used for automatic region-of-interest localization and a subsequent CNN is used for refined regional segmentation, achieved far superior results than traditional methods and pipelines containing single CNNs. This large-scale benchmarking study makes a significant step towards much-improved segmentation methods for cardiac LGE-MRIs, and will serve as an important benchmark for evaluating and comparing the future works in the field.
△ Less
Submitted 7 May, 2020; v1 submitted 26 April, 2020;
originally announced April 2020.
-
Learnable Subspace Clustering
Authors:
Jun Li,
Hongfu Liu,
Zhiqiang Tao,
Handong Zhao,
Yun Fu
Abstract:
This paper studies the large-scale subspace clustering (LSSC) problem with million data points. Many popular subspace clustering methods cannot directly handle the LSSC problem although they have been considered as state-of-the-art methods for small-scale data points. A basic reason is that these methods often choose all data points as a big dictionary to build huge coding models, which results in…
▽ More
This paper studies the large-scale subspace clustering (LSSC) problem with million data points. Many popular subspace clustering methods cannot directly handle the LSSC problem although they have been considered as state-of-the-art methods for small-scale data points. A basic reason is that these methods often choose all data points as a big dictionary to build huge coding models, which results in a high time and space complexity. In this paper, we develop a learnable subspace clustering paradigm to efficiently solve the LSSC problem. The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces instead of the expensive costs of the classical coding models. Moreover, we propose a unified robust predictive coding machine (RPCM) to learn the parametric function, which can be solved by an alternating minimization algorithm. In addition, we provide a bounded contraction analysis of the parametric function. To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods. Experiments on million-scale datasets verify that our paradigm outperforms the related state-of-the-art methods in both efficiency and effectiveness.
△ Less
Submitted 9 April, 2020;
originally announced April 2020.
-
NetDP: An Industrial-Scale Distributed Network Representation Framework for Default Prediction in Ant Credit Pay
Authors:
Jianbin Lin,
Zhiqiang Zhang,
Jun Zhou,
Xiaolong Li,
**gli Fang,
Yanming Fang,
Quan Yu,
Yuan Qi
Abstract:
Ant Credit Pay is a consumer credit service in Ant Financial Service Group. Similar to credit card, loan default is one of the major risks of this credit product. Hence, effective algorithm for default prediction is the key to losses reduction and profits increment for the company. However, the challenges facing in our scenario are different from those in conventional credit card service. The firs…
▽ More
Ant Credit Pay is a consumer credit service in Ant Financial Service Group. Similar to credit card, loan default is one of the major risks of this credit product. Hence, effective algorithm for default prediction is the key to losses reduction and profits increment for the company. However, the challenges facing in our scenario are different from those in conventional credit card service. The first one is scalability. The huge volume of users and their behaviors in Ant Financial requires the ability to process industrial-scale data and perform model training efficiently. The second challenges is the cold-start problem. Different from the manual review for credit card application in conventional banks, the credit limit of Ant Credit Pay is automatically offered to users based on the knowledge learned from big data. However, default prediction for new users is suffered from lack of enough credit behaviors. It requires that the proposal should leverage other new data source to alleviate the cold-start problem. Considering the above challenges and the special scenario in Ant Financial, we try to incorporate default prediction with network information to alleviate the cold-start problem. In this paper, we propose an industrial-scale distributed network representation framework, termed NetDP, for default prediction in Ant Credit Pay. The proposal explores network information generated by various interaction between users, and blends unsupervised and supervised network representation in a unified framework for default prediction problem. Moreover, we present a parameter-server-based distributed implement of our proposal to handle the scalability challenge. Experimental results demonstrate the effectiveness of our proposal, especially in cold-start problem, as well as the efficiency for industrial-scale dataset.
△ Less
Submitted 31 March, 2020;
originally announced April 2020.
-
Graph Representation Learning for Merchant Incentive Optimization in Mobile Payment Marketing
Authors:
Ziqi Liu,
Dong Wang,
Qianyu Yu,
Zhiqiang Zhang,
Yue Shen,
Jian Ma,
Wenliang Zhong,
**jie Gu,
Jun Zhou,
Shuang Yang,
Yuan Qi
Abstract:
Mobile payment such as Alipay has been widely used in our daily lives. To further promote the mobile payment activities, it is important to run marketing campaigns under a limited budget by providing incentives such as coupons, commissions to merchants. As a result, incentive optimization is the key to maximizing the commercial objective of the marketing campaign. With the analyses of online exper…
▽ More
Mobile payment such as Alipay has been widely used in our daily lives. To further promote the mobile payment activities, it is important to run marketing campaigns under a limited budget by providing incentives such as coupons, commissions to merchants. As a result, incentive optimization is the key to maximizing the commercial objective of the marketing campaign. With the analyses of online experiments, we found that the transaction network can subtly describe the similarity of merchants' responses to different incentives, which is of great use in the incentive optimization problem. In this paper, we present a graph representation learning method atop of transaction networks for merchant incentive optimization in mobile payment marketing. With limited samples collected from online experiments, our end-to-end method first learns merchant representations based on an attributed transaction networks, then effectively models the correlations between the commercial objectives each merchant may achieve and the incentives under varying treatments. Thus we are able to model the sensitivity to incentive for each merchant, and spend the most budgets on those merchants that show strong sensitivities in the marketing campaign. Extensive offline and online experimental results at Alipay demonstrate the effectiveness of our proposed approach.
△ Less
Submitted 27 February, 2020;
originally announced March 2020.
-
Deep Residual Dense U-Net for Resolution Enhancement in Accelerated MRI Acquisition
Authors:
Pak Lun Kevin Ding,
Zhiqiang Li,
Yuxiang Zhou,
Baoxin Li
Abstract:
Typical Magnetic Resonance Imaging (MRI) scan may take 20 to 60 minutes. Reducing MRI scan time is beneficial for both patient experience and cost considerations. Accelerated MRI scan may be achieved by acquiring less amount of k-space data (down-sampling in the k-space). However, this leads to lower resolution and aliasing artifacts for the reconstructed images. There are many existing approaches…
▽ More
Typical Magnetic Resonance Imaging (MRI) scan may take 20 to 60 minutes. Reducing MRI scan time is beneficial for both patient experience and cost considerations. Accelerated MRI scan may be achieved by acquiring less amount of k-space data (down-sampling in the k-space). However, this leads to lower resolution and aliasing artifacts for the reconstructed images. There are many existing approaches for attempting to reconstruct high-quality images from down-sampled k-space data, with varying complexity and performance. In recent years, deep-learning approaches have been proposed for this task, and promising results have been reported. Still, the problem remains challenging especially because of the high fidelity requirement in most medical applications employing reconstructed MRI images. In this work, we propose a deep-learning approach, aiming at reconstructing high-quality images from accelerated MRI acquisition. Specifically, we use Convolutional Neural Network (CNN) to learn the differences between the aliased images and the original images, employing a U-Net-like architecture. Further, a micro-architecture termed Residual Dense Block (RDB) is introduced for learning a better feature representation than the plain U-Net. Considering the peculiarity of the down-sampled k-space data, we introduce a new term to the loss function in learning, which effectively employs the given k-space data during training to provide additional regularization on the update of the network weights. To evaluate the proposed approach, we compare it with other state-of-the-art methods. In both visual inspection and evaluation using standard metrics, the proposed approach is able to deliver improved performance, demonstrating its potential for providing an effective solution.
△ Less
Submitted 13 January, 2020;
originally announced January 2020.
-
Automated Relational Meta-learning
Authors:
Huaxiu Yao,
Xian Wu,
Zhiqiang Tao,
Yaliang Li,
Bolin Ding,
Ruirui Li,
Zhenhui Li
Abstract:
In order to efficiently learn with small amount of data on new tasks, meta-learning transfers knowledge learned from previous tasks to the new ones. However, a critical challenge in meta-learning is the task heterogeneity which cannot be well handled by traditional globally shared meta-learning methods. In addition, current task-specific meta-learning methods may either suffer from hand-crafted st…
▽ More
In order to efficiently learn with small amount of data on new tasks, meta-learning transfers knowledge learned from previous tasks to the new ones. However, a critical challenge in meta-learning is the task heterogeneity which cannot be well handled by traditional globally shared meta-learning methods. In addition, current task-specific meta-learning methods may either suffer from hand-crafted structure design or lack the capability to capture complex relations between tasks. In this paper, motivated by the way of knowledge organization in knowledge bases, we propose an automated relational meta-learning (ARML) framework that automatically extracts the cross-task relations and constructs the meta-knowledge graph. When a new task arrives, it can quickly find the most relevant structure and tailor the learned structure knowledge to the meta-learner. As a result, the proposed framework not only addresses the challenge of task heterogeneity by a learned meta-knowledge graph, but also increases the model interpretability. We conduct extensive experiments on 2D toy regression and few-shot image classification and the results demonstrate the superiority of ARML over state-of-the-art baselines.
△ Less
Submitted 3 January, 2020;
originally announced January 2020.
-
The limits of the sample spiked eigenvalues for a high-dimensional generalized Fisher matrix and its applications
Authors:
Dandan Jiang,
Jiang Hu,
Zhiqiang Hou
Abstract:
A generalized spiked Fisher matrix is considered in this paper. We establish a criterion for the description of the support of the limiting spectral distribution of high-dimensional generalized Fisher matrix and study the almost sure limits of the sample spiked eigenvalues where the population covariance matrices are arbitrary which successively removed an unrealistic condition posed in the previo…
▽ More
A generalized spiked Fisher matrix is considered in this paper. We establish a criterion for the description of the support of the limiting spectral distribution of high-dimensional generalized Fisher matrix and study the almost sure limits of the sample spiked eigenvalues where the population covariance matrices are arbitrary which successively removed an unrealistic condition posed in the previous works, that is, the covariance matrices are assumed to be diagonal or diagonal block-wise structure. In addition, we also give a consistent estimator of the population spiked eigenvalues. A series of simulations are conducted that support the theoretical results and illustrate the accuracy of our estimators.
△ Less
Submitted 5 December, 2019;
originally announced December 2019.
-
Correlative Channel-Aware Fusion for Multi-View Time Series Classification
Authors:
Yue Bai,
Lichen Wang,
Zhiqiang Tao,
Sheng Li,
Yun Fu
Abstract:
Multi-view time series classification (MVTSC) aims to improve the performance by fusing the distinctive temporal information from multiple views. Existing methods mainly focus on fusing multi-view information at an early stage, e.g., by learning a common feature subspace among multiple views. However, these early fusion methods may not fully exploit the unique temporal patterns of each view in com…
▽ More
Multi-view time series classification (MVTSC) aims to improve the performance by fusing the distinctive temporal information from multiple views. Existing methods mainly focus on fusing multi-view information at an early stage, e.g., by learning a common feature subspace among multiple views. However, these early fusion methods may not fully exploit the unique temporal patterns of each view in complicated time series. Moreover, the label correlations of multiple views, which are critical to boost-ing, are usually under-explored for the MVTSC problem. To address the aforementioned issues, we propose a Correlative Channel-Aware Fusion (C2AF) network. First, C2AF extracts comprehensive and robust temporal patterns by a two-stream structured encoder for each view, and captures the intra-view and inter-view label correlations with a graph-based correlation matrix. Second, a channel-aware learnable fusion mechanism is implemented through convolutional neural networks to further explore the global correlative patterns. These two steps are trained end-to-end in the proposed C2AF network. Extensive experimental results on three real-world datasets demonstrate the superiority of our approach over the state-of-the-art methods. A detailed ablation study is also provided to show the effectiveness of each model component.
△ Less
Submitted 20 November, 2020; v1 submitted 24 November, 2019;
originally announced November 2019.
-
Analysis of odds, probability, and hazard ratios: From 2 by 2 tables to two-sample survival data
Authors:
Zhiqiang Tan
Abstract:
Analysis of 2 by 2 tables and two-sample survival data has been widely used. Exact calculation is computational intractable for conditional likelihood inference in odds ratio models with large marginals in 2 by 2 tables, or partial likelihood inference in Cox's proportional hazards models with considerable tied event times. Approximate methods are often employed, but their statistical properties h…
▽ More
Analysis of 2 by 2 tables and two-sample survival data has been widely used. Exact calculation is computational intractable for conditional likelihood inference in odds ratio models with large marginals in 2 by 2 tables, or partial likelihood inference in Cox's proportional hazards models with considerable tied event times. Approximate methods are often employed, but their statistical properties have not been formally studied while taking into account the approximation involved. We develop new methods and theory by constructing suitable estimating functions while leveraging knowledge from conditional or partial likelihood inference. We propose a weighted Mantel--Haenszel estimator in an odds ratio model such as Cox's discrete-time proportional hazards model. Moreover, we consider a probability ratio model, and derive as a consistent estimator the Breslow--Peto estimator, which has been regarded as an approximation to partial likelihood estimation in the odds ratio model. We study both model-based and model-robust variance estimation. For the Breslow--Peto estimator, our new model-based variance estimator is no greater than the commonly reported variance estimator. We present numerical studies which support the theoretical findings.
△ Less
Submitted 24 November, 2019;
originally announced November 2019.
-
GRASPEL: Graph Spectral Learning at Scale
Authors:
Yongyu Wang,
Zhiqiang Zhao,
Zhuo Feng
Abstract:
Learning meaningful graphs from data plays important roles in many data mining and machine learning tasks, such as data representation and analysis, dimension reduction, data clustering, and visualization, etc. In this work, for the first time, we present a highly-scalable spectral approach (GRASPEL) for learning large graphs from data. By limiting the precision matrix to be a graph Laplacian, our…
▽ More
Learning meaningful graphs from data plays important roles in many data mining and machine learning tasks, such as data representation and analysis, dimension reduction, data clustering, and visualization, etc. In this work, for the first time, we present a highly-scalable spectral approach (GRASPEL) for learning large graphs from data. By limiting the precision matrix to be a graph Laplacian, our approach aims to estimate ultra-sparse (tree-like) weighted undirected graphs and shows a clear connection with the prior graphical Lasso method. By interleaving the latest high-performance nearly-linear time spectral methods for graph sparsification, coarsening and embedding, ultra-sparse yet spectrally-robust graphs can be learned by identifying and including the most spectrally-critical edges into the graph. Compared with prior state-of-the-art graph learning approaches, GRASPEL is more scalable and allows substantially improving computing efficiency and solution quality of a variety of data mining and machine learning applications, such as spectral clustering (SC), and t-Distributed Stochastic Neighbor Embedding (t-SNE). {For example, when comparing with graphs constructed using existing methods, GRASPEL achieved the best spectral clustering efficiency and accuracy.
△ Less
Submitted 28 July, 2020; v1 submitted 23 November, 2019;
originally announced November 2019.
-
Progressive Feature Polishing Network for Salient Object Detection
Authors:
Bo Wang,
Quan Chen,
Min Zhou,
Zhiqiang Zhang,
Xiaogang **,
Kun Gai
Abstract:
Feature matters for salient object detection. Existing methods mainly focus on designing a sophisticated structure to incorporate multi-level features and filter out cluttered features. We present Progressive Feature Polishing Network (PFPN), a simple yet effective framework to progressively polish the multi-level features to be more accurate and representative. By employing multiple Feature Polis…
▽ More
Feature matters for salient object detection. Existing methods mainly focus on designing a sophisticated structure to incorporate multi-level features and filter out cluttered features. We present Progressive Feature Polishing Network (PFPN), a simple yet effective framework to progressively polish the multi-level features to be more accurate and representative. By employing multiple Feature Polishing Modules (FPMs) in a recurrent manner, our approach is able to detect salient objects with fine details without any post-processing. A FPM parallelly updates the features of each level by directly incorporating all higher level context information. Moreover, it can keep the dimensions and hierarchical structures of the feature maps, which makes it flexible to be integrated with any CNN-based models. Empirical experiments show that our results are monotonically getting better with increasing number of FPMs. Without bells and whistles, PFPN outperforms the state-of-the-art methods significantly on five benchmark datasets under various evaluation metrics.
△ Less
Submitted 14 November, 2019;
originally announced November 2019.
-
Deep least-squares methods: an unsupervised learning-based numerical method for solving elliptic PDEs
Authors:
Zhiqiang Cai,
**gshuang Chen,
Min Liu,
Xinyu Liu
Abstract:
This paper studies an unsupervised deep learning-based numerical approach for solving partial differential equations (PDEs). The approach makes use of the deep neural network to approximate solutions of PDEs through the compositional construction and employs least-squares functionals as loss functions to determine parameters of the deep neural network. There are various least-squares functionals f…
▽ More
This paper studies an unsupervised deep learning-based numerical approach for solving partial differential equations (PDEs). The approach makes use of the deep neural network to approximate solutions of PDEs through the compositional construction and employs least-squares functionals as loss functions to determine parameters of the deep neural network. There are various least-squares functionals for a partial differential equation. This paper focuses on the so-called first-order system least-squares (FOSLS) functional studied in [3], which is based on a first-order system of scalar second-order elliptic PDEs. Numerical results for second-order elliptic PDEs in one dimension are presented.
△ Less
Submitted 12 July, 2020; v1 submitted 5 November, 2019;
originally announced November 2019.
-
GraphZoom: A multi-level spectral approach for accurate and scalable graph embedding
Authors:
Chenhui Deng,
Zhiqiang Zhao,
Yongyu Wang,
Zhiru Zhang,
Zhuo Feng
Abstract:
Graph embedding techniques have been increasingly deployed in a multitude of different applications that involve learning on non-Euclidean data. However, existing graph embedding models either fail to incorporate node attribute information during training or suffer from node attribute noise, which compromises the accuracy. Moreover, very few of them scale to large graphs due to their high computat…
▽ More
Graph embedding techniques have been increasingly deployed in a multitude of different applications that involve learning on non-Euclidean data. However, existing graph embedding models either fail to incorporate node attribute information during training or suffer from node attribute noise, which compromises the accuracy. Moreover, very few of them scale to large graphs due to their high computational complexity and memory usage. In this paper we propose GraphZoom, a multi-level framework for improving both accuracy and scalability of unsupervised graph embedding algorithms. GraphZoom first performs graph fusion to generate a new graph that effectively encodes the topology of the original graph and the node attribute information. This fused graph is then repeatedly coarsened into much smaller graphs by merging nodes with high spectral similarities. GraphZoom allows any existing embedding methods to be applied to the coarsened graph, before it progressively refine the embeddings obtained at the coarsest level to increasingly finer graphs. We have evaluated our approach on a number of popular graph datasets for both transductive and inductive tasks. Our experiments show that GraphZoom can substantially increase the classification accuracy and significantly accelerate the entire graph embedding process by up to 40.8x, when compared to the state-of-the-art unsupervised embedding methods.
△ Less
Submitted 17 February, 2020; v1 submitted 6 October, 2019;
originally announced October 2019.
-
Empirical Study on Detecting Controversy in Social Media
Authors:
Azadeh Nematzadeh,
Grace Bang,
Xiaomo Liu,
Zhiqiang Ma
Abstract:
Companies and financial investors are paying increasing attention to social consciousness in develo** their corporate strategies and making investment decisions to support a sustainable economy for the future. Public discussion on incidents and events -- controversies -- of companies can provide valuable insights on how well the company operates with regards to social consciousness and indicate…
▽ More
Companies and financial investors are paying increasing attention to social consciousness in develo** their corporate strategies and making investment decisions to support a sustainable economy for the future. Public discussion on incidents and events -- controversies -- of companies can provide valuable insights on how well the company operates with regards to social consciousness and indicate the company's overall operational capability. However, there are challenges in evaluating the degree of a company's social consciousness and environmental sustainability due to the lack of systematic data. We introduce a system that utilizes Twitter data to detect and monitor controversial events and show their impact on market volatility. In our study, controversial events are identified from clustered tweets that share the same 5W terms and sentiment polarities of these clusters. Credible news links inside the event tweets are used to validate the truth of the event. A case study on the Starbucks Philadelphia arrests shows that this method can provide the desired functionality.
△ Less
Submitted 25 August, 2019;
originally announced September 2019.
-
Constrained Bilinear Factorization Multi-view Subspace Clustering
Authors:
Qinghai Zheng,
Jihua Zhu,
Zhiqiang Tian,
Zhongyu Li,
Shanmin Pang,
Xiuyi Jia
Abstract:
Multi-view clustering is an important and fundamental problem. Many multi-view subspace clustering methods have been proposed, and most of them assume that all views share a same coefficient matrix. However, the underlying information of multi-view data are not fully exploited under this assumption, since the coefficient matrices of different views should have the same clustering properties rather…
▽ More
Multi-view clustering is an important and fundamental problem. Many multi-view subspace clustering methods have been proposed, and most of them assume that all views share a same coefficient matrix. However, the underlying information of multi-view data are not fully exploited under this assumption, since the coefficient matrices of different views should have the same clustering properties rather than be uniform among multiple views. To this end, this paper proposes a novel Constrained Bilinear Factorization Multi-view Subspace Clustering (CBF-MSC) method. Specifically, the bilinear factorization with an orthonormality constraint and a low-rank constraint is imposed for all coefficient matrices to make them have the same trace-norm instead of being equivalent, so as to explore the consensus information of multi-view data more fully. Finally, an Augmented Lagrangian Multiplier (ALM) based algorithm is designed to optimize the objective function. Comprehensive experiments tested on nine benchmark datasets validate the effectiveness and competitiveness of the proposed approach compared with several state-of-the-arts.
△ Less
Submitted 24 March, 2021; v1 submitted 19 June, 2019;
originally announced June 2019.
-
Semi-supervised Logistic Learning Based on Exponential Tilt Mixture Models
Authors:
Xinwei Zhang,
Zhiqiang Tan
Abstract:
Consider semi-supervised learning for classification, where both labeled and unlabeled data are available for training. The goal is to exploit both datasets to achieve higher prediction accuracy than just using labeled data alone. We develop a semi-supervised logistic learning method based on exponential tilt mixture models, by extending a statistical equivalence between logistic regression and ex…
▽ More
Consider semi-supervised learning for classification, where both labeled and unlabeled data are available for training. The goal is to exploit both datasets to achieve higher prediction accuracy than just using labeled data alone. We develop a semi-supervised logistic learning method based on exponential tilt mixture models, by extending a statistical equivalence between logistic regression and exponential tilt modeling. We study maximum nonparametric likelihood estimation and derive novel objective functions which are shown to be Fisher consistent. We also propose regularized estimation and construct simple and highly interpretable EM algorithms. Finally, we present numerical results which demonstrate the advantage of the proposed methods compared with existing methods.
△ Less
Submitted 18 June, 2019;
originally announced June 2019.
-
Hierarchical Total Variations and Doubly Penalized ANOVA Modeling for Multivariate Nonparametric Regression
Authors:
Ting Yang,
Zhiqiang Tan
Abstract:
For multivariate nonparametric regression, functional analysis-of-variance (ANOVA) modeling aims to capture the relationship between a response and covariates by decomposing the unknown function into various components, representing main effects, two-way interactions, etc. Such an approach has been pursued explicitly in smoothing spline ANOVA modeling and implicitly in various greedy methods such…
▽ More
For multivariate nonparametric regression, functional analysis-of-variance (ANOVA) modeling aims to capture the relationship between a response and covariates by decomposing the unknown function into various components, representing main effects, two-way interactions, etc. Such an approach has been pursued explicitly in smoothing spline ANOVA modeling and implicitly in various greedy methods such as MARS. We develop a new method for functional ANOVA modeling, based on doubly penalized estimation using total-variation and empirical-norm penalties, to achieve sparse selection of component functions and their knots. For this purpose, we formulate a new class of hierarchical total variations, which measures total variations at different levels including main effects and multi-way interactions, possibly after some order of differentiation. Furthermore, we derive suitable basis functions for multivariate splines such that the hierarchical total variation can be represented as a regular Lasso penalty, and hence we extend a previous backfitting algorithm to handle doubly penalized estimation for ANOVA modeling. We present extensive numerical experiments on simulations and real data to compare our method with existing methods including MARS, tree boosting, and random forest. The results are very encouraging and demonstrate considerable gains from our method in both prediction or classification accuracy and simplicity of the fitted functions.
△ Less
Submitted 16 June, 2019;
originally announced June 2019.
-
Community Detection Based on the $L_\infty$ convergence of eigenvectors in DCBM
Authors:
Yan Liu,
Zhiqiang Hou,
Zhigang Yao,
Zhidong Bai,
Jiang Hu,
Shurong Zheng
Abstract:
Spectral clustering is one of the most popular algorithms for community detection in network analysis. Based on this rationale, in this paper we give the convergence rate of eigenvectors for the adjacency matrix in the $l_\infty$ norm, under the stochastic block model (BM) and degree corrected stochastic block model (DCBM), adding some mild and rational conditions. We also extend this result to a…
▽ More
Spectral clustering is one of the most popular algorithms for community detection in network analysis. Based on this rationale, in this paper we give the convergence rate of eigenvectors for the adjacency matrix in the $l_\infty$ norm, under the stochastic block model (BM) and degree corrected stochastic block model (DCBM), adding some mild and rational conditions. We also extend this result to a more general model, presented based on the DCBM such that the value of random variables in the adjacency matrix is not 0 or 1, but an arbitrary real number. During the process of proving the above conclusion, we obtain the relationship of the eigenvalues in the adjacency matrix and the corresponding `population' matrix, which vary in dimension from the community-wise edge probability matrix. Using that result, we can give an estimate of the number of the communities in a known set of network data. Meanwhile we proved the consistency of the estimator. Furthermore, according to the derivation of proof for the convergence of eigenvectors, we propose a new approach to community detection -- Spectral Clustering based on Difference of Ratios of Eigenvectors (SCDRE). Our simulation experiments demonstrate the superiority of our method in community detection.
△ Less
Submitted 16 June, 2019;
originally announced June 2019.
-
Consensus Clustering: An Embedding Perspective, Extension and Beyond
Authors:
Hongfu Liu,
Zhiqiang Tao,
Zhengming Ding
Abstract:
Consensus clustering fuses diverse basic partitions (i.e., clustering results obtained from conventional clustering methods) into an integrated one, which has attracted increasing attention in both academic and industrial areas due to its robust and effective performance. Tremendous research efforts have been made to thrive this domain in terms of algorithms and applications. Although there are so…
▽ More
Consensus clustering fuses diverse basic partitions (i.e., clustering results obtained from conventional clustering methods) into an integrated one, which has attracted increasing attention in both academic and industrial areas due to its robust and effective performance. Tremendous research efforts have been made to thrive this domain in terms of algorithms and applications. Although there are some survey papers to summarize the existing literature, they neglect to explore the underlying connection among different categories. Differently, in this paper we aim to provide an embedding prospective to illustrate the consensus mechanism, which transfers categorical basic partitions to other representations (e.g., binary coding, spectral embedding, etc) for the clustering purpose. To this end, we not only unify two major categories of consensus clustering, but also build an intuitive connection between consensus clustering and graph embedding. Moreover, we elaborate several extensions of classical consensus clustering from different settings and problems. Beyond this, we demonstrate how to leverage consensus clustering to address other tasks, such as constrained clustering, domain adaptation, feature selection, and outlier detection. Finally, we conclude this survey with future work in terms of interpretability, learnability and theoretical analysis.
△ Less
Submitted 31 May, 2019;
originally announced June 2019.
-
An Adaptive Remote Stochastic Gradient Method for Training Neural Networks
Authors:
Yushu Chen,
Hao **g,
Wenlai Zhao,
Zhiqiang Liu,
Ouyi Li,
Liang Qiao,
Wei Xue,
Guangwen Yang
Abstract:
We present the remote stochastic gradient (RSG) method, which computes the gradients at configurable remote observation points, in order to improve the convergence rate and suppress gradient noise at the same time for different curvatures. RSG is further combined with adaptive methods to construct ARSG for acceleration. The method is efficient in computation and memory, and is straightforward to i…
▽ More
We present the remote stochastic gradient (RSG) method, which computes the gradients at configurable remote observation points, in order to improve the convergence rate and suppress gradient noise at the same time for different curvatures. RSG is further combined with adaptive methods to construct ARSG for acceleration. The method is efficient in computation and memory, and is straightforward to implement. We analyze the convergence properties by modeling the training process as a dynamic system, which provides a guideline to select the configurable observation factor without grid search. ARSG yields $O(1/\sqrt{T})$ convergence rate in non-convex settings, that can be further improved to $O(\log(T)/T)$ in strongly convex settings. Numerical experiments demonstrate that ARSG achieves both faster convergence and better generalization, compared with popular adaptive methods, such as ADAM, NADAM, AMSGRAD, and RANGER for the tested problems. In particular, for training ResNet-50 on ImageNet, ARSG outperforms ADAM in convergence speed and meanwhile it surpasses SGD in generalization.
△ Less
Submitted 6 September, 2020; v1 submitted 3 May, 2019;
originally announced May 2019.
-
Improve Diverse Text Generation by Self Labeling Conditional Variational Auto Encoder
Authors:
Yuchi Zhang,
Yongliang Wang,
Li** Zhang,
Zhiqiang Zhang,
Kun Gai
Abstract:
Diversity plays a vital role in many text generating applications. In recent years, Conditional Variational Auto Encoders (CVAE) have shown promising performances for this task. However, they often encounter the so called KL-Vanishing problem. Previous works mitigated such problem by heuristic methods such as strengthening the encoder or weakening the decoder while optimizing the CVAE objective fu…
▽ More
Diversity plays a vital role in many text generating applications. In recent years, Conditional Variational Auto Encoders (CVAE) have shown promising performances for this task. However, they often encounter the so called KL-Vanishing problem. Previous works mitigated such problem by heuristic methods such as strengthening the encoder or weakening the decoder while optimizing the CVAE objective function. Nevertheless, the optimizing direction of these methods are implicit and it is hard to find an appropriate degree to which these methods should be applied. In this paper, we propose an explicit optimizing objective to complement the CVAE to directly pull away from KL-vanishing. In fact, this objective term guides the encoder towards the "best encoder" of the decoder to enhance the expressiveness. A labeling network is introduced to estimate the "best encoder". It provides a continuous label in the latent space of CVAE to help build a close connection between latent variables and targets. The whole proposed method is named Self Labeling CVAE~(SLCVAE). To accelerate the research of diverse text generation, we also propose a large native one-to-many dataset. Extensive experiments are conducted on two tasks, which show that our method largely improves the generating diversity while achieving comparable accuracy compared with state-of-art algorithms.
△ Less
Submitted 26 March, 2019;
originally announced March 2019.
-
A Novel Efficient Approach with Data-Adaptive Capability for OMP-based Sparse Subspace Clustering
Authors:
Jiaqiyu Zhan,
Zhiqiang Bai,
Yuesheng Zhu
Abstract:
Orthogonal Matching Pursuit (OMP) plays an important role in data science and its applications such as sparse subspace clustering and image processing. However, the existing OMP-based approaches lack of data adaptiveness so that the data cannot be represented well enough and may lose the accuracy. This paper proposes a novel approach to enhance the data-adaptive capability for OMP-based sparse sub…
▽ More
Orthogonal Matching Pursuit (OMP) plays an important role in data science and its applications such as sparse subspace clustering and image processing. However, the existing OMP-based approaches lack of data adaptiveness so that the data cannot be represented well enough and may lose the accuracy. This paper proposes a novel approach to enhance the data-adaptive capability for OMP-based sparse subspace clustering. In our method a parameter selection process is developed to adjust the parameters based on the data distribution for information representation. Our theoretical analysis indicates that the parameter selection process can efficiently coordinate with any OMP-based methods to improve the clustering performance. Also a new Self-Expressive-Affinity (SEA) ratio metric is defined to measure the sparse representation conversion efficiency for spectral clustering to obtain data segmentations. Our experiments show that proposed approach can achieve better performances compared with other OMP-based sparse subspace clustering algorithms in terms of clustering accuracy, SEA ratio and representation quality, also keep the time efficiency and anti-noise ability.
△ Less
Submitted 30 August, 2019; v1 submitted 5 March, 2019;
originally announced March 2019.
-
On doubly robust estimation for logistic partially linear models
Authors:
Zhiqiang Tan
Abstract:
Consider a logistic partially linear model, in which the logit of the mean of a binary response is related to a linear function of some covariates and a nonparametric function of other covariates. We derive simple, doubly robust estimators of coefficient for the covariates in the linear component of the partially linear model. Such estimators remain consistent if either a nuisance model is correct…
▽ More
Consider a logistic partially linear model, in which the logit of the mean of a binary response is related to a linear function of some covariates and a nonparametric function of other covariates. We derive simple, doubly robust estimators of coefficient for the covariates in the linear component of the partially linear model. Such estimators remain consistent if either a nuisance model is correctly specified for the nonparametric component, or another nuisance model is correctly specified for the means of the covariates of interest given other covariates and the response at a fixed value. In previous works, conditional density models are needed for the latter purposes unless a scalar, binary covariate is handled. We also propose two specific doubly robust estimators: one is locally-efficient like in our class of doubly robust estimators and the other is numerically and statistically simpler and can achieve reasonable efficiency especially when the true coefficients are close to 0.
△ Less
Submitted 25 January, 2019;
originally announced January 2019.
-
Towards Understanding Learning Representations: To What Extent Do Different Neural Networks Learn the Same Representation
Authors:
Liwei Wang,
Lunjia Hu,
Jiayuan Gu,
Yue Wu,
Zhiqiang Hu,
Kun He,
John Hopcroft
Abstract:
It is widely believed that learning good representations is one of the main reasons for the success of deep neural networks. Although highly intuitive, there is a lack of theory and systematic approach quantitatively characterizing what representations do deep neural networks learn. In this work, we move a tiny step towards a theory and better understanding of the representations. Specifically, we…
▽ More
It is widely believed that learning good representations is one of the main reasons for the success of deep neural networks. Although highly intuitive, there is a lack of theory and systematic approach quantitatively characterizing what representations do deep neural networks learn. In this work, we move a tiny step towards a theory and better understanding of the representations. Specifically, we study a simpler problem: How similar are the representations learned by two networks with identical architecture but trained from different initializations. We develop a rigorous theory based on the neuron activation subspace match model. The theory gives a complete characterization of the structure of neuron activation subspace matches, where the core concepts are maximum match and simple match which describe the overall and the finest similarity between sets of neurons in two networks respectively. We also propose efficient algorithms to find the maximum match and simple matches. Finally, we conduct extensive experiments using our algorithms. Experimental results suggest that, surprisingly, representations learned by the same convolutional layers of networks trained from different initializations are not as similar as prevalently expected, at least in terms of subspace match.
△ Less
Submitted 28 November, 2018; v1 submitted 27 October, 2018;
originally announced October 2018.