-
Interpret the estimand framework from a causal inference perspective
Authors:
**ghong Zeng
Abstract:
The estimand framework proposed by ICH in 2017 has brought fundamental changes in the pharmaceutical industry. It clearly describes how a treatment effect in a clinical question should be precisely defined and estimated, through attributes including treatments, endpoints and intercurrent events. However, ideas around the estimand framework are commonly in text, and different interpretations on thi…
▽ More
The estimand framework proposed by ICH in 2017 has brought fundamental changes in the pharmaceutical industry. It clearly describes how a treatment effect in a clinical question should be precisely defined and estimated, through attributes including treatments, endpoints and intercurrent events. However, ideas around the estimand framework are commonly in text, and different interpretations on this framework may exist. This article aims to interpret the estimand framework through its underlying theories, the causal inference framework based on potential outcomes. The statistical origin and formula of an estimand is given through the causal inference framework, with all attributes translated into statistical terms. How five strategies proposed by ICH to analyze intercurrent events are incorporated in the statistical formula of an estimand is described, and a new strategy to analyze intercurrent events is also suggested. The roles of target populations and analysis sets in the estimand framework are compared and discussed based on the statistical formula of an estimand. This article recommends continuing study of causal inference theories behind the estimand framework and improving the estimand framework with greater methodological comprehensibility and availability.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
Nonlinear regression models to forecast PM$_{2.5}$ concentration in Wuhan, China
Authors:
**ghong Zeng
Abstract:
Forecasting PM$_{2.5}$ concentration is important to solving air pollution problems in Wuhan. This paper proposes a PM$_{2.5}$ concentration forecast model based on nonlinear regression, including a single-value forecast model and an interval forecast model. The single-value forecast model can precisely forecast PM$_{2.5}$ concentration for the next day, with forecast bias about 6 $μg/m^3$ in good…
▽ More
Forecasting PM$_{2.5}$ concentration is important to solving air pollution problems in Wuhan. This paper proposes a PM$_{2.5}$ concentration forecast model based on nonlinear regression, including a single-value forecast model and an interval forecast model. The single-value forecast model can precisely forecast PM$_{2.5}$ concentration for the next day, with forecast bias about 6 $μg/m^3$ in goodness of fit analysis. The interval forecast model can efficiently forecast high-concentration and low-concentration days, which covers 60%-80% observed samples in model validation. Moreover, this paper combines the PM$_{2.5}$ concentration forecast model with NCEP Climate Forecast System Version 2 to realize its forecast application, then develops NCEP CFS2's PM$_{2.5}$ concentration forecast model to enhance forecast accuracy. The results indicate that the PM$_{2.5}$ concentration forecast model has good capacity for independent forecasting.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
Bayesian inference on average treatment effects in the PreventS trial data in the presence of unmeasured confounding
Authors:
**ghong Zeng
Abstract:
Using the PreventS trial data, our objective is to estimate average effects of a Health Wellness Coaching (HWC) intervention on improvement of cardiovascular health at 9 months post randomization and in three consecutive 3-month periods over 9 months post randomization. Conventional approaches, including instrumental variable models, are not applicable in the presence of multiple correlated multiv…
▽ More
Using the PreventS trial data, our objective is to estimate average effects of a Health Wellness Coaching (HWC) intervention on improvement of cardiovascular health at 9 months post randomization and in three consecutive 3-month periods over 9 months post randomization. Conventional approaches, including instrumental variable models, are not applicable in the presence of multiple correlated multivalued exposures and unmeasured confounding. We propose a causal framework and its Bayesian modelling procedures to identify and estimate average effects of one or multiple multivalued exposures on one outcome in the presence of unmeasured confounding, noncompliance and missing data, in a two-arm randomized trial. We also propose estimation methods of unmeasured confounders, where the exposure and outcome distributions are conditional on unmeasured confounders and then unmeasured confounders are imputed as completely missing variables. Several types of model non-identifiability and possible solutions are described. There is a risk that estimation methods of unmeasured confounders can fail when multiple contradictory posterior solutions are produced. The random intercept outcome models that only adjust for unmeasured confounding in the outcome distribution are proposed as a good surrogate causal model in this case, and they need further development.
There is evidence that the HWC intervention is beneficial to cardiovascular health at 9 months post randomization. On average, completing one HWC session improves the Life's Simple Seven total score by 0.16 (0.09, 0.22) and reduces systolic blood pressure by 0.54 (0.19, 0.90) mm Hg. There is also evidence that the HWC intervention has a larger beneficial effect on cardiovascular health during 3 months post randomization. There is no clear evidence that the HWC intervention benefits or harms mental health. The complete abstract is in the article.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
Optimal Contextual Bandits with Knapsacks under Realizability via Regression Oracles
Authors:
Yuxuan Han,
Jialin Zeng,
Yang Wang,
Yang Xiang,
Jiheng Zhang
Abstract:
We study the stochastic contextual bandit with knapsacks (CBwK) problem, where each action, taken upon a context, not only leads to a random reward but also costs a random resource consumption in a vector form. The challenge is to maximize the total reward without violating the budget for each resource. We study this problem under a general realizability setting where the expected reward and expec…
▽ More
We study the stochastic contextual bandit with knapsacks (CBwK) problem, where each action, taken upon a context, not only leads to a random reward but also costs a random resource consumption in a vector form. The challenge is to maximize the total reward without violating the budget for each resource. We study this problem under a general realizability setting where the expected reward and expected cost are functions of contexts and actions in some given general function classes $\mathcal{F}$ and $\mathcal{G}$, respectively. Existing works on CBwK are restricted to the linear function class since they use UCB-type algorithms, which heavily rely on the linear form and thus are difficult to extend to general function classes. Motivated by online regression oracles that have been successfully applied to contextual bandits, we propose the first universal and optimal algorithmic framework for CBwK by reducing it to online regression. We also establish the lower regret bound to show the optimality of our algorithm for a variety of function classes.
△ Less
Submitted 22 February, 2023; v1 submitted 21 October, 2022;
originally announced October 2022.
-
A Tale of HodgeRank and Spectral Method: Target Attack Against Rank Aggregation Is the Fixed Point of Adversarial Game
Authors:
Ke Ma,
Qianqian Xu,
**shan Zeng,
Guorong Li,
Xiaochun Cao,
Qingming Huang
Abstract:
Rank aggregation with pairwise comparisons has shown promising results in elections, sports competitions, recommendations, and information retrieval. However, little attention has been paid to the security issue of such algorithms, in contrast to numerous research work on the computational and statistical characteristics. Driven by huge profits, the potential adversary has strong motivation and in…
▽ More
Rank aggregation with pairwise comparisons has shown promising results in elections, sports competitions, recommendations, and information retrieval. However, little attention has been paid to the security issue of such algorithms, in contrast to numerous research work on the computational and statistical characteristics. Driven by huge profits, the potential adversary has strong motivation and incentives to manipulate the ranking list. Meanwhile, the intrinsic vulnerability of the rank aggregation methods is not well studied in the literature. To fully understand the possible risks, we focus on the purposeful adversary who desires to designate the aggregated results by modifying the pairwise data in this paper. From the perspective of the dynamical system, the attack behavior with a target ranking list is a fixed point belonging to the composition of the adversary and the victim. To perform the targeted attack, we formulate the interaction between the adversary and the victim as a game-theoretic framework consisting of two continuous operators while Nash equilibrium is established. Then two procedures against HodgeRank and RankCentrality are constructed to produce the modification of the original data. Furthermore, we prove that the victims will produce the target ranking list once the adversary masters the complete information. It is noteworthy that the proposed methods allow the adversary only to hold incomplete information or imperfect feedback and perform the purposeful attack. The effectiveness of the suggested target attack strategies is demonstrated by a series of toy simulations and several real-world data experiments. These experimental results show that the proposed methods could achieve the attacker's goal in the sense that the leading candidate of the perturbed ranking list is the designated one by the adversary.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
A Survey of Causal Inference Frameworks
Authors:
**gying Zeng,
Run Wang
Abstract:
Causal inference is a science with multi-disciplinary evolution and applications. On the one hand, it measures effects of treatments in observational data based on experimental designs and rigorous statistical inference to draw causal statements. One of the most influential framework in quantifying causal effects is the potential outcomes framework. On the other hand, causal graphical models utili…
▽ More
Causal inference is a science with multi-disciplinary evolution and applications. On the one hand, it measures effects of treatments in observational data based on experimental designs and rigorous statistical inference to draw causal statements. One of the most influential framework in quantifying causal effects is the potential outcomes framework. On the other hand, causal graphical models utilizes directed edges to represent causalities and encodes conditional independence relationships among variables in the graphs. A series of research has been done both in reading-off conditional independencies from graphs and in re-constructing causal structures. In recent years, the most state-of-art research in causal inference starts unifying the different causal inference frameworks together. This survey aims to provide a review of the past work on causal inference, focusing mainly on potential outcomes framework and causal graphical models. We hope that this survey will help accelerate the understanding of causal inference in different domains.
△ Less
Submitted 2 September, 2022;
originally announced September 2022.
-
Bayesian Causal Inference in Sequentially Randomized Experiments with Noncompliance
Authors:
**gying Zeng
Abstract:
Scientific researchers utilize randomized experiments to draw casual statements. Most early studies as well as current work on experiments with sequential intervention decisions has been focusing on estimating the causal effects among sequential treatments, ignoring the non-compliance issues that experimental units might not be compliant with the treatment assignments that they were originally all…
▽ More
Scientific researchers utilize randomized experiments to draw casual statements. Most early studies as well as current work on experiments with sequential intervention decisions has been focusing on estimating the causal effects among sequential treatments, ignoring the non-compliance issues that experimental units might not be compliant with the treatment assignments that they were originally allocated. A series of methodologies have been developed to address the non-compliance issues in randomized experiments with time-fixed treatment. However, to our best knowledge, there is little literature studies on the non-compliance issues in sequential experiments settings. In this paper, we go beyond the traditional methods using per-protocol, as-treated, or intention-to-treat analysis and propose a latent mixture Bayesian framework to estimate the sample-average treatment effect in sequential experiment having non-compliance concerns.
△ Less
Submitted 25 July, 2022;
originally announced July 2022.
-
Semiparametric Estimation on Multi-treatment Causal Effects via Cross-Fitting
Authors:
**gying Zeng
Abstract:
Causal inference is a critical research area with multi-disciplinary origins and applications, ranging from statistics, computer science, economics, psychology to public health. In many scientific research, randomized experiments provide a golden standard for estimation of causal effects for decades. However, in many situations, randomized experiments are not feasible in practice so that practitio…
▽ More
Causal inference is a critical research area with multi-disciplinary origins and applications, ranging from statistics, computer science, economics, psychology to public health. In many scientific research, randomized experiments provide a golden standard for estimation of causal effects for decades. However, in many situations, randomized experiments are not feasible in practice so that practitioners need to rely on empirical investigation for causal reasoning. Causal inference via observational data is a challenging task since the knowledge of the treatment assignment mechanism is missing, which typically requires non-testable assumptions to make the inference possible. For several years, great effort has been devoted to the research of causal inference for binary treatments. In practice, it is also common to use observational data on multiple treatment comparisons. Within the potential outcomes framework, we propose a generalized cross-fitting estimator (GCF), which generalizes the doubly robust estimator with cross-fitting for binary treatment to multiple treatment comparisons and provides rigorous proofs on its statistical properties. This estimator permits the use of more flexible machine learning methods to model the nuisance parts, and based on relatively weak assumptions, while there is still a theoretical guarantee for valid statistical inference. We show the asymptotic properties of the GCF estimators, and provide the asymptotic simultaneous confidence intervals that achieve the semiparametric efficiency bound for average treatment effect. The performance of the estimator is accessed through simulation study based on the common evaluation metrics generally considered in the causal inference literature.
△ Less
Submitted 25 July, 2022;
originally announced July 2022.
-
Boost-S: Gradient Boosted Trees for Spatial Data and Its Application to FDG-PET Imaging Data
Authors:
Reza Iranzad,
Xiao Liu,
W. Art Chaovalitwongse,
Daniel S. Hippe,
Shouyi Wang,
Jie Han,
Phawis Thammasorn,
Chunyan Duan,
**g Zeng,
Stephen R. Bowen
Abstract:
Boosting Trees are one of the most successful statistical learning approaches that involve sequentially growing an ensemble of simple regression trees (i.e., "weak learners"). However, gradient boosted trees are not yet available for spatially correlated data. This paper proposes a new gradient Boosted Trees algorithm for Spatial Data (Boost-S) with covariate information. Boost-S integrates the sp…
▽ More
Boosting Trees are one of the most successful statistical learning approaches that involve sequentially growing an ensemble of simple regression trees (i.e., "weak learners"). However, gradient boosted trees are not yet available for spatially correlated data. This paper proposes a new gradient Boosted Trees algorithm for Spatial Data (Boost-S) with covariate information. Boost-S integrates the spatial correlation structure into the classical framework of gradient boosted trees. Each tree is grown by solving a regularized optimization problem, where the objective function involves two penalty terms on tree complexity and takes into account the underlying spatial correlation. A computationally-efficient algorithm is proposed to obtain the ensemble trees. The proposed Boost-S is applied to the spatially-correlated FDG-PET (fluorodeoxyglucose-positron emission tomography) imaging data collected during cancer chemoradiotherapy. Our numerical investigations successfully demonstrate the advantages of the proposed Boost-S over existing approaches for this particular application.
△ Less
Submitted 3 February, 2021; v1 submitted 26 January, 2021;
originally announced January 2021.
-
Contrastive Self-supervised Learning for Graph Classification
Authors:
Jiaqi Zeng,
Pengtao Xie
Abstract:
Graph classification is a widely studied problem and has broad applications. In many real-world problems, the number of labeled graphs available for training classification models is limited, which renders these models prone to overfitting. To address this problem, we propose two approaches based on contrastive self-supervised learning (CSSL) to alleviate overfitting. In the first approach, we use…
▽ More
Graph classification is a widely studied problem and has broad applications. In many real-world problems, the number of labeled graphs available for training classification models is limited, which renders these models prone to overfitting. To address this problem, we propose two approaches based on contrastive self-supervised learning (CSSL) to alleviate overfitting. In the first approach, we use CSSL to pretrain graph encoders on widely-available unlabeled graphs without relying on human-provided labels, then finetune the pretrained encoders on labeled graphs. In the second approach, we develop a regularizer based on CSSL, and solve the supervised classification task and the unsupervised CSSL task simultaneously. To perform CSSL on graphs, given a collection of original graphs, we perform data augmentation to create augmented graphs out of the original graphs. An augmented graph is created by consecutively applying a sequence of graph alteration operations. A contrastive loss is defined to learn graph encoders by judging whether two augmented graphs are from the same original graph. Experiments on various graph classification datasets demonstrate the effectiveness of our proposed methods.
△ Less
Submitted 13 September, 2020;
originally announced September 2020.
-
Structured Sparsity Modeling for Improved Multivariate Statistical Analysis based Fault Isolation
Authors:
Wei Chen,
Jiusun Zeng,
Xiaobin Xu,
Shihua Luo,
Chuanhou Gao
Abstract:
In order to improve the fault diagnosis capability of multivariate statistical methods, this article introduces a fault isolation framework based on structured sparsity modeling. The developed method relies on the reconstruction based contribution analysis and the process structure information can be incorporated into the reconstruction objective function in the form of structured sparsity regular…
▽ More
In order to improve the fault diagnosis capability of multivariate statistical methods, this article introduces a fault isolation framework based on structured sparsity modeling. The developed method relies on the reconstruction based contribution analysis and the process structure information can be incorporated into the reconstruction objective function in the form of structured sparsity regularization terms. The structured sparsity terms allow selection of fault variables over structures like blocks or networks of process variables, hence more accurate fault isolation can be achieved. Four structured sparsity terms corresponding to different kinds of process information are considered, namely, partially known sparse support, block sparsity, clustered sparsity and tree-structured sparsity. The optimization problems involving the structured sparsity terms can be solved using the Alternating Direction Method of Multipliers (ADMM) algorithm, which is fast and efficient. Through a simulation example and an application study to a coal-fired power plant, it is verified that the proposed method can better isolate faulty variables by incorporating process structure information.
△ Less
Submitted 21 December, 2020; v1 submitted 5 September, 2020;
originally announced September 2020.
-
Uncertainty modelling and computational aspects of data association
Authors:
Jeremie Houssineau,
Jiajie Zeng,
Ajay Jasra
Abstract:
A novel solution to the smoothing problem for multi-object dynamical systems is proposed and evaluated. The systems of interest contain an unknown and varying number of dynamical objects that are partially observed under noisy and corrupted observations. An alternative representation of uncertainty is considered in order to account for the lack of information about the different aspects of this ty…
▽ More
A novel solution to the smoothing problem for multi-object dynamical systems is proposed and evaluated. The systems of interest contain an unknown and varying number of dynamical objects that are partially observed under noisy and corrupted observations. An alternative representation of uncertainty is considered in order to account for the lack of information about the different aspects of this type of complex system. The corresponding statistical model can be formulated as a hierarchical model consisting of conditionally-independent hidden Markov models. This particular structure is leveraged to propose an efficient method in the context of Markov chain Monte Carlo (MCMC) by relying on an approximate solution to the corresponding filtering problem, in a similar fashion to particle MCMC. This approach is shown to outperform existing algorithms in a range of scenarios.
△ Less
Submitted 5 September, 2020;
originally announced September 2020.
-
Generalized Liquid Association Analysis for Multimodal Data Integration
Authors:
Lexin Li,
**g Zeng,
Xin Zhang
Abstract:
Multimodal data are now prevailing in scientific research. A central question in multimodal integrative analysis is to understand how two data modalities associate and interact with each other given another modality or demographic variables. The problem can be formulated as studying the associations among three sets of random variables, a question that has received relatively less attention in the…
▽ More
Multimodal data are now prevailing in scientific research. A central question in multimodal integrative analysis is to understand how two data modalities associate and interact with each other given another modality or demographic variables. The problem can be formulated as studying the associations among three sets of random variables, a question that has received relatively less attention in the literature. In this article, we propose a novel generalized liquid association analysis method, which offers a new and unique angle to this important class of problems of studying three-way associations. We extend the notion of liquid association of \citet{li2002LA} from the univariate setting to the sparse, multivariate, and high-dimensional setting. We establish a population dimension reduction model, transform the problem to sparse Tucker decomposition of a three-way tensor, and develop a higher-order orthogonal iteration algorithm for parameter estimation. We derive the non-asymptotic error bound and asymptotic consistency of the proposed estimator, while allowing the variable dimensions to be larger than and diverge with the sample size. We demonstrate the efficacy of the method through both simulations and a multimodal neuroimaging application for Alzheimer's disease research.
△ Less
Submitted 24 April, 2021; v1 submitted 9 August, 2020;
originally announced August 2020.
-
DessiLBI: Exploring Structural Sparsity of Deep Networks via Differential Inclusion Paths
Authors:
Yanwei Fu,
Chen Liu,
Donghao Li,
Xinwei Sun,
**shan Zeng,
Yuan Yao
Abstract:
Over-parameterization is ubiquitous nowadays in training neural networks to benefit both optimization in seeking global optima and generalization in reducing prediction error. However, compressive networks are desired in many real world applications and direct training of small networks may be trapped in local optima. In this paper, instead of pruning or distilling over-parameterized models to com…
▽ More
Over-parameterization is ubiquitous nowadays in training neural networks to benefit both optimization in seeking global optima and generalization in reducing prediction error. However, compressive networks are desired in many real world applications and direct training of small networks may be trapped in local optima. In this paper, instead of pruning or distilling over-parameterized models to compressive ones, we propose a new approach based on differential inclusions of inverse scale spaces. Specifically, it generates a family of models from simple to complex ones that couples a pair of parameters to simultaneously train over-parameterized deep models and structural sparsity on weights of fully connected and convolutional layers. Such a differential inclusion scheme has a simple discretization, proposed as Deep structurally splitting Linearized Bregman Iteration (DessiLBI), whose global convergence analysis in deep learning is established that from any initializations, algorithmic iterations converge to a critical point of empirical risks. Experimental evidence shows that DessiLBI achieve comparable and even better performance than the competitive optimizers in exploring the structural sparsity of several widely used backbones on the benchmark datasets. Remarkably, with early stop**, DessiLBI unveils "winning tickets" in early epochs: the effective sparse structure with comparable test accuracy to fully trained over-parameterized models.
△ Less
Submitted 4 July, 2020;
originally announced July 2020.
-
Deep-learning of Parametric Partial Differential Equations from Sparse and Noisy Data
Authors:
Hao Xu,
Dongxiao Zhang,
Junsheng Zeng
Abstract:
Data-driven methods have recently made great progress in the discovery of partial differential equations (PDEs) from spatial-temporal data. However, several challenges remain to be solved, including sparse noisy data, incomplete candidate library, and spatially- or temporally-varying coefficients. In this work, a new framework, which combines neural network, genetic algorithm and adaptive methods,…
▽ More
Data-driven methods have recently made great progress in the discovery of partial differential equations (PDEs) from spatial-temporal data. However, several challenges remain to be solved, including sparse noisy data, incomplete candidate library, and spatially- or temporally-varying coefficients. In this work, a new framework, which combines neural network, genetic algorithm and adaptive methods, is put forward to address all of these challenges simultaneously. In the framework, a trained neural network is utilized to calculate derivatives and generate a large amount of meta-data, which solves the problem of sparse noisy data. Next, genetic algorithm is utilized to discover the form of PDEs and corresponding coefficients with an incomplete candidate library. Finally, a two-step adaptive method is introduced to discover parametric PDEs with spatially- or temporally-varying coefficients. In this method, the structure of a parametric PDE is first discovered, and then the general form of varying coefficients is identified. The proposed algorithm is tested on the Burgers equation, the convection-diffusion equation, the wave equation, and the KdV equation. The results demonstrate that this method is robust to sparse and noisy data, and is able to discover parametric PDEs with an incomplete candidate library.
△ Less
Submitted 16 May, 2020;
originally announced May 2020.
-
MedDialog: Two Large-scale Medical Dialogue Datasets
Authors:
Xuehai He,
Shu Chen,
Zeqian Ju,
Xiangyu Dong,
Hongchao Fang,
Sicheng Wang,
Yue Yang,
Jiaqi Zeng,
Ruisi Zhang,
Ruoyu Zhang,
Meng Zhou,
Penghui Zhu,
Pengtao Xie
Abstract:
Medical dialogue systems are promising in assisting in telemedicine to increase access to healthcare services, improve the quality of patient care, and reduce medical costs. To facilitate the research and development of medical dialogue systems, we build two large-scale medical dialogue datasets: MedDialog-EN and MedDialog-CN. MedDialog-EN is an English dataset containing 0.3 million conversations…
▽ More
Medical dialogue systems are promising in assisting in telemedicine to increase access to healthcare services, improve the quality of patient care, and reduce medical costs. To facilitate the research and development of medical dialogue systems, we build two large-scale medical dialogue datasets: MedDialog-EN and MedDialog-CN. MedDialog-EN is an English dataset containing 0.3 million conversations between patients and doctors and 0.5 million utterances. MedDialog-CN is an Chinese dataset containing 1.1 million conversations and 4 million utterances. To our best knowledge, MedDialog-(EN,CN) are the largest medical dialogue datasets to date. The dataset is available at https://github.com/UCSD-AI4H/Medical-Dialogue-System
△ Less
Submitted 7 July, 2020; v1 submitted 7 April, 2020;
originally announced April 2020.
-
Fully-Corrective Gradient Boosting with Squared Hinge: Fast Learning Rates and Early Stop**
Authors:
**shan Zeng,
Min Zhang,
Shao-Bo Lin
Abstract:
Boosting is a well-known method for improving the accuracy of weak learners in machine learning. However, its theoretical generalization guarantee is missing in literature. In this paper, we propose an efficient boosting method with theoretical generalization guarantees for binary classification. Three key ingredients of the proposed boosting method are: a) the \textit{fully-corrective greedy} (FC…
▽ More
Boosting is a well-known method for improving the accuracy of weak learners in machine learning. However, its theoretical generalization guarantee is missing in literature. In this paper, we propose an efficient boosting method with theoretical generalization guarantees for binary classification. Three key ingredients of the proposed boosting method are: a) the \textit{fully-corrective greedy} (FCG) update in the boosting procedure, b) a differentiable \textit{squared hinge} (also called \textit{truncated quadratic}) function as the loss function, and c) an efficient alternating direction method of multipliers (ADMM) algorithm for the associated FCG optimization. The used squared hinge loss not only inherits the robustness of the well-known hinge loss for classification with outliers, but also brings some benefits for computational implementation and theoretical justification. Under some sparseness assumption, we derive a fast learning rate of the order ${\cal O}((m/\log m)^{-1/4})$ for the proposed boosting method, which can be further improved to ${\cal O}((m/\log m)^{-1/2})$ if certain additional noise assumption is imposed, where $m$ is the size of sample set. Both derived learning rates are the best ones among the existing generalization results of boosting-type methods for classification. Moreover, an efficient early stop** scheme is provided for the proposed method. A series of toy simulations and real data experiments are conducted to verify the developed theories and demonstrate the effectiveness of the proposed method.
△ Less
Submitted 31 March, 2020;
originally announced April 2020.
-
Block Hankel Tensor ARIMA for Multiple Short Time Series Forecasting
Authors:
Qiquan Shi,
Jiaming Yin,
Jiajun Cai,
Andrzej Cichocki,
Tatsuya Yokota,
Lei Chen,
Mingxuan Yuan,
Jia Zeng
Abstract:
This work proposes a novel approach for multiple time series forecasting. At first, multi-way delay embedding transform (MDT) is employed to represent time series as low-rank block Hankel tensors (BHT). Then, the higher-order tensors are projected to compressed core tensors by applying Tucker decomposition. At the same time, the generalized tensor Autoregressive Integrated Moving Average (ARIMA) i…
▽ More
This work proposes a novel approach for multiple time series forecasting. At first, multi-way delay embedding transform (MDT) is employed to represent time series as low-rank block Hankel tensors (BHT). Then, the higher-order tensors are projected to compressed core tensors by applying Tucker decomposition. At the same time, the generalized tensor Autoregressive Integrated Moving Average (ARIMA) is explicitly used on consecutive core tensors to predict future samples. In this manner, the proposed approach tactically incorporates the unique advantages of MDT tensorization (to exploit mutual correlations) and tensor ARIMA coupled with low-rank Tucker decomposition into a unified framework. This framework exploits the low-rank structure of block Hankel tensors in the embedded space and captures the intrinsic correlations among multiple TS, which thus can improve the forecasting results, especially for multiple short time series. Experiments conducted on three public datasets and two industrial datasets verify that the proposed BHT-ARIMA effectively improves forecasting accuracy and reduces computational cost compared with the state-of-the-art methods.
△ Less
Submitted 25 February, 2020;
originally announced February 2020.
-
Transfer Learning-Based Outdoor Position Recovery with Telco Data
Authors:
Yige Zhang,
Aaron Yi Ding,
Jorg Ott,
Mingxuan Yuan,
Jia Zeng,
Kun Zhang,
Weixiong Rao
Abstract:
Telecommunication (Telco) outdoor position recovery aims to localize outdoor mobile devices by leveraging measurement report (MR) data. Unfortunately, Telco position recovery requires sufficient amount of MR samples across different areas and suffers from high data collection cost. For an area with scarce MR samples, it is hard to achieve good accuracy. In this paper, by leveraging the recently de…
▽ More
Telecommunication (Telco) outdoor position recovery aims to localize outdoor mobile devices by leveraging measurement report (MR) data. Unfortunately, Telco position recovery requires sufficient amount of MR samples across different areas and suffers from high data collection cost. For an area with scarce MR samples, it is hard to achieve good accuracy. In this paper, by leveraging the recently developed transfer learning techniques, we design a novel Telco position recovery framework, called TLoc, to transfer good models in the carefully selected source domains (those fine-grained small subareas) to a target one which originally suffers from poor localization accuracy. Specifically, TLoc introduces three dedicated components: 1) a new coordinate space to divide an area of interest into smaller domains, 2) a similarity measurement to select best source domains, and 3) an adaptation of an existing transfer learning approach. To the best of our knowledge, TLoc is the first framework that demonstrates the efficacy of applying transfer learning in the Telco outdoor position recovery. To exemplify, on the 2G GSM and 4G LTE MR datasets in Shanghai, TLoc outperforms a nontransfer approach by 27.58% and 26.12% less median errors, and further leads to 47.77% and 49.22% less median errors than a recent fingerprinting approach NBL.
△ Less
Submitted 10 December, 2019;
originally announced December 2019.
-
Fast Stochastic Ordinal Embedding with Variance Reduction and Adaptive Step Size
Authors:
Ke Ma,
**shan Zeng,
Qianqian Xu,
Xiaochun Cao,
Wei Liu,
Yuan Yao
Abstract:
Learning representation from relative similarity comparisons, often called ordinal embedding, gains rising attention in recent years. Most of the existing methods are based on semi-definite programming (\textit{SDP}), which is generally time-consuming and degrades the scalability, especially confronting large-scale data. To overcome this challenge, we propose a stochastic algorithm called \textit{…
▽ More
Learning representation from relative similarity comparisons, often called ordinal embedding, gains rising attention in recent years. Most of the existing methods are based on semi-definite programming (\textit{SDP}), which is generally time-consuming and degrades the scalability, especially confronting large-scale data. To overcome this challenge, we propose a stochastic algorithm called \textit{SVRG-SBB}, which has the following features: i) achieving good scalability via drop** positive semi-definite (\textit{PSD}) constraints as serving a fast algorithm, i.e., stochastic variance reduced gradient (\textit{SVRG}) method, and ii) adaptive learning via introducing a new, adaptive step size called the stabilized Barzilai-Borwein (\textit{SBB}) step size. Theoretically, under some natural assumptions, we show the $\boldsymbol{O}(\frac{1}{T})$ rate of convergence to a stationary point of the proposed algorithm, where $T$ is the number of total iterations. Under the further Polyak-Łojasiewicz assumption, we can show the global linear convergence (i.e., exponentially fast converging to a global optimum) of the proposed algorithm. Numerous simulations and real-world data experiments are conducted to show the effectiveness of the proposed algorithm by comparing with the state-of-the-art methods, notably, much lower computational cost with good prediction performance.
△ Less
Submitted 1 December, 2019;
originally announced December 2019.
-
Fast Polynomial Kernel Classification for Massive Data
Authors:
**shan Zeng,
Minrun Wu,
Shao-Bo Lin,
Ding-Xuan Zhou
Abstract:
In the era of big data, it is desired to develop efficient machine learning algorithms to tackle massive data challenges such as storage bottleneck, algorithmic scalability, and interpretability. In this paper, we develop a novel efficient classification algorithm, called fast polynomial kernel classification (FPC), to conquer the scalability and storage challenges. Our main tools are a suitable s…
▽ More
In the era of big data, it is desired to develop efficient machine learning algorithms to tackle massive data challenges such as storage bottleneck, algorithmic scalability, and interpretability. In this paper, we develop a novel efficient classification algorithm, called fast polynomial kernel classification (FPC), to conquer the scalability and storage challenges. Our main tools are a suitable selected feature map** based on polynomial kernels and an alternating direction method of multipliers (ADMM) algorithm for a related non-smooth convex optimization problem. Fast learning rates as well as feasibility verifications including the efficiency of an ADMM solver with convergence guarantees and the selection of center points are established to justify theoretical behaviors of FPC. Our theoretical assertions are verified by a series of simulations and real data applications. Numerical results demonstrate that FPC significantly reduces the computational burden and storage memory of existing learning schemes such as support vector machines, Nyström and random feature methods, without sacrificing their generalization abilities much.
△ Less
Submitted 11 November, 2022; v1 submitted 24 November, 2019;
originally announced November 2019.
-
Exploring Structural Sparsity of Deep Networks via Inverse Scale Spaces
Authors:
Yanwei Fu,
Chen Liu,
Donghao Li,
Zuyuan Zhong,
Xinwei Sun,
**shan Zeng,
Yuan Yao
Abstract:
The great success of deep neural networks is built upon their over-parameterization, which smooths the optimization landscape without degrading the generalization ability. Despite the benefits of over-parameterization, a huge amount of parameters makes deep networks cumbersome in daily life applications. Though techniques such as pruning and distillation are developed, they are expensive in fully…
▽ More
The great success of deep neural networks is built upon their over-parameterization, which smooths the optimization landscape without degrading the generalization ability. Despite the benefits of over-parameterization, a huge amount of parameters makes deep networks cumbersome in daily life applications. Though techniques such as pruning and distillation are developed, they are expensive in fully training a dense network as backward selection methods, and there is still a void on systematically exploring forward selection methods for learning structural sparsity in deep networks. To fill in this gap, this paper proposes a new approach based on differential inclusions of inverse scale spaces, which generate a family of models from simple to complex ones along the dynamics via coupling a pair of parameters, such that over-parameterized deep models and their structural sparsity can be explored simultaneously. This kind of differential inclusion scheme has a simple discretization, dubbed Deep structure splitting Linearized Bregman Iteration (DessiLBI), whose global convergence in learning deep networks could be established under the Kurdyka-Lojasiewicz framework. Experimental evidence shows that our method achieves comparable and even better performance than the competitive optimizers in exploring the sparse structure of several widely used backbones on the benchmark datasets. Remarkably, with early stop**, our method unveils `winning tickets' in early epochs: the effective sparse network structures with comparable test accuracy to fully trained over-parameterized models, that are further transferable to similar alternative tasks. Furthermore, our method is able to grow networks efficiently with adaptive filter configurations, demonstrating a good performance with much less computational cost. Codes and models can be downloaded at {https://github.com/DessiLBI2020/DessiLBI}.
△ Less
Submitted 21 April, 2022; v1 submitted 22 May, 2019;
originally announced May 2019.
-
On ADMM in Deep Learning: Convergence and Saturation-Avoidance
Authors:
**shan Zeng,
Shao-Bo Lin,
Yuan Yao,
Ding-Xuan Zhou
Abstract:
In this paper, we develop an alternating direction method of multipliers (ADMM) for deep neural networks training with sigmoid-type activation functions (called \textit{sigmoid-ADMM pair}), mainly motivated by the gradient-free nature of ADMM in avoiding the saturation of sigmoid-type activations and the advantages of deep neural networks with sigmoid-type activations (called deep sigmoid nets) ov…
▽ More
In this paper, we develop an alternating direction method of multipliers (ADMM) for deep neural networks training with sigmoid-type activation functions (called \textit{sigmoid-ADMM pair}), mainly motivated by the gradient-free nature of ADMM in avoiding the saturation of sigmoid-type activations and the advantages of deep neural networks with sigmoid-type activations (called deep sigmoid nets) over their rectified linear unit (ReLU) counterparts (called deep ReLU nets) in terms of approximation. In particular, we prove that the approximation capability of deep sigmoid nets is not worse than that of deep ReLU nets by showing that ReLU activation function can be well approximated by deep sigmoid nets with two hidden layers and finitely many free parameters but not vice-verse. We also establish the global convergence of the proposed ADMM for the nonlinearly constrained formulation of the deep sigmoid nets training from arbitrary initial points to a Karush-Kuhn-Tucker (KKT) point at a rate of order ${\cal O}(1/k)$. Besides sigmoid activation, such a convergence theorem holds for a general class of smooth activations. Compared with the widely used stochastic gradient descent (SGD) algorithm for the deep ReLU nets training (called ReLU-SGD pair), the proposed sigmoid-ADMM pair is practically stable with respect to the algorithmic hyperparameters including the learning rate, initial schemes and the pro-processing of the input data. Moreover, we find that to approximate and learn simple but important functions the proposed sigmoid-ADMM pair numerically outperforms the ReLU-SGD pair.
△ Less
Submitted 15 September, 2021; v1 submitted 6 February, 2019;
originally announced February 2019.
-
The Relevance of Bayesian Layer Positioning to Model Uncertainty in Deep Bayesian Active Learning
Authors:
Jiaming Zeng,
Adam Lesnikowski,
Jose M. Alvarez
Abstract:
One of the main challenges of deep learning tools is their inability to capture model uncertainty. While Bayesian deep learning can be used to tackle the problem, Bayesian neural networks often require more time and computational power to train than deterministic networks. Our work explores whether fully Bayesian networks are needed to successfully capture model uncertainty. We vary the number and…
▽ More
One of the main challenges of deep learning tools is their inability to capture model uncertainty. While Bayesian deep learning can be used to tackle the problem, Bayesian neural networks often require more time and computational power to train than deterministic networks. Our work explores whether fully Bayesian networks are needed to successfully capture model uncertainty. We vary the number and position of Bayesian layers in a network and compare their performance on active learning with the MNIST dataset. We found that we can fully capture the model uncertainty by using only a few Bayesian layers near the output of the network, combining the advantages of deterministic and Bayesian networks.
△ Less
Submitted 29 November, 2018;
originally announced November 2018.
-
Learning and Inference on Generative Adversarial Quantum Circuits
Authors:
**feng Zeng,
Yufeng Wu,
**-Guo Liu,
Lei Wang,
Jiang** Hu
Abstract:
Quantum mechanics is inherently probabilistic in light of Born's rule. Using quantum circuits as probabilistic generative models for classical data exploits their superior expressibility and efficient direct sampling ability. However, training of quantum circuits can be more challenging compared to classical neural networks due to lack of efficient differentiable learning algorithm. We devise an a…
▽ More
Quantum mechanics is inherently probabilistic in light of Born's rule. Using quantum circuits as probabilistic generative models for classical data exploits their superior expressibility and efficient direct sampling ability. However, training of quantum circuits can be more challenging compared to classical neural networks due to lack of efficient differentiable learning algorithm. We devise an adversarial quantum-classical hybrid training scheme via coupling a quantum circuit generator and a classical neural network discriminator together. After training, the quantum circuit generative model can infer missing data with quadratic speed up via amplitude amplification. We numerically simulate the learning and inference of generative adversarial quantum circuit using the prototypical Bars-and-Stripes dataset. Generative adversarial quantum circuits is a fresh approach to machine learning which may enjoy the practically useful quantum advantage on near-term quantum devices.
△ Less
Submitted 10 August, 2018;
originally announced August 2018.
-
A Proximal Block Coordinate Descent Algorithm for Deep Neural Network Training
Authors:
Tim Tsz-Kit Lau,
**shan Zeng,
Baoyuan Wu,
Yuan Yao
Abstract:
Training deep neural networks (DNNs) efficiently is a challenge due to the associated highly nonconvex optimization. The backpropagation (backprop) algorithm has long been the most widely used algorithm for gradient computation of parameters of DNNs and is used along with gradient descent-type algorithms for this optimization task. Recent work have shown the efficiency of block coordinate descent…
▽ More
Training deep neural networks (DNNs) efficiently is a challenge due to the associated highly nonconvex optimization. The backpropagation (backprop) algorithm has long been the most widely used algorithm for gradient computation of parameters of DNNs and is used along with gradient descent-type algorithms for this optimization task. Recent work have shown the efficiency of block coordinate descent (BCD) type methods empirically for training DNNs. In view of this, we propose a novel algorithm based on the BCD method for training DNNs and provide its global convergence results built upon the powerful framework of the Kurdyka-Lojasiewicz (KL) property. Numerical experiments on standard datasets demonstrate its competitive efficiency against standard optimizers with backprop.
△ Less
Submitted 24 March, 2018;
originally announced March 2018.
-
Global Convergence of Block Coordinate Descent in Deep Learning
Authors:
**shan Zeng,
Tim Tsz-Kit Lau,
Shaobo Lin,
Yuan Yao
Abstract:
Deep learning has aroused extensive attention due to its great empirical success. The efficiency of the block coordinate descent (BCD) methods has been recently demonstrated in deep neural network (DNN) training. However, theoretical studies on their convergence properties are limited due to the highly nonconvex nature of DNN training. In this paper, we aim at providing a general methodology for p…
▽ More
Deep learning has aroused extensive attention due to its great empirical success. The efficiency of the block coordinate descent (BCD) methods has been recently demonstrated in deep neural network (DNN) training. However, theoretical studies on their convergence properties are limited due to the highly nonconvex nature of DNN training. In this paper, we aim at providing a general methodology for provable convergence guarantees for this type of methods. In particular, for most of the commonly used DNN training models involving both two- and three-splitting schemes, we establish the global convergence to a critical point at a rate of ${\cal O}(1/k)$, where $k$ is the number of iterations. The results extend to general loss functions which have Lipschitz continuous gradients and deep residual networks (ResNets). Our key development adds several new elements to the Kurdyka-Łojasiewicz inequality framework that enables us to carry out the global convergence analysis of BCD in the general scenario of deep learning.
△ Less
Submitted 12 May, 2019; v1 submitted 1 March, 2018;
originally announced March 2018.
-
Stochastic Non-convex Ordinal Embedding with Stabilized Barzilai-Borwein Step Size
Authors:
Ke Ma,
**shan Zeng,
Jiechao Xiong,
Qianqian Xu,
Xiaochun Cao,
Wei Liu,
Yuan Yao
Abstract:
Learning representation from relative similarity comparisons, often called ordinal embedding, gains rising attention in recent years. Most of the existing methods are batch methods designed mainly based on the convex optimization, say, the projected gradient descent method. However, they are generally time-consuming due to that the singular value decomposition (SVD) is commonly adopted during the…
▽ More
Learning representation from relative similarity comparisons, often called ordinal embedding, gains rising attention in recent years. Most of the existing methods are batch methods designed mainly based on the convex optimization, say, the projected gradient descent method. However, they are generally time-consuming due to that the singular value decomposition (SVD) is commonly adopted during the update, especially when the data size is very large. To overcome this challenge, we propose a stochastic algorithm called SVRG-SBB, which has the following features: (a) SVD-free via drop** convexity, with good scalability by the use of stochastic algorithm, i.e., stochastic variance reduced gradient (SVRG), and (b) adaptive step size choice via introducing a new stabilized Barzilai-Borwein (SBB) method as the original version for convex problems might fail for the considered stochastic \textit{non-convex} optimization problem. Moreover, we show that the proposed algorithm converges to a stationary point at a rate $\mathcal{O}(\frac{1}{T})$ in our setting, where $T$ is the number of total iterations. Numerous simulations and real-world data experiments are conducted to show the effectiveness of the proposed algorithm via comparing with the state-of-the-art methods, particularly, much lower computational cost with good prediction performance.
△ Less
Submitted 30 January, 2018; v1 submitted 17 November, 2017;
originally announced November 2017.
-
Learning rates for classification with Gaussian kernels
Authors:
Shao-Bo Lin,
**shan Zeng,
Xiangyu Chang
Abstract:
This paper aims at refined error analysis for binary classification using support vector machine (SVM) with Gaussian kernel and convex loss. Our first result shows that for some loss functions such as the truncated quadratic loss and quadratic loss, SVM with Gaussian kernel can reach the almost optimal learning rate, provided the regression function is smooth. Our second result shows that, for a l…
▽ More
This paper aims at refined error analysis for binary classification using support vector machine (SVM) with Gaussian kernel and convex loss. Our first result shows that for some loss functions such as the truncated quadratic loss and quadratic loss, SVM with Gaussian kernel can reach the almost optimal learning rate, provided the regression function is smooth. Our second result shows that, for a large number of loss functions, under some Tsybakov noise assumption, if the regression function is infinitely smooth, then SVM with Gaussian kernel can achieve the learning rate of order $m^{-1}$, where $m$ is the number of samples.
△ Less
Submitted 5 October, 2017; v1 submitted 28 February, 2017;
originally announced February 2017.
-
Interpretable Classification Models for Recidivism Prediction
Authors:
Jiaming Zeng,
Berk Ustun,
Cynthia Rudin
Abstract:
We investigate a long-debated question, which is how to create predictive models of recidivism that are sufficiently accurate, transparent, and interpretable to use for decision-making. This question is complicated as these models are used to support different decisions, from sentencing, to determining release on probation, to allocating preventative social services. Each use case might have an ob…
▽ More
We investigate a long-debated question, which is how to create predictive models of recidivism that are sufficiently accurate, transparent, and interpretable to use for decision-making. This question is complicated as these models are used to support different decisions, from sentencing, to determining release on probation, to allocating preventative social services. Each use case might have an objective other than classification accuracy, such as a desired true positive rate (TPR) or false positive rate (FPR). Each (TPR, FPR) pair is a point on the receiver operator characteristic (ROC) curve. We use popular machine learning methods to create models along the full ROC curve on a wide range of recidivism prediction problems. We show that many methods (SVM, Ridge Regression) produce equally accurate models along the full ROC curve. However, methods that designed for interpretability (CART, C5.0) cannot be tuned to produce models that are accurate and/or interpretable. To handle this shortcoming, we use a new method known as SLIM (Supersparse Linear Integer Models) to produce accurate, transparent, and interpretable models along the full ROC curve. These models can be used for decision-making for many different use cases, since they are just as accurate as the most powerful black-box machine learning models, but completely transparent, and highly interpretable.
△ Less
Submitted 7 July, 2016; v1 submitted 26 March, 2015;
originally announced March 2015.
-
Learning rates of $l^q$ coefficient regularization learning with Gaussian kernel
Authors:
Shaobo Lin,
**shan Zeng,
Jian Fang,
Zongben Xu
Abstract:
Regularization is a well recognized powerful strategy to improve the performance of a learning machine and $l^q$ regularization schemes with $0<q<\infty$ are central in use. It is known that different $q$ leads to different properties of the deduced estimators, say, $l^2$ regularization leads to smooth estimators while $l^1$ regularization leads to sparse estimators. Then, how does the generalizat…
▽ More
Regularization is a well recognized powerful strategy to improve the performance of a learning machine and $l^q$ regularization schemes with $0<q<\infty$ are central in use. It is known that different $q$ leads to different properties of the deduced estimators, say, $l^2$ regularization leads to smooth estimators while $l^1$ regularization leads to sparse estimators. Then, how does the generalization capabilities of $l^q$ regularization learning vary with $q$? In this paper, we study this problem in the framework of statistical learning theory and show that implementing $l^q$ coefficient regularization schemes in the sample dependent hypothesis space associated with Gaussian kernel can attain the same almost optimal learning rates for all $0<q<\infty$. That is, the upper and lower bounds of learning rates for $l^q$ regularization learning are asymptotically identical for all $0<q<\infty$. Our finding tentatively reveals that, in some modeling contexts, the choice of $q$ might not have a strong impact with respect to the generalization capability. From this perspective, $q$ can be arbitrarily specified, or specified merely by other no generalization criteria like smoothness, computational complexity, sparsity, etc..
△ Less
Submitted 24 September, 2014; v1 submitted 19 December, 2013;
originally announced December 2013.
-
Towards Big Topic Modeling
Authors:
Jian-Feng Yan,
Jia Zeng,
Zhi-Qiang Liu,
Yang Gao
Abstract:
To solve the big topic modeling problem, we need to reduce both time and space complexities of batch latent Dirichlet allocation (LDA) algorithms. Although parallel LDA algorithms on the multi-processor architecture have low time and space complexities, their communication costs among processors often scale linearly with the vocabulary size and the number of topics, leading to a serious scalabilit…
▽ More
To solve the big topic modeling problem, we need to reduce both time and space complexities of batch latent Dirichlet allocation (LDA) algorithms. Although parallel LDA algorithms on the multi-processor architecture have low time and space complexities, their communication costs among processors often scale linearly with the vocabulary size and the number of topics, leading to a serious scalability problem. To reduce the communication complexity among processors for a better scalability, we propose a novel communication-efficient parallel topic modeling architecture based on power law, which consumes orders of magnitude less communication time when the number of topics is large. We combine the proposed communication-efficient parallel architecture with the online belief propagation (OBP) algorithm referred to as POBP for big topic modeling tasks. Extensive empirical results confirm that POBP has the following advantages to solve the big topic modeling problem: 1) high accuracy, 2) communication-efficient, 3) fast speed, and 4) constant memory usage when compared with recent state-of-the-art parallel LDA algorithms on the multi-processor architecture.
△ Less
Submitted 17 November, 2013;
originally announced November 2013.
-
Does generalization performance of $l^q$ regularization learning depend on $q$? A negative example
Authors:
Shaobo Lin,
Chen Xu,
**gshan Zeng,
Jian Fang
Abstract:
$l^q$-regularization has been demonstrated to be an attractive technique in machine learning and statistical modeling. It attempts to improve the generalization (prediction) capability of a machine (model) through appropriately shrinking its coefficients. The shape of a $l^q$ estimator differs in varying choices of the regularization order $q$. In particular, $l^1…
▽ More
$l^q$-regularization has been demonstrated to be an attractive technique in machine learning and statistical modeling. It attempts to improve the generalization (prediction) capability of a machine (model) through appropriately shrinking its coefficients. The shape of a $l^q$ estimator differs in varying choices of the regularization order $q$. In particular, $l^1$ leads to the LASSO estimate, while $l^{2}$ corresponds to the smooth ridge regression. This makes the order $q$ a potential tuning parameter in applications. To facilitate the use of $l^{q}$-regularization, we intend to seek for a modeling strategy where an elaborative selection on $q$ is avoidable. In this spirit, we place our investigation within a general framework of $l^{q}$-regularized kernel learning under a sample dependent hypothesis space (SDHS). For a designated class of kernel functions, we show that all $l^{q}$ estimators for $0< q < \infty$ attain similar generalization error bounds. These estimated bounds are almost optimal in the sense that up to a logarithmic factor, the upper and lower bounds are asymptotically identical. This finding tentatively reveals that, in some modeling contexts, the choice of $q$ might not have a strong impact in terms of the generalization capability. From this perspective, $q$ can be arbitrarily specified, or specified merely by other no generalization criteria like smoothness, computational complexity, sparsity, etc..
△ Less
Submitted 13 June, 2023; v1 submitted 24 July, 2013;
originally announced July 2013.