Search | arXiv e-print repository

arXiv:2302.11971

Efficiently handling constraints with Metropolis-adjusted Langevin algorithm

Authors: **yuan Chang, Cheng Yong Tang, Yuanzheng Zhu

Abstract: In this study, we investigate the performance of the Metropolis-adjusted Langevin algorithm in a setting with constraints on the support of the target distribution. We provide a rigorous analysis of the resulting Markov chain, establishing its convergence and deriving an upper bound for its mixing time. Our results demonstrate that the Metropolis-adjusted Langevin algorithm is highly effective in… ▽ More In this study, we investigate the performance of the Metropolis-adjusted Langevin algorithm in a setting with constraints on the support of the target distribution. We provide a rigorous analysis of the resulting Markov chain, establishing its convergence and deriving an upper bound for its mixing time. Our results demonstrate that the Metropolis-adjusted Langevin algorithm is highly effective in handling this challenging situation: the mixing time bound we obtain is superior to the best known bounds for competing algorithms without an accept-reject step. Our numerical experiments support these theoretical findings, indicating that the Metropolis-adjusted Langevin algorithm shows promising performance when dealing with constraints on the support of the target distribution. △ Less

Submitted 14 May, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

Comments: We find some error in the proof of Theorem 2 and the associated result may not be correct

arXiv:2203.02328 [pdf, ps, other]

Multijoints and Factorisation

Authors: Michael Chi Yung Tang

Abstract: We solve the dual multijoint problem and prove the existence of so-called "factorisations" for arbitrary fields and multijoints of $k_j$-planes. More generally, we deduce a discrete analogue of a theorem due in essence to Bourgain and Guth. Our result is a universal statement which describes a property of the discrete wedge product without any explicit reference to multijoints and is stated as fol… ▽ More We solve the dual multijoint problem and prove the existence of so-called "factorisations" for arbitrary fields and multijoints of $k_j$-planes. More generally, we deduce a discrete analogue of a theorem due in essence to Bourgain and Guth. Our result is a universal statement which describes a property of the discrete wedge product without any explicit reference to multijoints and is stated as follows: Suppose that $k_1 + \ldots + k_d = n$. There is a constant $C=C(n)$ so that for any field $\mathbb{F}$ and for any finitely supported function $S : \mathbb{F}^n \rightarrow \mathbb{R}_{\geq 0}$, there are factorising functions $s_{k_j} : \mathbb{F}^n\times \mathrm{Gr}(k_j, \mathbb{F}^n)\rightarrow \mathbb{R}_{\geq 0}$ such that $$(V_1 \wedge\cdots\wedge V_d)S(p)^d \leq C\prod_{j=1}^d s_{k_j}(p, V_j),$$ for every $p\in \mathbb{F}^n$ and every tuple of planes $V_j\in \mathrm{Gr}(k_j, \mathbb{F}^n)$, and $$\sum_{p\in π_j} s(p, e(π_j)) =||S||_d$$ for every $k_j$-plane $π_j\subset \mathbb{F}^n$, where $e(π_j)\in \mathrm{Gr}(k_j,\mathbb{F}^n)$ denotes the translate of $π_j$ that contains the origin and $\wedge$ denotes the discrete wedge product. △ Less

Submitted 8 April, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

Comments: 18 pages, references updated

MSC Class: 46A20; 46G25; 47H60; 52C35; 52C99

arXiv:2203.02320 [pdf, ps, other]

Non-Transversal Multilinear Duality and Joints

Authors: Anthony Carbery, Michael Chi Yung Tang

Abstract: We develop a framework for a duality theory for general multilinear operators which extends that for transversal multilinear operators which has been established in arXiv:1809.02449. We apply it to the setting of joints and multijoints, and obtain a "factorisation" theorem which provides an analogue in the discrete setting of results of Bourgain and Guth (arXiv:0811.2251 and arXiv:1012.3760) from… ▽ More We develop a framework for a duality theory for general multilinear operators which extends that for transversal multilinear operators which has been established in arXiv:1809.02449. We apply it to the setting of joints and multijoints, and obtain a "factorisation" theorem which provides an analogue in the discrete setting of results of Bourgain and Guth (arXiv:0811.2251 and arXiv:1012.3760) from the Euclidean setting. △ Less

Submitted 8 April, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

Comments: 19 pages, references updated

MSC Class: 46A20; 47H60; 47N99; 42Bxx; 52C99

arXiv:2110.12913 [pdf]

Antinodal kink in the band dispersion of electron-doped cuprate ${\rm La}_{2-x}{\rm Ce}_x{\rm CuO}_{4\pmδ}$

Authors: C. Y. Tang, Z. F. Lin, J. X. Zhang, X. C. Guo, J. Y. Guan, S. Y. Gao, Z. C. Rao, J. Zhao, Y. B. Huang, T. Qian, Z. Y. Weng, K. **, Y. J. Sun, H. Ding

Abstract: Angle-resolved photoemission spectroscopy (ARPES) measurements have established the phenomenon of kink in band dispersion of high-$T_{\rm c}$ cuprate superconductors. However, systematic studies of the kink in electron-doped cuprates are still lacking experimentally. We performed $in$-$situ$ ARPES measurements on ${\rm La}_{2-x}{\rm Ce}_x{\rm CuO}_{4\pmδ}$ (LCCO) thin films over a wide electron do… ▽ More Angle-resolved photoemission spectroscopy (ARPES) measurements have established the phenomenon of kink in band dispersion of high-$T_{\rm c}$ cuprate superconductors. However, systematic studies of the kink in electron-doped cuprates are still lacking experimentally. We performed $in$-$situ$ ARPES measurements on ${\rm La}_{2-x}{\rm Ce}_x{\rm CuO}_{4\pmδ}$ (LCCO) thin films over a wide electron do** ($n$) range from 0.05 to 0.23. While the nodal kink is nearly invisible, an antinodal kink around 45 meV, surviving above 200 K, is observed for $n\sim0.05-0.19$, whose position is roughly independent of do**. The fact that the antinodal kink observed at high temperatures and in the highly overdoped region favors the phonon mechanism with contributions from the Cu-O bond-stretching mode and the out-of-plane oxygen buckling mode. Our results also suggest that the antinodal kink of LCCO is only weakly coupled to its superconductivity. △ Less

Submitted 25 October, 2021; originally announced October 2021.

arXiv:2109.05861 [pdf, other]

Applied Regression Analysis of Correlations for Correlated Data

Authors: Jie Hu, Yu Chen, Chenlei Leng, Cheng Yong Tang

Abstract: Correlated data are ubiquitous in today's data-driven society. While regression models for analyzing means and variances of responses of interest are relatively well-developed, the development of these models for analyzing the correlations is largely confined to longitudinal data, a special form of sequentially correlated data. This paper proposes a new method for the analysis of correlations to f… ▽ More Correlated data are ubiquitous in today's data-driven society. While regression models for analyzing means and variances of responses of interest are relatively well-developed, the development of these models for analyzing the correlations is largely confined to longitudinal data, a special form of sequentially correlated data. This paper proposes a new method for the analysis of correlations to fully exploit the use of covariates for general correlated data. In a renewed analysis of the Classroom data, a highly unbalanced multilevel clustered data with within-class and within-school correlations, our method reveals informative insights on these structures not previously known. In another analysis of the malaria immune response data in Benin, a longitudinal study with time-dependent covariates where the exact times of the observations are not available, our approach again provides promising new results. At the heart of our approach is a new generalized z-transformation that converts correlation matrices constrained to be positive definite to vectors with unrestricted support, and is order-invariant. These two properties enable us to develop regression analysis incorporating covariates for the modelling of correlations via the use of maximum likelihood. △ Less

Submitted 11 June, 2023; v1 submitted 13 September, 2021; originally announced September 2021.

arXiv:2106.09071 [pdf, ps, other]

Pre-processing with Orthogonal Decompositions for High-dimensional Explanatory Variables

Authors: Xu Han, Ethan X Fang, Cheng Yong Tang

Abstract: Strong correlations between explanatory variables are problematic for high-dimensional regularized regression methods. Due to the violation of the Irrepresentable Condition, the popular LASSO method may suffer from false inclusions of inactive variables. In this paper, we propose pre-processing with orthogonal decompositions (PROD) for the explanatory variables in high-dimensional regressions. The… ▽ More Strong correlations between explanatory variables are problematic for high-dimensional regularized regression methods. Due to the violation of the Irrepresentable Condition, the popular LASSO method may suffer from false inclusions of inactive variables. In this paper, we propose pre-processing with orthogonal decompositions (PROD) for the explanatory variables in high-dimensional regressions. The PROD procedure is constructed based upon a generic orthogonal decomposition of the design matrix. We demonstrate by two concrete cases that the PROD approach can be effectively constructed for improving the performance of high-dimensional penalized regression. Our theoretical analysis reveals their properties and benefits for high-dimensional penalized linear regression with LASSO. Extensive numerical studies with simulations and data analysis show the promising performance of the PROD. △ Less

Submitted 16 June, 2021; originally announced June 2021.

MSC Class: 62J05

arXiv:2106.05551 [pdf]

doi 10.1103/PhysRevB.104.155125

Suppression of Antiferromagnetism in Electron-Doped Cuprate $T'$-${\rm La}_{2-x}{\rm Ce}_x\rm {CuO}_{4\pmδ}$

Authors: C. Y. Tang, Z. F. Lin, J. X. Zhang, X. C. Guo, J. Y. Guan, S. Y. Gao, Z. C. Rao, J. Zhao, Y. B. Huang, T. Qian, Z. Y. Weng, K. **, Y. J. Sun, H. Ding

Abstract: We performed systematic angle-resolved photoemission spectroscopy measurements $in$-$situ$ on $T'$-${\rm La}_{2-x}{\rm Ce}_x\rm {CuO}_{4\pmδ}$ (LCCO) thin films over the extended do** range prepared by the refined ozone/vacuum annealing method. Electron do** level ($n$), estimated from the measured Fermi surface volume, varies from 0.05 to 0.23, which covers the whole superconducting dome. We… ▽ More We performed systematic angle-resolved photoemission spectroscopy measurements $in$-$situ$ on $T'$-${\rm La}_{2-x}{\rm Ce}_x\rm {CuO}_{4\pmδ}$ (LCCO) thin films over the extended do** range prepared by the refined ozone/vacuum annealing method. Electron do** level ($n$), estimated from the measured Fermi surface volume, varies from 0.05 to 0.23, which covers the whole superconducting dome. We observed an absence of the insulating behavior around $n \sim$ 0.05 and the Fermi surface reconstruction shifted to $n \sim$ 0.11 in LCCO compared to that of other electron-doped cuprates at around 0.15, suggesting that antiferromagnetism is strongly suppressed in this material. The possible explanation may lie in the enhanced -$t'$ /$t$ in LCCO for the largest $\rm{La^{3+}}$ ionic radius among all the Lanthanide elements. △ Less

Submitted 10 June, 2021; originally announced June 2021.

arXiv:2102.09080 [pdf, other]

Adjusting the Benjamini-Hochberg method for controlling the false discovery rate in knockoff assisted variable selection

Authors: Sanat K. Sarkar, Cheng Yong Tang

Abstract: The knockoff-based multiple testing setup of Barber & Candes (2015) for variable selection in multiple regression where sample size is as large as the number of explanatory variables is considered. The method of Benjamini & Hochberg (1995) based on ordinary least squares estimates of the regression coefficients is adjusted to the setup, transforming it to a valid p-value based false discovery rate… ▽ More The knockoff-based multiple testing setup of Barber & Candes (2015) for variable selection in multiple regression where sample size is as large as the number of explanatory variables is considered. The method of Benjamini & Hochberg (1995) based on ordinary least squares estimates of the regression coefficients is adjusted to the setup, transforming it to a valid p-value based false discovery rate controlling method not relying on any specific correlation structure of the explanatory variables. Simulations and real data applications show that our proposed method that is agnostic to π0, the proportion of unimportant explanatory variables, and a data-adaptive version of it that uses an estimate of π0 are powerful competitors of the false discovery rate controlling method in Barber & Candes (2015). △ Less

Submitted 18 August, 2021; v1 submitted 17 February, 2021; originally announced February 2021.

arXiv:2003.11181 [pdf, other]

Missing at Random or Not: A Semiparametric Testing Approach

Authors: Rui Duan, C. Jason Liang, Pamela Shaw, Cheng Yong Tang, Yong Chen

Abstract: Practical problems with missing data are common, and statistical methods have been developed concerning the validity and/or efficiency of statistical procedures. On a central focus, there have been longstanding interests on the mechanism governing data missingness, and correctly deciding the appropriate mechanism is crucially relevant for conducting proper practical investigations. The conventiona… ▽ More Practical problems with missing data are common, and statistical methods have been developed concerning the validity and/or efficiency of statistical procedures. On a central focus, there have been longstanding interests on the mechanism governing data missingness, and correctly deciding the appropriate mechanism is crucially relevant for conducting proper practical investigations. The conventional notions include the three common potential classes -- missing completely at random, missing at random, and missing not at random. In this paper, we present a new hypothesis testing approach for deciding between missing at random and missing not at random. Since the potential alternatives of missing at random are broad, we focus our investigation on a general class of models with instrumental variables for data missing not at random. Our setting is broadly applicable, thanks to that the model concerning the missing data is nonparametric, requiring no explicit model specification for the data missingness. The foundational idea is to develop appropriate discrepancy measures between estimators whose properties significantly differ only when missing at random does not hold. We show that our new hypothesis testing approach achieves an objective data oriented choice between missing at random or not. We demonstrate the feasibility, validity, and efficacy of the new test by theoretical analysis, simulation studies, and a real data analysis. △ Less

Submitted 24 March, 2020; originally announced March 2020.

arXiv:1901.07970 [pdf, ps, other]

High-dimensional Interactions Detection with Sparse Principal Hessian Matrix

Authors: Cheng Yong Tang, Ethan X. Fang, Yuexiao Dong

Abstract: In statistical learning framework with regressions, interactions are the contributions to the response variable from the products of the explanatory variables. In high-dimensional problems, detecting interactions is challenging due to combinatorial complexity and limited data information. We consider detecting interactions by exploring their connections with the principal Hessian matrix. Specifica… ▽ More In statistical learning framework with regressions, interactions are the contributions to the response variable from the products of the explanatory variables. In high-dimensional problems, detecting interactions is challenging due to combinatorial complexity and limited data information. We consider detecting interactions by exploring their connections with the principal Hessian matrix. Specifically, we propose a one-step synthetic approach for estimating the principal Hessian matrix by a penalized M-estimator. An alternating direction method of multipliers (ADMM) is proposed to efficiently solve the encountered regularized optimization problem. Based on the sparse estimator, we detect the interactions by identifying its nonzero components. Our method directly targets at the interactions, and it requires no structural assumption on the hierarchy of the interaction effects. We show that our estimator is theoretically valid, computationally efficient, and practically useful for detecting the interactions in a broad spectrum of scenarios. △ Less

Submitted 27 September, 2019; v1 submitted 23 January, 2019; originally announced January 2019.

Comments: 25 pages

arXiv:1901.00765 [pdf, other]

Analysis and Control of a Continuous-Time Bi-Virus Model

Authors: Ji Liu, Philip E. Pare, Angelia Nedich, Choon Yik Tang, Carolyn L. Beck, Tamer Basar

Abstract: This paper studies a distributed continuous-time bi-virus model in which two competing viruses spread over a network consisting of multiple groups of individuals. Limiting behaviors of the network are characterized by analyzing the equilibria of the system and their stability. Specifically, when the two viruses spread over possibly different directed infection graphs, the system may have (1) a uni… ▽ More This paper studies a distributed continuous-time bi-virus model in which two competing viruses spread over a network consisting of multiple groups of individuals. Limiting behaviors of the network are characterized by analyzing the equilibria of the system and their stability. Specifically, when the two viruses spread over possibly different directed infection graphs, the system may have (1) a unique equilibrium, the healthy state, which is globally stable, implying that both viruses will eventually be eradicated, (2) two equilibria including the healthy state and a dominant virus state, which is almost globally stable, implying that one virus will pervade the entire network causing a single-virus epidemic while the other virus will be eradicated, or (3) at least three equilibria including the healthy state and two dominant virus states, depending on certain conditions on the healing and infection rates. When the two viruses spread over the same directed infection graph, the system may have zero or infinitely many coexisting epidemic equilibria, which represents the pervasion of the two viruses. Sensitivity properties of some nontrivial equilibria are investigated in the context of a decentralized control technique, and an impossibility result is given for a certain type of distributed feedback controller. △ Less

Submitted 1 January, 2019; originally announced January 2019.

Comments: arXiv admin note: text overlap with arXiv:1603.04098

arXiv:1812.08217 [pdf, ps, other]

doi 10.1016/j.jeconom.2022.06.010

Optimal covariance matrix estimation for high-dimensional noise in high-frequency data

Authors: **yuan Chang, Qiao Hu, Cheng Liu, Cheng Yong Tang

Abstract: We consider high-dimensional measurement errors with high-frequency data. Our objective is on recovering the high-dimensional cross-sectional covariance matrix of the random errors with optimality. In this problem, not all components of the random vector are observed at the same time and the measurement errors are latent variables, leading to major challenges besides high data dimensionality. We p… ▽ More We consider high-dimensional measurement errors with high-frequency data. Our objective is on recovering the high-dimensional cross-sectional covariance matrix of the random errors with optimality. In this problem, not all components of the random vector are observed at the same time and the measurement errors are latent variables, leading to major challenges besides high data dimensionality. We propose a new covariance matrix estimator in this context with appropriate localization and thresholding, and then conduct a series of comprehensive theoretical investigations of the proposed estimator. By develo** a new technical device integrating the high-frequency data feature with the conventional notion of $α$-mixing, our analysis successfully accommodates the challenging serial dependence in the measurement errors. Our theoretical analysis establishes the minimax optimal convergence rates associated with two commonly used loss functions; and we demonstrate with concrete cases when the proposed localized estimator with thresholding achieves the minimax optimal convergence rates. Considering that the variances and covariances can be small in reality, we conduct a second-order theoretical analysis that further disentangles the dominating bias in the estimator. A bias-corrected estimator is then proposed to ensure its practical finite sample performance. We also extensively analyze our estimator in the setting with jumps, and show that its performance is reasonably robust. We illustrate the promising empirical performance of the proposed estimator with extensive simulation studies and a real data analysis. △ Less

Submitted 10 September, 2022; v1 submitted 19 December, 2018; originally announced December 2018.

Journal ref: Journal of Econometrics 2024, Vol. 239, 105329

arXiv:1805.10742 [pdf, ps, other]

doi 10.1093/biomet/asaa051

High-dimensional empirical likelihood inference

Authors: **yuan Chang, Song Xi Chen, Cheng Yong Tang, Tong Tong Wu

Abstract: High-dimensional statistical inference with general estimating equations are challenging and remain less explored. In this paper, we study two problems in the area: confidence set estimation for multiple components of the model parameters, and model specifications test. For the first one, we propose to construct a new set of estimating equations such that the impact from estimating the high-dimens… ▽ More High-dimensional statistical inference with general estimating equations are challenging and remain less explored. In this paper, we study two problems in the area: confidence set estimation for multiple components of the model parameters, and model specifications test. For the first one, we propose to construct a new set of estimating equations such that the impact from estimating the high-dimensional nuisance parameters becomes asymptotically negligible. The new construction enables us to estimate a valid confidence region by empirical likelihood ratio. For the second one, we propose a test statistic as the maximum of the marginal empirical likelihood ratios to quantify data evidence against the model specification. Our theory establishes the validity of the proposed empirical likelihood approaches, accommodating over-identification and exponentially growing data dimensionality. The numerical studies demonstrate promising performance and potential practical benefits of the new methods. △ Less

Submitted 6 November, 2019; v1 submitted 27 May, 2018; originally announced May 2018.

Comments: The original title of this paper is "High-dimensional statistical inferences with over-identification: confidence set estimation and specification test"

Journal ref: Biometrika 2021, Vol. 108, No. 1, 127-147

arXiv:1805.06450 [pdf]

doi 10.1103/PhysRevB.98.140507

Continuous do** of a cuprate surface: new insights from in-situ ARPES

Authors: Y. G. Zhong, J. Y. Guan, X. Shi, J. Zhao, Z. C. Rao, C. Y. Tang, H. J. Liu, G. D. Gu, Z. Y. Weng, Z. Q. Wang, T. Qian, Y. J. Sun, H. Ding

Abstract: The cuprate superconductors distinguish themselves from the conventional superconductors in that a small variation in the carrier do** can significantly change the superconducting transition temperature (T_c), giving rise to a superconducting dome where a pseudogap (ref. 1,2) emerges in the underdoped region and a Fermi liquid appears in the overdoped region. Thus a systematic study of the prope… ▽ More The cuprate superconductors distinguish themselves from the conventional superconductors in that a small variation in the carrier do** can significantly change the superconducting transition temperature (T_c), giving rise to a superconducting dome where a pseudogap (ref. 1,2) emerges in the underdoped region and a Fermi liquid appears in the overdoped region. Thus a systematic study of the properties over a wide do** range is critical for understanding the superconducting mechanism. Here, we report a new technique to continuously dope the surface of Bi2Sr2CaCu2O8+x through an ozone/vacuum annealing method. Using in-situ ARPES, we obtain precise quantities of energy gaps and the coherent spectral weight over a wide range of do**. We discover that the d-wave component of the quasiparticle gap is linearly proportional to the Nernst temperature that is the onset of superconducting vortices (ref. 3), strongly suggesting that the emergence of superconducting pairing is concomitant with the onset of free vortices, with direct implications for the onset of superconducting phase coherence at T_c and the nature of the pseudogap phenomena. △ Less

Submitted 2 June, 2018; v1 submitted 16 May, 2018; originally announced May 2018.

Comments: 23 pages, 10 figures, 1 table, supplementary materials included

Journal ref: Phys. Rev. B 98, 140507 (2018)

arXiv:1804.09302 [pdf, ps, other]

Disentangling and Assessing Uncertainties in Multiperiod Corporate Default Risk Predictions

Authors: Miao Yuan, Cheng Yong Tang, Yili Hong, Jian Yang

Abstract: Measuring the corporate default risk is broadly important in economics and finance. Quantitative methods have been developed to predictively assess future corporate default probabilities. However, as a more difficult yet crucial problem, evaluating the uncertainties associated with the default predictions remains little explored. In this paper, we attempt to fill this blank by develo** a procedu… ▽ More Measuring the corporate default risk is broadly important in economics and finance. Quantitative methods have been developed to predictively assess future corporate default probabilities. However, as a more difficult yet crucial problem, evaluating the uncertainties associated with the default predictions remains little explored. In this paper, we attempt to fill this blank by develo** a procedure for quantifying the level of associated uncertainties upon carefully disentangling multiple contributing sources. Our framework effectively incorporates broad information from historical default data, corporates' financial records, and macroeconomic conditions by a) characterizing the default mechanism, and b) capturing the future dynamics of various features contributing to the default mechanism. Our procedure overcomes the major challenges in this large scale statistical inference problem and makes it practically feasible by using parsimonious models, innovative methods, and modern computational facilities. By predicting the marketwide total number of defaults and assessing the associated uncertainties, our method can also be applied for evaluating the aggregated market credit risk level. Upon analyzing a US market data set, we demonstrate that the level of uncertainties associated with default risk assessments is indeed substantial. More informatively, we also find that the level of uncertainties associated with the default risk predictions is correlated with the level of default risks, indicating potential for new scopes in practical applications including improving the accuracy of default risk assessments. △ Less

Submitted 24 April, 2018; originally announced April 2018.

Comments: 34 pages

MSC Class: 62P25 62N99

arXiv:1801.06669 [pdf, ps, other]

doi 10.1093/biomet/asy006

A frequency domain analysis of the error distribution from noisy high-frequency data

Authors: **yuan Chang, Aurore Delaigle, Peter Hall, Cheng Yong Tang

Abstract: Data observed at high sampling frequency are typically assumed to be an additive composite of a relatively slow-varying continuous-time component, a latent stochastic process or a smooth random function, and measurement error. Supposing that the latent component is an Itô diffusion process, we propose to estimate the measurement error density function by applying a deconvolution technique with app… ▽ More Data observed at high sampling frequency are typically assumed to be an additive composite of a relatively slow-varying continuous-time component, a latent stochastic process or a smooth random function, and measurement error. Supposing that the latent component is an Itô diffusion process, we propose to estimate the measurement error density function by applying a deconvolution technique with appropriate localization. Our estimator, which does not require equally-spaced observed times, is consistent and minimax rate optimal. We also investigate estimators of the moments of the error distribution and their properties, propose a frequency domain estimator for the integrated volatility of the underlying stochastic process, and show that it achieves the optimal convergence rate. Simulations and a real data analysis validate our analysis. △ Less

Submitted 20 January, 2018; originally announced January 2018.

Journal ref: Biometrika 2018, Vol. 105, No. 2, 353-369

arXiv:1704.00566 [pdf, ps, other]

doi 10.1214/17-AOS1655

A new scope of penalized empirical likelihood with high-dimensional estimating equations

Authors: **yuan Chang, Cheng Yong Tang, Tong Tong Wu

Abstract: Statistical methods with empirical likelihood (EL) are appealing and effective especially in conjunction with estimating equations through which useful data information can be adaptively and flexibly incorporated. It is also known in the literature that EL approaches encounter difficulties when dealing with problems having high-dimensional model parameters and estimating equations. To overcome the… ▽ More Statistical methods with empirical likelihood (EL) are appealing and effective especially in conjunction with estimating equations through which useful data information can be adaptively and flexibly incorporated. It is also known in the literature that EL approaches encounter difficulties when dealing with problems having high-dimensional model parameters and estimating equations. To overcome the challenges, we begin our study with a careful investigation on high-dimensional EL from a new scope targeting at estimating a high-dimensional sparse model parameters. We show that the new scope provides an opportunity for relaxing the stringent requirement on the dimensionality of the model parameter. Motivated by the new scope, we then propose a new penalized EL by applying two penalty functions respectively regularizing the model parameters and the associated Lagrange multipliers in the optimizations of EL. By penalizing the Lagrange multiplier to encourage its sparsity, we show that drastic dimension reduction in the number of estimating equations can be effectively achieved without compromising the validity and consistency of the resulting estimators. Most attractively, such a reduction in dimensionality of estimating equations is actually equivalent to a selection among those high-dimensional estimating equations, resulting in a highly parsimonious and effective device for high-dimensional sparse model parameters. Allowing both the dimensionalities of model parameters and estimating equations growing exponentially with the sample size, our theory demonstrates that the estimator from our new penalized EL is sparse and consistent with asymptotically normally distributed nonzero components. Numerical simulations and a real data analysis show that the proposed penalized EL works promisingly. △ Less

Submitted 27 May, 2017; v1 submitted 3 April, 2017; originally announced April 2017.

Journal ref: Annals of Statistics 2018, Vol. 46, No. 6B, 3185-3216

arXiv:1605.03321 [pdf, ps, other]

doi 10.1111/rssb.12001

Tuning parameter selection in high dimensional penalized likelihood

Authors: Yingying Fan, Cheng Yong Tang

Abstract: Determining how to appropriately select the tuning parameter is essential in penalized likelihood methods for high-dimensional data analysis. We examine this problem in the setting of penalized likelihood methods for generalized linear models, where the dimensionality of covariates p is allowed to increase exponentially with the sample size n. We propose to select the tuning parameter by optimizin… ▽ More Determining how to appropriately select the tuning parameter is essential in penalized likelihood methods for high-dimensional data analysis. We examine this problem in the setting of penalized likelihood methods for generalized linear models, where the dimensionality of covariates p is allowed to increase exponentially with the sample size n. We propose to select the tuning parameter by optimizing the generalized information criterion (GIC) with an appropriate model complexity penalty. To ensure that we consistently identify the true model, a range for the model complexity penalty is identified in GIC. We find that this model complexity penalty should diverge at the rate of some power of $\log p$ depending on the tail probability behavior of the response variables. This reveals that using the AIC or BIC to select the tuning parameter may not be adequate for consistently identifying the true model. Based on our theoretical study, we propose a uniform choice of the model complexity penalty and show that the proposed approach consistently identifies the true model among candidate models with asymptotic probability one. We justify the performance of the proposed procedure by numerical simulations and a gene expression data analysis. △ Less

Submitted 11 May, 2016; originally announced May 2016.

Comments: 38 pages

MSC Class: 62J12(Primary) 62J07(Secondary)

Journal ref: Journal of the Royal Statistical Society Series B 75, 531-552 (2013)

arXiv:1603.04098 [pdf, ps, other]

On the Analysis of a Continuous-Time Bi-Virus Model

Authors: Ji Liu, Philip E. Paré, Angelia Nedić, Choon Yik Tang, Carolyn L. Beck, Tamer Başar

Abstract: Motivated by the spread of opinions on different social networks, we study a distributed continuous-time bi-virus model for a system of groups of individuals. An in-depth stability analysis is performed for more general models than have been previously considered, for the healthy and epidemic states. In addition, we investigate sensitivity properties of some nontrivial equilibria and obtain an imp… ▽ More Motivated by the spread of opinions on different social networks, we study a distributed continuous-time bi-virus model for a system of groups of individuals. An in-depth stability analysis is performed for more general models than have been previously considered, for the healthy and epidemic states. In addition, we investigate sensitivity properties of some nontrivial equilibria and obtain an impossibility result for distributed feedback control. △ Less

Submitted 18 March, 2016; v1 submitted 13 March, 2016; originally announced March 2016.

arXiv:1503.08192 [pdf, ps, other]

Distributed Estimation of Graph Spectrum

Authors: Mu Yang, Choon Yik Tang

Abstract: In this paper, we develop a two-stage distributed algorithm that enables nodes in a graph to cooperatively estimate the spectrum of a matrix $W$ associated with the graph, which includes the adjacency and Laplacian matrices as special cases. In the first stage, the algorithm uses a discrete-time linear iteration and the Cayley-Hamilton theorem to convert the problem into one of solving a set of li… ▽ More In this paper, we develop a two-stage distributed algorithm that enables nodes in a graph to cooperatively estimate the spectrum of a matrix $W$ associated with the graph, which includes the adjacency and Laplacian matrices as special cases. In the first stage, the algorithm uses a discrete-time linear iteration and the Cayley-Hamilton theorem to convert the problem into one of solving a set of linear equations, where each equation is known to a node. In the second stage, if the nodes happen to know that $W$ is cyclic, the algorithm uses a Lyapunov approach to asymptotically solve the equations with an exponential rate of convergence. If they do not know whether $W$ is cyclic, the algorithm uses a random perturbation approach and a structural controllability result to approximately solve the equations with an error that can be made small. Finally, we provide simulation results that illustrate the algorithm. △ Less

Submitted 27 March, 2015; originally announced March 2015.

Comments: 15 pages, 2 figures

arXiv:1502.07061 [pdf, ps, other]

doi 10.1214/15-AOS1374

Local independence feature screening for nonparametric and semiparametric models by marginal empirical likelihood

Authors: **yuan Chang, Cheng Yong Tang, Yichao Wu

Abstract: We consider an independence feature screening technique for identifying explanatory variables that locally contribute to the response variable in high-dimensional regression analysis. Without requiring a specific parametric form of the underlying data model, our approach accommodates a wide spectrum of nonparametric and semiparametric model families. To detect the local contributions of explanator… ▽ More We consider an independence feature screening technique for identifying explanatory variables that locally contribute to the response variable in high-dimensional regression analysis. Without requiring a specific parametric form of the underlying data model, our approach accommodates a wide spectrum of nonparametric and semiparametric model families. To detect the local contributions of explanatory variables, our approach constructs empirical likelihood locally in conjunction with marginal nonparametric regressions. Since our approach actually requires no estimation, it is advantageous in scenarios such as the single-index models where even specification and identification of a marginal model is an issue. By automatically incorporating the level of variation of the nonparametric regression and directly assessing the strength of data evidence supporting local contribution from each explanatory variable, our approach provides a unique perspective for solving feature screening problems. Theoretical analysis shows that our approach can handle data dimensionality growing exponentially with the sample size. With extensive theoretical illustrations and numerical examples, we show that the local independence screening approach performs promisingly. △ Less

Submitted 30 March, 2016; v1 submitted 25 February, 2015; originally announced February 2015.

Comments: Published at http://dx.doi.org/10.1214/15-AOS1374 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1374

Journal ref: Annals of Statistics 2016, Vol. 44, No. 2, 515-539

arXiv:1306.4408 [pdf, ps, other]

doi 10.1214/13-AOS1139

Marginal empirical likelihood and sure independence feature screening

Authors: **yuan Chang, Cheng Yong Tang, Yichao Wu

Abstract: We study a marginal empirical likelihood approach in scenarios when the number of variables grows exponentially with the sample size. The marginal empirical likelihood ratios as functions of the parameters of interest are systematically examined, and we find that the marginal empirical likelihood ratio evaluated at zero can be used to differentiate whether an explanatory variable is contributing t… ▽ More We study a marginal empirical likelihood approach in scenarios when the number of variables grows exponentially with the sample size. The marginal empirical likelihood ratios as functions of the parameters of interest are systematically examined, and we find that the marginal empirical likelihood ratio evaluated at zero can be used to differentiate whether an explanatory variable is contributing to a response variable or not. Based on this finding, we propose a unified feature screening procedure for linear models and the generalized linear models. Different from most existing feature screening approaches that rely on the magnitudes of some marginal estimators to identify true signals, the proposed screening approach is capable of further incorporating the level of uncertainties of such estimators. Such a merit inherits the self-studentization property of the empirical likelihood approach, and extends the insights of existing feature screening methods. Moreover, we show that our screening approach is less restrictive to distributional assumptions, and can be conveniently adapted to be applied in a broad range of scenarios such as models specified using general moment conditions. Our theoretical results and extensive numerical examples by simulations and data analysis demonstrate the merits of the marginal empirical likelihood approach. △ Less

Submitted 6 November, 2013; v1 submitted 18 June, 2013; originally announced June 2013.

Comments: Published in at http://dx.doi.org/10.1214/13-AOS1139 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1139

Journal ref: Annals of Statistics 2013, Vol. 41, No. 4, 2123-2148

arXiv:1306.0260 [pdf, ps, other]

A Distributed Algorithm for Solving Positive Definite Linear Equations over Networks with Membership Dynamics

Authors: Jie Lu, Choon Yik Tang

Abstract: This paper considers the problem of solving a symmetric positive definite system of linear equations over a network of agents with arbitrary asynchronous interactions and membership dynamics. The latter implies that each agent is allowed to join and leave the network at any time, for infinitely many times, and lose all its memory upon leaving. We develop Subset Equalizing (SE), a distributed async… ▽ More This paper considers the problem of solving a symmetric positive definite system of linear equations over a network of agents with arbitrary asynchronous interactions and membership dynamics. The latter implies that each agent is allowed to join and leave the network at any time, for infinitely many times, and lose all its memory upon leaving. We develop Subset Equalizing (SE), a distributed asynchronous algorithm for solving such a problem. To design and analyze SE, we introduce a novel time-varying Lyapunov-like function, defined on a state space with changing dimension, and a generalized concept of network connectivity, capable of handling such interactions and membership dynamics. Based on them, we establish the boundedness, asymptotic convergence, and exponential convergence of SE, along with a bound on its convergence rate. Finally, through extensive simulation, we show that SE is effective in a volatile agent network and that a special case of SE, termed Groupwise Equalizing, is significantly more bandwidth/energy efficient than two existing algorithms in multi-hop wireless networks. △ Less

Submitted 12 June, 2016; v1 submitted 2 June, 2013; originally announced June 2013.

Comments: 12 pages, 3 figures

arXiv:1112.3544

RNASEQR - A streamlined and accurate RNA-seq sequence analysis program

Authors: Abner C. -Y. Huang, Leslie Y Chen, Kuo-Chen Wei, Kai Wang, Chiung-Yin Huang, Danielle Yi, Chuan Yi Tang, David J. Galas, Leroy E. Hood

Abstract: The paper has been withdrawn by the authors. The paper has been withdrawn by the authors. △ Less

Submitted 29 December, 2011; v1 submitted 15 December, 2011; originally announced December 2011.

Comments: The paper has been withdrawn by the authors. Further work and review is necessary before making the paper available

arXiv:1104.5422 [pdf, ps, other]

Zero-Gradient-Sum Algorithms for Distributed Convex Optimization: The Continuous-Time Case

Authors: Jie Lu, Choon Yik Tang

Abstract: This paper presents a set of continuous-time distributed algorithms that solve unconstrained, separable, convex optimization problems over undirected networks with fixed topologies. The algorithms are developed using a Lyapunov function candidate that exploits convexity, and are called Zero-Gradient-Sum (ZGS) algorithms as they yield nonlinear networked dynamical systems that evolve invariantly on… ▽ More This paper presents a set of continuous-time distributed algorithms that solve unconstrained, separable, convex optimization problems over undirected networks with fixed topologies. The algorithms are developed using a Lyapunov function candidate that exploits convexity, and are called Zero-Gradient-Sum (ZGS) algorithms as they yield nonlinear networked dynamical systems that evolve invariantly on a zero-gradient-sum manifold and converge asymptotically to the unknown optimizer. We also describe a systematic way to construct ZGS algorithms, show that a subset of them actually converge exponentially, and obtain lower and upper bounds on their convergence rates in terms of the network topologies, problem characteristics, and algorithm parameters, including the algebraic connectivity, Laplacian spectral radius, and function curvatures. The findings of this paper may be regarded as a natural generalization of several well-known algorithms and results for distributed consensus, to distributed convex optimization. △ Less

Submitted 26 September, 2011; v1 submitted 28 April, 2011; originally announced April 2011.

Comments: 15 pages

arXiv:1007.3152 [pdf]

doi 10.1103/PhysRevB.82.054510

Evidence for a temperature dependent anisotropy of the superconducting state parameters in underdoped SmBa2Cu3Ox

Authors: A. Kortyka, R. Puzniak, A. Wisniewski, M. Zehetmayer, H. W. Weber, C. Y. Tang, X. Yao, K. Conder

Abstract: The temperature dependence of the anisotropy of the superconducting state parameters, γ, was studied by torque magnetometry for the high temperature superconductor SmBa2Cu3Ox in magnetic fields of up to 9 T. The measurements were performed on four underdoped single crystals with oxygen contents corresponding to Tc's varying from 42.8 to 63.6 K. The anisotropy was found to be strongly temperature d… ▽ More The temperature dependence of the anisotropy of the superconducting state parameters, γ, was studied by torque magnetometry for the high temperature superconductor SmBa2Cu3Ox in magnetic fields of up to 9 T. The measurements were performed on four underdoped single crystals with oxygen contents corresponding to Tc's varying from 42.8 to 63.6 K. The anisotropy was found to be strongly temperature dependent, while only a weak dependence on the magnetic field was observed. No evidence for a field dependent superfluid density was found. Possible origins of the temperature dependence of the anisotropy are discussed. △ Less

Submitted 19 July, 2010; originally announced July 2010.

Comments: 21 pages, 4 figures, accepted for publication in Phys. Rev. B

Journal ref: Phys. Rev. B 82, 054510 (2010)

arXiv:1006.5277 [pdf, ps, other]

Voltage/Pitch Control for Maximization and Regulation of Active/Reactive Powers in Wind Turbines with Uncertainties

Authors: Yi Guo, S. Hossein Hosseini, John N. Jiang, Choon Yik Tang, Rama G. Ramakumar

Abstract: This paper addresses the problem of controlling a variable-speed wind turbine with a Doubly Fed Induction Generator (DFIG), modeled as an electromechanically-coupled nonlinear system with rotor voltages and blade pitch angle as its inputs, active and reactive powers as its outputs, and most of the aerodynamic and mechanical parameters as its uncertainties. Using a blend of linear and nonlinear con… ▽ More This paper addresses the problem of controlling a variable-speed wind turbine with a Doubly Fed Induction Generator (DFIG), modeled as an electromechanically-coupled nonlinear system with rotor voltages and blade pitch angle as its inputs, active and reactive powers as its outputs, and most of the aerodynamic and mechanical parameters as its uncertainties. Using a blend of linear and nonlinear control strategies (including feedback linearization, pole placement, uncertainty estimation, and gradient-based potential function minimization) as well as time-scale separation in the dynamics, we develop a controller that is capable of maximizing the active power in the Maximum Power Tracking (MPT) mode, regulating the active power in the Power Regulation (PR) mode, seamlessly switching between the two modes, and simultaneously adjusting the reactive power to achieve a desired power factor. The controller consists of four cascaded components, uses realistic feedback signals, and operates without knowledge of the C_p-surface, air density, friction coefficient, and wind speed. Finally, we show the effectiveness of the controller via simulation with a realistic wind profile. △ Less

Submitted 7 August, 2010; v1 submitted 28 June, 2010; originally announced June 2010.

Comments: 10 pages, 4 figures

arXiv:1005.2967 [pdf, ps, other]

Controlled Hopwise Averaging: Bandwidth/Energy-Efficient Asynchronous Distributed Averaging for Wireless Networks

Authors: Choon Yik Tang, Jie Lu

Abstract: This paper addresses the problem of averaging numbers across a wireless network from an important, but largely neglected, viewpoint: bandwidth/energy efficiency. We show that existing distributed averaging schemes have several drawbacks and are inefficient, producing networked dynamical systems that evolve with wasteful communications. Motivated by this, we develop Controlled Hopwise Averaging (CH… ▽ More This paper addresses the problem of averaging numbers across a wireless network from an important, but largely neglected, viewpoint: bandwidth/energy efficiency. We show that existing distributed averaging schemes have several drawbacks and are inefficient, producing networked dynamical systems that evolve with wasteful communications. Motivated by this, we develop Controlled Hopwise Averaging (CHA), a distributed asynchronous algorithm that attempts to "make the most" out of each iteration by fully exploiting the broadcast nature of wireless medium and enabling control of when to initiate an iteration. We show that CHA admits a common quadratic Lyapunov function for analysis, derive bounds on its exponential convergence rate, and show that they outperform the convergence rate of Pairwise Averaging for some common graphs. We also introduce a new way to apply Lyapunov stability theory, using the Lyapunov function to perform greedy, decentralized, feedback iteration control. Finally, through extensive simulation on random geometric graphs, we show that CHA is substantially more efficient than several existing schemes, requiring far fewer transmissions to complete an averaging task. △ Less

Submitted 26 December, 2010; v1 submitted 17 May, 2010; originally announced May 2010.

Comments: 33 pages, 4 figures

arXiv:1004.0487 [pdf, ps, other]

doi 10.1109/TCST.2010.2053931

Nonlinear Dual-Mode Control of Variable-Speed Wind Turbines with Doubly Fed Induction Generators

Authors: Choon Yik Tang, Yi Guo, John N. Jiang

Abstract: This paper presents a feedback/feedforward nonlinear controller for variable-speed wind turbines with doubly fed induction generators. By appropriately adjusting the rotor voltages and the blade pitch angle, the controller simultaneously enables: (a) control of the active power in both the maximum power tracking and power regulation modes, (b) seamless switching between the two modes, and (c) cont… ▽ More This paper presents a feedback/feedforward nonlinear controller for variable-speed wind turbines with doubly fed induction generators. By appropriately adjusting the rotor voltages and the blade pitch angle, the controller simultaneously enables: (a) control of the active power in both the maximum power tracking and power regulation modes, (b) seamless switching between the two modes, and (c) control of the reactive power so that a desirable power factor is maintained. Unlike many existing designs, the controller is developed based on original, nonlinear, electromechanically-coupled models of wind turbines, without attempting approximate linearization. Its development consists of three steps: (i) employ feedback linearization to exactly cancel some of the nonlinearities and perform arbitrary pole placement, (ii) design a speed controller that makes the rotor angular velocity track a desired reference whenever possible, and (iii) introduce a Lyapunov-like function and present a gradient-based approach for minimizing this function. The effectiveness of the controller is demonstrated through simulation of a wind turbine operating under several scenarios. △ Less

Submitted 21 June, 2010; v1 submitted 4 April, 2010; originally announced April 2010.

Comments: 14 pages, 9 figures, accepted for publication in IEEE Transactions on Control Systems Technology

arXiv:1002.2283 [pdf, ps, other]

Gossip Algorithms for Convex Consensus Optimization over Networks

Authors: Jie Lu, Choon Yik Tang, Paul R. Regier, Travis D. Bow

Abstract: In many applications, nodes in a network desire not only a consensus, but an optimal one. To date, a family of subgradient algorithms have been proposed to solve this problem under general convexity assumptions. This paper shows that, for the scalar case and by assuming a bit more, novel non-gradient-based algorithms with appealing features can be constructed. Specifically, we develop Pairwise Equ… ▽ More In many applications, nodes in a network desire not only a consensus, but an optimal one. To date, a family of subgradient algorithms have been proposed to solve this problem under general convexity assumptions. This paper shows that, for the scalar case and by assuming a bit more, novel non-gradient-based algorithms with appealing features can be constructed. Specifically, we develop Pairwise Equalizing (PE) and Pairwise Bisectioning (PB), two gossip algorithms that solve unconstrained, separable, convex consensus optimization problems over undirected networks with time-varying topologies, where each local function is strictly convex, continuously differentiable, and has a minimizer. We show that PE and PB are easy to implement, bypass limitations of the subgradient algorithms, and produce switched, nonlinear, networked dynamical systems that admit a common Lyapunov function and asymptotically converge. Moreover, PE generalizes the well-known Pairwise Averaging and Randomized Gossip Algorithm, while PB relaxes a requirement of PE, allowing nodes to never share their local functions. △ Less

Submitted 10 February, 2011; v1 submitted 11 February, 2010; originally announced February 2010.

Comments: 15 pages

MSC Class: 93A14

arXiv:math/0702123 [pdf, ps, other]

doi 10.1214/009053607000000659

A test for model specification of diffusion processes

Authors: Song Xi Chen, Jiti Gao, Cheng Yong Tang

Abstract: We propose a test for model specification of a parametric diffusion process based on a kernel estimation of the transitional density of the process. The empirical likelihood is used to formulate a statistic, for each kernel smoothing bandwidth, which is effectively a Studentized $L_2$-distance between the kernel transitional density estimator and the parametric transitional density implied by th… ▽ More We propose a test for model specification of a parametric diffusion process based on a kernel estimation of the transitional density of the process. The empirical likelihood is used to formulate a statistic, for each kernel smoothing bandwidth, which is effectively a Studentized $L_2$-distance between the kernel transitional density estimator and the parametric transitional density implied by the parametric process. To reduce the sensitivity of the test on smoothing bandwidth choice, the final test statistic is constructed by combining the empirical likelihood statistics over a set of smoothing bandwidths. To better capture the finite sample distribution of the test statistic and data dependence, the critical value of the test is obtained by a parametric bootstrap procedure. Properties of the test are evaluated asymptotically and numerically by simulation and by a real data example. △ Less

Submitted 12 March, 2008; v1 submitted 6 February, 2007; originally announced February 2007.

Comments: Published in at http://dx.doi.org/10.1214/009053607000000659 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS0288 MSC Class: 62G05 (Primary); 62J02 (Secondary)

Journal ref: Annals of Statistics 2008, Vol. 36, No. 1, 167-198

Showing 1–31 of 31 results for author: Tang, C Y