-
Pivotal Estimation of Linear Discriminant Analysis in High Dimensions
Authors:
Ethan X. Fang,
Yajun Mei,
Yuyang Shi,
Qunzhi Xu,
Tuo Zhao
Abstract:
We consider the linear discriminant analysis problem in the high-dimensional settings. In this work, we propose PANDA(PivotAl liNear Discriminant Analysis), a tuning-insensitive method in the sense that it requires very little effort to tune the parameters. Moreover, we prove that PANDA achieves the optimal convergence rate in terms of both the estimation error and misclassification rate. Our theo…
▽ More
We consider the linear discriminant analysis problem in the high-dimensional settings. In this work, we propose PANDA(PivotAl liNear Discriminant Analysis), a tuning-insensitive method in the sense that it requires very little effort to tune the parameters. Moreover, we prove that PANDA achieves the optimal convergence rate in terms of both the estimation error and misclassification rate. Our theoretical results are backed up by thorough numerical studies using both simulated and real datasets. In comparison with the existing methods, we observe that our proposed PANDA yields equal or better performance, and requires substantially less effort in parameter tuning.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
Creation of super-high-flux photo-neutrons and gamma-rays > 8 MeV using a petawatt laser to irradiate high-Z solid targets
Authors:
E. Liang,
W. Lo,
B. Cage,
E. Fang,
S. Arora,
K. Q. Zheng,
H . Quvedo,
S. A. Bruce,
M. Spinks,
E. Medina,
A. Helal,
T. Ditmire
Abstract:
We report the creation of super-high-flux gamma-rays with energy >8 MeV and photo-neutrons via the (g,n) reaction near giant dipole resonance energies (8 - 20 MeV), using the ~130 J Texas Petawatt laser to irradiate high-Z (Au, Pt, Re, W) targets of mm - cm thickness, at laser intensities up to ~5x1021W/cm2. We detected up to ~ several x 1012 gamma-rays > 8 MeV (~3% of incident laser energy) and ~…
▽ More
We report the creation of super-high-flux gamma-rays with energy >8 MeV and photo-neutrons via the (g,n) reaction near giant dipole resonance energies (8 - 20 MeV), using the ~130 J Texas Petawatt laser to irradiate high-Z (Au, Pt, Re, W) targets of mm - cm thickness, at laser intensities up to ~5x1021W/cm2. We detected up to ~ several x 1012 gamma-rays > 8 MeV (~3% of incident laser energy) and ~ 1010 photo-neutrons per shot. Due to the short pulse and narrow gamma-ray cone (~17o half-width) around laser forward, the peak emergent gamma-ray flux >8 MeV reached ~1027 gammas/cm2/sec, and the peak emergent neutron flux reached ~1020 neutrons/cm2/sec. Such intense gamma-ray and neutron fluxes are among the highest achieved for short-pulse laser experiments. They will facilitate the study of nuclear reactions requiring super-high-flux of gamma-rays or neutrons, such as the creation of r-process elements. These results may also have far-reaching applications for nuclear energy, such as the transmutation of nuclear waste.
△ Less
Submitted 15 February, 2023; v1 submitted 13 February, 2023;
originally announced February 2023.
-
PASTA: Pessimistic Assortment Optimization
Authors:
Juncheng Dong,
Weibin Mo,
Zhengling Qi,
Cong Shi,
Ethan X. Fang,
Vahid Tarokh
Abstract:
We consider a class of assortment optimization problems in an offline data-driven setting. A firm does not know the underlying customer choice model but has access to an offline dataset consisting of the historically offered assortment set, customer choice, and revenue. The objective is to use the offline dataset to find an optimal assortment. Due to the combinatorial nature of assortment optimiza…
▽ More
We consider a class of assortment optimization problems in an offline data-driven setting. A firm does not know the underlying customer choice model but has access to an offline dataset consisting of the historically offered assortment set, customer choice, and revenue. The objective is to use the offline dataset to find an optimal assortment. Due to the combinatorial nature of assortment optimization, the problem of insufficient data coverage is likely to occur in the offline dataset. Therefore, designing a provably efficient offline learning algorithm becomes a significant challenge. To this end, we propose an algorithm referred to as Pessimistic ASsortment opTimizAtion (PASTA for short) designed based on the principle of pessimism, that can correctly identify the optimal assortment by only requiring the offline data to cover the optimal assortment under general settings. In particular, we establish a regret bound for the offline assortment optimization problem under the celebrated multinomial logit model. We also propose an efficient computational procedure to solve our pessimistic assortment optimization problem. Numerical studies demonstrate the superiority of the proposed method over the existing baseline method.
△ Less
Submitted 7 February, 2023;
originally announced February 2023.
-
Combinatorial Inference on the Optimal Assortment in Multinomial Logit Models
Authors:
Shuting Shen,
Xi Chen,
Ethan X. Fang,
Junwei Lu
Abstract:
Assortment optimization has received active explorations in the past few decades due to its practical importance. Despite the extensive literature dealing with optimization algorithms and latent score estimation, uncertainty quantification for the optimal assortment still needs to be explored and is of great practical significance. Instead of estimating and recovering the complete optimal offer se…
▽ More
Assortment optimization has received active explorations in the past few decades due to its practical importance. Despite the extensive literature dealing with optimization algorithms and latent score estimation, uncertainty quantification for the optimal assortment still needs to be explored and is of great practical significance. Instead of estimating and recovering the complete optimal offer set, decision-makers may only be interested in testing whether a given property holds true for the optimal assortment, such as whether they should include several products of interest in the optimal set, or how many categories of products the optimal set should include. This paper proposes a novel inferential framework for testing such properties. We consider the widely adopted multinomial logit (MNL) model, where we assume that each customer will purchase an item within the offered products with a probability proportional to the underlying preference score associated with the product. We reduce inferring a general optimal assortment property to quantifying the uncertainty associated with the sign change point detection of the marginal revenue gaps. We show the asymptotic normality of the marginal revenue gap estimator, and construct a maximum statistic via the gap estimators to detect the sign change point. By approximating the distribution of the maximum statistic with multiplier bootstrap techniques, we propose a valid testing procedure. We also conduct numerical experiments to assess the performance of our method.
△ Less
Submitted 3 May, 2023; v1 submitted 28 January, 2023;
originally announced January 2023.
-
Stochastic Compositional Optimization with Compositional Constraints
Authors:
Shuoguang Yang,
Zhe Zhang,
Ethan X. Fang
Abstract:
Stochastic compositional optimization (SCO) has attracted considerable attention because of its broad applicability to important real-world problems. However, existing works on SCO assume that the projection within a solution update is simple, which fails to hold for problem instances where the constraints are in the form of expectations, such as empirical conditional value-at-risk constraints. We…
▽ More
Stochastic compositional optimization (SCO) has attracted considerable attention because of its broad applicability to important real-world problems. However, existing works on SCO assume that the projection within a solution update is simple, which fails to hold for problem instances where the constraints are in the form of expectations, such as empirical conditional value-at-risk constraints. We study a novel model that incorporates single-level expected value and two-level compositional constraints into the current SCO framework. Our model can be applied widely to data-driven optimization and risk management, including risk-averse optimization and high-moment portfolio selection, and can handle multiple constraints. We further propose a class of primal-dual algorithms that generates sequences converging to the optimal solution at the rate of $\cO(\frac{1}{\sqrt{N}})$under both single-level expected value and two-level compositional constraints, where $N$ is the iteration counter, establishing the benchmarks in expected value constrained SCO.
△ Less
Submitted 8 September, 2022;
originally announced September 2022.
-
Superconductivity in TlBi$_2$ with a large Kadowaki-Woods ratio
Authors:
Zhihua Yang,
Zhen Yang,
Qi** Su,
Jianhua Du,
Enda Fang,
Chuxiang Wu,
**hu Yang,
Bin Chen,
Hangdong Wang,
Minghu Fang
Abstract:
In this article, the superconducting and normal state properties of TlBi$_2$ with the AlB$_2$-type structure were studied by the resistivity, magnetization and specific heat measurements. It was found that bulk superconductivity with $T_{C}$ = 6.2 K emerges in TlBi$_2$, which is a phonon-mediated $s$-wave superconductor with a strong electron-phonon coupling ($λ$$_{ep}$ = 1.38) and a large superco…
▽ More
In this article, the superconducting and normal state properties of TlBi$_2$ with the AlB$_2$-type structure were studied by the resistivity, magnetization and specific heat measurements. It was found that bulk superconductivity with $T_{C}$ = 6.2 K emerges in TlBi$_2$, which is a phonon-mediated $s$-wave superconductor with a strong electron-phonon coupling ($λ$$_{ep}$ = 1.38) and a large superconducting gap ($Δ_{0}$/$k_{B}T_{C}$ = 2.25). We found that the $ρ$($T$) exhibits an unusual $T$-linear dependence above 50 K, and can be well described by the Fermi-liquid theory below 20 K. Interestingly, its Kadowaki-Woods ratio $A/γ^{2}$ [9.2$\times$10$^{-5}$ $μΩ$ cm(mol K$^{2}$/mJ)$^{2}$] is unexpectedly one order of magnitude larger than that obtained in many heavy Fermi compounds, although the electron correlation is not so strong.
△ Less
Submitted 11 April, 2022;
originally announced April 2022.
-
Robust Weakly Supervised Learning for COVID-19 Recognition Using Multi-Center CT Images
Authors:
Qinghao Ye,
Yuan Gao,
Wei** Ding,
Zhangming Niu,
Chengjia Wang,
Yinghui Jiang,
Minhao Wang,
Evandro Fei Fang,
Wade Menpes-Smith,
Jun Xia,
Guang Yang
Abstract:
The world is currently experiencing an ongoing pandemic of an infectious disease named coronavirus disease 2019 (i.e., COVID-19), which is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Computed Tomography (CT) plays an important role in assessing the severity of the infection and can also be used to identify those symptomatic and asymptomatic COVID-19 carriers. With a…
▽ More
The world is currently experiencing an ongoing pandemic of an infectious disease named coronavirus disease 2019 (i.e., COVID-19), which is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Computed Tomography (CT) plays an important role in assessing the severity of the infection and can also be used to identify those symptomatic and asymptomatic COVID-19 carriers. With a surge of the cumulative number of COVID-19 patients, radiologists are increasingly stressed to examine the CT scans manually. Therefore, an automated 3D CT scan recognition tool is highly in demand since the manual analysis is time-consuming for radiologists and their fatigue can cause possible misjudgment. However, due to various technical specifications of CT scanners located in different hospitals, the appearance of CT images can be significantly different leading to the failure of many automated image recognition approaches. The multi-domain shift problem for the multi-center and multi-scanner studies is therefore nontrivial that is also crucial for a dependable recognition and critical for reproducible and objective diagnosis and prognosis. In this paper, we proposed a COVID-19 CT scan recognition model namely coronavirus information fusion and diagnosis network (CIFD-Net) that can efficiently handle the multi-domain shift problem via a new robust weakly supervised learning paradigm. Our model can resolve the problem of different appearance in CT scan images reliably and efficiently while attaining higher accuracy compared to other state-of-the-art methods.
△ Less
Submitted 9 December, 2021;
originally announced December 2021.
-
Lagrangian Inference for Ranking Problems
Authors:
Yue Liu,
Ethan X. Fang,
Junwei Lu
Abstract:
We propose a novel combinatorial inference framework to conduct general uncertainty quantification in ranking problems. We consider the widely adopted Bradley-Terry-Luce (BTL) model, where each item is assigned a positive preference score that determines the Bernoulli distributions of pairwise comparisons' outcomes. Our proposed method aims to infer general ranking properties of the BTL model. The…
▽ More
We propose a novel combinatorial inference framework to conduct general uncertainty quantification in ranking problems. We consider the widely adopted Bradley-Terry-Luce (BTL) model, where each item is assigned a positive preference score that determines the Bernoulli distributions of pairwise comparisons' outcomes. Our proposed method aims to infer general ranking properties of the BTL model. The general ranking properties include the "local" properties such as if an item is preferred over another and the "global" properties such as if an item is among the top $K$-ranked items. We further generalize our inferential framework to multiple testing problems where we control the false discovery rate (FDR), and apply the method to infer the top-$K$ ranked items. We also derive the information-theoretic lower bound to justify the minimax optimality of the proposed method. We conduct extensive numerical studies using both synthetic and real datasets to back up our theory.
△ Less
Submitted 30 September, 2021;
originally announced October 2021.
-
Implicit Regularization of Bregman Proximal Point Algorithm and Mirror Descent on Separable Data
Authors:
Yan Li,
Caleb Ju,
Ethan X. Fang,
Tuo Zhao
Abstract:
Bregman proximal point algorithm (BPPA) has witnessed emerging machine learning applications, yet its theoretical understanding has been largely unexplored. We study the computational properties of BPPA through learning linear classifiers with separable data, and demonstrate provable algorithmic regularization of BPPA. For any BPPA instantiated with a fixed Bregman divergence, we provide a lower b…
▽ More
Bregman proximal point algorithm (BPPA) has witnessed emerging machine learning applications, yet its theoretical understanding has been largely unexplored. We study the computational properties of BPPA through learning linear classifiers with separable data, and demonstrate provable algorithmic regularization of BPPA. For any BPPA instantiated with a fixed Bregman divergence, we provide a lower bound of the margin obtained by BPPA with respect to an arbitrarily chosen norm. The obtained margin lower bound differs from the maximal margin by a multiplicative factor, which inversely depends on the condition number of the distance-generating function measured in the dual norm. We show that the dependence on the condition number is tight, thus demonstrating the importance of divergence in affecting the quality of the learned classifiers. We then extend our findings to mirror descent, for which we establish similar connections between the margin and Bregman divergence, together with a non-asymptotic analysis. Numerical experiments on both synthetic and real-world datasets are provided to support our theoretical findings. To the best of our knowledge, the aforementioned findings appear to be new in the literature of algorithmic regularization.
△ Less
Submitted 24 August, 2023; v1 submitted 15 August, 2021;
originally announced August 2021.
-
Pre-processing with Orthogonal Decompositions for High-dimensional Explanatory Variables
Authors:
Xu Han,
Ethan X Fang,
Cheng Yong Tang
Abstract:
Strong correlations between explanatory variables are problematic for high-dimensional regularized regression methods. Due to the violation of the Irrepresentable Condition, the popular LASSO method may suffer from false inclusions of inactive variables. In this paper, we propose pre-processing with orthogonal decompositions (PROD) for the explanatory variables in high-dimensional regressions. The…
▽ More
Strong correlations between explanatory variables are problematic for high-dimensional regularized regression methods. Due to the violation of the Irrepresentable Condition, the popular LASSO method may suffer from false inclusions of inactive variables. In this paper, we propose pre-processing with orthogonal decompositions (PROD) for the explanatory variables in high-dimensional regressions. The PROD procedure is constructed based upon a generic orthogonal decomposition of the design matrix. We demonstrate by two concrete cases that the PROD approach can be effectively constructed for improving the performance of high-dimensional penalized regression. Our theoretical analysis reveals their properties and benefits for high-dimensional penalized linear regression with LASSO. Extensive numerical studies with simulations and data analysis show the promising performance of the PROD.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy
Authors:
Han Zhong,
Xun Deng,
Ethan X. Fang,
Zhuoran Yang,
Zhaoran Wang,
Runze Li
Abstract:
While deep reinforcement learning has achieved tremendous successes in various applications, most existing works only focus on maximizing the expected value of total return and thus ignore its inherent stochasticity. Such stochasticity is also known as the aleatoric uncertainty and is closely related to the notion of risk. In this work, we make the first attempt to study risk-sensitive deep reinfo…
▽ More
While deep reinforcement learning has achieved tremendous successes in various applications, most existing works only focus on maximizing the expected value of total return and thus ignore its inherent stochasticity. Such stochasticity is also known as the aleatoric uncertainty and is closely related to the notion of risk. In this work, we make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria. In particular, we focus on a variance-constrained policy optimization problem where the goal is to find a policy that maximizes the expected value of the long-run average reward, subject to a constraint that the long-run variance of the average reward is upper bounded by a threshold. Utilizing Lagrangian and Fenchel dualities, we transform the original problem into an unconstrained saddle-point policy optimization problem, and propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable. When both the value and policy functions are represented by multi-layer overparameterized neural networks, we prove that our actor-critic algorithm generates a sequence of policies that finds a globally optimal policy at a sublinear rate. Further, We provide numerical studies of the proposed method using two real datasets to back up the theoretical results.
△ Less
Submitted 16 September, 2023; v1 submitted 28 December, 2020;
originally announced December 2020.
-
Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection
Authors:
Yining Wang,
Yi Chen,
Ethan X. Fang,
Zhaoran Wang,
Runze Li
Abstract:
We consider the stochastic contextual bandit problem under the high dimensional linear model. We focus on the case where the action space is finite and random, with each action associated with a randomly generated contextual covariate. This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine. However, it is very challenging as w…
▽ More
We consider the stochastic contextual bandit problem under the high dimensional linear model. We focus on the case where the action space is finite and random, with each action associated with a randomly generated contextual covariate. This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine. However, it is very challenging as we need to balance exploration and exploitation. We propose doubly growing epochs and estimating the parameter using the best subset selection method, which is easy to implement in practice. This approach achieves $ \tilde{\mathcal{O}}(s\sqrt{T})$ regret with high probability, which is nearly independent in the ``ambient'' regression model dimension $d$. We further attain a sharper $\tilde{\mathcal{O}}(\sqrt{sT})$ regret by using the \textsc{SupLinUCB} framework and match the minimax lower bound of low-dimensional linear stochastic bandit problems. Finally, we conduct extensive numerical experiments to demonstrate the applicability and robustness of our algorithms empirically.
△ Less
Submitted 4 September, 2020;
originally announced September 2020.
-
Weakly Supervised Deep Learning for COVID-19 Infection Detection and Classification from CT Images
Authors:
Shao** Hu,
Yuan Gao,
Zhangming Niu,
Yinghui Jiang,
Lao Li,
Xianglu Xiao,
Minhao Wang,
Evandro Fei Fang,
Wade Menpes-Smith,
Jun Xia,
Hui Ye,
Guang Yang
Abstract:
An outbreak of a novel coronavirus disease (i.e., COVID-19) has been recorded in Wuhan, China since late December 2019, which subsequently became pandemic around the world. Although COVID-19 is an acutely treated disease, it can also be fatal with a risk of fatality of 4.03% in China and the highest of 13.04% in Algeria and 12.67% Italy (as of 8th April 2020). The onset of serious illness may resu…
▽ More
An outbreak of a novel coronavirus disease (i.e., COVID-19) has been recorded in Wuhan, China since late December 2019, which subsequently became pandemic around the world. Although COVID-19 is an acutely treated disease, it can also be fatal with a risk of fatality of 4.03% in China and the highest of 13.04% in Algeria and 12.67% Italy (as of 8th April 2020). The onset of serious illness may result in death as a consequence of substantial alveolar damage and progressive respiratory failure. Although laboratory testing, e.g., using reverse transcription polymerase chain reaction (RT-PCR), is the golden standard for clinical diagnosis, the tests may produce false negatives. Moreover, under the pandemic situation, shortage of RT-PCR testing resources may also delay the following clinical decision and treatment. Under such circumstances, chest CT imaging has become a valuable tool for both diagnosis and prognosis of COVID-19 patients. In this study, we propose a weakly supervised deep learning strategy for detecting and classifying COVID-19 infection from CT images. The proposed method can minimise the requirements of manual labelling of CT images but still be able to obtain accurate infection detection and distinguish COVID-19 from non-COVID-19 cases. Based on the promising results obtained qualitatively and quantitatively, we can envisage a wide deployment of our developed technique in large-scale clinical studies.
△ Less
Submitted 14 April, 2020;
originally announced April 2020.
-
Central Limit Theorems for Compound Paths on the 2-Dimensional Lattice
Authors:
Evan Fang,
Jonathan Jenkins,
Zack Lee,
Daniel Li,
Ethan Lu,
Steven J. Miller,
Dilhan Salgado,
Joshua M. Siktar
Abstract:
Zeckendorf proved that every integer can be written uniquely as a sum of non-consecutive Fibonacci numbers $\{F_n\}$, and later researchers showed that the distribution of the number of summands needed for such decompositions of integers in $[F_n, F_{n+1})$ converges to a Gaussian as $n\to\infty$. Decomposition problems have been studied extensively for a variety of different sequences and notions…
▽ More
Zeckendorf proved that every integer can be written uniquely as a sum of non-consecutive Fibonacci numbers $\{F_n\}$, and later researchers showed that the distribution of the number of summands needed for such decompositions of integers in $[F_n, F_{n+1})$ converges to a Gaussian as $n\to\infty$. Decomposition problems have been studied extensively for a variety of different sequences and notions of a legal decompositions; for the Fibonacci numbers, a legal decomposition is one for which each summand is used at most once and no two consecutive summands may be chosen. Recently, Chen et al. [CCGJMSY] generalized earlier work to $d$-dimensional lattices of positive integers; there, a legal decomposition is a path such that every point chosen had each component strictly less than the component of the previous chosen point in the path. They were able to prove Gaussianity results despite the lack of uniqueness of the decompositions; however, their results should hold in the more general case where some components are identical. The strictly decreasing assumption was needed in that work to obtain simple, closed form combinatorial expressions, which could then be well approximated and led to the limiting behavior. In this work we remove that assumption through inclusion-exclusion arguments. These lead to more involved combinatorial sums; using generating functions and recurrence relations we obtain tractable forms in $2$ dimensions and prove Gaussianity again; a more involved analysis should work in higher dimensions.
△ Less
Submitted 21 May, 2020; v1 submitted 25 June, 2019;
originally announced June 2019.
-
Inductive Bias of Gradient Descent based Adversarial Training on Separable Data
Authors:
Yan Li,
Ethan X. Fang,
Huan Xu,
Tuo Zhao
Abstract:
Adversarial training is a principled approach for training robust neural networks. Despite of tremendous successes in practice, its theoretical properties still remain largely unexplored. In this paper, we provide new theoretical insights of gradient descent based adversarial training by studying its computational properties, specifically on its inductive bias. We take the binary classification ta…
▽ More
Adversarial training is a principled approach for training robust neural networks. Despite of tremendous successes in practice, its theoretical properties still remain largely unexplored. In this paper, we provide new theoretical insights of gradient descent based adversarial training by studying its computational properties, specifically on its inductive bias. We take the binary classification task on linearly separable data as an illustrative example, where the loss asymptotically attains its infimum as the parameter diverges to infinity along certain directions. Specifically, we show that when the adversarial perturbation during training has bounded $\ell_2$-norm, the classifier learned by gradient descent based adversarial training converges in direction to the maximum $\ell_2$-norm margin classifier at the rate of $\tilde{\mathcal{O}}(1/\sqrt{T})$, significantly faster than the rate $\mathcal{O}(1/\log T)$ of training with clean data. In addition, when the adversarial perturbation during training has bounded $\ell_q$-norm for some $q\ge 1$, the resulting classifier converges in direction to a maximum mixed-norm margin classifier, which has a natural interpretation of robustness, as being the maximum $\ell_2$-norm margin classifier under worst-case $\ell_q$-norm perturbation to the data. Our findings provide theoretical backups for adversarial training that it indeed promotes robustness against adversarial perturbation.
△ Less
Submitted 26 July, 2019; v1 submitted 7 June, 2019;
originally announced June 2019.
-
High-dimensional Interactions Detection with Sparse Principal Hessian Matrix
Authors:
Cheng Yong Tang,
Ethan X. Fang,
Yuexiao Dong
Abstract:
In statistical learning framework with regressions, interactions are the contributions to the response variable from the products of the explanatory variables. In high-dimensional problems, detecting interactions is challenging due to combinatorial complexity and limited data information. We consider detecting interactions by exploring their connections with the principal Hessian matrix. Specifica…
▽ More
In statistical learning framework with regressions, interactions are the contributions to the response variable from the products of the explanatory variables. In high-dimensional problems, detecting interactions is challenging due to combinatorial complexity and limited data information. We consider detecting interactions by exploring their connections with the principal Hessian matrix. Specifically, we propose a one-step synthetic approach for estimating the principal Hessian matrix by a penalized M-estimator. An alternating direction method of multipliers (ADMM) is proposed to efficiently solve the encountered regularized optimization problem. Based on the sparse estimator, we detect the interactions by identifying its nonzero components. Our method directly targets at the interactions, and it requires no structural assumption on the hierarchy of the interaction effects. We show that our estimator is theoretically valid, computationally efficient, and practically useful for detecting the interactions in a broad spectrum of scenarios.
△ Less
Submitted 27 September, 2019; v1 submitted 23 January, 2019;
originally announced January 2019.
-
Multi-Level Stochastic Gradient Methods for Nested Composition Optimization
Authors:
Shuoguang Yang,
Mengdi Wang,
Ethan X. Fang
Abstract:
Stochastic gradient methods are scalable for solving large-scale optimization problems that involve empirical expectations of loss functions. Existing results mainly apply to optimization problems where the objectives are one- or two-level expectations. In this paper, we consider the multi-level compositional optimization problem that involves compositions of multi-level component functions and ne…
▽ More
Stochastic gradient methods are scalable for solving large-scale optimization problems that involve empirical expectations of loss functions. Existing results mainly apply to optimization problems where the objectives are one- or two-level expectations. In this paper, we consider the multi-level compositional optimization problem that involves compositions of multi-level component functions and nested expectations over a random path. It finds applications in risk-averse optimization and sequential planning. We propose a class of multi-level stochastic gradient methods that are motivated from the method of multi-timescale stochastic approximation. First we propose a basic $T$-level stochastic compositional gradient algorithm, establish its almost sure convergence and obtain an $n$-iteration error bound $O (n^{-1/2^T})$. Then we develop accelerated multi-level stochastic gradient methods by using an extrapolation-interpolation scheme to take advantage of the smoothness of individual component functions. When all component functions are smooth, we show that the convergence rate improves to $O(n^{-4/(7+T)})$ for general objectives and $O (n^{-4/(3+T)})$ for strongly convex objectives. We also provide almost sure convergence and rate of convergence results for nonconvex problems. The proposed methods and theoretical results are validated using numerical experiments.
△ Less
Submitted 12 January, 2018; v1 submitted 10 January, 2018;
originally announced January 2018.
-
Misspecified Nonconvex Statistical Optimization for Phase Retrieval
Authors:
Zhuoran Yang,
Lin F. Yang,
Ethan X. Fang,
Tuo Zhao,
Zhaoran Wang,
Matey Neykov
Abstract:
Existing nonconvex statistical optimization theory and methods crucially rely on the correct specification of the underlying "true" statistical models. To address this issue, we take a first step towards taming model misspecification by studying the high-dimensional sparse phase retrieval problem with misspecified link functions. In particular, we propose a simple variant of the thresholded Wirtin…
▽ More
Existing nonconvex statistical optimization theory and methods crucially rely on the correct specification of the underlying "true" statistical models. To address this issue, we take a first step towards taming model misspecification by studying the high-dimensional sparse phase retrieval problem with misspecified link functions. In particular, we propose a simple variant of the thresholded Wirtinger flow algorithm that, given a proper initialization, linearly converges to an estimator with optimal statistical accuracy for a broad family of unknown link functions. We further provide extensive numerical experiments to support our theoretical findings.
△ Less
Submitted 17 December, 2017;
originally announced December 2017.
-
Versatile Robust Clustering of Ad Hoc Cognitive Radio Network
Authors:
Di Li,
Erwin Fang,
James Gross
Abstract:
Cluster structure in cognitive radio networks facilitates cooperative spectrum sensing, routing and other functionalities. The unlicensed channels, which are available for every member of a group of cognitive radio users, consolidate the group into a cluster, and the availability of unlicensed channels decides the robustness of that cluster against the licensed users' influence. This paper analyse…
▽ More
Cluster structure in cognitive radio networks facilitates cooperative spectrum sensing, routing and other functionalities. The unlicensed channels, which are available for every member of a group of cognitive radio users, consolidate the group into a cluster, and the availability of unlicensed channels decides the robustness of that cluster against the licensed users' influence. This paper analyses the problem that how to form robust clusters in cognitive radio network, so that more cognitive radio users can get benefits from cluster structure even when the primary users' operation are intense. We provide a formal description of robust clustering problem, prove it to be NP-hard and propose a centralized solution, besides, a distributed solution is proposed to suit the dynamics in the ad hoc cognitive radio network. Congestion game model is adopted to analyze the process of cluster formation, which not only contributes designing the distributed clustering scheme directly, but also provides the guarantee of convergence into Nash Equilibrium and convergence speed. Our proposed clustering solution is versatile to fulfill some other requirements such as faster convergence and cluster size control. The proposed distributed clustering scheme outperforms the related work in terms of cluster robustness, convergence speed and overhead. The extensive simulation supports our claims.
△ Less
Submitted 16 April, 2017;
originally announced April 2017.
-
Max-Norm Optimization for Robust Matrix Recovery
Authors:
Ethan X. Fang,
Han Liu,
Kim-Chuan Toh,
Wen-Xin Zhou
Abstract:
This paper studies the matrix completion problem under arbitrary sampling schemes. We propose a new estimator incorporating both max-norm and nuclear-norm regularization, based on which we can conduct efficient low-rank matrix recovery using a random subset of entries observed with additive noise under general non-uniform and unknown sampling distributions. This method significantly relaxes the un…
▽ More
This paper studies the matrix completion problem under arbitrary sampling schemes. We propose a new estimator incorporating both max-norm and nuclear-norm regularization, based on which we can conduct efficient low-rank matrix recovery using a random subset of entries observed with additive noise under general non-uniform and unknown sampling distributions. This method significantly relaxes the uniform sampling assumption imposed for the widely used nuclear-norm penalized approach, and makes low-rank matrix recovery feasible in more practical settings. Theoretically, we prove that the proposed estimator achieves fast rates of convergence under different settings. Computationally, we propose an alternating direction method of multipliers algorithm to efficiently compute the estimator, which bridges a gap between theory and practice of machine learning methods with max-norm regularization. Further, we provide thorough numerical studies to evaluate the proposed method using both simulated and real datasets.
△ Less
Submitted 24 September, 2016;
originally announced September 2016.
-
Accelerating Stochastic Composition Optimization
Authors:
Mengdi Wang,
Ji Liu,
Ethan X. Fang
Abstract:
Consider the stochastic composition optimization problem where the objective is a composition of two expected-value functions. We propose a new stochastic first-order method, namely the accelerated stochastic compositional proximal gradient (ASC-PG) method, which updates based on queries to the sampling oracle using two different timescales. The ASC-PG is the first proximal gradient method for the…
▽ More
Consider the stochastic composition optimization problem where the objective is a composition of two expected-value functions. We propose a new stochastic first-order method, namely the accelerated stochastic compositional proximal gradient (ASC-PG) method, which updates based on queries to the sampling oracle using two different timescales. The ASC-PG is the first proximal gradient method for the stochastic composition problem that can deal with nonsmooth regularization penalty. We show that the ASC-PG exhibits faster convergence than the best known algorithms, and that it achieves the optimal sample-error complexity in several important special cases. We further demonstrate the application of ASC-PG to reinforcement learning and conduct numerical experiments.
△ Less
Submitted 25 July, 2016;
originally announced July 2016.
-
Testing and Confidence Intervals for High Dimensional Proportional Hazards Model
Authors:
Ethan X. Fang,
Yang Ning,
Han Liu
Abstract:
This paper proposes a decorrelation-based approach to test hypotheses and construct confidence intervals for the low dimensional component of high dimensional proportional hazards models. Motivated by the geometric projection principle, we propose new decorrelated score, Wald and partial likelihood ratio statistics. Without assuming model selection consistency, we prove the asymptotic normality of…
▽ More
This paper proposes a decorrelation-based approach to test hypotheses and construct confidence intervals for the low dimensional component of high dimensional proportional hazards models. Motivated by the geometric projection principle, we propose new decorrelated score, Wald and partial likelihood ratio statistics. Without assuming model selection consistency, we prove the asymptotic normality of these test statistics, establish their semiparametric optimality. We also develop new procedures for constructing pointwise confidence intervals for the baseline hazard function and baseline survival function. Thorough numerical results are provided to back up our theory.
△ Less
Submitted 16 December, 2014;
originally announced December 2014.
-
Stochastic Compositional Gradient Descent: Algorithms for Minimizing Compositions of Expected-Value Functions
Authors:
Mengdi Wang,
Ethan X. Fang,
Han Liu
Abstract:
Classical stochastic gradient methods are well suited for minimizing expected-value objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values or a composition of two expected-value functions, i.e., problems of the form $\min_x \mathbf{E}_v [f_v\big(\mathbf{E}_w [g_w(x)]\big)]$. In order to solve this stochastic composition problem, we pro…
▽ More
Classical stochastic gradient methods are well suited for minimizing expected-value objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values or a composition of two expected-value functions, i.e., problems of the form $\min_x \mathbf{E}_v [f_v\big(\mathbf{E}_w [g_w(x)]\big)]$. In order to solve this stochastic composition problem, we propose a class of stochastic compositional gradient descent (SCGD) algorithms that can be viewed as stochastic versions of quasi-gradient method. SCGD update the solutions based on noisy sample gradients of $f_v,g_{w}$ and use an auxiliary variable to track the unknown quantity $\mathbf{E}_w[g_w(x)]$. We prove that the SCGD converge almost surely to an optimal solution for convex optimization problems, as long as such a solution exists. The convergence involves the interplay of two iterations with different time scales. For nonsmooth convex problems, the SCGD achieve a convergence rate of $O(k^{-1/4})$ in the general case and $O(k^{-2/3})$ in the strongly convex case, after taking $k$ samples. For smooth convex problems, the SCGD can be accelerated to converge at a rate of $O(k^{-2/7})$ in the general case and $O(k^{-4/5})$ in the strongly convex case. For nonconvex problems, we prove that any limit point generated by SCGD is a stationary point, for which we also provide the convergence rate analysis. Indeed, the stochastic setting where one wants to optimize compositions of expected-value functions is very common in practice. The proposed SCGD methods find wide applications in learning, estimation, dynamic programming, etc.
△ Less
Submitted 14 November, 2014;
originally announced November 2014.