Search | arXiv e-print repository

KOO approach for scalable variable selection problem in large-dimensional regression

Authors: Zhidong Bai, Kwok Pui Choi, Yasunori Fujikoshi, Jiang Hu

Abstract: An important issue in many multivariate regression problems is to eliminate candidate predictors with null predictor vectors. In large-dimensional (LD) setting where the numbers of responses and predictors are large, model selection encounters the scalability challenge. Knock-one-out (KOO) statistics hold promise to meet this challenge. In this paper, the almost sure limits and the central limit t… ▽ More An important issue in many multivariate regression problems is to eliminate candidate predictors with null predictor vectors. In large-dimensional (LD) setting where the numbers of responses and predictors are large, model selection encounters the scalability challenge. Knock-one-out (KOO) statistics hold promise to meet this challenge. In this paper, the almost sure limits and the central limit theorem of the KOO statistics are derived under the LD setting and mild distributional assumptions (finite fourth moments) of the errors. These theoretical results guarantee the strong consistency of a subset selection rule based on the KOO statistics with a general threshold. For enhancing the robustness of the selection rule, we also propose a bootstrap threshold for the KOO approach. Simulation results support our conclusions and demonstrate the selection probabilities by the KOO approach with the bootstrap threshold outperform the methods using Akaike information threshold, Bayesian information threshold and Mallow's C$_p$ threshold. We compare the proposed KOO approach with those based on information threshold to a chemometrics dataset and a yeast cell-cycle dataset, which suggests our proposed method identifies useful models. △ Less

Submitted 25 April, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

arXiv:2303.05715 [pdf, other]

Context-Based Trit-Plane Coding for Progressive Image Compression

Authors: Seungmin Jeon, Kwang Pyo Choi, Youngo Park, Chang-Su Kim

Abstract: Trit-plane coding enables deep progressive image compression, but it cannot use autoregressive context models. In this paper, we propose the context-based trit-plane coding (CTC) algorithm to achieve progressive compression more compactly. First, we develop the context-based rate reduction module to estimate trit probabilities of latent elements accurately and thus encode the trit-planes compactly… ▽ More Trit-plane coding enables deep progressive image compression, but it cannot use autoregressive context models. In this paper, we propose the context-based trit-plane coding (CTC) algorithm to achieve progressive compression more compactly. First, we develop the context-based rate reduction module to estimate trit probabilities of latent elements accurately and thus encode the trit-planes compactly. Second, we develop the context-based distortion reduction module to refine partial latent tensors from the trit-planes and improve the reconstructed image quality. Third, we propose a retraining scheme for the decoder to attain better rate-distortion tradeoffs. Extensive experiments show that CTC outperforms the baseline trit-plane codec significantly in BD-rate on the Kodak lossless dataset, while increasing the time complexity only marginally. Our codes are available at https://github.com/seungminjeon-github/CTC. △ Less

Submitted 13 March, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

Comments: Accepted to CVPR 2023

arXiv:2207.01831 [pdf, other]

Learning Local Implicit Fourier Representation for Image War**

Authors: Jaewon Lee, Kwang Pyo Choi, Kyong Hwan **

Abstract: Image war** aims to reshape images defined on rectangular grids into arbitrary shapes. Recently, implicit neural functions have shown remarkable performances in representing images in a continuous manner. However, a standalone multi-layer perceptron suffers from learning high-frequency Fourier coefficients. In this paper, we propose a local texture estimator for image war** (LTEW) followed by… ▽ More Image war** aims to reshape images defined on rectangular grids into arbitrary shapes. Recently, implicit neural functions have shown remarkable performances in representing images in a continuous manner. However, a standalone multi-layer perceptron suffers from learning high-frequency Fourier coefficients. In this paper, we propose a local texture estimator for image war** (LTEW) followed by an implicit neural representation to deform images into continuous shapes. Local textures estimated from a deep super-resolution (SR) backbone are multiplied by locally-varying Jacobian matrices of a coordinate transformation to predict Fourier responses of a warped image. Our LTEW-based neural function outperforms existing war** methods for asymmetric-scale SR and homography transform. Furthermore, our algorithm well generalizes arbitrary coordinate transformations, such as homography transform with a large magnification factor and equirectangular projection (ERP) perspective transform, which are not provided in training. △ Less

Submitted 5 July, 2022; originally announced July 2022.

Comments: ECCV 2022 camera-ready version (https://ipl.dgist.ac.kr/LTEW.pdf)

arXiv:2112.06334 [pdf, other]

DPICT: Deep Progressive Image Compression Using Trit-Planes

Authors: Jae-Han Lee, Seungmin Jeon, Kwang Pyo Choi, Youngo Park, Chang-Su Kim

Abstract: We propose the deep progressive image compression using trit-planes (DPICT) algorithm, which is the first learning-based codec supporting fine granular scalability (FGS). First, we transform an image into a latent tensor using an analysis network. Then, we represent the latent tensor in ternary digits (trits) and encode it into a compressed bitstream trit-plane by trit-plane in the decreasing orde… ▽ More We propose the deep progressive image compression using trit-planes (DPICT) algorithm, which is the first learning-based codec supporting fine granular scalability (FGS). First, we transform an image into a latent tensor using an analysis network. Then, we represent the latent tensor in ternary digits (trits) and encode it into a compressed bitstream trit-plane by trit-plane in the decreasing order of significance. Moreover, within each trit-plane, we sort the trits according to their rate-distortion priorities and transmit more important information first. Since the compression network is less optimized for the cases of using fewer trit-planes, we develop a postprocessing network for refining reconstructed images at low rates. Experimental results show that DPICT outperforms conventional progressive codecs significantly, while enabling FGS transmission. Codes are available at https://github.com/jaehanlee-mcl/DPICT. △ Less

Submitted 6 May, 2022; v1 submitted 12 December, 2021; originally announced December 2021.

Comments: Accepted to CVPR 2022 (Oral presentation)

MSC Class: 94A08 (Primary) 68T07; 68P30; 68U10 (Secondary) ACM Class: I.4.2; I.4.9

arXiv:2110.02850 [pdf, other]

Distributions of cherries and pitchforks for the Ford model

Authors: Gursharn Kaur, Kwok Pui Choi, Taoyang Wu

Abstract: We study two fringe subtree counting statistics, the number of cherries and that of pitchforks for Ford's $α$ model, a one-parameter family of random phylogenetic tree models that includes the uniform and the Yule models, two tree models commonly used in phylogenetics. Based on a nonuniform version of the extended Pólya urn models in which negative entries are permitted for their replacement matri… ▽ More We study two fringe subtree counting statistics, the number of cherries and that of pitchforks for Ford's $α$ model, a one-parameter family of random phylogenetic tree models that includes the uniform and the Yule models, two tree models commonly used in phylogenetics. Based on a nonuniform version of the extended Pólya urn models in which negative entries are permitted for their replacement matrices, we obtain the strong law of large numbers and the central limit theorem for the joint distribution of these two count statistics for the Ford model. Furthermore, we derive a recursive formula for computing the exact joint distribution of these two statistics. This leads to exact formulas for their means and higher order asymptotic expansions of their second moments, which allows us to identify a critical parameter value for the correlation between these two statistics. That is, when $n$ is sufficiently large, they are negatively correlated for $0\le α\le 1/2$ and positively correlated for $1/2<α<1$. △ Less

Submitted 4 November, 2021; v1 submitted 6 October, 2021; originally announced October 2021.

Comments: 23 pages, 2 figures

arXiv:2103.11113 [pdf, other]

Exploration Enhancement of Nature-Inspired Swarm-based Optimization Algorithms

Authors: Kwok Pui Choi, Enzio Hai Hong Kam, Tze Leung Lai, Xin T. Tong, Weng Kee Wong

Abstract: Nature-inspired swarm-based algorithms have been widely applied to tackle high-dimensional and complex optimization problems across many disciplines. They are general purpose optimization algorithms, easy to use and implement, flexible and assumption-free. A common drawback of these algorithms is premature convergence and the solution found is not a global optimum. We provide sufficient conditions… ▽ More Nature-inspired swarm-based algorithms have been widely applied to tackle high-dimensional and complex optimization problems across many disciplines. They are general purpose optimization algorithms, easy to use and implement, flexible and assumption-free. A common drawback of these algorithms is premature convergence and the solution found is not a global optimum. We provide sufficient conditions for an algorithm to converge almost surely (a.s.) to a global optimum. We then propose a general, simple and effective strategy, called Perturbation-Projection (PP), to enhance an algorithm's exploration capability so that our convergence conditions are guaranteed to hold. We illustrate this approach using three widely used nature-inspired swarm-based optimization algorithms: particle swarm optimization (PSO), bat algorithm (BAT) and competitive swarm optimizer (CSO). Extensive numerical experiments show that each of the three algorithms with the enhanced PP strategy outperforms the original version in a number of notable ways. △ Less

Submitted 20 March, 2021; originally announced March 2021.

Comments: 20 pages, 9 figures

arXiv:2101.07488 [pdf, other]

On asymptotic joint distributions of cherries and pitchforks for random phylogenetic trees

Authors: Kwok Pui Choi, Gursharn Kaur, Taoyang Wu

Abstract: Tree shape statistics provide valuable quantitative insights into evolutionary mechanisms underpinning phylogenetic trees, a commonly used graph representation of evolution systems ranging from viruses to species. By develo** limit theorems for a version of extended Pólya urn models in which negative entries are permitted for their replacement matrices, we present strong laws of large numbers an… ▽ More Tree shape statistics provide valuable quantitative insights into evolutionary mechanisms underpinning phylogenetic trees, a commonly used graph representation of evolution systems ranging from viruses to species. By develo** limit theorems for a version of extended Pólya urn models in which negative entries are permitted for their replacement matrices, we present strong laws of large numbers and central limit theorems for asymptotic joint distributions of two subtree counting statistics, the number of cherries and that of pitchforks, for random phylogenetic trees generated by two widely used null tree models: the proportional to distinguishable arrangements (PDA) and the Yule-Harding-Kingman (YHK) models. Our results indicate that the limiting behaviour of these two statistics, when appropriately scaled, are independent of the initial trees used in the tree generating process. △ Less

Submitted 19 January, 2021; originally announced January 2021.

arXiv:2002.12643 [pdf, other]

On cherry and pitchfork distributions of random rooted and unrooted phylogenetic trees

Authors: Kwok Pui Choi, Ariadne Thompson, Taoyang Wu

Abstract: Tree shape statistics are important for investigating evolutionary mechanisms mediating phylogenetic trees. As a step towards bridging shape statistics between rooted and unrooted trees, we present a comparison study on two subtree statistics known as numbers of cherries and pitchforks for the proportional to distinguishable arrangements (PDA) and the Yule-Harding-Kingman (YHK) models. Based on re… ▽ More Tree shape statistics are important for investigating evolutionary mechanisms mediating phylogenetic trees. As a step towards bridging shape statistics between rooted and unrooted trees, we present a comparison study on two subtree statistics known as numbers of cherries and pitchforks for the proportional to distinguishable arrangements (PDA) and the Yule-Harding-Kingman (YHK) models. Based on recursive formulas on the joint distribution of the number of cherries and that of pitchforks, it is shown that cherry distributions are log-concave for both rooted and unrooted trees under these two models. Furthermore, the mean number of cherries and that of pitchforks for unrooted trees converge respectively to those for rooted trees under the YHK model while there exists a limiting gap of 1/4 for the PDA model. Finally, the total variation distances between the cherry distributions of rooted and those of unrooted trees converge for both models. Our results indicate that caution is required for conducting statistical analysis for tree shapes involving both rooted and unrooted trees. △ Less

Submitted 28 February, 2020; originally announced February 2020.

Comments: 26 pages

arXiv:1508.03139 [pdf, other]

On joint subtree distributions under two evolutionary models

Authors: Taoyang Wu, Kwok Pui Choi

Abstract: In population and evolutionary biology, hypotheses about micro-evolutionary and macro-evolutionary processes are commonly tested by comparing the shape indices of empirical evolutionary trees with those predicted by neutral models. A key ingredient in this approach is the ability to compute and quantify distributions of various tree shape indices under random models of interest. As a step to meet… ▽ More In population and evolutionary biology, hypotheses about micro-evolutionary and macro-evolutionary processes are commonly tested by comparing the shape indices of empirical evolutionary trees with those predicted by neutral models. A key ingredient in this approach is the ability to compute and quantify distributions of various tree shape indices under random models of interest. As a step to meet this challenge, in this paper we investigate the joint distribution of cherries and pitchforks (that is, subtrees with two and three leaves) under two widely used null models: the Yule-Harding-Kingman (YHK) model and the proportional to distinguishable arrangements (PDA) model. Based on two novel recursive formulae, we propose a dynamic approach to numerically compute the exact joint distribution (and hence the marginal distributions) for trees of any size. We also obtained insights into the statistical properties of trees generated under these two models, including a constant correlation between the cherry and the pitchfork distributions under the YHK model, the log-concavity and unimodality of cherry distributions under both models. In particular, we show the existence of a unique change point for cherry distribution between the two models, that is, there exists a critical value $τ_n$ for each $n\geq 4$ such that the probability that a random tree with $n$ leaves generated under the YHK model contains $k$ cherries is lower than that under the PDA model if $1<k< τ_n$, and higher if $τ_n<k\le n/2$. △ Less

Submitted 13 August, 2015; originally announced August 2015.

Comments: 22 pages, 4 figures

arXiv:1306.4253 [pdf, ps, other]

Systematic assessment of the expected length, variance and distribution of Longest Common Subsequences

Authors: Kang Ning, Kwok Pui Choi

Abstract: The Longest Common Subsequence (LCS) problem is a very important problem in math- ematics, which has a broad application in scheduling problems, physics and bioinformatics. It is known that the given two random sequences of infinite lengths, the expected length of LCS will be a constant. however, the value of this constant is not yet known. Moreover, the variance distribution of LCS length is also… ▽ More The Longest Common Subsequence (LCS) problem is a very important problem in math- ematics, which has a broad application in scheduling problems, physics and bioinformatics. It is known that the given two random sequences of infinite lengths, the expected length of LCS will be a constant. however, the value of this constant is not yet known. Moreover, the variance distribution of LCS length is also not fully understood. The problem becomes more difficult when there are (a) multiple sequences, (b) sequences with non-even distribution of alphabets and (c) large alphabets. This work focus on these more complicated issues. We have systematically analyze the expected length, variance and distribution of LCS based on extensive Monte Carlo simulation. The results on expected length are consistent with currently proved theoretical results, and the analysis on variance and distribution provide further insights into the problem. △ Less

Submitted 18 June, 2013; originally announced June 2013.

arXiv:1203.2430 [pdf, other]

Reconstruction of Network Evolutionary History from Extant Network Topology and Duplication History

Authors: Si Li, Kwok Pui Choi, Taoyang Wu, Louxin Zhang

Abstract: Genome-wide protein-protein interaction (PPI) data are readily available thanks to recent breakthroughs in biotechnology. However, PPI networks of extant organisms are only snapshots of the network evolution. How to infer the whole evolution history becomes a challenging problem in computational biology. In this paper, we present a likelihood-based approach to inferring network evolution history f… ▽ More Genome-wide protein-protein interaction (PPI) data are readily available thanks to recent breakthroughs in biotechnology. However, PPI networks of extant organisms are only snapshots of the network evolution. How to infer the whole evolution history becomes a challenging problem in computational biology. In this paper, we present a likelihood-based approach to inferring network evolution history from the topology of PPI networks and the duplication relationship among the paralogs. Simulations show that our approach outperforms the existing ones in terms of the accuracy of reconstruction. Moreover, the growth parameters of several real PPI networks estimated by our method are more consistent with the ones predicted in literature. △ Less

Submitted 12 March, 2012; originally announced March 2012.

Comments: 15 pages, 5 figures, submitted to ISBRA 2012

arXiv:1104.4396 [pdf, ps, other]

doi 10.3150/10-BEJ287

Limit theorems for functions of marginal quantiles

Authors: G. Jogesh Babu, Zhidong Bai, Kwok Pui Choi, Vasudevan Mangalam

Abstract: Multivariate distributions are explored using the joint distributions of marginal sample quantiles. Limit theory for the mean of a function of order statistics is presented. The results include a multivariate central limit theorem and a strong law of large numbers. A result similar to Bahadur's representation of quantiles is established for the mean of a function of the marginal quantiles. In part… ▽ More Multivariate distributions are explored using the joint distributions of marginal sample quantiles. Limit theory for the mean of a function of order statistics is presented. The results include a multivariate central limit theorem and a strong law of large numbers. A result similar to Bahadur's representation of quantiles is established for the mean of a function of the marginal quantiles. In particular, it is shown that \[\sqrt{n}\Biggl(\frac{1}{n}\sum_{i=1}^nφ\bigl(X_{n:i}^{(1)},...,X_{n:i}^{(d)}\bigr)-\barγ\Biggr)=\frac{1}{\sqrt{n}}\sum_{i=1}^nZ_{n,i}+\mathrm{o}_P(1)\] as $n\rightarrow\infty$, where $\barγ$ is a constant and $Z_{n,i}$ are i.i.d. random variables for each $n$. This leads to the central limit theorem. Weak convergence to a Gaussian process using equicontinuity of functions is indicated. The results are established under very general conditions. These conditions are shown to be satisfied in many commonly occurring situations. △ Less

Submitted 22 April, 2011; originally announced April 2011.

Comments: Published in at http://dx.doi.org/10.3150/10-BEJ287 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm)

Report number: IMS-BEJ-BEJ287

Journal ref: Bernoulli 2011, Vol. 17, No. 2, 671-686

Showing 1–12 of 12 results for author: Choi, K P