-
Cumulant-based approximation for fast and efficient prediction for species distribution
Authors:
Osamu Komori,
Yusuke Saigusa,
Shinto Eguchi,
Yasuhiro Kubota
Abstract:
Species distribution modeling plays an important role in estimating the habitat suitability of species using environmental variables. For this purpose, Maxent and the Poisson point process are popular and powerful methods extensively employed across various ecological and biological sciences. However, the computational speed becomes prohibitively slow when using huge background datasets, which is…
▽ More
Species distribution modeling plays an important role in estimating the habitat suitability of species using environmental variables. For this purpose, Maxent and the Poisson point process are popular and powerful methods extensively employed across various ecological and biological sciences. However, the computational speed becomes prohibitively slow when using huge background datasets, which is often the case with fine-resolution data or global-scale estimations. To address this problem, we propose a computationally efficient species distribution model using a cumulant-based approximation (CBA) applied to the loss function of $γ$-divergence. Additionally, we introduce a sequential estimating algorithm with an $L_1$ penalty to select important environmental variables closely associated with species distribution. The regularized geometric-mean method, derived from the CBA, demonstrates high computational efficiency and estimation accuracy. Moreover, by applying CBA to Maxent, we establish that Maxent and Fisher linear discriminant analysis are equivalent under a normality assumption. This equivalence leads to an highly efficient computational method for estimating species distribution. The effectiveness of our proposed methods is illustrated through simulation studies and by analyzing data on 226 species from the National Centre for Ecological Analysis and Synthesis and 709 Japanese vascular plant species. The computational efficiency of the proposed methods is significantly improved compared to Maxent, while maintaining comparable estimation accuracy. A R package {\tt CBA} is also prepared to provide all programming codes used in simulation studies and real data analysis.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Robust minimum divergence estimation in a spatial Poisson point process
Authors:
Yusuke Saigusa,
Shinto Eguchi,
Osamu Komori
Abstract:
Species distribution modeling (SDM) plays a crucial role in investigating habitat suitability and addressing various ecological issues. While likelihood analysis is commonly used to draw ecological conclusions, it has been observed that its statistical performance is not robust when faced with slight deviations due to misspecification in SDM. We propose a new robust estimation method based on a no…
▽ More
Species distribution modeling (SDM) plays a crucial role in investigating habitat suitability and addressing various ecological issues. While likelihood analysis is commonly used to draw ecological conclusions, it has been observed that its statistical performance is not robust when faced with slight deviations due to misspecification in SDM. We propose a new robust estimation method based on a novel divergence for the Poisson point process model. The proposed method is characterized by weighting the log-likelihood equation to mitigate the impact of heterogeneous observations in the presence-only data, which can result from model misspecification. We demonstrate that the proposed method improves the predictive performance of the maximum likelihood estimation in our simulation studies and in the analysis of vascular plant data in Japan.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
Statistical learning for species distribution models in ecological studies
Authors:
Osamu Komori,
Yusuke Saigusa,
Shinto Eguchi
Abstract:
We discuss species distribution models (SDM) for biodiversity studies in ecology. SDM plays an important role to estimate abundance of a species based on environmental variables that are closely related with the habitat of the species. The resultant habitat map indicates areas where the species is likely to live, hence it is essential for conservation planning and reserve selection. We especially…
▽ More
We discuss species distribution models (SDM) for biodiversity studies in ecology. SDM plays an important role to estimate abundance of a species based on environmental variables that are closely related with the habitat of the species. The resultant habitat map indicates areas where the species is likely to live, hence it is essential for conservation planning and reserve selection. We especially focus on a Poisson point process and clarify relations with other statistical methods. Then we discuss a Poisson point process from a view point of information divergence, showing the Kullback-Leibler divergence of density functions reduces to the extended Kullback-Leibler divergence of intensity functions. This property enables us to extend the Poisson point process to that derived from other divergence such as $β$ and $γ$ divergences. Finally, we discuss integrated SDM and evaluate the estimating performance based on the Fisher information matrices.
△ Less
Submitted 27 April, 2023;
originally announced April 2023.
-
Gene-Gene association for Imaging Genetics Data using Robust Kernel Canonical Correlation Analysis
Authors:
Md ashad Alam,
Osamu Komori,
Yu-** Wang
Abstract:
In genome-wide interaction studies, to detect gene-gene interactions, most methods are divided into two folds: single nucleotide polymorphisms (SNP) based and gene-based methods. Basically, the methods based on the gene are more effective than the methods based on a single SNP. Recent years, while the kernel canonical correlation analysis (Classical kernel CCA) based U statistic (KCCU) has propose…
▽ More
In genome-wide interaction studies, to detect gene-gene interactions, most methods are divided into two folds: single nucleotide polymorphisms (SNP) based and gene-based methods. Basically, the methods based on the gene are more effective than the methods based on a single SNP. Recent years, while the kernel canonical correlation analysis (Classical kernel CCA) based U statistic (KCCU) has proposed to detect the nonlinear relationship between genes. To estimate the variance in KCCU, they have used resampling based methods which are highly computationally intensive. In addition, classical kernel CCA is not robust to contaminated data. We, therefore, first discuss robust kernel mean element, the robust kernel covariance, and cross-covariance operators. Second, we propose a method based on influence function to estimate the variance of the KCCU. Third, we propose a nonparametric robust KCCU method based on robust kernel CCA, which is designed for contaminated data and less sensitive to noise than classical kernel CCA. Finally, we investigate the proposed methods to synthesized data and imaging genetic data set. Based on gene ontology and pathway analysis, the synthesized and genetics analysis demonstrate that the proposed robust method shows the superior performance of the state-of-the-art methods.
△ Less
Submitted 1 June, 2016;
originally announced June 2016.
-
Spontaneous Clustering via Minimum γ-divergence
Authors:
Akifumi Notsu,
Osamu Komori,
Shinto Eguchi
Abstract:
We propose a new method for clustering based on the local minimization of the γ-divergence, which we call the spontaneous clustering. The greatest advantage of the proposed method is that it automatically detects the number of clusters that adequately reflect the data structure. In contrast, exiting methods such as K-means, fuzzy c-means, and model based clustering need to prescribe the number of…
▽ More
We propose a new method for clustering based on the local minimization of the γ-divergence, which we call the spontaneous clustering. The greatest advantage of the proposed method is that it automatically detects the number of clusters that adequately reflect the data structure. In contrast, exiting methods such as K-means, fuzzy c-means, and model based clustering need to prescribe the number of clusters. We detect all the local minimum points of the γ-divergence, which are defined as the centers of clusters. A necessary and sufficient condition for the γ-divergence to have the local minimum points is also derived in a simple setting. A simulation study and a real data analysis are performed to compare our proposal with existing methods.
△ Less
Submitted 30 April, 2013;
originally announced April 2013.
-
Robust Independent Component Analysis via Minimum Divergence Estimation
Authors:
Peng-Wen Chen,
Hung Hung,
Osamu Komori,
Su-Yun Huang,
Shinto Eguchi
Abstract:
Independent component analysis (ICA) has been shown to be useful in many applications. However, most ICA methods are sensitive to data contamination and outliers. In this article we introduce a general minimum U-divergence framework for ICA, which covers some standard ICA methods as special cases. Within the U-family we further focus on the gamma-divergence due to its desirable property of super r…
▽ More
Independent component analysis (ICA) has been shown to be useful in many applications. However, most ICA methods are sensitive to data contamination and outliers. In this article we introduce a general minimum U-divergence framework for ICA, which covers some standard ICA methods as special cases. Within the U-family we further focus on the gamma-divergence due to its desirable property of super robustness, which gives the proposed method gamma-ICA. Statistical properties and technical conditions for the consistency of gamma-ICA are rigorously studied. In the limiting case, it leads to a necessary and sufficient condition for the consistency of MLE-ICA. This necessary and sufficient condition is weaker than the condition known in the literature. Since the parameter of interest in ICA is an orthogonal matrix, a geometrical algorithm based on gradient flows on special orthogonal group is introduced to implement gamma-ICA. Furthermore, a data-driven selection for the gamma value, which is critical to the achievement of gamma-ICA, is developed. The performance, especially the robustness, of gamma-ICA in comparison with standard ICA methods is demonstrated through experimental studies using simulated data and image data.
△ Less
Submitted 20 October, 2012;
originally announced October 2012.