-
Matrix norm shrinkage estimators and priors
Authors:
Xiao Li,
Takeru Matsuda,
Fumiyasu Komaki
Abstract:
We develop a class of minimax estimators for a normal mean matrix under the Frobenius loss, which generalizes the James--Stein and Efron--Morris estimators. It shrinks the Schatten norm towards zero and works well for low-rank matrices. We also propose a class of superharmonic priors based on the Schatten norm, which generalizes Stein's prior and the singular value shrinkage prior. The generalized…
▽ More
We develop a class of minimax estimators for a normal mean matrix under the Frobenius loss, which generalizes the James--Stein and Efron--Morris estimators. It shrinks the Schatten norm towards zero and works well for low-rank matrices. We also propose a class of superharmonic priors based on the Schatten norm, which generalizes Stein's prior and the singular value shrinkage prior. The generalized Bayes estimators and Bayesian predictive densities with respect to these priors are minimax. We examine the performance of the proposed estimators and priors in simulation.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
BHGNN-RT: Network embedding for directed heterogeneous graphs
Authors:
Xiyang Sun,
Fumiyasu Komaki
Abstract:
Networks are one of the most valuable data structures for modeling problems in the real world. However, the most recent node embedding strategies have focused on undirected graphs, with limited attention to directed graphs, especially directed heterogeneous graphs. In this study, we first investigated the network properties of directed heterogeneous graphs. Based on network analysis, we proposed a…
▽ More
Networks are one of the most valuable data structures for modeling problems in the real world. However, the most recent node embedding strategies have focused on undirected graphs, with limited attention to directed graphs, especially directed heterogeneous graphs. In this study, we first investigated the network properties of directed heterogeneous graphs. Based on network analysis, we proposed an embedding method, a bidirectional heterogeneous graph neural network with random teleport (BHGNN-RT), for directed heterogeneous graphs, that leverages bidirectional message-passing process and network heterogeneity. With the optimization of teleport proportion, BHGNN-RT is beneficial to overcome the over-smoothing problem. Extensive experiments on various datasets were conducted to verify the efficacy and efficiency of BHGNN-RT. Furthermore, we investigated the effects of message components, model layer, and teleport proportion on model performance. The performance comparison with all other baselines illustrates that BHGNN-RT achieves state-of-the-art performance, outperforming the benchmark methods in both node classification and unsupervised clustering tasks.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
Double shrinkage priors for a normal mean matrix
Authors:
Takeru Matsuda,
Fumiyasu Komaki,
William E. Strawderman
Abstract:
We consider estimation of a normal mean matrix under the Frobenius loss. Motivated by the Efron--Morris estimator, a generalization of Stein's prior has been recently developed, which is superharmonic and shrinks the singular values towards zero. The generalized Bayes estimator with respect to this prior is minimax and dominates the maximum likelihood estimator. However, here we show that it is in…
▽ More
We consider estimation of a normal mean matrix under the Frobenius loss. Motivated by the Efron--Morris estimator, a generalization of Stein's prior has been recently developed, which is superharmonic and shrinks the singular values towards zero. The generalized Bayes estimator with respect to this prior is minimax and dominates the maximum likelihood estimator. However, here we show that it is inadmissible by using Brown's condition. Then, we develop two types of priors that provide improved generalized Bayes estimators and examine their performance numerically. The proposed priors attain risk reduction by adding scalar shrinkage or column-wise shrinkage to singular value shrinkage. Parallel results for Bayesian predictive densities are also given.
△ Less
Submitted 17 April, 2024; v1 submitted 21 November, 2023;
originally announced November 2023.
-
On High-Dimensional Asymptotic Properties of Model Averaging Estimators
Authors:
Ryo Ando,
Fumiyasu Komaki
Abstract:
When multiple models are considered in regression problems, the model averaging method can be used to weigh and integrate the models. In the present study, we examined how the goodness-of-prediction of the estimator depends on the dimensionality of explanatory variables when using a generalization of the model averaging method in a linear model. We specifically considered the case of high-dimensio…
▽ More
When multiple models are considered in regression problems, the model averaging method can be used to weigh and integrate the models. In the present study, we examined how the goodness-of-prediction of the estimator depends on the dimensionality of explanatory variables when using a generalization of the model averaging method in a linear model. We specifically considered the case of high-dimensional explanatory variables, with multiple linear models deployed for subsets of these variables. Consequently, we derived the optimal weights that yield the best predictions. we also observe that the double-descent phenomenon occurs in the model averaging estimator. Furthermore, we obtained theoretical results by adapting methods such as the random forest to linear regression models. Finally, we conducted a practical verification through numerical experiments.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Predictive densities for multivariate normal models based on extended models and shrinkage Bayes methods
Authors:
Michiko Okudo,
Fumiyasu Komaki
Abstract:
We investigate predictive densities for multivariate normal models with unknown mean vectors and known covariance matrices. Bayesian predictive densities based on shrinkage priors often have complex representations, although they are effective in various problems. We consider extended normal models with mean vectors and covariance matrices as parameters, and adopt predictive densities that belong…
▽ More
We investigate predictive densities for multivariate normal models with unknown mean vectors and known covariance matrices. Bayesian predictive densities based on shrinkage priors often have complex representations, although they are effective in various problems. We consider extended normal models with mean vectors and covariance matrices as parameters, and adopt predictive densities that belong to the extended models including the original normal model. We adopt predictive densities that are optimal with respect to the posterior Bayes risk in the extended models. The proposed predictive density based on a superharmonic shrinkage prior is shown to dominate the Bayesian predictive density based on the uniform prior under a loss function based on the Kullback-Leibler divergence. Our method provides an alternative to the empirical Bayes method, which is widely used to construct tractable predictive densities.
△ Less
Submitted 6 December, 2022;
originally announced December 2022.
-
Improved nearly minimax prediction for independent Poisson processes under Kullback-Leibler loss
Authors:
Xiao Li,
Fumiyasu Komaki
Abstract:
The problem of predicting independent Poisson random variables is commonly encountered in real-life practice. Simultaneous predictive distributions for independent Poisson observables are investigated, and the performance of predictive distributions is evaluated using the Kullback-Leibler (K-L) loss. This study introduces intuitive sufficient conditions, based on superharmonicity of priors, to imp…
▽ More
The problem of predicting independent Poisson random variables is commonly encountered in real-life practice. Simultaneous predictive distributions for independent Poisson observables are investigated, and the performance of predictive distributions is evaluated using the Kullback-Leibler (K-L) loss. This study introduces intuitive sufficient conditions, based on superharmonicity of priors, to improve the Bayesian predictive distribution based on the Jeffreys prior. The sufficient conditions exhibit a certain analogy with those known for the multivariate normal distribution. Additionally, this study examines the case where the observed data and target variables to be predicted are independent Poisson processes with different durations. Examples that satisfy the sufficient conditions are provided, including point and subspace shrinkage priors. The K-L risk of the improved predictions is demonstrated to be less than 1.04 times a minimax lower bound.
△ Less
Submitted 4 December, 2023; v1 submitted 29 September, 2022;
originally announced September 2022.
-
Asymptotic analysis of parameter estimation for the Ewens--Pitman partition
Authors:
Takuya Koriyama,
Takeru Matsuda,
Fumiyasu Komaki
Abstract:
We derive the exact asymptotic distribution of the maximum likelihood estimator $(\hatα_n, \hatθ_n)$ of $(α, θ)$ for the Ewens--Pitman partition in the regime of $0<α<1$ and $θ>-α$: we show that $\hatα_n$ is $n^{α/2}$-consistent and converges to a variance mixture of normal distributions, i.e., $\hatα_n$ is asymptotically mixed normal, while $\hatθ_n$ is not consistent and converges to a transform…
▽ More
We derive the exact asymptotic distribution of the maximum likelihood estimator $(\hatα_n, \hatθ_n)$ of $(α, θ)$ for the Ewens--Pitman partition in the regime of $0<α<1$ and $θ>-α$: we show that $\hatα_n$ is $n^{α/2}$-consistent and converges to a variance mixture of normal distributions, i.e., $\hatα_n$ is asymptotically mixed normal, while $\hatθ_n$ is not consistent and converges to a transformation of the generalized Mittag-Leffler distribution. As an application, we derive a confidence interval of $α$ and propose a hypothesis testing of sparsity for network data. In our proof, we define an empirical measure induced by the Ewens--Pitman partition and prove a suitable convergence of the measure in some test functions, aiming to derive asymptotic behavior of the log likelihood.
△ Less
Submitted 14 May, 2023; v1 submitted 5 July, 2022;
originally announced July 2022.
-
Structured regularization based velocity structure estimation in local earthquake tomography for the adaptation to velocity discontinuities
Authors:
Yohta Yamanaka,
Sumito Kurata,
Keisuke Yano,
Fumiyasu Komaki,
Takahiro Shiina,
Aitaro Kato
Abstract:
We propose a local earthquake tomography method that applies a structured regularization technique to determine sharp changes in Earth's seismic velocity structure using arrival time data of direct waves. Our approach focuses on the ability to better image two common features that are observed in Earth's seismic velocity structure: sharp changes in velocities that correspond to material boundaries…
▽ More
We propose a local earthquake tomography method that applies a structured regularization technique to determine sharp changes in Earth's seismic velocity structure using arrival time data of direct waves. Our approach focuses on the ability to better image two common features that are observed in Earth's seismic velocity structure: sharp changes in velocities that correspond to material boundaries, such as the Conrad and Moho discontinuities; and gradual changes in velocity that are associated with pressure and temperature distributions in the crust and mantle. We employ different penalty terms in the vertical and horizontal directions to refine the earthquake tomography. We utilize a vertical-direction (depth) penalty that takes the form of the l1-sum of the l2-norms of the second-order differences of the horizontal units in the vertical direction. This penalty is intended to represent sharp velocity changes caused by discontinuities by creating a piecewise linear depth profile of seismic velocity. We set a horizontal-direction penalty term on the basis of the l2-norm to express gradual velocity tendencies in the horizontal direction, which has been often used in conventional tomography methods. We use a synthetic data set to demonstrate that our method provides significant improvements over velocity structures estimated using conventional methods by obtaining stable estimates of both steep and gradual changes in velocity. Furthermore, we apply our proposed method to real seismic data in central Japan and present the potential of our method for detecting velocity discontinuities using the observed arrival times from a small number of local earthquakes.
△ Less
Submitted 24 March, 2022; v1 submitted 19 April, 2021;
originally announced April 2021.
-
Enriched standard conjugate priors and the right invariant prior for Wishart distributions
Authors:
Hidemasa Oda,
Fumiyasu Komaki
Abstract:
The prediction of the variance-covariance matrix of the multivariate normal distribution is important in the multivariate analysis. We investigated Bayesian predictive distributions for Wishart distributions under the Kullback-Leibler divergence. The conditional reducibility of the family of Wishart distributions enables us to decompose the risk of a Bayesian predictive distribution. We considered…
▽ More
The prediction of the variance-covariance matrix of the multivariate normal distribution is important in the multivariate analysis. We investigated Bayesian predictive distributions for Wishart distributions under the Kullback-Leibler divergence. The conditional reducibility of the family of Wishart distributions enables us to decompose the risk of a Bayesian predictive distribution. We considered a recently introduced class of prior distributions, which is called the family of enriched standard conjugate prior distributions, and compared the Bayesian predictive distributions based on these prior distributions. Furthermore, we studied the performance of the Bayesian predictive distribution based on the reference prior distribution in the family and showed that there exists a prior distribution in the family that dominates the reference prior distribution. Our study provides new insight into the multivariate analysis when there exists an ordered inferential importance for the independent variables.
△ Less
Submitted 22 September, 2022; v1 submitted 13 January, 2021;
originally announced January 2021.
-
Shrinkage priors for nonparametric Bayesian prediction of nonhomogeneous Poisson processes
Authors:
Fumiyasu Komaki
Abstract:
We consider nonparametric Bayesian estimation and prediction for nonhomogeneous Poisson process models with unknown intensity functions. We propose a class of improper priors for intensity functions. Nonparametric Bayesian inference with kernel mixture based on the class improper priors is shown to be useful, although improper priors have not been widely used for nonparametric Bayes problems. Seve…
▽ More
We consider nonparametric Bayesian estimation and prediction for nonhomogeneous Poisson process models with unknown intensity functions. We propose a class of improper priors for intensity functions. Nonparametric Bayesian inference with kernel mixture based on the class improper priors is shown to be useful, although improper priors have not been widely used for nonparametric Bayes problems. Several theorems corresponding to those for finite-dimensional independent Poisson models hold for nonhomogeneous Poisson process models with infinite-dimensional parameter spaces. Bayesian estimation and prediction based on the improper priors are shown to be admissible under the Kullback--Leibler loss. Numerical methods for Bayesian inference based on the priors are investigated.
△ Less
Submitted 8 May, 2021; v1 submitted 7 June, 2020;
originally announced June 2020.
-
Shrinkage priors on complex-valued circular-symmetric autoregressive processes
Authors:
Hidemasa Oda,
Fumiyasu Komaki
Abstract:
We investigate shrinkage priors on power spectral densities for complex-valued circular-symmetric autoregressive processes. We construct shrinkage predictive power spectral densities, which asymptotically dominate (i) the Bayesian predictive power spectral density based on the Jeffreys prior and (ii) the estimative power spectral density with the maximal likelihood estimator, where the Kullback-Le…
▽ More
We investigate shrinkage priors on power spectral densities for complex-valued circular-symmetric autoregressive processes. We construct shrinkage predictive power spectral densities, which asymptotically dominate (i) the Bayesian predictive power spectral density based on the Jeffreys prior and (ii) the estimative power spectral density with the maximal likelihood estimator, where the Kullback-Leibler divergence from the true power spectral density to a predictive power spectral density is adopted as a risk. Furthermore, we propose general constructions of objective priors for Kähler parameter spaces, utilizing a positive continuous eigenfunction of the Laplace-Beltrami operator with a negative eigenvalue. We present numerical experiments on a complex-valued stationary autoregressive model of order $1$.
△ Less
Submitted 4 February, 2021; v1 submitted 5 April, 2020;
originally announced April 2020.
-
Bayes Extended Estimators for Curved Exponential Families
Authors:
Michiko Okudo,
Fumiyasu Komaki
Abstract:
The Bayesian predictive density has complex representation and does not belong to any finite-dimensional statistical model except for in limited situations. In this paper, we introduce its simple approximate representation employing its projection onto a finite-dimensional exponential family. Its theoretical properties are established parallelly to those of the Bayesian predictive density when the…
▽ More
The Bayesian predictive density has complex representation and does not belong to any finite-dimensional statistical model except for in limited situations. In this paper, we introduce its simple approximate representation employing its projection onto a finite-dimensional exponential family. Its theoretical properties are established parallelly to those of the Bayesian predictive density when the model belongs to curved exponential families. It is also demonstrated that the projection asymptotically coincides with the plugin density with the posterior mean of the expectation parameter of the exponential family, which we refer to as the Bayes extended estimator. Information-geometric correspondence indicates that the Bayesian predictive density can be represented as the posterior mean of the infinite-dimensional exponential family. The Kullback--Leibler risk performance of the approximation is demonstrated by numerical simulations and it indicates that the posterior mean of the expectation parameter approaches the Bayesian predictive density as the dimension of the exponential family increases. It also suggests that approximation by projection onto an exponential family of reasonable size is practically advantageous with respect to risk performance and computational cost.
△ Less
Submitted 29 October, 2020; v1 submitted 18 June, 2019;
originally announced June 2019.
-
Learning partially ranked data based on graph regularization
Authors:
Kento Nakamura,
Keisuke Yano,
Fumiyasu Komaki
Abstract:
Ranked data appear in many different applications, including voting and consumer surveys. There often exhibits a situation in which data are partially ranked. Partially ranked data is thought of as missing data. This paper addresses parameter estimation for partially ranked data under a (possibly) non-ignorable missing mechanism. We propose estimators for both complete rankings and missing mechani…
▽ More
Ranked data appear in many different applications, including voting and consumer surveys. There often exhibits a situation in which data are partially ranked. Partially ranked data is thought of as missing data. This paper addresses parameter estimation for partially ranked data under a (possibly) non-ignorable missing mechanism. We propose estimators for both complete rankings and missing mechanisms together with a simple estimation procedure. Our estimation procedure leverages a graph regularization in conjunction with the Expectation-Maximization algorithm. Our estimation procedure is theoretically guaranteed to have the convergence properties. We reduce a modeling bias by allowing a non-ignorable missing mechanism. In addition, we avoid the inherent complexity within a non-ignorable missing mechanism by introducing a graph regularization. The experimental results demonstrate that the proposed estimators work well under non-ignorable missing mechanisms.
△ Less
Submitted 28 February, 2019;
originally announced February 2019.
-
Minimax Predictive Density for Sparse Count Data
Authors:
Keisuke Yano,
Ryoya Kaneko,
Fumiyasu Komaki
Abstract:
This paper discusses predictive densities under the Kullback--Leibler loss for high-dimensional Poisson sequence models under sparsity constraints. Sparsity in count data implies zero-inflation. We present a class of Bayes predictive densities that attain asymptotic minimaxity in sparse Poisson sequence models. We also show that our class with an estimator of unknown sparsity level plugged-in is a…
▽ More
This paper discusses predictive densities under the Kullback--Leibler loss for high-dimensional Poisson sequence models under sparsity constraints. Sparsity in count data implies zero-inflation. We present a class of Bayes predictive densities that attain asymptotic minimaxity in sparse Poisson sequence models. We also show that our class with an estimator of unknown sparsity level plugged-in is adaptive in the asymptotically minimax sense. For application, we extend our results to settings with quasi-sparsity and with missing-completely-at-random observations. The simulation studies as well as application to real data illustrate the efficiency of the proposed Bayes predictive densities.
△ Less
Submitted 5 September, 2020; v1 submitted 14 December, 2018;
originally announced December 2018.
-
Analysis of Noise Contrastive Estimation from the Perspective of Asymptotic Variance
Authors:
Masatoshi Uehara,
Takeru Matsuda,
Fumiyasu Komaki
Abstract:
There are many models, often called unnormalized models, whose normalizing constants are not calculated in closed form. Maximum likelihood estimation is not directly applicable to unnormalized models. Score matching, contrastive divergence method, pseudo-likelihood, Monte Carlo maximum likelihood, and noise contrastive estimation (NCE) are popular methods for estimating parameters of such models.…
▽ More
There are many models, often called unnormalized models, whose normalizing constants are not calculated in closed form. Maximum likelihood estimation is not directly applicable to unnormalized models. Score matching, contrastive divergence method, pseudo-likelihood, Monte Carlo maximum likelihood, and noise contrastive estimation (NCE) are popular methods for estimating parameters of such models. In this paper, we focus on NCE. The estimator derived from NCE is consistent and asymptotically normal because it is an M-estimator. NCE characteristically uses an auxiliary distribution to calculate the normalizing constant in the same spirit of the importance sampling. In addition, there are several candidates as objective functions of NCE.
We focus on how to reduce asymptotic variance. First, we propose a method for reducing asymptotic variance by estimating the parameters of the auxiliary distribution. Then, we determine the form of the objective functions, where the asymptotic variance takes the smallest values in the original estimator class and the proposed estimator classes. We further analyze the robustness of the estimator.
△ Less
Submitted 23 August, 2018;
originally announced August 2018.
-
On $\varepsilon$-Admissibility in High Dimension and Nonparametrics
Authors:
Keisuke Yano,
Fumiyasu Komaki
Abstract:
In this paper, we discuss the use of $\varepsilon$-admissibility for estimation in high-dimensional and nonparametric statistical models. The minimax rate of convergence is widely used to compare the performance of estimators in high-dimensional and nonparametric models. However, it often works poorly as a criterion of comparison. In such cases, the addition of comparison by $\varepsilon$-admissib…
▽ More
In this paper, we discuss the use of $\varepsilon$-admissibility for estimation in high-dimensional and nonparametric statistical models. The minimax rate of convergence is widely used to compare the performance of estimators in high-dimensional and nonparametric models. However, it often works poorly as a criterion of comparison. In such cases, the addition of comparison by $\varepsilon$-admissibility provides a better outcome. We demonstrate the usefulness of $\varepsilon$-admissibility through high-dimensional Poisson model and Gaussian infinite sequence model, and present noble results.
△ Less
Submitted 12 August, 2017;
originally announced August 2017.
-
Empirical Bayes Matrix Completion
Authors:
Takeru Matsuda,
Fumiyasu Komaki
Abstract:
We develop an empirical Bayes (EB) algorithm for the matrix completion problems. The EB algorithm is motivated from the singular value shrinkage estimator for matrix means by Efron and Morris (1972). Since the EB algorithm is essentially the EM algorithm applied to a simple model, it does not require heuristic parameter tuning other than tolerance. Numerical results demonstrated that the EB algori…
▽ More
We develop an empirical Bayes (EB) algorithm for the matrix completion problems. The EB algorithm is motivated from the singular value shrinkage estimator for matrix means by Efron and Morris (1972). Since the EB algorithm is essentially the EM algorithm applied to a simple model, it does not require heuristic parameter tuning other than tolerance. Numerical results demonstrated that the EB algorithm achieves a good trade-off between accuracy and efficiency compared to existing algorithms and that it works particularly well when the difference between the number of rows and columns is large. Application to real data also shows the practical utility of the EB algorithm.
△ Less
Submitted 6 June, 2017; v1 submitted 5 June, 2017;
originally announced June 2017.
-
Non-asymptotic Bayesian Minimax Adaptation
Authors:
Keisuke Yano,
Fumiyasu Komaki
Abstract:
This paper studies a Bayesian approach to non-asymptotic minimax adaptation in nonparametric estimation. Estimating an input function on the basis of output functions in a Gaussian white-noise model is discussed. The input function is assumed to be in a Sobolev ellipsoid with an unknown smoothness and an unknown radius. Our purpose in this paper is to present a Bayesian approach attaining minimaxi…
▽ More
This paper studies a Bayesian approach to non-asymptotic minimax adaptation in nonparametric estimation. Estimating an input function on the basis of output functions in a Gaussian white-noise model is discussed. The input function is assumed to be in a Sobolev ellipsoid with an unknown smoothness and an unknown radius. Our purpose in this paper is to present a Bayesian approach attaining minimaxity up to a universal constant without any knowledge regarding the smoothness and the radius. Our Bayesian approach provides not only a rate-exact minimax adaptive estimator in large sample asymptotics but also a risk bound for the Bayes estimator quantifying the effects of both the smoothness and the ratio of the squared radius to the noise variance, where the smoothness and the ratio are the key parameters to describe the minimax risk in this model. Application to non-parametric regression models is also discussed.
△ Less
Submitted 29 August, 2018; v1 submitted 4 September, 2016;
originally announced September 2016.
-
Asymptotically Minimax Prediction in Infinite Sequence Models
Authors:
Keisuke Yano,
Fumiyasu Komaki
Abstract:
We study asymptotically minimax predictive distributions in an infinite sequence model. First, we discuss the connection between the prediction in the infinite sequence model and the prediction in the function model. Second, we construct an asymptotically minimax predictive distribution when the parameter space is a known ellipsoid. We show that the Bayesian predictive distribution based on the Ga…
▽ More
We study asymptotically minimax predictive distributions in an infinite sequence model. First, we discuss the connection between the prediction in the infinite sequence model and the prediction in the function model. Second, we construct an asymptotically minimax predictive distribution when the parameter space is a known ellipsoid. We show that the Bayesian predictive distribution based on the Gaussian prior distribution is asymptotically minimax in the ellipsoid. Third, we construct an asymptotically minimax predictive distribution for any Sobolev ellipsoid. We show that the Bayesian predictive distribution based on the product of Stein's priors is asymptotically minimax for any Sobolev ellipsoid. Finally, we present an efficient sampling method from the proposed Bayesian predictive distribution.
△ Less
Submitted 19 July, 2017; v1 submitted 25 June, 2016;
originally announced June 2016.
-
Asymptotic Properties of Bayesian Predictive Densities When the Distributions of Data and Target Variables are Different
Authors:
Fumiyasu Komaki
Abstract:
Bayesian predictive densities when the observed data $x$ and the target variable $y$ to be predicted have different distributions are investigated by using the framework of information geometry. The performance of predictive densities is evaluated by the Kullback--Leibler divergence. The parametric models are formulated as Riemannian manifolds. In the conventional setting in which $x$ and $y$ have…
▽ More
Bayesian predictive densities when the observed data $x$ and the target variable $y$ to be predicted have different distributions are investigated by using the framework of information geometry. The performance of predictive densities is evaluated by the Kullback--Leibler divergence. The parametric models are formulated as Riemannian manifolds. In the conventional setting in which $x$ and $y$ have the same distribution, the Fisher--Rao metric and the Jeffreys prior play essential roles. In the present setting in which $x$ and $y$ have different distributions, a new metric, which we call the predictive metric, constructed by using the Fisher information matrices of $x$ and $y$, and the volume element based on the predictive metric play the corresponding roles. It is shown that Bayesian predictive densities based on priors constructed by using non-constant positive superharmonic functions with respect to the predictive metric asymptotically dominate those based on the volume element prior of the predictive metric.
△ Less
Submitted 26 March, 2015;
originally announced March 2015.
-
Information criteria for multistep ahead predictions
Authors:
Keisuke Yano,
Fumiyasu Komaki
Abstract:
We propose an information criterion for multistep ahead predictions. It is also used for extrapolations. For the derivation, we consider multistep ahead predictions under local misspecification. In the prediction, we show that Bayesian predictive distributions asymptotically have smaller Kullback--Leibler risks than plug-in predictive distributions. From the results, we construct an information cr…
▽ More
We propose an information criterion for multistep ahead predictions. It is also used for extrapolations. For the derivation, we consider multistep ahead predictions under local misspecification. In the prediction, we show that Bayesian predictive distributions asymptotically have smaller Kullback--Leibler risks than plug-in predictive distributions. From the results, we construct an information criterion for multistep ahead predictions by using an asymptotically unbiased estimator of the Kullback--Leibler risk of Bayesian predictive distributions. We show the effectiveness of the proposed information criterion throughout the numerical experiments.
△ Less
Submitted 9 March, 2015;
originally announced March 2015.
-
Relations Between the Conditional Normalized Maximum Likelihood Distributions and the Latent Information Priors
Authors:
Mutsuki Kojima,
Fumiyasu Komaki
Abstract:
We reveal the relations between the conditional normalized maximum likelihood (CNML) distributions and Bayesian predictive densities based on the latent information priors (LIPs). In particular, CNML3, which is one type of CNML distributions, is investigated. The Bayes projection of a predictive density, which is an information projection of the predictive density on a set of Bayesian predictive d…
▽ More
We reveal the relations between the conditional normalized maximum likelihood (CNML) distributions and Bayesian predictive densities based on the latent information priors (LIPs). In particular, CNML3, which is one type of CNML distributions, is investigated. The Bayes projection of a predictive density, which is an information projection of the predictive density on a set of Bayesian predictive densities, is considered. We prove that the sum of the Bayes projection divergence of CNML3 and the conditional mutual information is asymptotically constant. This result implies that the Bayes projection of CNML3 (BPCNML3) is asymptotically identical to the Bayesian predictive density based on LIP. In addition, under some stronger assumptions, we show that BPCNML3 exactly coincides with the Bayesian predictive density based on LIP.
△ Less
Submitted 25 December, 2014;
originally announced December 2014.
-
Singular Value Shrinkage Priors for Bayesian Prediction
Authors:
Takeru Matsuda,
Fumiyasu Komaki
Abstract:
We develop singular value shrinkage priors for the mean matrix parameters in the matrix-variate normal model with known covariance matrices. Our priors are superharmonic and put more weight on matrices with smaller singular values. They are a natural generalization of the Stein prior. Bayes estimators and Bayesian predictive densities based on our priors are minimax and dominate those based on the…
▽ More
We develop singular value shrinkage priors for the mean matrix parameters in the matrix-variate normal model with known covariance matrices. Our priors are superharmonic and put more weight on matrices with smaller singular values. They are a natural generalization of the Stein prior. Bayes estimators and Bayesian predictive densities based on our priors are minimax and dominate those based on the uniform prior in finite samples. In particular, our priors work well when the true value of the parameter has low rank.
△ Less
Submitted 2 April, 2021; v1 submitted 13 August, 2014;
originally announced August 2014.
-
Determinantal Point Process Priors for Bayesian Variable Selection in Linear Regression
Authors:
Mutsuki Kojima,
Fumiyasu Komaki
Abstract:
We propose discrete determinantal point processes (DPPs) for priors on the model parameter in Bayesian variable selection. By our variable selection method, collinear predictors are less likely to be selected simultaneously because of the repulsion property of discrete DPPs. Three types of DPP priors are proposed. We show the efficiency of the proposed priors through numerical experiments and appl…
▽ More
We propose discrete determinantal point processes (DPPs) for priors on the model parameter in Bayesian variable selection. By our variable selection method, collinear predictors are less likely to be selected simultaneously because of the repulsion property of discrete DPPs. Three types of DPP priors are proposed. We show the efficiency of the proposed priors through numerical experiments and applications to collinear datasets.
△ Less
Submitted 9 June, 2014;
originally announced June 2014.
-
Simultaneous prediction for independent Poisson processes with different durations
Authors:
Fumiyasu Komaki
Abstract:
Simultaneous predictive densities for independent Poisson observables are investigated. The observed data and the target variables to be predicted are independently distributed according to different Poisson distributions parametrized by the same parameter. The performance of predictive densities is evaluated by the Kullback-Leibler divergence. A class of prior distributions depending on the objec…
▽ More
Simultaneous predictive densities for independent Poisson observables are investigated. The observed data and the target variables to be predicted are independently distributed according to different Poisson distributions parametrized by the same parameter. The performance of predictive densities is evaluated by the Kullback-Leibler divergence. A class of prior distributions depending on the objective of prediction is introduced. A Bayesian predictive density based on a prior in this class dominates the Bayesian predictive density based on the Jeffreys prior.
△ Less
Submitted 2 February, 2014; v1 submitted 31 January, 2014;
originally announced January 2014.
-
Asymptotically minimax Bayesian predictive densities for multinomial models
Authors:
Fumiyasu Komaki
Abstract:
One-step ahead prediction for the multinomial model is considered. The performance of a predictive density is evaluated by the average Kullback-Leibler divergence from the true density to the predictive density. Asymptotic approximations of risk functions of Bayesian predictive densities based on Dirichlet priors are obtained. It is shown that a Bayesian predictive density based on a specific Diri…
▽ More
One-step ahead prediction for the multinomial model is considered. The performance of a predictive density is evaluated by the average Kullback-Leibler divergence from the true density to the predictive density. Asymptotic approximations of risk functions of Bayesian predictive densities based on Dirichlet priors are obtained. It is shown that a Bayesian predictive density based on a specific Dirichlet prior is asymptotically minimax. The asymptotically minimax prior is different from known objective priors such as the Jeffreys prior or the uniform prior.
△ Less
Submitted 4 December, 2011;
originally announced December 2011.
-
Bayesian Predictive Densities Based on Latent Information Priors
Authors:
Fumiyasu Komaki
Abstract:
Construction methods for prior densities are investigated from a predictive viewpoint. Predictive densities for future observables are constructed by using observed data. The simultaneous distribution of future observables and observed data is assumed to belong to a parametric submodel of a multinomial model. Future observables and data are possibly dependent. The discrepancy of a predictive densi…
▽ More
Construction methods for prior densities are investigated from a predictive viewpoint. Predictive densities for future observables are constructed by using observed data. The simultaneous distribution of future observables and observed data is assumed to belong to a parametric submodel of a multinomial model. Future observables and data are possibly dependent. The discrepancy of a predictive density to the true conditional density of future observables given observed data is evaluated by the Kullback-Leibler divergence. It is proved that limits of Bayesian predictive densities form an essentially complete class. Latent information priors are defined as priors maximizing the conditional mutual information between the parameter and the future observables given the observed data. Minimax predictive densities are constructed as limits of Bayesian predictive densities based on prior sequences converging to the latent information priors.
△ Less
Submitted 26 September, 2010;
originally announced September 2010.
-
Bayesian shrinkage prediction for the regression problem
Authors:
Kei Kobayashi,
Fumiyasu Komaki
Abstract:
We consider Bayesian shrinkage predictions for the Normal regression problem under the frequentist Kullback-Leibler risk function.
Firstly, we consider the multivariate Normal model with an unknown mean and a known covariance. While the unknown mean is fixed, the covariance of future samples can be different from training samples. We show that the Bayesian predictive distribution based on the…
▽ More
We consider Bayesian shrinkage predictions for the Normal regression problem under the frequentist Kullback-Leibler risk function.
Firstly, we consider the multivariate Normal model with an unknown mean and a known covariance. While the unknown mean is fixed, the covariance of future samples can be different from training samples. We show that the Bayesian predictive distribution based on the uniform prior is dominated by that based on a class of priors if the prior distributions for the covariance and future covariance matrices are rotation invariant.
Then, we consider a class of priors for the mean parameters depending on the future covariance matrix. With such a prior, we can construct a Bayesian predictive distribution dominating that based on the uniform prior.
Lastly, applying this result to the prediction of response variables in the Normal linear regression model, we show that there exists a Bayesian predictive distribution dominating that based on the uniform prior. Minimaxity of these Bayesian predictions follows from these results.
△ Less
Submitted 20 January, 2007;
originally announced January 2007.
-
Shrinkage priors for Bayesian prediction
Authors:
Fumiyasu Komaki
Abstract:
We investigate shrinkage priors for constructing Bayesian predictive distributions. It is shown that there exist shrinkage predictive distributions asymptotically dominating Bayesian predictive distributions based on the Jeffreys prior or other vague priors if the model manifold satisfies some differential geometric conditions. Kullback--Leibler divergence from the true distribution to a predict…
▽ More
We investigate shrinkage priors for constructing Bayesian predictive distributions. It is shown that there exist shrinkage predictive distributions asymptotically dominating Bayesian predictive distributions based on the Jeffreys prior or other vague priors if the model manifold satisfies some differential geometric conditions. Kullback--Leibler divergence from the true distribution to a predictive distribution is adopted as a loss function. Conformal transformations of model manifolds corresponding to vague priors are introduced. We show several examples where shrinkage predictive distributions dominate Bayesian predictive distributions based on vague priors.
△ Less
Submitted 1 July, 2006;
originally announced July 2006.
-
Bayesian prediction of the Gaussian states from n sample
Authors:
F. Tanaka,
F. Komaki
Abstract:
Recently quantum prediction problem was proposed in the Bayesian framework. It is shown that Bayesian predictive density operators are the best predictive density operators when we evaluate them by using the average relative entropy based on a prior.As an illustrative example, we treat the Gaussian states family adopting the Gaussian distribution as a prior and give the Bayesian predictive densi…
▽ More
Recently quantum prediction problem was proposed in the Bayesian framework. It is shown that Bayesian predictive density operators are the best predictive density operators when we evaluate them by using the average relative entropy based on a prior.As an illustrative example, we treat the Gaussian states family adopting the Gaussian distribution as a prior and give the Bayesian predictive density operator with the heterodyne measurement fixed. We show that it is better than the plug-in predictive density operator based on the maximum likelihood estimate by calculating each average relative entropy.
△ Less
Submitted 12 May, 2006; v1 submitted 23 October, 2005;
originally announced October 2005.
-
Asymptotic Expansion of the Risk Difference of the Bayesian Spectral Density in the ARMA model
Authors:
Fuyuhiko Tanaka,
Fumiyasu Komaki
Abstract:
The autoregressive moving average (ARMA) model is one of the most important models in time series analysis.We consider the Bayesian estimation of an unknown spectral density in the ARMA model.In the i.i.d. cases, Komaki showed that Bayesian predictive densities based on a superharmonic prior asymptotically dominate those based on the Jeffreys prior.It is shown by using the asymptotic expansion o…
▽ More
The autoregressive moving average (ARMA) model is one of the most important models in time series analysis.We consider the Bayesian estimation of an unknown spectral density in the ARMA model.In the i.i.d. cases, Komaki showed that Bayesian predictive densities based on a superharmonic prior asymptotically dominate those based on the Jeffreys prior.It is shown by using the asymptotic expansion of the risk difference.We obtain the corresponding result in the ARMA model.
△ Less
Submitted 26 October, 2005;
originally announced October 2005.
-
Simultaneous prediction of independent Poisson observables
Authors:
Fumiyasu Komaki
Abstract:
Simultaneous predictive distributions for independent Poisson observables are investigated. A class of improper prior distributions for Poisson means is introduced. The Bayesian predictive distributions based on priors from the introduced class are shown to be admissible under the Kullback-Leibler loss. A Bayesian predictive distribution based on a prior in this class dominates the Bayesian pred…
▽ More
Simultaneous predictive distributions for independent Poisson observables are investigated. A class of improper prior distributions for Poisson means is introduced. The Bayesian predictive distributions based on priors from the introduced class are shown to be admissible under the Kullback-Leibler loss. A Bayesian predictive distribution based on a prior in this class dominates the Bayesian predictive distribution based on the Jeffreys prior.
△ Less
Submitted 5 October, 2004;
originally announced October 2004.