Search | arXiv e-print repository

Matrix norm shrinkage estimators and priors

Authors: Xiao Li, Takeru Matsuda, Fumiyasu Komaki

Abstract: We develop a class of minimax estimators for a normal mean matrix under the Frobenius loss, which generalizes the James--Stein and Efron--Morris estimators. It shrinks the Schatten norm towards zero and works well for low-rank matrices. We also propose a class of superharmonic priors based on the Schatten norm, which generalizes Stein's prior and the singular value shrinkage prior. The generalized… ▽ More We develop a class of minimax estimators for a normal mean matrix under the Frobenius loss, which generalizes the James--Stein and Efron--Morris estimators. It shrinks the Schatten norm towards zero and works well for low-rank matrices. We also propose a class of superharmonic priors based on the Schatten norm, which generalizes Stein's prior and the singular value shrinkage prior. The generalized Bayes estimators and Bayesian predictive densities with respect to these priors are minimax. We examine the performance of the proposed estimators and priors in simulation. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2311.14404 [pdf, other]

BHGNN-RT: Network embedding for directed heterogeneous graphs

Authors: Xiyang Sun, Fumiyasu Komaki

Abstract: Networks are one of the most valuable data structures for modeling problems in the real world. However, the most recent node embedding strategies have focused on undirected graphs, with limited attention to directed graphs, especially directed heterogeneous graphs. In this study, we first investigated the network properties of directed heterogeneous graphs. Based on network analysis, we proposed a… ▽ More Networks are one of the most valuable data structures for modeling problems in the real world. However, the most recent node embedding strategies have focused on undirected graphs, with limited attention to directed graphs, especially directed heterogeneous graphs. In this study, we first investigated the network properties of directed heterogeneous graphs. Based on network analysis, we proposed an embedding method, a bidirectional heterogeneous graph neural network with random teleport (BHGNN-RT), for directed heterogeneous graphs, that leverages bidirectional message-passing process and network heterogeneity. With the optimization of teleport proportion, BHGNN-RT is beneficial to overcome the over-smoothing problem. Extensive experiments on various datasets were conducted to verify the efficacy and efficiency of BHGNN-RT. Furthermore, we investigated the effects of message components, model layer, and teleport proportion on model performance. The performance comparison with all other baselines illustrates that BHGNN-RT achieves state-of-the-art performance, outperforming the benchmark methods in both node classification and unsupervised clustering tasks. △ Less

Submitted 24 November, 2023; originally announced November 2023.

arXiv:2311.13137 [pdf, ps, other]

Double shrinkage priors for a normal mean matrix

Authors: Takeru Matsuda, Fumiyasu Komaki, William E. Strawderman

Abstract: We consider estimation of a normal mean matrix under the Frobenius loss. Motivated by the Efron--Morris estimator, a generalization of Stein's prior has been recently developed, which is superharmonic and shrinks the singular values towards zero. The generalized Bayes estimator with respect to this prior is minimax and dominates the maximum likelihood estimator. However, here we show that it is in… ▽ More We consider estimation of a normal mean matrix under the Frobenius loss. Motivated by the Efron--Morris estimator, a generalization of Stein's prior has been recently developed, which is superharmonic and shrinks the singular values towards zero. The generalized Bayes estimator with respect to this prior is minimax and dominates the maximum likelihood estimator. However, here we show that it is inadmissible by using Brown's condition. Then, we develop two types of priors that provide improved generalized Bayes estimators and examine their performance numerically. The proposed priors attain risk reduction by adding scalar shrinkage or column-wise shrinkage to singular value shrinkage. Parallel results for Bayesian predictive densities are also given. △ Less

Submitted 17 April, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

arXiv:2308.09476 [pdf, other]

On High-Dimensional Asymptotic Properties of Model Averaging Estimators

Authors: Ryo Ando, Fumiyasu Komaki

Abstract: When multiple models are considered in regression problems, the model averaging method can be used to weigh and integrate the models. In the present study, we examined how the goodness-of-prediction of the estimator depends on the dimensionality of explanatory variables when using a generalization of the model averaging method in a linear model. We specifically considered the case of high-dimensio… ▽ More When multiple models are considered in regression problems, the model averaging method can be used to weigh and integrate the models. In the present study, we examined how the goodness-of-prediction of the estimator depends on the dimensionality of explanatory variables when using a generalization of the model averaging method in a linear model. We specifically considered the case of high-dimensional explanatory variables, with multiple linear models deployed for subsets of these variables. Consequently, we derived the optimal weights that yield the best predictions. we also observe that the double-descent phenomenon occurs in the model averaging estimator. Furthermore, we obtained theoretical results by adapting methods such as the random forest to linear regression models. Finally, we conducted a practical verification through numerical experiments. △ Less

Submitted 18 August, 2023; originally announced August 2023.

Comments: 28 pages,8 figures

ACM Class: G.3

arXiv:2212.03444 [pdf, other]

Predictive densities for multivariate normal models based on extended models and shrinkage Bayes methods

Authors: Michiko Okudo, Fumiyasu Komaki

Abstract: We investigate predictive densities for multivariate normal models with unknown mean vectors and known covariance matrices. Bayesian predictive densities based on shrinkage priors often have complex representations, although they are effective in various problems. We consider extended normal models with mean vectors and covariance matrices as parameters, and adopt predictive densities that belong… ▽ More We investigate predictive densities for multivariate normal models with unknown mean vectors and known covariance matrices. Bayesian predictive densities based on shrinkage priors often have complex representations, although they are effective in various problems. We consider extended normal models with mean vectors and covariance matrices as parameters, and adopt predictive densities that belong to the extended models including the original normal model. We adopt predictive densities that are optimal with respect to the posterior Bayes risk in the extended models. The proposed predictive density based on a superharmonic shrinkage prior is shown to dominate the Bayesian predictive density based on the uniform prior under a loss function based on the Kullback-Leibler divergence. Our method provides an alternative to the empirical Bayes method, which is widely used to construct tractable predictive densities. △ Less

Submitted 6 December, 2022; originally announced December 2022.

arXiv:2209.14618 [pdf, other]

Improved nearly minimax prediction for independent Poisson processes under Kullback-Leibler loss

Authors: Xiao Li, Fumiyasu Komaki

Abstract: The problem of predicting independent Poisson random variables is commonly encountered in real-life practice. Simultaneous predictive distributions for independent Poisson observables are investigated, and the performance of predictive distributions is evaluated using the Kullback-Leibler (K-L) loss. This study introduces intuitive sufficient conditions, based on superharmonicity of priors, to imp… ▽ More The problem of predicting independent Poisson random variables is commonly encountered in real-life practice. Simultaneous predictive distributions for independent Poisson observables are investigated, and the performance of predictive distributions is evaluated using the Kullback-Leibler (K-L) loss. This study introduces intuitive sufficient conditions, based on superharmonicity of priors, to improve the Bayesian predictive distribution based on the Jeffreys prior. The sufficient conditions exhibit a certain analogy with those known for the multivariate normal distribution. Additionally, this study examines the case where the observed data and target variables to be predicted are independent Poisson processes with different durations. Examples that satisfy the sufficient conditions are provided, including point and subspace shrinkage priors. The K-L risk of the improved predictions is demonstrated to be less than 1.04 times a minimax lower bound. △ Less

Submitted 4 December, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

arXiv:2207.01949 [pdf, other]

Asymptotic analysis of parameter estimation for the Ewens--Pitman partition

Authors: Takuya Koriyama, Takeru Matsuda, Fumiyasu Komaki

Abstract: We derive the exact asymptotic distribution of the maximum likelihood estimator $(\hatα_n, \hatθ_n)$ of $(α, θ)$ for the Ewens--Pitman partition in the regime of $0<α<1$ and $θ>-α$: we show that $\hatα_n$ is $n^{α/2}$-consistent and converges to a variance mixture of normal distributions, i.e., $\hatα_n$ is asymptotically mixed normal, while $\hatθ_n$ is not consistent and converges to a transform… ▽ More We derive the exact asymptotic distribution of the maximum likelihood estimator $(\hatα_n, \hatθ_n)$ of $(α, θ)$ for the Ewens--Pitman partition in the regime of $0<α<1$ and $θ>-α$: we show that $\hatα_n$ is $n^{α/2}$-consistent and converges to a variance mixture of normal distributions, i.e., $\hatα_n$ is asymptotically mixed normal, while $\hatθ_n$ is not consistent and converges to a transformation of the generalized Mittag-Leffler distribution. As an application, we derive a confidence interval of $α$ and propose a hypothesis testing of sparsity for network data. In our proof, we define an empirical measure induced by the Ewens--Pitman partition and prove a suitable convergence of the measure in some test functions, aiming to derive asymptotic behavior of the log likelihood. △ Less

Submitted 14 May, 2023; v1 submitted 5 July, 2022; originally announced July 2022.

Comments: 58 pages

arXiv:2104.09067 [pdf]

doi 10.1186/s40623-022-01600-x

Structured regularization based velocity structure estimation in local earthquake tomography for the adaptation to velocity discontinuities

Authors: Yohta Yamanaka, Sumito Kurata, Keisuke Yano, Fumiyasu Komaki, Takahiro Shiina, Aitaro Kato

Abstract: We propose a local earthquake tomography method that applies a structured regularization technique to determine sharp changes in Earth's seismic velocity structure using arrival time data of direct waves. Our approach focuses on the ability to better image two common features that are observed in Earth's seismic velocity structure: sharp changes in velocities that correspond to material boundaries… ▽ More We propose a local earthquake tomography method that applies a structured regularization technique to determine sharp changes in Earth's seismic velocity structure using arrival time data of direct waves. Our approach focuses on the ability to better image two common features that are observed in Earth's seismic velocity structure: sharp changes in velocities that correspond to material boundaries, such as the Conrad and Moho discontinuities; and gradual changes in velocity that are associated with pressure and temperature distributions in the crust and mantle. We employ different penalty terms in the vertical and horizontal directions to refine the earthquake tomography. We utilize a vertical-direction (depth) penalty that takes the form of the l1-sum of the l2-norms of the second-order differences of the horizontal units in the vertical direction. This penalty is intended to represent sharp velocity changes caused by discontinuities by creating a piecewise linear depth profile of seismic velocity. We set a horizontal-direction penalty term on the basis of the l2-norm to express gradual velocity tendencies in the horizontal direction, which has been often used in conventional tomography methods. We use a synthetic data set to demonstrate that our method provides significant improvements over velocity structures estimated using conventional methods by obtaining stable estimates of both steep and gradual changes in velocity. Furthermore, we apply our proposed method to real seismic data in central Japan and present the potential of our method for detecting velocity discontinuities using the observed arrival times from a small number of local earthquakes. △ Less

Submitted 24 March, 2022; v1 submitted 19 April, 2021; originally announced April 2021.

Comments: 18 pages, 11 figures

Journal ref: Earth, Planets and Space volume 74, Article number: 43 (2022)

arXiv:2101.04919 [pdf, other]

Enriched standard conjugate priors and the right invariant prior for Wishart distributions

Authors: Hidemasa Oda, Fumiyasu Komaki

Abstract: The prediction of the variance-covariance matrix of the multivariate normal distribution is important in the multivariate analysis. We investigated Bayesian predictive distributions for Wishart distributions under the Kullback-Leibler divergence. The conditional reducibility of the family of Wishart distributions enables us to decompose the risk of a Bayesian predictive distribution. We considered… ▽ More The prediction of the variance-covariance matrix of the multivariate normal distribution is important in the multivariate analysis. We investigated Bayesian predictive distributions for Wishart distributions under the Kullback-Leibler divergence. The conditional reducibility of the family of Wishart distributions enables us to decompose the risk of a Bayesian predictive distribution. We considered a recently introduced class of prior distributions, which is called the family of enriched standard conjugate prior distributions, and compared the Bayesian predictive distributions based on these prior distributions. Furthermore, we studied the performance of the Bayesian predictive distribution based on the reference prior distribution in the family and showed that there exists a prior distribution in the family that dominates the reference prior distribution. Our study provides new insight into the multivariate analysis when there exists an ordered inferential importance for the independent variables. △ Less

Submitted 22 September, 2022; v1 submitted 13 January, 2021; originally announced January 2021.

MSC Class: 62C10 (Primary) 62H12 (Secondary)

arXiv:2006.04052 [pdf, ps, other]

doi 10.1109/TIT.2021.3084062

Shrinkage priors for nonparametric Bayesian prediction of nonhomogeneous Poisson processes

Authors: Fumiyasu Komaki

Abstract: We consider nonparametric Bayesian estimation and prediction for nonhomogeneous Poisson process models with unknown intensity functions. We propose a class of improper priors for intensity functions. Nonparametric Bayesian inference with kernel mixture based on the class improper priors is shown to be useful, although improper priors have not been widely used for nonparametric Bayes problems. Seve… ▽ More We consider nonparametric Bayesian estimation and prediction for nonhomogeneous Poisson process models with unknown intensity functions. We propose a class of improper priors for intensity functions. Nonparametric Bayesian inference with kernel mixture based on the class improper priors is shown to be useful, although improper priors have not been widely used for nonparametric Bayes problems. Several theorems corresponding to those for finite-dimensional independent Poisson models hold for nonhomogeneous Poisson process models with infinite-dimensional parameter spaces. Bayesian estimation and prediction based on the improper priors are shown to be admissible under the Kullback--Leibler loss. Numerical methods for Bayesian inference based on the priors are investigated. △ Less

Submitted 8 May, 2021; v1 submitted 7 June, 2020; originally announced June 2020.

Journal ref: IEEE Transactions on Information Theory, Volume 67, Issue 8, 2021, Pages 5305 - 5317

arXiv:2004.02389 [pdf, other]

Shrinkage priors on complex-valued circular-symmetric autoregressive processes

Authors: Hidemasa Oda, Fumiyasu Komaki

Abstract: We investigate shrinkage priors on power spectral densities for complex-valued circular-symmetric autoregressive processes. We construct shrinkage predictive power spectral densities, which asymptotically dominate (i) the Bayesian predictive power spectral density based on the Jeffreys prior and (ii) the estimative power spectral density with the maximal likelihood estimator, where the Kullback-Le… ▽ More We investigate shrinkage priors on power spectral densities for complex-valued circular-symmetric autoregressive processes. We construct shrinkage predictive power spectral densities, which asymptotically dominate (i) the Bayesian predictive power spectral density based on the Jeffreys prior and (ii) the estimative power spectral density with the maximal likelihood estimator, where the Kullback-Leibler divergence from the true power spectral density to a predictive power spectral density is adopted as a risk. Furthermore, we propose general constructions of objective priors for Kähler parameter spaces, utilizing a positive continuous eigenfunction of the Laplace-Beltrami operator with a negative eigenvalue. We present numerical experiments on a complex-valued stationary autoregressive model of order $1$. △ Less

Submitted 4 February, 2021; v1 submitted 5 April, 2020; originally announced April 2020.

Comments: revised; Figures are modified

MSC Class: 62F15 (Primary) 62C15 (Secondary)

arXiv:1906.07514 [pdf, other]

Bayes Extended Estimators for Curved Exponential Families

Authors: Michiko Okudo, Fumiyasu Komaki

Abstract: The Bayesian predictive density has complex representation and does not belong to any finite-dimensional statistical model except for in limited situations. In this paper, we introduce its simple approximate representation employing its projection onto a finite-dimensional exponential family. Its theoretical properties are established parallelly to those of the Bayesian predictive density when the… ▽ More The Bayesian predictive density has complex representation and does not belong to any finite-dimensional statistical model except for in limited situations. In this paper, we introduce its simple approximate representation employing its projection onto a finite-dimensional exponential family. Its theoretical properties are established parallelly to those of the Bayesian predictive density when the model belongs to curved exponential families. It is also demonstrated that the projection asymptotically coincides with the plugin density with the posterior mean of the expectation parameter of the exponential family, which we refer to as the Bayes extended estimator. Information-geometric correspondence indicates that the Bayesian predictive density can be represented as the posterior mean of the infinite-dimensional exponential family. The Kullback--Leibler risk performance of the approximation is demonstrated by numerical simulations and it indicates that the posterior mean of the expectation parameter approaches the Bayesian predictive density as the dimension of the exponential family increases. It also suggests that approximation by projection onto an exponential family of reasonable size is practically advantageous with respect to risk performance and computational cost. △ Less

Submitted 29 October, 2020; v1 submitted 18 June, 2019; originally announced June 2019.

arXiv:1902.10963 [pdf, other]

Learning partially ranked data based on graph regularization

Authors: Kento Nakamura, Keisuke Yano, Fumiyasu Komaki

Abstract: Ranked data appear in many different applications, including voting and consumer surveys. There often exhibits a situation in which data are partially ranked. Partially ranked data is thought of as missing data. This paper addresses parameter estimation for partially ranked data under a (possibly) non-ignorable missing mechanism. We propose estimators for both complete rankings and missing mechani… ▽ More Ranked data appear in many different applications, including voting and consumer surveys. There often exhibits a situation in which data are partially ranked. Partially ranked data is thought of as missing data. This paper addresses parameter estimation for partially ranked data under a (possibly) non-ignorable missing mechanism. We propose estimators for both complete rankings and missing mechanisms together with a simple estimation procedure. Our estimation procedure leverages a graph regularization in conjunction with the Expectation-Maximization algorithm. Our estimation procedure is theoretically guaranteed to have the convergence properties. We reduce a modeling bias by allowing a non-ignorable missing mechanism. In addition, we avoid the inherent complexity within a non-ignorable missing mechanism by introducing a graph regularization. The experimental results demonstrate that the proposed estimators work well under non-ignorable missing mechanisms. △ Less

Submitted 28 February, 2019; originally announced February 2019.

arXiv:1812.06037 [pdf, other]

Minimax Predictive Density for Sparse Count Data

Authors: Keisuke Yano, Ryoya Kaneko, Fumiyasu Komaki

Abstract: This paper discusses predictive densities under the Kullback--Leibler loss for high-dimensional Poisson sequence models under sparsity constraints. Sparsity in count data implies zero-inflation. We present a class of Bayes predictive densities that attain asymptotic minimaxity in sparse Poisson sequence models. We also show that our class with an estimator of unknown sparsity level plugged-in is a… ▽ More This paper discusses predictive densities under the Kullback--Leibler loss for high-dimensional Poisson sequence models under sparsity constraints. Sparsity in count data implies zero-inflation. We present a class of Bayes predictive densities that attain asymptotic minimaxity in sparse Poisson sequence models. We also show that our class with an estimator of unknown sparsity level plugged-in is adaptive in the asymptotically minimax sense. For application, we extend our results to settings with quasi-sparsity and with missing-completely-at-random observations. The simulation studies as well as application to real data illustrate the efficiency of the proposed Bayes predictive densities. △ Less

Submitted 5 September, 2020; v1 submitted 14 December, 2018; originally announced December 2018.

Comments: 49 pages; the supplement is included in pp. 32-49 Accepted for publication in Bernoulli journal

arXiv:1808.07983 [pdf, other]

Analysis of Noise Contrastive Estimation from the Perspective of Asymptotic Variance

Authors: Masatoshi Uehara, Takeru Matsuda, Fumiyasu Komaki

Abstract: There are many models, often called unnormalized models, whose normalizing constants are not calculated in closed form. Maximum likelihood estimation is not directly applicable to unnormalized models. Score matching, contrastive divergence method, pseudo-likelihood, Monte Carlo maximum likelihood, and noise contrastive estimation (NCE) are popular methods for estimating parameters of such models.… ▽ More There are many models, often called unnormalized models, whose normalizing constants are not calculated in closed form. Maximum likelihood estimation is not directly applicable to unnormalized models. Score matching, contrastive divergence method, pseudo-likelihood, Monte Carlo maximum likelihood, and noise contrastive estimation (NCE) are popular methods for estimating parameters of such models. In this paper, we focus on NCE. The estimator derived from NCE is consistent and asymptotically normal because it is an M-estimator. NCE characteristically uses an auxiliary distribution to calculate the normalizing constant in the same spirit of the importance sampling. In addition, there are several candidates as objective functions of NCE. We focus on how to reduce asymptotic variance. First, we propose a method for reducing asymptotic variance by estimating the parameters of the auxiliary distribution. Then, we determine the form of the objective functions, where the asymptotic variance takes the smallest values in the original estimator class and the proposed estimator classes. We further analyze the robustness of the estimator. △ Less

Submitted 23 August, 2018; originally announced August 2018.

arXiv:1708.03751 [pdf, ps, other]

On $\varepsilon$-Admissibility in High Dimension and Nonparametrics

Authors: Keisuke Yano, Fumiyasu Komaki

Abstract: In this paper, we discuss the use of $\varepsilon$-admissibility for estimation in high-dimensional and nonparametric statistical models. The minimax rate of convergence is widely used to compare the performance of estimators in high-dimensional and nonparametric models. However, it often works poorly as a criterion of comparison. In such cases, the addition of comparison by $\varepsilon$-admissib… ▽ More In this paper, we discuss the use of $\varepsilon$-admissibility for estimation in high-dimensional and nonparametric statistical models. The minimax rate of convergence is widely used to compare the performance of estimators in high-dimensional and nonparametric models. However, it often works poorly as a criterion of comparison. In such cases, the addition of comparison by $\varepsilon$-admissibility provides a better outcome. We demonstrate the usefulness of $\varepsilon$-admissibility through high-dimensional Poisson model and Gaussian infinite sequence model, and present noble results. △ Less

Submitted 12 August, 2017; originally announced August 2017.

Comments: 22 pages

arXiv:1706.01252 [pdf, ps, other]

doi 10.1016/j.csda.2019.02.006

Empirical Bayes Matrix Completion

Authors: Takeru Matsuda, Fumiyasu Komaki

Abstract: We develop an empirical Bayes (EB) algorithm for the matrix completion problems. The EB algorithm is motivated from the singular value shrinkage estimator for matrix means by Efron and Morris (1972). Since the EB algorithm is essentially the EM algorithm applied to a simple model, it does not require heuristic parameter tuning other than tolerance. Numerical results demonstrated that the EB algori… ▽ More We develop an empirical Bayes (EB) algorithm for the matrix completion problems. The EB algorithm is motivated from the singular value shrinkage estimator for matrix means by Efron and Morris (1972). Since the EB algorithm is essentially the EM algorithm applied to a simple model, it does not require heuristic parameter tuning other than tolerance. Numerical results demonstrated that the EB algorithm achieves a good trade-off between accuracy and efficiency compared to existing algorithms and that it works particularly well when the difference between the number of rows and columns is large. Application to real data also shows the practical utility of the EB algorithm. △ Less

Submitted 6 June, 2017; v1 submitted 5 June, 2017; originally announced June 2017.

Comments: 15 pages

Journal ref: Computational Statistics & Data Analysis, Vol. 137, pp. 195--210, 2019

arXiv:1609.00940 [pdf, ps, other]

Non-asymptotic Bayesian Minimax Adaptation

Authors: Keisuke Yano, Fumiyasu Komaki

Abstract: This paper studies a Bayesian approach to non-asymptotic minimax adaptation in nonparametric estimation. Estimating an input function on the basis of output functions in a Gaussian white-noise model is discussed. The input function is assumed to be in a Sobolev ellipsoid with an unknown smoothness and an unknown radius. Our purpose in this paper is to present a Bayesian approach attaining minimaxi… ▽ More This paper studies a Bayesian approach to non-asymptotic minimax adaptation in nonparametric estimation. Estimating an input function on the basis of output functions in a Gaussian white-noise model is discussed. The input function is assumed to be in a Sobolev ellipsoid with an unknown smoothness and an unknown radius. Our purpose in this paper is to present a Bayesian approach attaining minimaxity up to a universal constant without any knowledge regarding the smoothness and the radius. Our Bayesian approach provides not only a rate-exact minimax adaptive estimator in large sample asymptotics but also a risk bound for the Bayes estimator quantifying the effects of both the smoothness and the ratio of the squared radius to the noise variance, where the smoothness and the ratio are the key parameters to describe the minimax risk in this model. Application to non-parametric regression models is also discussed. △ Less

Submitted 29 August, 2018; v1 submitted 4 September, 2016; originally announced September 2016.

Comments: 30pages

arXiv:1606.07896 [pdf, ps, other]

doi 10.1214/17-EJS1312

Asymptotically Minimax Prediction in Infinite Sequence Models

Authors: Keisuke Yano, Fumiyasu Komaki

Abstract: We study asymptotically minimax predictive distributions in an infinite sequence model. First, we discuss the connection between the prediction in the infinite sequence model and the prediction in the function model. Second, we construct an asymptotically minimax predictive distribution when the parameter space is a known ellipsoid. We show that the Bayesian predictive distribution based on the Ga… ▽ More We study asymptotically minimax predictive distributions in an infinite sequence model. First, we discuss the connection between the prediction in the infinite sequence model and the prediction in the function model. Second, we construct an asymptotically minimax predictive distribution when the parameter space is a known ellipsoid. We show that the Bayesian predictive distribution based on the Gaussian prior distribution is asymptotically minimax in the ellipsoid. Third, we construct an asymptotically minimax predictive distribution for any Sobolev ellipsoid. We show that the Bayesian predictive distribution based on the product of Stein's priors is asymptotically minimax for any Sobolev ellipsoid. Finally, we present an efficient sampling method from the proposed Bayesian predictive distribution. △ Less

Submitted 19 July, 2017; v1 submitted 25 June, 2016; originally announced June 2016.

Comments: Accepted for publication in Electronic Journal of Statistics

Journal ref: Electronic Journal of Statistics, Volume 11, Number 2 (2017), 3165-3195

arXiv:1503.07643 [pdf, ps, other]

doi 10.1214/14-BA886

Asymptotic Properties of Bayesian Predictive Densities When the Distributions of Data and Target Variables are Different

Authors: Fumiyasu Komaki

Abstract: Bayesian predictive densities when the observed data $x$ and the target variable $y$ to be predicted have different distributions are investigated by using the framework of information geometry. The performance of predictive densities is evaluated by the Kullback--Leibler divergence. The parametric models are formulated as Riemannian manifolds. In the conventional setting in which $x$ and $y$ have… ▽ More Bayesian predictive densities when the observed data $x$ and the target variable $y$ to be predicted have different distributions are investigated by using the framework of information geometry. The performance of predictive densities is evaluated by the Kullback--Leibler divergence. The parametric models are formulated as Riemannian manifolds. In the conventional setting in which $x$ and $y$ have the same distribution, the Fisher--Rao metric and the Jeffreys prior play essential roles. In the present setting in which $x$ and $y$ have different distributions, a new metric, which we call the predictive metric, constructed by using the Fisher information matrices of $x$ and $y$, and the volume element based on the predictive metric play the corresponding roles. It is shown that Bayesian predictive densities based on priors constructed by using non-constant positive superharmonic functions with respect to the predictive metric asymptotically dominate those based on the volume element prior of the predictive metric. △ Less

Submitted 26 March, 2015; originally announced March 2015.

Comments: Published at http://dx.doi.org/10.1214/14-BA886 in the Bayesian Analysis (http://projecteuclid.org/euclid.ba) by the International Society of Bayesian Analysis (http://bayesian.org/)

Report number: VTeX-BA-BA886

Journal ref: Bayesian Analysis 2015, Vol. 10, No. 1, 31-51

arXiv:1503.02390 [pdf, ps, other]

doi 10.5705/ss.202015.0380

Information criteria for multistep ahead predictions

Authors: Keisuke Yano, Fumiyasu Komaki

Abstract: We propose an information criterion for multistep ahead predictions. It is also used for extrapolations. For the derivation, we consider multistep ahead predictions under local misspecification. In the prediction, we show that Bayesian predictive distributions asymptotically have smaller Kullback--Leibler risks than plug-in predictive distributions. From the results, we construct an information cr… ▽ More We propose an information criterion for multistep ahead predictions. It is also used for extrapolations. For the derivation, we consider multistep ahead predictions under local misspecification. In the prediction, we show that Bayesian predictive distributions asymptotically have smaller Kullback--Leibler risks than plug-in predictive distributions. From the results, we construct an information criterion for multistep ahead predictions by using an asymptotically unbiased estimator of the Kullback--Leibler risk of Bayesian predictive distributions. We show the effectiveness of the proposed information criterion throughout the numerical experiments. △ Less

Submitted 9 March, 2015; originally announced March 2015.

Journal ref: Statistica Sinica 27 (2017), 1205-1223

arXiv:1412.7794 [pdf, ps, other]

doi 10.1109/TIT.2015.2496581

Relations Between the Conditional Normalized Maximum Likelihood Distributions and the Latent Information Priors

Authors: Mutsuki Kojima, Fumiyasu Komaki

Abstract: We reveal the relations between the conditional normalized maximum likelihood (CNML) distributions and Bayesian predictive densities based on the latent information priors (LIPs). In particular, CNML3, which is one type of CNML distributions, is investigated. The Bayes projection of a predictive density, which is an information projection of the predictive density on a set of Bayesian predictive d… ▽ More We reveal the relations between the conditional normalized maximum likelihood (CNML) distributions and Bayesian predictive densities based on the latent information priors (LIPs). In particular, CNML3, which is one type of CNML distributions, is investigated. The Bayes projection of a predictive density, which is an information projection of the predictive density on a set of Bayesian predictive densities, is considered. We prove that the sum of the Bayes projection divergence of CNML3 and the conditional mutual information is asymptotically constant. This result implies that the Bayes projection of CNML3 (BPCNML3) is asymptotically identical to the Bayesian predictive density based on LIP. In addition, under some stronger assumptions, we show that BPCNML3 exactly coincides with the Bayesian predictive density based on LIP. △ Less

Submitted 25 December, 2014; originally announced December 2014.

Journal ref: IEEE Transactions on Information Theory, Volume: 62, Issue 1, 2016

arXiv:1408.2951 [pdf, ps, other]

doi 10.1093/biomet/asv036

Singular Value Shrinkage Priors for Bayesian Prediction

Authors: Takeru Matsuda, Fumiyasu Komaki

Abstract: We develop singular value shrinkage priors for the mean matrix parameters in the matrix-variate normal model with known covariance matrices. Our priors are superharmonic and put more weight on matrices with smaller singular values. They are a natural generalization of the Stein prior. Bayes estimators and Bayesian predictive densities based on our priors are minimax and dominate those based on the… ▽ More We develop singular value shrinkage priors for the mean matrix parameters in the matrix-variate normal model with known covariance matrices. Our priors are superharmonic and put more weight on matrices with smaller singular values. They are a natural generalization of the Stein prior. Bayes estimators and Bayesian predictive densities based on our priors are minimax and dominate those based on the uniform prior in finite samples. In particular, our priors work well when the true value of the parameter has low rank. △ Less

Submitted 2 April, 2021; v1 submitted 13 August, 2014; originally announced August 2014.

Journal ref: Biometrika, Volume 102, Issue 4, Pages 843--854, 2015

arXiv:1406.2100 [pdf, ps, other]

doi 10.5705/ss.202014.0161

Determinantal Point Process Priors for Bayesian Variable Selection in Linear Regression

Authors: Mutsuki Kojima, Fumiyasu Komaki

Abstract: We propose discrete determinantal point processes (DPPs) for priors on the model parameter in Bayesian variable selection. By our variable selection method, collinear predictors are less likely to be selected simultaneously because of the repulsion property of discrete DPPs. Three types of DPP priors are proposed. We show the efficiency of the proposed priors through numerical experiments and appl… ▽ More We propose discrete determinantal point processes (DPPs) for priors on the model parameter in Bayesian variable selection. By our variable selection method, collinear predictors are less likely to be selected simultaneously because of the repulsion property of discrete DPPs. Three types of DPP priors are proposed. We show the efficiency of the proposed priors through numerical experiments and applications to collinear datasets. △ Less

Submitted 9 June, 2014; originally announced June 2014.

Journal ref: Statistica Sinica 26 (2016), 97-117

arXiv:1401.8080 [pdf, ps, other]

doi 10.1016/j.jmva.2015.06.008

Simultaneous prediction for independent Poisson processes with different durations

Authors: Fumiyasu Komaki

Abstract: Simultaneous predictive densities for independent Poisson observables are investigated. The observed data and the target variables to be predicted are independently distributed according to different Poisson distributions parametrized by the same parameter. The performance of predictive densities is evaluated by the Kullback-Leibler divergence. A class of prior distributions depending on the objec… ▽ More Simultaneous predictive densities for independent Poisson observables are investigated. The observed data and the target variables to be predicted are independently distributed according to different Poisson distributions parametrized by the same parameter. The performance of predictive densities is evaluated by the Kullback-Leibler divergence. A class of prior distributions depending on the objective of prediction is introduced. A Bayesian predictive density based on a prior in this class dominates the Bayesian predictive density based on the Jeffreys prior. △ Less

Submitted 2 February, 2014; v1 submitted 31 January, 2014; originally announced January 2014.

Comments: 19 pages

Journal ref: Journal of Multivariate Analysis, Volume 141, 2015, Pages 35-48

arXiv:1112.0818 [pdf, ps, other]

doi 10.1214/12-EJS700

Asymptotically minimax Bayesian predictive densities for multinomial models

Authors: Fumiyasu Komaki

Abstract: One-step ahead prediction for the multinomial model is considered. The performance of a predictive density is evaluated by the average Kullback-Leibler divergence from the true density to the predictive density. Asymptotic approximations of risk functions of Bayesian predictive densities based on Dirichlet priors are obtained. It is shown that a Bayesian predictive density based on a specific Diri… ▽ More One-step ahead prediction for the multinomial model is considered. The performance of a predictive density is evaluated by the average Kullback-Leibler divergence from the true density to the predictive density. Asymptotic approximations of risk functions of Bayesian predictive densities based on Dirichlet priors are obtained. It is shown that a Bayesian predictive density based on a specific Dirichlet prior is asymptotically minimax. The asymptotically minimax prior is different from known objective priors such as the Jeffreys prior or the uniform prior. △ Less

Submitted 4 December, 2011; originally announced December 2011.

Journal ref: Electron. J. Statist. 6: 934-957 (2012)

arXiv:1009.5072 [pdf, ps, other]

doi 10.1016/j.jspi.2011.06.009

Bayesian Predictive Densities Based on Latent Information Priors

Authors: Fumiyasu Komaki

Abstract: Construction methods for prior densities are investigated from a predictive viewpoint. Predictive densities for future observables are constructed by using observed data. The simultaneous distribution of future observables and observed data is assumed to belong to a parametric submodel of a multinomial model. Future observables and data are possibly dependent. The discrepancy of a predictive densi… ▽ More Construction methods for prior densities are investigated from a predictive viewpoint. Predictive densities for future observables are constructed by using observed data. The simultaneous distribution of future observables and observed data is assumed to belong to a parametric submodel of a multinomial model. Future observables and data are possibly dependent. The discrepancy of a predictive density to the true conditional density of future observables given observed data is evaluated by the Kullback-Leibler divergence. It is proved that limits of Bayesian predictive densities form an essentially complete class. Latent information priors are defined as priors maximizing the conditional mutual information between the parameter and the future observables given the observed data. Minimax predictive densities are constructed as limits of Bayesian predictive densities based on prior sequences converging to the latent information priors. △ Less

Submitted 26 September, 2010; originally announced September 2010.

Journal ref: Journal of Statistical Planning and Inference, Volume 141, Issue 12, 2011, Pages 3705-3715

arXiv:math/0701583 [pdf, ps, other]

Bayesian shrinkage prediction for the regression problem

Authors: Kei Kobayashi, Fumiyasu Komaki

Abstract: We consider Bayesian shrinkage predictions for the Normal regression problem under the frequentist Kullback-Leibler risk function. Firstly, we consider the multivariate Normal model with an unknown mean and a known covariance. While the unknown mean is fixed, the covariance of future samples can be different from training samples. We show that the Bayesian predictive distribution based on the… ▽ More We consider Bayesian shrinkage predictions for the Normal regression problem under the frequentist Kullback-Leibler risk function. Firstly, we consider the multivariate Normal model with an unknown mean and a known covariance. While the unknown mean is fixed, the covariance of future samples can be different from training samples. We show that the Bayesian predictive distribution based on the uniform prior is dominated by that based on a class of priors if the prior distributions for the covariance and future covariance matrices are rotation invariant. Then, we consider a class of priors for the mean parameters depending on the future covariance matrix. With such a prior, we can construct a Bayesian predictive distribution dominating that based on the uniform prior. Lastly, applying this result to the prediction of response variables in the Normal linear regression model, we show that there exists a Bayesian predictive distribution dominating that based on the uniform prior. Minimaxity of these Bayesian predictions follows from these results. △ Less

Submitted 20 January, 2007; originally announced January 2007.

arXiv:math/0607021 [pdf, ps, other]

doi 10.1214/009053606000000010

Shrinkage priors for Bayesian prediction

Authors: Fumiyasu Komaki

Abstract: We investigate shrinkage priors for constructing Bayesian predictive distributions. It is shown that there exist shrinkage predictive distributions asymptotically dominating Bayesian predictive distributions based on the Jeffreys prior or other vague priors if the model manifold satisfies some differential geometric conditions. Kullback--Leibler divergence from the true distribution to a predict… ▽ More We investigate shrinkage priors for constructing Bayesian predictive distributions. It is shown that there exist shrinkage predictive distributions asymptotically dominating Bayesian predictive distributions based on the Jeffreys prior or other vague priors if the model manifold satisfies some differential geometric conditions. Kullback--Leibler divergence from the true distribution to a predictive distribution is adopted as a loss function. Conformal transformations of model manifolds corresponding to vague priors are introduced. We show several examples where shrinkage predictive distributions dominate Bayesian predictive distributions based on vague priors. △ Less

Submitted 1 July, 2006; originally announced July 2006.

Comments: Published at http://dx.doi.org/10.1214/009053606000000010 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS0078 MSC Class: 62F15; 62C15 (Primary)

Journal ref: Annals of Statistics 2006, Vol. 34, No. 2, 808-819

arXiv:quant-ph/0510176 [pdf, ps, other]

Bayesian prediction of the Gaussian states from n sample

Authors: F. Tanaka, F. Komaki

Abstract: Recently quantum prediction problem was proposed in the Bayesian framework. It is shown that Bayesian predictive density operators are the best predictive density operators when we evaluate them by using the average relative entropy based on a prior.As an illustrative example, we treat the Gaussian states family adopting the Gaussian distribution as a prior and give the Bayesian predictive densi… ▽ More Recently quantum prediction problem was proposed in the Bayesian framework. It is shown that Bayesian predictive density operators are the best predictive density operators when we evaluate them by using the average relative entropy based on a prior.As an illustrative example, we treat the Gaussian states family adopting the Gaussian distribution as a prior and give the Bayesian predictive density operator with the heterodyne measurement fixed. We show that it is better than the plug-in predictive density operator based on the maximum likelihood estimate by calculating each average relative entropy. △ Less

Submitted 12 May, 2006; v1 submitted 23 October, 2005; originally announced October 2005.

Comments: 5 pages, no figures. Presented in QIT13; Added appendix

arXiv:math/0510558 [pdf, ps, other]

doi 10.1007/s13171-011-0005-1

Asymptotic Expansion of the Risk Difference of the Bayesian Spectral Density in the ARMA model

Authors: Fuyuhiko Tanaka, Fumiyasu Komaki

Abstract: The autoregressive moving average (ARMA) model is one of the most important models in time series analysis.We consider the Bayesian estimation of an unknown spectral density in the ARMA model.In the i.i.d. cases, Komaki showed that Bayesian predictive densities based on a superharmonic prior asymptotically dominate those based on the Jeffreys prior.It is shown by using the asymptotic expansion o… ▽ More The autoregressive moving average (ARMA) model is one of the most important models in time series analysis.We consider the Bayesian estimation of an unknown spectral density in the ARMA model.In the i.i.d. cases, Komaki showed that Bayesian predictive densities based on a superharmonic prior asymptotically dominate those based on the Jeffreys prior.It is shown by using the asymptotic expansion of the risk difference.We obtain the corresponding result in the ARMA model. △ Less

Submitted 26 October, 2005; originally announced October 2005.

Comments: 23 pages

Report number: METR 2005-31

Journal ref: Sankhya A 73, 162-184 (2011)

arXiv:math/0410094 [pdf, ps, other]

doi 10.1214/009053604000000445

Simultaneous prediction of independent Poisson observables

Authors: Fumiyasu Komaki

Abstract: Simultaneous predictive distributions for independent Poisson observables are investigated. A class of improper prior distributions for Poisson means is introduced. The Bayesian predictive distributions based on priors from the introduced class are shown to be admissible under the Kullback-Leibler loss. A Bayesian predictive distribution based on a prior in this class dominates the Bayesian pred… ▽ More Simultaneous predictive distributions for independent Poisson observables are investigated. A class of improper prior distributions for Poisson means is introduced. The Bayesian predictive distributions based on priors from the introduced class are shown to be admissible under the Kullback-Leibler loss. A Bayesian predictive distribution based on a prior in this class dominates the Bayesian predictive distribution based on the Jeffreys prior. △ Less

Submitted 5 October, 2004; originally announced October 2004.

Comments: Published by the Institute of Mathematical Statistics (http://www.imstat.org) in the Annals of Statistics (http://www.imstat.org/aos/) at http://dx.doi.org/10.1214/009053604000000445

Report number: IMS-AOS-AOS229 MSC Class: 62F15; 62C15 (Primary)

Journal ref: Annals of Statistics 2004, Vol. 32, No. 4, 1744-1769

Showing 1–32 of 32 results for author: Komaki, F