Search | arXiv e-print repository

Jeffreys-prior penalty for high-dimensional logistic regression: A conjecture about aggregate bias

Authors: Ioannis Kosmidis, Patrick Zietkiewicz

Abstract: Firth (1993, Biometrika) shows that the maximum Jeffreys' prior penalized likelihood estimator in logistic regression has asymptotic bias decreasing with the square of the number of observations when the number of parameters is fixed, which is an order faster than the typical rate from maximum likelihood. The widespread use of that estimator in applied work is supported by the results in Kosmidis… ▽ More Firth (1993, Biometrika) shows that the maximum Jeffreys' prior penalized likelihood estimator in logistic regression has asymptotic bias decreasing with the square of the number of observations when the number of parameters is fixed, which is an order faster than the typical rate from maximum likelihood. The widespread use of that estimator in applied work is supported by the results in Kosmidis and Firth (2021, Biometrika), who show that it takes finite values, even in cases where the maximum likelihood estimate does not exist. Kosmidis and Firth (2021, Biometrika) also provide empirical evidence that the estimator has good bias properties in high-dimensional settings where the number of parameters grows asymptotically linearly but slower than the number of observations. We design and carry out a large-scale computer experiment covering a wide range of such high-dimensional settings and produce strong empirical evidence for a simple rescaling of the maximum Jeffreys' prior penalized likelihood estimator that delivers high accuracy in signal recovery in the presence of an intercept parameter. The rescaled estimator is effective even in cases where estimates from maximum likelihood and other recently proposed corrective methods based on approximate message passing do not exist. △ Less

Submitted 19 November, 2023; originally announced November 2023.

MSC Class: 62J12; 62F10; 62F12

arXiv:2311.06076 [pdf, ps, other]

Bayesian Tensor Factorisations for Time Series of Counts

Authors: Zhongzhen Wang, Petros Dellaportas, Ioannis Kosmidis

Abstract: We propose a flexible nonparametric Bayesian modelling framework for multivariate time series of count data based on tensor factorisations. Our models can be viewed as infinite state space Markov chains of known maximal order with non-linear serial dependence through the introduction of appropriate latent variables. Alternatively, our models can be viewed as Bayesian hierarchical models with condi… ▽ More We propose a flexible nonparametric Bayesian modelling framework for multivariate time series of count data based on tensor factorisations. Our models can be viewed as infinite state space Markov chains of known maximal order with non-linear serial dependence through the introduction of appropriate latent variables. Alternatively, our models can be viewed as Bayesian hierarchical models with conditionally independent Poisson distributed observations. Inference about the important lags and their complex interactions is achieved via MCMC. When the observed counts are large, we deal with the resulting computational complexity of Bayesian inference via a two-step inferential strategy based on an initial analysis of a training set of the data. Our methodology is illustrated using simulation experiments and analysis of real-world data. △ Less

Submitted 10 November, 2023; originally announced November 2023.

MSC Class: 62F15; 62M10; 62G05; 62P10

arXiv:2307.07342 [pdf, other]

Bounded-memory adjusted scores estimation in generalized linear models with large data sets

Authors: Patrick Zietkiewicz, Ioannis Kosmidis

Abstract: The widespread use of maximum Jeffreys'-prior penalized likelihood in binomial-response generalized linear models, and in logistic regression, in particular, are supported by the results of Kosmidis and Firth (2021, Biometrika), who show that the resulting estimates are always finite-valued, even in cases where the maximum likelihood estimates are not, which is a practical issue regardless of the… ▽ More The widespread use of maximum Jeffreys'-prior penalized likelihood in binomial-response generalized linear models, and in logistic regression, in particular, are supported by the results of Kosmidis and Firth (2021, Biometrika), who show that the resulting estimates are always finite-valued, even in cases where the maximum likelihood estimates are not, which is a practical issue regardless of the size of the data set. In logistic regression, the implied adjusted score equations are formally bias-reducing in asymptotic frameworks with a fixed number of parameters and appear to deliver a substantial reduction in the persistent bias of the maximum likelihood estimator in high-dimensional settings where the number of parameters grows asymptotically as a proportion of the number of observations. In this work, we develop and present two new variants of iteratively reweighted least squares for estimating generalized linear models with adjusted score equations for mean bias reduction and maximization of the likelihood penalized by a positive power of the Jeffreys-prior penalty, which eliminate the requirement of storing $O(n)$ quantities in memory, and can operate with data sets that exceed computer memory or even hard drive capacity. We achieve that through incremental QR decompositions, which enable IWLS iterations to have access only to data chunks of predetermined size. Both procedures can also be readily adapted to fit generalized linear models when distinct parts of the data is stored across different sites and, due to privacy concerns, cannot be fully transferred across sites. We assess the procedures through a real-data application with millions of observations. △ Less

Submitted 3 June, 2024; v1 submitted 14 July, 2023; originally announced July 2023.

MSC Class: 62J12; 62F10; 62F12

arXiv:2206.02561 [pdf, other]

Maximum softly-penalized likelihood for mixed effects logistic regression

Authors: Philipp Sterzinger, Ioannis Kosmidis

Abstract: Maximum likelihood estimation in logistic regression with mixed effects is known to often result in estimates on the boundary of the parameter space. Such estimates, which include infinite values for fixed effects and singular or infinite variance components, can cause havoc to numerical estimation procedures and inference. We introduce an appropriately scaled additive penalty to the log-likelihoo… ▽ More Maximum likelihood estimation in logistic regression with mixed effects is known to often result in estimates on the boundary of the parameter space. Such estimates, which include infinite values for fixed effects and singular or infinite variance components, can cause havoc to numerical estimation procedures and inference. We introduce an appropriately scaled additive penalty to the log-likelihood function, or an approximation thereof, which penalizes the fixed effects by the Jeffreys' invariant prior for the model with no random effects and the variance components by a composition of negative Huber loss functions. The resulting maximum penalized likelihood estimates are shown to lie in the interior of the parameter space. Appropriate scaling of the penalty guarantees that the penalization is soft enough to preserve the optimal asymptotic properties expected by the maximum likelihood estimator, namely consistency, asymptotic normality, and Cramér-Rao efficiency. Our choice of penalties and scaling factor preserves equivariance of the fixed effects estimates under linear transformation of the model parameters, such as contrasts. Maximum softly-penalized likelihood is compared to competing approaches on two real-data examples, and through comprehensive simulation studies that illustrate its superior finite sample performance. △ Less

Submitted 2 February, 2023; v1 submitted 6 June, 2022; originally announced June 2022.

Comments: 28 pages, 5 figures

MSC Class: 62J05; 62J12; 62F10; 62F12; 62F03

arXiv:2112.02621 [pdf, other]

Mean and median bias reduction: A concise review and application to adjacent-categories logit models

Authors: Ioannis Kosmidis

Abstract: The estimation of categorical response models using bias-reducing adjusted score equations has seen extensive theoretical research and applied use. The resulting estimates have been found to have superior frequentist properties to what maximum likelihood generally delivers and to be finite, even in cases where the maximum likelihood estimates are infinite. We briefly review mean and median bias re… ▽ More The estimation of categorical response models using bias-reducing adjusted score equations has seen extensive theoretical research and applied use. The resulting estimates have been found to have superior frequentist properties to what maximum likelihood generally delivers and to be finite, even in cases where the maximum likelihood estimates are infinite. We briefly review mean and median bias reduction of maximum likelihood estimates via adjusted score equations in an illustration-driven way, and discuss their particular equivariance properties under parameter transformations. We then apply mean and median bias reduction to adjacent-categories logit models for ordinal responses. We show how ready bias reduction procedures for Poisson log-linear models can be used for mean and median bias reduction in adjacent-categories logit models with proportional odds and mean bias-reduced estimation in models with non-proportional odds. As in binomial logistic regression, the reduced-bias estimates are found to be finite even in cases where the maximum likelihood estimates are infinite. We also use the approximation of the bias of transformations of mean bias-reduced estimators to correct for the mean bias of model-based ordinal superiority measures. All developments are motivated and illustrated using real-data case studies and simulations △ Less

Submitted 24 January, 2022; v1 submitted 5 December, 2021; originally announced December 2021.

MSC Class: 62J12; 62F03; 62F12

arXiv:2105.14574 [pdf, other]

Scalable Marked Point Processes for Exchangeable and Non-Exchangeable Event Sequences

Authors: Aristeidis Panos, Ioannis Kosmidis, Petros Dellaportas

Abstract: We adopt the interpretability offered by a parametric, Hawkes-process-inspired conditional probability mass function for the marks and apply variational inference techniques to derive a general and scalable inferential framework for marked point processes. The framework can handle both exchangeable and non-exchangeable event sequences with minimal tuning and without any pre-training. This contrast… ▽ More We adopt the interpretability offered by a parametric, Hawkes-process-inspired conditional probability mass function for the marks and apply variational inference techniques to derive a general and scalable inferential framework for marked point processes. The framework can handle both exchangeable and non-exchangeable event sequences with minimal tuning and without any pre-training. This contrasts with many parametric and non-parametric state-of-the-art methods that typically require pre-training and/or careful tuning, and can only handle exchangeable event sequences. The framework's competitive computational and predictive performance against other state-of-the-art methods are illustrated through real data experiments. Its attractiveness for large-scale applications is demonstrated through a case study involving all events occurring in an English Premier League season. △ Less

Submitted 19 February, 2023; v1 submitted 30 May, 2021; originally announced May 2021.

Comments: accepted at AISTATS-2022

arXiv:2103.04647 [pdf, other]

Flexible marked spatio-temporal point processes with applications to event sequences from association football

Authors: Santhosh Narayanan, Ioannis Kosmidis, Petros Dellaportas

Abstract: We develop a new family of marked point processes by focusing the characteristic properties of marked Hawkes processes exclusively to the space of marks, providing the freedom to specify a different model for the occurrence times. This is possible through the decomposition of the joint distribution of marks and times that allows to separately specify the conditional distribution of marks given the… ▽ More We develop a new family of marked point processes by focusing the characteristic properties of marked Hawkes processes exclusively to the space of marks, providing the freedom to specify a different model for the occurrence times. This is possible through the decomposition of the joint distribution of marks and times that allows to separately specify the conditional distribution of marks given the filtration of the process and the current time. We develop a Bayesian framework for the inference and prediction from this family of marked point processes that can naturally accommodate process and point-specific covariate information to drive cross-excitations, offering wide flexibility and applicability in the modelling of real-world processes. The framework is used here for the modelling of in-game event sequences from association football, resulting not only in inferences about previously unquantified characteristics of game dynamics and extraction of event-specific team abilities, but also in predictions for the occurrence of events of interest, such as goals, corners or fouls in a specified interval of time. △ Less

Submitted 17 October, 2022; v1 submitted 8 March, 2021; originally announced March 2021.

MSC Class: 62F15; 62J02; 62M99; 62P99

arXiv:2101.07141 [pdf, other]

Bias Reduction as a Remedy to the Consequences of Infinite Estimates in Poisson and Tobit Regression

Authors: Susanne Köll, Ioannis Kosmidis, Christian Kleiber, Achim Zeileis

Abstract: Data separation is a well-studied phenomenon that can cause problems in the estimation and inference from binary response models. Complete or quasi-complete separation occurs when there is a combination of regressors in the model whose value can perfectly predict one or both outcomes. In such cases, and such cases only, the maximum likelihood estimates and the corresponding standard errors are inf… ▽ More Data separation is a well-studied phenomenon that can cause problems in the estimation and inference from binary response models. Complete or quasi-complete separation occurs when there is a combination of regressors in the model whose value can perfectly predict one or both outcomes. In such cases, and such cases only, the maximum likelihood estimates and the corresponding standard errors are infinite. It is less widely known that the same can happen in further microeconometric models. One of the few works in the area is Santos Silva and Tenreyro (2010) who note that the finiteness of the maximum likelihood estimates in Poisson regression depends on the data configuration and propose a strategy to detect and overcome the consequences of data separation. However, their approach can lead to notable bias on the parameter estimates when the regressors are correlated. We illustrate how bias-reducing adjustments to the maximum likelihood score equations can overcome the consequences of separation in Poisson and Tobit regression models. △ Less

Submitted 18 January, 2021; originally announced January 2021.

Comments: 8 pages, 8 figures

arXiv:2001.03786 [pdf, other]

Empirical bias-reducing adjustments to estimating functions

Authors: Ioannis Kosmidis, Nicola Lunardon

Abstract: We develop a novel and general framework for reduced-bias $M$-estimation from asymptotically unbiased estimating functions. The framework relies on an empirical approximation of the bias by a function of derivatives of estimating function contributions. Reduced-bias $M$-estimation operates either implicitly, by solving empirically-adjusted estimating equations, or explicitly, by subtracting the es… ▽ More We develop a novel and general framework for reduced-bias $M$-estimation from asymptotically unbiased estimating functions. The framework relies on an empirical approximation of the bias by a function of derivatives of estimating function contributions. Reduced-bias $M$-estimation operates either implicitly, by solving empirically-adjusted estimating equations, or explicitly, by subtracting the estimated bias from the original $M$-estimates, and applies to models that are partially- or fully-specified, with either likelihoods or other surrogate objectives. Automatic differentiation can be used to abstract away the only algebra required to implement reduced-bias $M$-estimation. As a result, the bias reduction methods we introduce have markedly broader applicability with more straightforward implementation and less algebraic or computational effort than other established bias-reduction methods that require resampling or evaluation of expectations of products of log-likelihood derivatives. If $M$-estimation is by maximizing an objective, then there always exists a bias-reducing penalized objective. That penalized objective relates closely to information criteria for model selection, and can be further enhanced with plug-in penalties to deliver reduced-bias $M$-estimates with extra properties, like finiteness in models for categorical data. The reduced-bias $M$-estimators have the same asymptotic distribution as the original $M$-estimators, and, hence, standard procedures for inference and model selection apply unaltered with the improved estimates. We demonstrate and assess the properties of reduced-bias $M$-estimation in well-used, prominent modelling settings of varying complexity. △ Less

Submitted 9 August, 2023; v1 submitted 11 January, 2020; originally announced January 2020.

MSC Class: 62F10; 62F12; 62J12

arXiv:1909.07123 [pdf, other]

Davidson-Luce model for multi-item choice with ties

Authors: David Firth, Ioannis Kosmidis, Heather Turner

Abstract: This paper introduces a natural extension of the pair-comparison-with-ties model of Davidson (1970, J. Amer. Statist. Assoc.), to allow for ties when more than two items are compared. Properties of the new model are discussed. It is found that this "Davidson-Luce" model retains the many appealing features of Davidson's solution, while extending the scope of application substantially beyond the dom… ▽ More This paper introduces a natural extension of the pair-comparison-with-ties model of Davidson (1970, J. Amer. Statist. Assoc.), to allow for ties when more than two items are compared. Properties of the new model are discussed. It is found that this "Davidson-Luce" model retains the many appealing features of Davidson's solution, while extending the scope of application substantially beyond the domain of pair-comparison data. The model introduced here already underpins the handling of tied rankings in the "PlackettLuce" R package. △ Less

Submitted 16 September, 2019; originally announced September 2019.

Comments: 11 pages, including Appendix with example R code

MSC Class: 62J15 (Primary) 62J12 (Secondary)

arXiv:1812.01938 [pdf, other]

Jeffreys-prior penalty, finiteness and shrinkage in binomial-response generalized linear models

Authors: Ioannis Kosmidis, David Firth

Abstract: Penalization of the likelihood by Jeffreys' invariant prior, or by a positive power thereof, is shown to produce finite-valued maximum penalized likelihood estimates in a broad class of binomial generalized linear models. The class of models includes logistic regression, where the Jeffreys-prior penalty is known additionally to reduce the asymptotic bias of the maximum likelihood estimator; and al… ▽ More Penalization of the likelihood by Jeffreys' invariant prior, or by a positive power thereof, is shown to produce finite-valued maximum penalized likelihood estimates in a broad class of binomial generalized linear models. The class of models includes logistic regression, where the Jeffreys-prior penalty is known additionally to reduce the asymptotic bias of the maximum likelihood estimator; and also models with other commonly used link functions such as probit and log-log. Shrinkage towards equiprobability across observations, relative to the maximum likelihood estimator, is established theoretically and is studied through illustrative examples. Some implications of finiteness and shrinkage for inference are discussed, particularly when inference is based on Wald-type procedures. A widely applicable procedure is developed for computation of maximum penalized likelihood estimates, by using repeated maximum likelihood fits with iteratively adjusted binomial responses and totals. These theoretical results and methods underpin the increasingly widespread use of reduced-bias and similarly penalized binomial regression models in many applied fields. △ Less

Submitted 23 March, 2020; v1 submitted 5 December, 2018; originally announced December 2018.

MSC Class: 62J12; 62F10; 62F12; 62F03

arXiv:1810.12068 [pdf, other]

Modelling rankings in R: the PlackettLuce package

Authors: Heather L. Turner, Jacob van Etten, David Firth, Ioannis Kosmidis

Abstract: This paper presents the R package PlackettLuce, which implements a generalization of the Plackett-Luce model for rankings data. The generalization accommodates both ties (of arbitrary order) and partial rankings (complete rankings of subsets of items). By default, the implementation adds a set of pseudo-comparisons with a hypothetical item, ensuring that the underlying network of wins and losses b… ▽ More This paper presents the R package PlackettLuce, which implements a generalization of the Plackett-Luce model for rankings data. The generalization accommodates both ties (of arbitrary order) and partial rankings (complete rankings of subsets of items). By default, the implementation adds a set of pseudo-comparisons with a hypothetical item, ensuring that the underlying network of wins and losses between items is always strongly connected. In this way, the worth of each item always has a finite maximum likelihood estimate, with finite standard error. The use of pseudo-comparisons also has a regularization effect, shrinking the estimated parameters towards equal item worth. In addition to standard methods for model summary, PlackettLuce provides a method to compute quasi standard errors for the item parameters. This provides the basis for comparison intervals that do not change with the choice of identifiability constraint placed on the item parameters. Finally, the package provides a method for model-based partitioning using covariates whose values vary between rankings, enabling the identification of subgroups of judges or settings that have different item worths. The features of the package are demonstrated through application to classic and novel data sets. △ Less

Submitted 14 December, 2019; v1 submitted 29 October, 2018; originally announced October 2018.

Comments: In v2: review of software implementing alternative models to Plackett-Luce; comparison of algorithms provided by the PlackettLuce package; further examples of rankings where the underlying win-loss network is not strongly connected. In addition, general editing to improve organisation and clarity. In v3: corrected headings Table 4, minor edits

arXiv:1807.01623 [pdf, other]

Modeling outcomes of soccer matches

Authors: Alkeos Tsokos, Santhosh Narayanan, Ioannis Kosmidis, Gianluca Baio, Mihai Cucuringu, Gavin Whitaker, Franz J. Király

Abstract: We compare various extensions of the Bradley-Terry model and a hierarchical Poisson log-linear model in terms of their performance in predicting the outcome of soccer matches (win, draw, or loss). The parameters of the Bradley-Terry extensions are estimated by maximizing the log-likelihood, or an appropriately penalized version of it, while the posterior densities of the parameters of the hierarch… ▽ More We compare various extensions of the Bradley-Terry model and a hierarchical Poisson log-linear model in terms of their performance in predicting the outcome of soccer matches (win, draw, or loss). The parameters of the Bradley-Terry extensions are estimated by maximizing the log-likelihood, or an appropriately penalized version of it, while the posterior densities of the parameters of the hierarchical Poisson log-linear model are approximated using integrated nested Laplace approximations. The prediction performance of the various modeling approaches is assessed using a novel, context-specific framework for temporal validation that is found to deliver accurate estimates of the test error. The direct modeling of outcomes via the various Bradley-Terry extensions and the modeling of match scores using the hierarchical Poisson log-linear model demonstrate similar behavior in terms of predictive performance. △ Less

Submitted 3 August, 2018; v1 submitted 4 July, 2018; originally announced July 2018.

arXiv:1804.04085 [pdf, other]

Mean and median bias reduction in generalized linear models

Authors: Ioannis Kosmidis, Euloge Clovis Kenne Pagui, Nicola Sartori

Abstract: This paper presents an integrated framework for estimation and inference from generalized linear models using adjusted score equations that result in mean and median bias reduction. The framework unifies theoretical and methodological aspects of past research on mean bias reduction and accommodates, in a natural way, new advances on median bias reduction. General expressions for the adjusted score… ▽ More This paper presents an integrated framework for estimation and inference from generalized linear models using adjusted score equations that result in mean and median bias reduction. The framework unifies theoretical and methodological aspects of past research on mean bias reduction and accommodates, in a natural way, new advances on median bias reduction. General expressions for the adjusted score functions are derived in terms of quantities that are readily available in standard software for fitting generalized linear models. The resulting estimating equations are solved using a unifying quasi-Fisher scoring algorithm that is shown to be equivalent to iteratively re-weighted least squares with appropriately adjusted working variates. Formal links between the iterations for mean and median bias reduction are established. Core model invariance properties are used to develop a novel mixed adjustment strategy when the estimation of a dispersion parameter is necessary. It is also shown how median bias reduction in multinomial logistic regression can be done using the equivalent Poisson log-linear model. The estimates coming out from mean and median bias reduction are found to overcome practical issues related to infinite estimates that can occur with positive probability in generalized linear models with multinomial or discrete responses, and can result in valid inferences even in the presence of a high-dimensional nuisance parameter △ Less

Submitted 12 January, 2019; v1 submitted 11 April, 2018; originally announced April 2018.

MSC Class: 62J12; 62F03; 62F12

arXiv:1802.08114 [pdf, other]

Two-way sparsity for time-varying networks, with applications in genomics

Authors: Thomas E. Bartlett, Ioannis Kosmidis, Ricardo Silva

Abstract: We propose a novel way of modelling time-varying networks, by inducing two-way sparsity on local models of node connectivity. This two-way sparsity separately promotes sparsity across time and sparsity across variables (within time). Separation of these two types of sparsity is achieved through a novel prior structure, which draws on ideas from the Bayesian lasso and from copula modelling. We prov… ▽ More We propose a novel way of modelling time-varying networks, by inducing two-way sparsity on local models of node connectivity. This two-way sparsity separately promotes sparsity across time and sparsity across variables (within time). Separation of these two types of sparsity is achieved through a novel prior structure, which draws on ideas from the Bayesian lasso and from copula modelling. We provide an efficient implementation of the proposed model via a Gibbs sampler, and we apply the model to data from neural development. In doing so, we demonstrate that the proposed model is able to identify changes in genomic network structure that match current biological knowledge. Such changes in genomic network structure can then be used by neuro-biologists to identify potential targets for further experimental investigation. △ Less

Submitted 18 November, 2020; v1 submitted 22 February, 2018; originally announced February 2018.

arXiv:1801.09002 [pdf, other]

Median bias reduction in random-effects meta-analysis and meta-regression

Authors: Sophia Kyriakou, Ioannis Kosmidis, Nicola Sartori

Abstract: Random-effects models are frequently used to synthesise information from different studies in meta-analysis. While likelihood-based inference is attractive both in terms of limiting properties and of implementation, its application in random-effects meta-analysis may result in misleading conclusions, especially when the number of studies is small to moderate. The current paper shows how methodolog… ▽ More Random-effects models are frequently used to synthesise information from different studies in meta-analysis. While likelihood-based inference is attractive both in terms of limiting properties and of implementation, its application in random-effects meta-analysis may result in misleading conclusions, especially when the number of studies is small to moderate. The current paper shows how methodology that reduces the asymptotic bias of the maximum likelihood estimator of the variance component can also substantially improve inference about the mean effect size. The results are derived for the more general framework of random-effects meta-regression, which allows the mean effect size to vary with study-specific covariates. △ Less

Submitted 23 May, 2018; v1 submitted 26 January, 2018; originally announced January 2018.

MSC Class: 62F03; 62F12; 62P10

arXiv:1710.11217 [pdf, other]

Location-adjusted Wald statistics for scalar parameters

Authors: C. Di Caterina, I. Kosmidis

Abstract: Inference about a scalar parameter of interest is a core statistical task that has attracted immense research in statistics. The Wald statistic is a prime candidate for the task, on the grounds of the asymptotic validity of the standard normal approximation to its finite-sample distribution, simplicity and low computational cost. It is well known, though, that this normal approximation can be inad… ▽ More Inference about a scalar parameter of interest is a core statistical task that has attracted immense research in statistics. The Wald statistic is a prime candidate for the task, on the grounds of the asymptotic validity of the standard normal approximation to its finite-sample distribution, simplicity and low computational cost. It is well known, though, that this normal approximation can be inadequate, especially when the sample size is small or moderate relative to the number of parameters. A novel, algebraic adjustment to the Wald statistic is proposed, delivering significant improvements in inferential performance with only small implementation and computational overhead, predominantly due to additional matrix multiplications. The Wald statistic is viewed as an estimate of a transformation of the model parameters and is appropriately adjusted, using either maximum likelihood or reduced-bias estimators, bringing its expectation asymptotically closer to zero. The location adjustment depends on the expected information, an approximation to the bias of the estimator, and the derivatives of the transformation, which are all either readily available or easily obtainable in standard software for a wealth of models. An algorithm for the implementation of the location-adjusted Wald statistics in general models is provided, as well as a bootstrap scheme for the further scale correction of the location-adjusted statistic. Ample analytical and numerical evidence is presented for the adoption of the location-adjusted statistic in prominent modelling settings, including inference about log-odds and binomial proportions, logistic regression in the presence of nuisance parameters, beta regression, and gamma regression. The location-adjusted Wald statistics are used for the construction of significance maps for the analysis of multiple sclerosis lesions from MRI data. △ Less

Submitted 11 March, 2019; v1 submitted 30 October, 2017; originally announced October 2017.

MSC Class: 62F05; 62F03; 62J02; 62J12; 62P10

arXiv:1710.00001 [pdf, other]

A Bayesian inference approach for determining player abilities in football

Authors: Gavin A. Whitaker, Ricardo Silva, Daniel Edwards, Ioannis Kosmidis

Abstract: We consider the task of determining a football player's ability for a given event type, for example, scoring a goal. We propose an interpretable Bayesian model which is fit using variational inference methods. We implement a Poisson model to capture occurrences of event types, from which we infer player abilities. Our approach also allows the visualisation of differences between players, for a spe… ▽ More We consider the task of determining a football player's ability for a given event type, for example, scoring a goal. We propose an interpretable Bayesian model which is fit using variational inference methods. We implement a Poisson model to capture occurrences of event types, from which we infer player abilities. Our approach also allows the visualisation of differences between players, for a specific ability, through the marginal posterior variational densities. We then use these inferred player abilities to extend the Bayesian hierarchical model of Baio and Blangiardo (2010) which captures a team's scoring rate (the rate at which they score goals). We apply the resulting scheme to the English Premier League, capturing player abilities over the 2013/2014 season, before using output from the hierarchical model to predict whether over or under 2.5 goals will be scored in a given game in the 2014/2015 season. This validates our model as a way of providing insights into team formation and the individual success of sports teams. △ Less

Submitted 23 September, 2020; v1 submitted 25 September, 2017; originally announced October 2017.

Comments: 31 pages, 14 figures

arXiv:1509.00650 [pdf, other]

doi 10.1093/biomet/asx001

Improving the accuracy of likelihood-based inference in meta-analysis and meta-regression

Authors: Ioannis Kosmidis, Annamaria Guolo, Cristiano Varin

Abstract: Random-effects models are frequently used to synthesise information from different studies in meta-analysis. While likelihood-based inference is attractive both in terms of limiting properties and of implementation, its application in random-effects meta-analysis may result in misleading conclusions, especially when the number of studies is small to moderate. The current paper shows how methodolog… ▽ More Random-effects models are frequently used to synthesise information from different studies in meta-analysis. While likelihood-based inference is attractive both in terms of limiting properties and of implementation, its application in random-effects meta-analysis may result in misleading conclusions, especially when the number of studies is small to moderate. The current paper shows how methodology that reduces the asymptotic bias of the maximum likelihood estimator of the variance component can also substantially improve inference about the mean effect size. The results are derived for the more general framework of random-effects meta-regression, which allows the mean effect size to vary with study-specific covariates. △ Less

Submitted 22 May, 2017; v1 submitted 2 September, 2015; originally announced September 2015.

MSC Class: 62F03; 62F12; 62P10

Journal ref: Biometrika 104 (2017) 489-496,

arXiv:1506.01388 [pdf, other]

Linking the performance of endurance runners to training and physiological effects via multi-resolution elastic net

Authors: Ioannis Kosmidis, Louis Passfield

Abstract: A multiplicative effects model is introduced for the identification of the factors that are influential to the performance of highly-trained endurance runners. The model extends the established power-law relationship between performance times and distances by taking into account the effect of the physiological status of the runners, and training effects extracted from GPS records collected over th… ▽ More A multiplicative effects model is introduced for the identification of the factors that are influential to the performance of highly-trained endurance runners. The model extends the established power-law relationship between performance times and distances by taking into account the effect of the physiological status of the runners, and training effects extracted from GPS records collected over the course of a year. In order to incorporate information on the runners' training into the model, the concept of the training distribution profile is introduced and its ability to capture the characteristics of the training session is discussed. The covariates that are relevant to runner performance as response are identified using a procedure termed multi-resolution elastic net. Multi-resolution elastic net allows the simultaneous identification of scalar covariates and of intervals on the domain of one or more functional covariates that are most influential for the response. The results identify a contiguous group of speed intervals between 5.3 to 5.7 m$\cdot$s$^{-1}$ as influential for the improvement of running performance and extend established relationships between physiological status and runner performance. Another outcome of multi-resolution elastic net is a predictive equation for performance based on the minimization of the mean squared prediction error on a test data set across resolutions. △ Less

Submitted 1 July, 2015; v1 submitted 3 June, 2015; originally announced June 2015.

arXiv:1404.4077 [pdf, other]

doi 10.1007/s11222-015-9590-5

Model-based clustering using copulas with applications

Authors: Ioannis Kosmidis, Dimitris Karlis

Abstract: The majority of model-based clustering techniques is based on multivariate Normal models and their variants. In this paper copulas are used for the construction of flexible families of models for clustering applications. The use of copulas in model-based clustering offers two direct advantages over current methods: i) the appropriate choice of copulas provides the ability to obtain a range of exot… ▽ More The majority of model-based clustering techniques is based on multivariate Normal models and their variants. In this paper copulas are used for the construction of flexible families of models for clustering applications. The use of copulas in model-based clustering offers two direct advantages over current methods: i) the appropriate choice of copulas provides the ability to obtain a range of exotic shapes for the clusters, and ii) the explicit choice of marginal distributions for the clusters allows the modelling of multivariate data of various modes (either discrete or continuous) in a natural way. This paper introduces and studies the framework of copula-based finite mixture models for clustering applications. Estimation in the general case can be performed using standard EM, and, depending on the mode of the data, more efficient procedures are provided that can fully exploit the copula structure. The closure properties of the mixture models under marginalization are discussed, and for continuous, real-valued data parametric rotations in the sample space are introduced, with a parallel discussion on parameter identifiability depending on the choice of copulas for the components. The exposition of the methodology is accompanied and motivated by the analysis of real and artificial data. △ Less

Submitted 2 July, 2015; v1 submitted 15 April, 2014; originally announced April 2014.

Journal ref: Stat.Comput. 26 (2016) 1079-1099

arXiv:1311.6311 [pdf, ps, other]

doi 10.1002/wics.1296

Bias in parametric estimation: reduction and useful side-effects

Authors: Ioannis Kosmidis

Abstract: The bias of an estimator is defined as the difference of its expected value from the parameter to be estimated, where the expectation is with respect to the model. Loosely speaking, small bias reflects the desire that if an experiment is repeated indefinitely then the average of all the resultant estimates will be close to the parameter value that is estimated. The current paper is a review of the… ▽ More The bias of an estimator is defined as the difference of its expected value from the parameter to be estimated, where the expectation is with respect to the model. Loosely speaking, small bias reflects the desire that if an experiment is repeated indefinitely then the average of all the resultant estimates will be close to the parameter value that is estimated. The current paper is a review of the still-expanding repository of methods that have been developed to reduce bias in the estimation of parametric models. The review provides a unifying framework where all those methods are seen as attempts to approximate the solution of a simple estimating equation. Of particular focus is the maximum likelihood estimator, which despite being asymptotically unbiased under the usual regularity conditions, has finite-sample bias that can result in significant loss of performance of standard inferential procedures. An informal comparison of the methods is made revealing some useful practical side-effects in the estimation of popular models in practice including: i) shrinkage of the estimators in binomial and multinomial regression models that guarantees finiteness even in cases of data separation where the maximum likelihood estimator is infinite, and ii) inferential benefits for models that require the estimation of dispersion or precision parameters. △ Less

Submitted 25 November, 2013; originally announced November 2013.

MSC Class: 62F03; 62F10; 62F12; 62J12

Journal ref: WIREs.Compu.Stat. 6 (2014) 185-196

arXiv:1204.0105 [pdf, other]

doi 10.1111/rssb.12025

Improved estimation in cumulative link models

Authors: Ioannis Kosmidis

Abstract: For the estimation of cumulative link models for ordinal data, the bias-reducing adjusted score equations in \citet{firth:93} are obtained, whose solution ensures an estimator with smaller asymptotic bias than the maximum likelihood estimator. Their form suggests a parameter-dependent adjustment of the multinomial counts, which, in turn suggests the solution of the adjusted score equations through… ▽ More For the estimation of cumulative link models for ordinal data, the bias-reducing adjusted score equations in \citet{firth:93} are obtained, whose solution ensures an estimator with smaller asymptotic bias than the maximum likelihood estimator. Their form suggests a parameter-dependent adjustment of the multinomial counts, which, in turn suggests the solution of the adjusted score equations through iterated maximum likelihood fits on adjusted counts, greatly facilitating implementation. Like the maximum likelihood estimator, the reduced-bias estimator is found to respect the invariance properties that make cumulative link models a good choice for the analysis of categorical data. Its additional finiteness and optimal frequentist properties, along with the adequate behaviour of related asymptotic inferential procedures make the reduced-bias estimator attractive as a default choice for practical applications. Furthermore, the proposed estimator enjoys certain shrinkage properties that are defensible from an experimental point of view relating to the nature of ordinal data. △ Less

Submitted 28 January, 2013; v1 submitted 31 March, 2012; originally announced April 2012.

MSC Class: 62F03; 62F10; 62F12; 62J12

Journal ref: J.R.Stat.Soc.B 76 (2014) 169-196

arXiv:1201.1314 [pdf, other]

Some discussions of D. Fearnhead and D. Prangle's Read Paper "Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation"

Authors: Christophe Andrieu, Simon Barthelme, Nicolas Chopin, Julien Cornebise, Arnaud Doucet, Mark Girolami, Ioannis Kosmidis, Ajay Jasra, Anthony Lee, Jean-Michel Marin, Pierre Pudlo, Christian P. Robert, Mohammed Sedki., Sumeetpal S. Singh

Abstract: This report is a collection of comments on the Read Paper of Fearnhead and Prangle (2011), to appear in the Journal of the Royal Statistical Society Series B, along with a reply from the authors. This report is a collection of comments on the Read Paper of Fearnhead and Prangle (2011), to appear in the Journal of the Royal Statistical Society Series B, along with a reply from the authors. △ Less

Submitted 5 January, 2012; originally announced January 2012.

Comments: 10 pages

arXiv:0907.4018 [pdf, ps, other]

Simulating Events of Unknown Probabilities via Reverse Time Martingales

Authors: Krzysztof Latuszynski, Ioannis Kosmidis, Omiros Papaspiliopoulos, Gareth O. Roberts

Abstract: Assume that one aims to simulate an event of unknown probability $s\in (0,1)$ which is uniquely determined, however only its approximations can be obtained using a finite computational effort. Such settings are often encountered in statistical simulations. We consider two specific examples. First, the exact simulation of non-linear diffusions, second, the celebrated Bernoulli factory problem of… ▽ More Assume that one aims to simulate an event of unknown probability $s\in (0,1)$ which is uniquely determined, however only its approximations can be obtained using a finite computational effort. Such settings are often encountered in statistical simulations. We consider two specific examples. First, the exact simulation of non-linear diffusions, second, the celebrated Bernoulli factory problem of generating an $f(p)-$coin given a sequence $X_1,X_2,...$ of independent tosses of a $p-$coin (with known $f$ and unknown $p$). We describe a general framework and provide algorithms where this kind of problems can be fitted and solved. The algorithms are straightforward to implement and thus allow for effective simulation of desired events of probability $s.$ In the case of diffusions, we obtain the algorithm of \cite{BeskosRobertsEA1} as a specific instance of the generic framework developed here. In the case of the Bernoulli factory, our work offers a statistical understanding of the Nacu-Peres algorithm for $f(p) = \min\{2p, 1-2\varepsilon\}$ (which is central to the general question) and allows for its immediate implementation that avoids algorithmic difficulties of the original version. △ Less

Submitted 21 November, 2009; v1 submitted 23 July, 2009; originally announced July 2009.

Comments: referees suggestions incorporated

Report number: University of Warwick CRiSM research report No. 09-30

Showing 1–25 of 25 results for author: Kosmidis, I