-
Permutation-based multiple testing when fitting many generalized linear models
Authors:
Riccardo De Santis,
Jelle J. Goeman,
Samuel Davenport,
Jesse Hemerik,
Livio Finos
Abstract:
The multiple testing problem appears when fitting multivariate generalized linear models for high dimensional data. We show that the sign-flip test can be combined with permutation-based procedures for assessing the multiple testing problem
The multiple testing problem appears when fitting multivariate generalized linear models for high dimensional data. We show that the sign-flip test can be combined with permutation-based procedures for assessing the multiple testing problem
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Robust Inference for Generalized Linear Mixed Models: An Approach Based on Score Sign Flip**
Authors:
Angela Andreella,
Jelle Goeman,
Jesse Hemerik,
Livio Finos
Abstract:
Despite the versatility of generalized linear mixed models in handling complex experimental designs, they often suffer from misspecification and convergence problems. This makes inference on the values of coefficients problematic. To address these challenges, we propose a robust extension of the score-based statistical test using sign-flip** transformations. Our approach efficiently handles with…
▽ More
Despite the versatility of generalized linear mixed models in handling complex experimental designs, they often suffer from misspecification and convergence problems. This makes inference on the values of coefficients problematic. To address these challenges, we propose a robust extension of the score-based statistical test using sign-flip** transformations. Our approach efficiently handles within-variance structure and heteroscedasticity, ensuring accurate regression coefficient testing. The approach is illustrated by analyzing the reduction of health issues over time for newly adopted children. The model is characterized by a binomial response with unbalanced frequencies and several categorical and continuous predictors. The proposed approach efficiently deals with critical problems related to longitudinal nonlinear models, surpassing common statistical approaches such as generalized estimating equations and generalized linear mixed models.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
A novel CFA+EFA model to detect aberrant respondents
Authors:
Niccolò Cao,
Livio Finos,
Luigi Lombardi,
Antonio Calcagnì
Abstract:
Aberrant respondents are common but yet extremely detrimental to the quality of social surveys or questionnaires. Recently, factor mixture models have been employed to identify individuals providing deceptive or careless responses. We propose a comprehensive factor mixture model that combines confirmatory and exploratory factor models to represent both the non-aberrant and aberrant components of t…
▽ More
Aberrant respondents are common but yet extremely detrimental to the quality of social surveys or questionnaires. Recently, factor mixture models have been employed to identify individuals providing deceptive or careless responses. We propose a comprehensive factor mixture model that combines confirmatory and exploratory factor models to represent both the non-aberrant and aberrant components of the responses. The flexibility of the proposed solution allows for the identification of two of the most common aberant response styles, namely faking and careless responding. We validated our approach by means of two simulations and two case studies. The results indicate the effectiveness of the proposed model in handling with aberrant responses in social and behavioral surveys.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Revealing Cortical Layers In Histological Brain Images With Self-Supervised Graph Convolutional Networks Applied To Cell-Graphs
Authors:
Valentina Vadori,
Antonella Peruffo,
Jean-Marie Graïc,
Giulia Vadori,
Livio Finos,
Enrico Grisan
Abstract:
Identifying cerebral cortex layers is crucial for comparative studies of the cytoarchitecture aiming at providing insights into the relations between brain structure and function across species. The absence of extensive annotated datasets typically limits the adoption of machine learning approaches, leading to the manual delineation of cortical layers by neuroanatomists. We introduce a self-superv…
▽ More
Identifying cerebral cortex layers is crucial for comparative studies of the cytoarchitecture aiming at providing insights into the relations between brain structure and function across species. The absence of extensive annotated datasets typically limits the adoption of machine learning approaches, leading to the manual delineation of cortical layers by neuroanatomists. We introduce a self-supervised approach to detect layers in 2D Nissl-stained histological slices of the cerebral cortex. It starts with the segmentation of individual cells and the creation of an attributed cell-graph. A self-supervised graph convolutional network generates cell embeddings that encode morphological and structural traits of the cellular environment and are exploited by a community detection algorithm for the final layering. Our method, the first self-supervised of its kind with no spatial transcriptomics data involved, holds the potential to accelerate cytoarchitecture analyses, sidestep** annotation needs and advancing cross-species investigation.
△ Less
Submitted 26 November, 2023;
originally announced November 2023.
-
NCIS: Deep Color Gradient Maps Regression and Three-Class Pixel Classification for Enhanced Neuronal Cell Instance Segmentation in Nissl-Stained Histological Images
Authors:
Valentina Vadori,
Antonella Peruffo,
Jean-Marie Graïc,
Livio Finos,
Livio Corain,
Enrico Grisan
Abstract:
Deep learning has proven to be more effective than other methods in medical image analysis, including the seemingly simple but challenging task of segmenting individual cells, an essential step for many biological studies. Comparative neuroanatomy studies are an example where the instance segmentation of neuronal cells is crucial for cytoarchitecture characterization. This paper presents an end-to…
▽ More
Deep learning has proven to be more effective than other methods in medical image analysis, including the seemingly simple but challenging task of segmenting individual cells, an essential step for many biological studies. Comparative neuroanatomy studies are an example where the instance segmentation of neuronal cells is crucial for cytoarchitecture characterization. This paper presents an end-to-end framework to automatically segment single neuronal cells in Nissl-stained histological images of the brain, thus aiming to enable solid morphological and structural analyses for the investigation of changes in the brain cytoarchitecture. A U-Net-like architecture with an EfficientNet as the encoder and two decoding branches is exploited to regress four color gradient maps and classify pixels into contours between touching cells, cell bodies, or background. The decoding branches are connected through attention gates to share relevant features, and their outputs are combined to return the instance segmentation of the cells. The method was tested on images of the cerebral cortex and cerebellum, outperforming other recent deep-learning-based approaches for the instance segmentation of cells.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
Procrustes-based distances for exploring between-matrices similarity
Authors:
Angela Andreella,
Riccardo De Santis,
Anna Vesely,
Livio Finos
Abstract:
The statistical shape analysis called Procrustes analysis minimizes the distance between matrices by similarity transformations. The method returns a set of optimal orthogonal matrices, which project each matrix into a common space. This manuscript presents two types of distances derived from Procrustes analysis for exploring between-matrices similarity. The first one focuses on the residuals from…
▽ More
The statistical shape analysis called Procrustes analysis minimizes the distance between matrices by similarity transformations. The method returns a set of optimal orthogonal matrices, which project each matrix into a common space. This manuscript presents two types of distances derived from Procrustes analysis for exploring between-matrices similarity. The first one focuses on the residuals from the Procrustes analysis, i.e., the residual-based distance metric. In contrast, the second one exploits the fitted orthogonal matrices, i.e., the rotational-based distance metric. Thanks to these distances, similarity-based techniques such as the multidimensional scaling method can be applied to visualize and explore patterns and similarities among observations. The proposed distances result in being helpful in functional magnetic resonance imaging (fMRI) data analysis. The brain activation measured over space and time can be represented by a matrix. The proposed distances applied to a sample of subjects -- i.e., matrices -- revealed groups of individuals sharing patterns of neural brain activation.
△ Less
Submitted 15 January, 2023;
originally announced January 2023.
-
MR-NOM: Multi-scale Resolution of Neuronal cells in Nissl-stained histological slices via deliberate Over-segmentation and Merging
Authors:
Valentina Vadori,
Jean-Marie Graïc,
Livio Finos,
Livio Corain,
Antonella Peruffo,
Enrico Grisan
Abstract:
In comparative neuroanatomy, the characterization of brain cytoarchitecture is critical to a better understanding of brain structure and function, as it helps to distill information on the development, evolution, and distinctive features of different populations. The automatic segmentation of individual brain cells is a primary prerequisite and yet remains challenging. A new method (MR-NOM) was de…
▽ More
In comparative neuroanatomy, the characterization of brain cytoarchitecture is critical to a better understanding of brain structure and function, as it helps to distill information on the development, evolution, and distinctive features of different populations. The automatic segmentation of individual brain cells is a primary prerequisite and yet remains challenging. A new method (MR-NOM) was developed for the instance segmentation of cells in Nissl-stained histological images of the brain. MR-NOM exploits a multi-scale approach to deliberately over-segment the cells into superpixels and subsequently merge them via a classifier based on shape, structure, and intensity features. The method was tested on images of the cerebral cortex, proving successful in dealing with cells of varying characteristics that partially touch or overlap, showing better performance than two state-of-the-art methods.
△ Less
Submitted 14 November, 2022;
originally announced November 2022.
-
Post-selection Inference in Multiverse Analysis (PIMA): an inferential framework based on the sign flip** score test
Authors:
Paolo Girardi,
Anna Vesely,
Daniël Lakens,
Gianmarco Altoè,
Massimiliano Pastore,
Antonio Calcagnì,
Livio Finos
Abstract:
When analyzing data researchers make some decisions that are either arbitrary, based on subjective beliefs about the data generating process, or for which equally justifiable alternative choices could have been made. This wide range of data-analytic choices can be abused, and has been one of the underlying causes of the replication crisis in several fields. Recently, the introduction of multiverse…
▽ More
When analyzing data researchers make some decisions that are either arbitrary, based on subjective beliefs about the data generating process, or for which equally justifiable alternative choices could have been made. This wide range of data-analytic choices can be abused, and has been one of the underlying causes of the replication crisis in several fields. Recently, the introduction of multiverse analysis provides researchers with a method to evaluate the stability of the results across reasonable choices that could be made when analyzing data. Multiverse analysis is confined to a descriptive role, lacking a proper and comprehensive inferential procedure. Recently, specification curve analysis adds an inferential procedure to multiverse analysis, but this approach is limited to simple cases related to the linear model, and only allows researchers to infer whether at least one specification rejects the null hypothesis, but not which specifications should be selected. In this paper we present a Post-selection Inference approach to Multiverse Analysis (PIMA) which is a flexible and general inferential approach that accounts for all possible models, i.e., the multiverse of reasonable analyses. The approach allows for a wide range of data specifications (i.e. pre-processing) and any generalized linear model; it allows testing the null hypothesis of a given predictor not being associated with the outcome, by merging information from all reasonable models of multiverse analysis, and provides strong control of the family-wise error rate such that it allows researchers to claim that the null-hypothesis can be rejected for each specification that shows a significant effect. The inferential proposal is based on a conditional resampling procedure. To be continued...
△ Less
Submitted 3 October, 2023; v1 submitted 6 October, 2022;
originally announced October 2022.
-
Inference in generalized linear models with robustness to misspecified variances
Authors:
Riccardo De Santis,
Jelle J. Goeman,
Jesse Hemerik,
Livio Finos
Abstract:
Generalized linear models usually assume a common dispersion parameter. This assumption is seldom true in practice, and may cause appreciable loss of type I error control if standard parametric methods are used. We present an alternative semi-parametric group invariance method based on sign flip** of score contributions. Our method requires only the correct specification of the mean model, but i…
▽ More
Generalized linear models usually assume a common dispersion parameter. This assumption is seldom true in practice, and may cause appreciable loss of type I error control if standard parametric methods are used. We present an alternative semi-parametric group invariance method based on sign flip** of score contributions. Our method requires only the correct specification of the mean model, but is robust against any misspecification of the variance. The method is available in the R library flipscores.
△ Less
Submitted 26 October, 2022; v1 submitted 28 September, 2022;
originally announced September 2022.
-
Enhanced hyperalignment via spatial prior information
Authors:
Angela Andreella,
Livio Finos,
Martin A Lindquist
Abstract:
Functional alignment between subjects is an important assumption of functional magnetic resonance imaging (fMRI) group-level analysis. However, it is often violated in practice, even after alignment to a standard anatomical template. Hyperalignment, based on sequential Procrustes orthogonal transformations, has been proposed as a method of aligning shared functional information into a common high-…
▽ More
Functional alignment between subjects is an important assumption of functional magnetic resonance imaging (fMRI) group-level analysis. However, it is often violated in practice, even after alignment to a standard anatomical template. Hyperalignment, based on sequential Procrustes orthogonal transformations, has been proposed as a method of aligning shared functional information into a common high-dimensional space and thereby improving inter-subject analysis. Though successful, current hyperalignment algorithms have a number of shortcomings, including difficulties interpreting the transformations, a lack of uniqueness of the procedure, and difficulties performing whole-brain analysis. To resolve these issues, we propose the ProMises (Procrustes von Mises-Fisher) model. We reformulate functional alignment as a statistical model and impose a prior distribution on the orthogonal parameters (the von Mises-Fisher distribution). This allows for the embedding of anatomical information into the estimation procedure by penalizing the contribution of spatially distant voxels when creating the shared functional high-dimensional space. Importantly, the transformations, aligned images, and related results are all unique. In addition, the proposed method allows for efficient whole-brain functional alignment. In simulations and application to data from four fMRI studies we find that ProMises improves inter-subject classification in terms of between-subject accuracy and interpretability compared to standard hyperalignment algorithms.
△ Less
Submitted 16 September, 2022;
originally announced September 2022.
-
Resampling-Based Multisplit Inference for High-Dimensional Regression
Authors:
Anna Vesely,
Jelle J. Goeman,
Livio Finos
Abstract:
We propose a novel resampling-based method to construct an asymptotically exact test for any subset of hypotheses on coefficients in high-dimensional linear regression. It can be embedded into any multiple testing procedure to make confidence statements on relevant predictor variables. The method constructs permutation test statistics for any individual hypothesis by means of repeated splits of th…
▽ More
We propose a novel resampling-based method to construct an asymptotically exact test for any subset of hypotheses on coefficients in high-dimensional linear regression. It can be embedded into any multiple testing procedure to make confidence statements on relevant predictor variables. The method constructs permutation test statistics for any individual hypothesis by means of repeated splits of the data and a variable selection technique; then it defines a test for any subset by suitably aggregating its variables' test statistics. The resulting procedure is extremely flexible, as it allows different selection techniques and several combining functions. We present it in two ways: an exact method and an approximate one, that requires less memory usage and shorter computation time, and can be scaled up to higher dimensions. We illustrate the performance of the method with simulations and the analysis of real gene expression data.
△ Less
Submitted 25 May, 2022;
originally announced May 2022.
-
Advances in Multi-Variate Analysis Methods for New Physics Searches at the Large Hadron Collider
Authors:
Anna Stakia,
Tommaso Dorigo,
Giovanni Banelli,
Daniela Bortoletto,
Alessandro Casa,
Pablo de Castro,
Christophe Delaere,
Julien Donini,
Livio Finos,
Michele Gallinaro,
Andrea Giammanco,
Alexander Held,
Fabricio Jiménez Morales,
Grzegorz Kotkowski,
Seng Pei Liew,
Fabio Maltoni,
Giovanna Menardi,
Ioanna Papavergou,
Alessia Saggio,
Bruno Scarpa,
Giles C. Strong,
Cecilia Tosciri,
João Varela,
Pietro Vischia,
Andreas Weiler
Abstract:
Between the years 2015 and 2019, members of the Horizon 2020-funded Innovative Training Network named "AMVA4NewPhysics" studied the customization and application of advanced multivariate analysis methods and statistical learning tools to high-energy physics problems, as well as developed entirely new ones. Many of those methods were successfully used to improve the sensitivity of data analyses per…
▽ More
Between the years 2015 and 2019, members of the Horizon 2020-funded Innovative Training Network named "AMVA4NewPhysics" studied the customization and application of advanced multivariate analysis methods and statistical learning tools to high-energy physics problems, as well as developed entirely new ones. Many of those methods were successfully used to improve the sensitivity of data analyses performed by the ATLAS and CMS experiments at the CERN Large Hadron Collider; several others, still in the testing phase, promise to further improve the precision of measurements of fundamental physics parameters and the reach of searches for new phenomena. In this paper, the most relevant new tools, among those studied and developed, are presented along with the evaluation of their performances.
△ Less
Submitted 22 November, 2021; v1 submitted 16 May, 2021;
originally announced May 2021.
-
Permutation-Based True Discovery Guarantee by Sum Tests
Authors:
Anna Vesely,
Livio Finos,
Jelle J. Goeman
Abstract:
Sum-based global tests are highly popular in multiple hypothesis testing. In this paper we propose a general closed testing procedure for sum tests, which provides lower confidence bounds for the proportion of true discoveries (TDP), simultaneously over all subsets of hypotheses. These simultaneous inferences come for free, i.e., without any adjustment of the alpha-level, whenever a global test is…
▽ More
Sum-based global tests are highly popular in multiple hypothesis testing. In this paper we propose a general closed testing procedure for sum tests, which provides lower confidence bounds for the proportion of true discoveries (TDP), simultaneously over all subsets of hypotheses. These simultaneous inferences come for free, i.e., without any adjustment of the alpha-level, whenever a global test is used. Our method allows for an exploratory approach, as simultaneity ensures control of the TDP even when the subset of interest is selected post hoc. It adapts to the unknown joint distribution of the data through permutation testing. Any sum test may be employed, depending on the desired power properties. We present an iterative shortcut for the closed testing procedure, based on the branch and bound algorithm, which converges to the full closed testing results, often after few iterations; even if it is stopped early, it controls the TDP. We compare the properties of different choices for the sum test through simulations, then we illustrate the feasibility of the method for high dimensional data on brain imaging and genomics data.
△ Less
Submitted 18 January, 2023; v1 submitted 23 February, 2021;
originally announced February 2021.
-
Permutation-based true discovery proportions for functional Magnetic Resonance Imaging cluster analysis
Authors:
Angela Andreella,
Jesse Hemerik,
Wouter Weeda,
Livio Finos,
Jelle Goeman
Abstract:
We propose a permutation-based method for testing a large collection of hypotheses simultaneously. Our method provides lower bounds for the number of true discoveries in any selected subset of hypotheses. These bounds are simultaneously valid with high confidence. The methodology is particularly useful in functional Magnetic Resonance Imaging cluster analysis, where it provides a confidence statem…
▽ More
We propose a permutation-based method for testing a large collection of hypotheses simultaneously. Our method provides lower bounds for the number of true discoveries in any selected subset of hypotheses. These bounds are simultaneously valid with high confidence. The methodology is particularly useful in functional Magnetic Resonance Imaging cluster analysis, where it provides a confidence statement on the percentage of truly activated voxels within clusters of voxels, avoiding the well-known spatial specificity paradox. We offer a user-friendly tool to estimate the percentage of true discoveries for each cluster while controlling the family-wise error rate for multiple testing and taking into account that the cluster was chosen in a data-driven way. The method adapts to the spatial correlation structure that characterizes functional Magnetic Resonance Imaging data, gaining power over parametric approaches.
△ Less
Submitted 26 January, 2023; v1 submitted 1 December, 2020;
originally announced December 2020.
-
Procrustes analysis for high-dimensional data
Authors:
Angela Andreella,
Livio Finos
Abstract:
The Procrustes-based perturbation model (Goodall, 1991) allows minimization of the Frobenius distance between matrices by similarity transformation. However, it suffers from non-identifiability, critical interpretation of the transformed matrices, and inapplicability in high-dimensional data. We provide an extension of the perturbation model focused on the high-dimensional data framework, called t…
▽ More
The Procrustes-based perturbation model (Goodall, 1991) allows minimization of the Frobenius distance between matrices by similarity transformation. However, it suffers from non-identifiability, critical interpretation of the transformed matrices, and inapplicability in high-dimensional data. We provide an extension of the perturbation model focused on the high-dimensional data framework, called the ProMises (Procrustes von Mises-Fisher) model. The ill-posed and interpretability problems are solved by imposing a proper prior distribution for the orthogonal matrix parameter (i.e., the von Mises-Fisher distribution) which is a conjugate prior, resulting in a fast estimation process. Furthermore, we present the Efficient ProMises model for the high-dimensional framework, useful in neuroimaging, where the problem has much more than three dimensions. We found a great improvement in functional magnetic resonance imaging (fMRI) connectivity analysis because the ProMises model permits incorporation of topological brain information in the alignment's estimation process.
△ Less
Submitted 20 May, 2022; v1 submitted 11 August, 2020;
originally announced August 2020.
-
Permutation testing in high-dimensional linear models: an empirical investigation
Authors:
Jesse Hemerik,
Magne Thoresen,
Livio Finos
Abstract:
Permutation testing in linear models, where the number of nuisance coefficients is smaller than the sample size, is a well-studied topic. The common approach of such tests is to permute residuals after regressing on the nuisance covariates. Permutation-based tests are valuable in particular because they can be highly robust to violations of the standard linear model, such as non-normality and hete…
▽ More
Permutation testing in linear models, where the number of nuisance coefficients is smaller than the sample size, is a well-studied topic. The common approach of such tests is to permute residuals after regressing on the nuisance covariates. Permutation-based tests are valuable in particular because they can be highly robust to violations of the standard linear model, such as non-normality and heteroscedasticity. Moreover, in some cases they can be combined with existing, powerful permutation-based multiple testing methods. Here, we propose permutation tests for models where the number of nuisance coefficients exceeds the sample size. The performance of the novel tests is investigated with simulations. In a wide range of simulation scenarios our proposed permutation methods provided appropriate type I error rate control, unlike some competing tests, while having good power.
△ Less
Submitted 8 October, 2020; v1 submitted 6 January, 2020;
originally announced January 2020.
-
(Mis)Information Operations: An Integrated Perspective
Authors:
Matteo Cinelli,
Mauro Conti,
Livio Finos,
Francesco Grisolia,
Petra Kralj Novak,
Antonio Peruzzi,
Maurizio Tesconi,
Fabiana Zollo,
Walter Quattrociocchi
Abstract:
The massive diffusion of social media fosters disintermediation and changes the way users are informed, the way they process reality, and the way they engage in public debate. The cognitive layer of users and the related social dynamics define the nature and the dimension of informational threats. Users show the tendency to interact with information adhering to their preferred narrative and to ign…
▽ More
The massive diffusion of social media fosters disintermediation and changes the way users are informed, the way they process reality, and the way they engage in public debate. The cognitive layer of users and the related social dynamics define the nature and the dimension of informational threats. Users show the tendency to interact with information adhering to their preferred narrative and to ignore dissenting information. Confirmation bias seems to account for users decisions about consuming and spreading content; and, at the same time, aggregation of favored information within those communities reinforces group polarization. In this work, the authors address the problem of (mis)information operations with a holistic and integrated approach. Cognitive weakness induced by this new information environment are considered. Moreover, (mis)information operations, with particular reference to the Italian context, are considered; and the fact that the phenomenon is more complex than expected is highlighted. The paper concludes by providing an integrated research roadmap accounting for the possible future technological developments.
△ Less
Submitted 23 December, 2019;
originally announced December 2019.
-
Enhancing statistical inference in psychological research via prospective and retrospective design analysis
Authors:
Gianmarco Altoè,
Giulia Bertoldo,
Claudio Zandonella Callegher,
Enrico Toffalini,
Antonio Calcagnì,
Livio Finos,
Massimiliano Pastore
Abstract:
In the past two decades, psychological science has experienced an unprecedented replicability crisis which uncovered several issues. Among others, statistical inference is too often viewed as an isolated procedure limited to the analysis of data that have already been collected. We build on and further develop an idea proposed by Gelman and Carlin (2014) termed "prospective and retrospective desig…
▽ More
In the past two decades, psychological science has experienced an unprecedented replicability crisis which uncovered several issues. Among others, statistical inference is too often viewed as an isolated procedure limited to the analysis of data that have already been collected. We build on and further develop an idea proposed by Gelman and Carlin (2014) termed "prospective and retrospective design analysis". Rather than focusing only on the statistical significance of a result and on the classical control of type I and type II errors, a comprehensive design analysis involves reasoning about what can be considered a plausible effect size. Furthermore, it introduces two relevant inferential risks: the exaggeration ratio or Type M error (i.e., the predictable average overestimation of an effect that emerges as statistically significant), and the sign error or Type S error (i.e., the risk that a statistically significant effect is estimated in the wrong direction). Another important aspect of design analysis is that it can be usefully carried out both in the planning phase of a study and for the evaluation of studies that have already been conducted, thus increasing researchers' awareness during all phases of a research project. We use a familiar example in psychology where the researcher is interested in analyzing the differences between two independent groups. We examine the case in which the plausible effect size is formalized as a single value, and propose a method in which uncertainty concerning the magnitude of the effect is formalized via probability distributions. Through several examples, we show that even though a design analysis requires big effort, it has the potential to contribute to planning more robust and replicable studies. Finally, future developments in the Bayesian framework are discussed.
△ Less
Submitted 30 September, 2019;
originally announced September 2019.
-
Robust testing in generalized linear models by sign-flip** score contributions
Authors:
Jesse Hemerik,
Jelle J Goeman,
Livio Finos
Abstract:
Generalized linear models are often misspecified due to overdispersion, heteroscedasticity and ignored nuisance variables. Existing quasi-likelihood methods for testing in misspecified models often do not provide satisfactory type-I error rate control. We provide a novel semi-parametric test, based on sign-flip** individual score contributions. The tested parameter is allowed to be multi-dimensi…
▽ More
Generalized linear models are often misspecified due to overdispersion, heteroscedasticity and ignored nuisance variables. Existing quasi-likelihood methods for testing in misspecified models often do not provide satisfactory type-I error rate control. We provide a novel semi-parametric test, based on sign-flip** individual score contributions. The tested parameter is allowed to be multi-dimensional and even high-dimensional. Our test is often robust against the mentioned forms of misspecification and provides better type-I error control than its competitors. When nuisance parameters are estimated, our basic test becomes conservative. We show how to take nuisance estimation into account to obtain an asymptotically exact test. Our proposed test is asymptotically equivalent to its parametric counterpart.
△ Less
Submitted 8 May, 2020; v1 submitted 9 September, 2019;
originally announced September 2019.
-
A Maximum Entropy Procedure to Solve Likelihood Equations
Authors:
Antonio Calcagnì,
Livio Finos,
Gianmarco Altoè,
Massimiliano Pastore
Abstract:
In this article we provide initial findings regarding the problem of solving likelihood equations by means of a maximum entropy approach. Unlike standard procedures that require equating at zero the score function of the maximum-likelihood problem, we propose an alternative strategy where the score is instead used as external informative constraint to the maximization of the convex Shannon's entro…
▽ More
In this article we provide initial findings regarding the problem of solving likelihood equations by means of a maximum entropy approach. Unlike standard procedures that require equating at zero the score function of the maximum-likelihood problem, we propose an alternative strategy where the score is instead used as external informative constraint to the maximization of the convex Shannon's entropy function. The problem involves the re-parameterization of the score parameters as expected values of discrete probability distributions where probabilities need to be estimated. This leads to a simpler situation where parameters are searched in smaller (hyper) simplex space. We assessed our proposal by means of empirical case studies and a simulation study, this latter involving the most critical case of logistic regression under data separation. The results suggested that the maximum entropy re-formulation of the score problem solves the likelihood equation problem. Similarly, when maximum-likelihood estimation is difficult, as for the case of logistic regression under separation, the maximum entropy proposal achieved results (numerically) comparable to those obtained by the Firth's Bias-corrected approach. Overall, these first findings reveal that a maximum entropy solution can be considered as an alternative technique to solve the likelihood equation.
△ Less
Submitted 13 June, 2019; v1 submitted 22 April, 2019;
originally announced April 2019.
-
Hemisphere Mixing: a Fully Data-Driven Model of QCD Multijet Backgrounds for LHC Searches
Authors:
P. De Castro Manzano,
M. Dall'Osso,
T. Dorigo,
L. Finos,
G. Kotkowski,
G. Menardi,
B. Scarpa
Abstract:
A novel method is proposed here to precisely model the multi-dimensional features of QCD multi-jet events in hadron collisions. The method relies on the schematization of high-pT QCD processes as 2->2 reactions made complex by sub-leading effects. The construction of libraries of hemispheres from experimental data and the definition of a suitable nearest-neighbor-based association map allow for th…
▽ More
A novel method is proposed here to precisely model the multi-dimensional features of QCD multi-jet events in hadron collisions. The method relies on the schematization of high-pT QCD processes as 2->2 reactions made complex by sub-leading effects. The construction of libraries of hemispheres from experimental data and the definition of a suitable nearest-neighbor-based association map allow for the generation of artificial events that reproduce with surprising accuracy the kinematics of the QCD component of original data, while remaining insensitive to small signal contaminations. The method is succinctly described and its performance is tested in the case of the search for the hh->bbbb process at the LHC.
△ Less
Submitted 7 December, 2017;
originally announced December 2017.