-
Case-crossover designs and overdispersion with application in air pollution epidemiology
Authors:
Samuel Perreault,
Gracia Y. Dong,
Alex Stringer,
Hwashin Shin,
Patrick Brown
Abstract:
Over the last three decades, case-crossover designs have found many applications in health sciences, especially in air pollution epidemiology. They are typically used, in combination with partial likelihood techniques, to define a conditional logistic model for the responses, usually health outcomes, conditional on the exposures. Despite the fact that conditional logistic models have been shown eq…
▽ More
Over the last three decades, case-crossover designs have found many applications in health sciences, especially in air pollution epidemiology. They are typically used, in combination with partial likelihood techniques, to define a conditional logistic model for the responses, usually health outcomes, conditional on the exposures. Despite the fact that conditional logistic models have been shown equivalent, in typical air pollution epidemiology setups, to specific instances of the well-known Poisson time series model, it is often claimed that they cannot allow for overdispersion. This paper clarifies the relationship between case-crossover designs, the models that ensue from their use, and overdispersion. In particular, we propose to relax the assumption of independence between individuals traditionally made in case-crossover analyses, in order to explicitly introduce overdispersion in the conditional logistic model. As we show, the resulting overdispersed conditional logistic model coincides with the overdispersed, conditional Poisson model, in the sense that their likelihoods are simple re-expressions of one another. We further provide the technical details of a Bayesian implementation of the proposed case-crossover model, which we use to demonstrate, by means of a large simulation study, that standard case-crossover models can lead to dramatically underestimated coverage probabilities, while the proposed models do not. We also perform an illustrative analysis of the association between air pollution and morbidity in Toronto, Canada, which shows that the proposed models are more robust than standard ones to outliers such as those associated with public holidays.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Structured Learning in Time-dependent Cox Models
Authors:
Guanbo Wang,
Yi Lian,
Archer Y. Yang,
Robert W. Platt,
Rui Wang,
Sylvie Perreault,
Marc Dorais,
Mireille E. Schnitzer
Abstract:
Cox models with time-dependent coefficients and covariates are widely used in survival analysis. In high-dimensional settings, sparse regularization techniques are employed for variable selection, but existing methods for time-dependent Cox models lack flexibility in enforcing specific sparsity patterns (i.e., covariate structures). We propose a flexible framework for variable selection in time-de…
▽ More
Cox models with time-dependent coefficients and covariates are widely used in survival analysis. In high-dimensional settings, sparse regularization techniques are employed for variable selection, but existing methods for time-dependent Cox models lack flexibility in enforcing specific sparsity patterns (i.e., covariate structures). We propose a flexible framework for variable selection in time-dependent Cox models, accommodating complex selection rules. Our method can adapt to arbitrary grou** structures, including interaction selection, temporal, spatial, tree, and directed acyclic graph structures. It achieves accurate estimation with low false alarm rates. We develop the sox package, implementing a network flow algorithm for efficiently solving models with complex covariate structures. sox offers a user-friendly interface for specifying grou** structures and delivers fast computation. Through examples, including a case study on identifying predictors of time to all-cause death in atrial fibrillation patients, we demonstrate the practical application of our method with specific selection rules.
△ Less
Submitted 6 January, 2024; v1 submitted 21 June, 2023;
originally announced June 2023.
-
Clustered Archimax Copulas
Authors:
Simon Chatelain,
Samuel Perreault,
Johanna G. Nešlehová,
Anne-Laure Fougères
Abstract:
When modeling multivariate phenomena, properly capturing the joint extremal behavior is often one of the many concerns. Archimax copulas appear as successful candidates in case of asymptotic dependence. In this paper, the class of Archimax copulas is extended via their stochastic representation to a clustered construction. These clustered Archimax copulas are characterized by a partition of the ra…
▽ More
When modeling multivariate phenomena, properly capturing the joint extremal behavior is often one of the many concerns. Archimax copulas appear as successful candidates in case of asymptotic dependence. In this paper, the class of Archimax copulas is extended via their stochastic representation to a clustered construction. These clustered Archimax copulas are characterized by a partition of the random variables into groups linked by a radial copula; each cluster is Archimax and therefore defined by its own Archimedean generator and stable tail dependence function. The proposed extension allows for both asymptotic dependence and independence between the clusters, a property which is sought, for example, in applications in environmental sciences and finance. The model also inherits from the ability of Archimax copulas to capture dependence between variables at pre-extreme levels. The asymptotic behavior of the model is established, leading to a rich class of stable tail dependence functions.
△ Less
Submitted 27 October, 2022;
originally announced October 2022.
-
Integrating complex selection rules into the latent overlap** group Lasso for constructing coherent prediction models
Authors:
Guanbo Wang,
Sylvie Perreault,
Robert W. Platt,
Rui Wang,
Marc Dorais,
Mireille E. Schnitzer
Abstract:
The construction of coherent prediction models holds great importance in medical research as such models enable health researchers to gain deeper insights into disease epidemiology and clinicians to identify patients at higher risk of adverse outcomes. One commonly employed approach to develo** prediction models is variable selection through penalized regression techniques. Integrating natural v…
▽ More
The construction of coherent prediction models holds great importance in medical research as such models enable health researchers to gain deeper insights into disease epidemiology and clinicians to identify patients at higher risk of adverse outcomes. One commonly employed approach to develo** prediction models is variable selection through penalized regression techniques. Integrating natural variable structures into this process not only enhances model interpretability but can also %increase the likelihood of recovering the true underlying model and boost prediction accuracy. However, a challenge lies in determining how to effectively integrate potentially complex selection dependencies into the penalized regression. In this work, we demonstrate how to represent selection dependencies mathematically, provide algorithms for deriving the complete set of potential models, and offer a structured approach for integrating complex rules into variable selection through the latent overlap** group Lasso. To illustrate our methodology, we applied these techniques to construct a coherent prediction model for major bleeding in hypertensive patients recently hospitalized for atrial fibrillation and subsequently prescribed oral anticoagulants. In this application, we account for a proxy of anticoagulant adherence and its interaction with dosage and the type of oral anticoagulants in addition to drug-drug interactions.
△ Less
Submitted 15 January, 2024; v1 submitted 10 June, 2022;
originally announced June 2022.
-
Simultaneous computation of Kendall's tau and its jackknife variance
Authors:
Samuel Perreault
Abstract:
We present efficient algorithms for simultaneously computing Kendall's tau and the jackknife estimator of its variance. For the classical pairwise tau, we describe a modification of Knight's algorithm (originally designed to compute only tau) that does so while preserving its $O(n \log_2 n)$ runtime in the number of observations $n$. We also introduce a novel algorithm computing a multivariate ext…
▽ More
We present efficient algorithms for simultaneously computing Kendall's tau and the jackknife estimator of its variance. For the classical pairwise tau, we describe a modification of Knight's algorithm (originally designed to compute only tau) that does so while preserving its $O(n \log_2 n)$ runtime in the number of observations $n$. We also introduce a novel algorithm computing a multivariate extension of tau and its jackknife variance in $O(n \log_2^p n)$ time.
△ Less
Submitted 24 June, 2024; v1 submitted 8 June, 2022;
originally announced June 2022.
-
Hypothesis tests for structured rank correlation matrices
Authors:
Samuel Perreault,
Johanna Neslehova,
Thierry Duchesne
Abstract:
Joint modeling of a large number of variables often requires dimension reduction strategies that lead to structural assumptions of the underlying correlation matrix, such as equal pair-wise correlations within subsets of variables. The underlying correlation matrix is thus of interest for both model specification and model validation. In this paper, we develop tests of the hypothesis that the entr…
▽ More
Joint modeling of a large number of variables often requires dimension reduction strategies that lead to structural assumptions of the underlying correlation matrix, such as equal pair-wise correlations within subsets of variables. The underlying correlation matrix is thus of interest for both model specification and model validation. In this paper, we develop tests of the hypothesis that the entries of the Kendall rank correlation matrix are linear combinations of a smaller number of parameters. The asymptotic behavior of the proposed test statistics is investigated both when the dimension is fixed and when it grows with the sample size. We pay special attention to the restricted hypothesis of partial exchangeability, which contains full exchangeability as a special case. We show that under partial exchangeability, the test statistics and their large-sample distributions simplify, which leads to computational advantages and better performance of the tests. We propose various scalable numerical strategies for implementation of the proposed procedures, investigate their behavior through simulations and power calculations under local alternatives, and demonstrate their use on a real dataset of mean sea levels at various geographical locations.
△ Less
Submitted 23 November, 2021; v1 submitted 19 July, 2020;
originally announced July 2020.
-
Detection of Block-Exchangeable Structure in Large-Scale Correlation Matrices
Authors:
Samuel Perreault,
Thierry Duchesne,
Johanna G. Nešlehová
Abstract:
Correlation matrices are omnipresent in multivariate data analysis. When the number d of variables is large, the sample estimates of correlation matrices are typically noisy and conceal underlying dependence patterns. We consider the case when the variables can be grouped into K clusters with exchangeable dependence; this assumption is often made in applications, e.g., in finance and econometrics.…
▽ More
Correlation matrices are omnipresent in multivariate data analysis. When the number d of variables is large, the sample estimates of correlation matrices are typically noisy and conceal underlying dependence patterns. We consider the case when the variables can be grouped into K clusters with exchangeable dependence; this assumption is often made in applications, e.g., in finance and econometrics. Under this partial exchangeability condition, the corresponding correlation matrix has a block structure and the number of unknown parameters is reduced from d(d-1)/2 to at most K(K+1)/2. We propose a robust algorithm based on Kendall's rank correlation to identify the clusters without assuming the knowledge of K a priori or anything about the margins except continuity. The corresponding block-structured estimator performs considerably better than the sample Kendall rank correlation matrix when K < d. The new estimator can also be much more efficient in finite samples even in the unstructured case K = d, although there is no gain asymptotically. When the distribution of the data is elliptical, the results extend to linear correlation matrices and their inverses. The procedure is illustrated on financial stock returns.
△ Less
Submitted 24 October, 2018; v1 submitted 19 June, 2017;
originally announced June 2017.