-
Structuring, Sequencing, Staging, Selecting: the 4S method for the longitudinal analysis of multidimensional measurement scales in chronic diseases
Authors:
Tiphaine Saulnier,
Wassilios G. Meissner,
Margherita Fabbri,
Alexandra Foubert-Samier,
Cécile Proust-Lima
Abstract:
In clinical studies, measurement scales are often collected to report disease-related manifestations from clinician or patient perspectives. Their analysis can help identify relevant manifestations throughout the disease course, enhancing knowledge of disease progression and guiding clinicians in providing appropriate support. However, the analysis of measurement scales in health studies is not st…
▽ More
In clinical studies, measurement scales are often collected to report disease-related manifestations from clinician or patient perspectives. Their analysis can help identify relevant manifestations throughout the disease course, enhancing knowledge of disease progression and guiding clinicians in providing appropriate support. However, the analysis of measurement scales in health studies is not straightforward as made of repeated, ordinal, and potentially multidimensional item data. Their sum-score summaries may considerably reduce information and impend interpretation, their change over time occurs along clinical progression, and as many other longitudinal processes, their observation may be truncated by events. This work establishes a comprehensive strategy in four consecutive steps to leverage repeated data from multidimensional measurement scales. The 4S method successively (1) identifies the scale structure into subdimensions satisfying three calibration assumptions (unidimensionality, conditional independence, increasing monotonicity), (2) describes each subdimension progression using a joint latent process model which includes a continuous-time item response theory model for the longitudinal subpart, (3) aligns each subdimension's progression with disease stages through a projection approach, and (4) identifies the most informative items across disease stages using the Fisher's information. The method is comprehensively illustrated in multiple system atrophy (MSA), an alpha-synucleinopathy, with the analysis of daily activity and motor impairments over disease progression. The 4S method provides an effective and complete analytical strategy for any measurement scale repeatedly collected in health studies.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Continuous-time mediation analysis for repeatedly measured mediators and outcomes
Authors:
K. Le Bourdonnec,
L. Valeri,
C. Proust-Lima
Abstract:
Mediation analysis aims to decipher the underlying causal mechanisms between an exposure, an outcome, and intermediate variables called mediators. Initially developed for fixed-time mediator and outcome, it has been extended to the framework of longitudinal data by discretizing the assessment times of mediator and outcome. Yet, processes in play in longitudinal studies are usually defined in conti…
▽ More
Mediation analysis aims to decipher the underlying causal mechanisms between an exposure, an outcome, and intermediate variables called mediators. Initially developed for fixed-time mediator and outcome, it has been extended to the framework of longitudinal data by discretizing the assessment times of mediator and outcome. Yet, processes in play in longitudinal studies are usually defined in continuous time and measured at irregular and subject-specific visits. This is the case in dementia research when cerebral and cognitive changes measured at planned visits in cohorts are of interest. We thus propose a methodology to estimate the causal mechanisms between a time-fixed exposure ($X$), a mediator process ($\mathcal{M}_t$) and an outcome process ($\mathcal{Y}_t$) both measured repeatedly over time in the presence of a time-dependent confounding process ($\mathcal{L}_t$). We consider three types of causal estimands, the natural effects, path-specific effects and randomized interventional analogues to natural effects, and provide identifiability assumptions. We employ a dynamic multivariate model based on differential equations for their estimation. The performance of the methods are explored in simulations, and we illustrate the method in two real-world examples motivated by the 3C cerebral aging study to assess: (1) the effect of educational level on functional dependency through depressive symptomatology and cognitive functioning, and (2) the effect of a genetic factor on cognitive functioning potentially mediated by vascular brain lesions and confounded by neurodegeneration.
△ Less
Submitted 5 April, 2024; v1 submitted 16 March, 2024;
originally announced March 2024.
-
Functional principal component analysis as an alternative to mixed-effect models for describing sparse repeated measures in presence of missing data
Authors:
Corentin Ségalas,
Catherine Helmer,
Robin Genuer,
Cécile Proust-Lima
Abstract:
Analyzing longitudinal data in health studies is challenging due to sparse and error-prone measurements, strong within-individual correlation, missing data and various trajectory shapes. While mixed-effect models (MM) effectively address these challenges, they remain parametric models and may incur computational costs. In contrast, Functional Principal Component Analysis (FPCA) is a non-parametric…
▽ More
Analyzing longitudinal data in health studies is challenging due to sparse and error-prone measurements, strong within-individual correlation, missing data and various trajectory shapes. While mixed-effect models (MM) effectively address these challenges, they remain parametric models and may incur computational costs. In contrast, Functional Principal Component Analysis (FPCA) is a non-parametric approach developed for regular and dense functional data that flexibly describes temporal trajectories at a potentially lower computational cost. This paper presents an empirical simulation study evaluating the behaviour of FPCA with sparse and error-prone repeated measures and its robustness under different missing data schemes in comparison with MM. The results show that FPCA is well-suited in the presence of missing at random data caused by dropout, except in scenarios involving most frequent and systematic dropout. Like MM, FPCA fails under missing not at random mechanism. The FPCA was applied to describe the trajectories of four cognitive functions before clinical dementia and contrast them with those of matched controls in a case-control study nested in a population-based aging cohort. The average cognitive declines of future dementia cases showed a sudden divergence from those of their matched controls with a sharp acceleration 5 to 2.5 years prior to diagnosis.
△ Less
Submitted 10 July, 2024; v1 submitted 16 February, 2024;
originally announced February 2024.
-
Patient-perceived progression in multiple system atrophy: natural history of quality of life
Authors:
Tiphaine Saulnier,
Margherita Fabbri,
Mélanie Le Goff,
Catherine Helmer,
Anne Pavy-Le Traon,
Wassilios G. Meissner,
Olivier Rascol,
Cécile Proust-Lima,
Alexandra Foubert-Samier
Abstract:
Health-related quality of life (Hr-QoL) scales provide crucial information on neurodegenerative disease progression, help improving patient care, and constitute a meaningful endpoint for therapeutic research. However, Hr-QoL progression is usually poorly documented, as for multiple system atrophy (MSA), a rare and rapidly progressing alpha-synucleinopathy. This work aimed to describe Hr-QoL progre…
▽ More
Health-related quality of life (Hr-QoL) scales provide crucial information on neurodegenerative disease progression, help improving patient care, and constitute a meaningful endpoint for therapeutic research. However, Hr-QoL progression is usually poorly documented, as for multiple system atrophy (MSA), a rare and rapidly progressing alpha-synucleinopathy. This work aimed to describe Hr-QoL progression during the natural course of MSA, explore disparities between patients, and identify informative items using a four-step statistical strategy.We leveraged the data of the French MSA cohort comprising annual assessments with the MSA-QoL questionnaire for more than 500 patients over up to 11 years. The four-step strategy (1) determined the subdimensions of Hr-QoL in MSA; (2) modelled the subdimension trajectories over time, accounting for the risk of death; (3) mapped the sequence of item impairments with disease stages; and (4) identified the most informative items specific to each disease stage.Among the 536 patients included, 50% were women and they were aged on average 65.1 years old at entry. Among them, 63.1% died during the follow-up. Four dimensions were identified. In addition to the original motor, nonmotor, and emotional domains, an oropharyngeal component was highlighted. While the motor and oropharyngeal domains deteriorated rapidly, the nonmotor and emotional aspects were already slightly to moderately impaired at cohort entry and deteriorated slowly over the course of the disease. Impairments were associated with sex, diagnosis subtype, and delay since symptom onset. Except for the emotional domain, each dimension was driven by key identified items.Hr-QoL is a multidimensional concept that deteriorates progressively over the course of MSA and brings essential knowledge for improving patient care. As exemplified with MSA, the thorough description of Hr-QoL using the 4-step original analysis can provide new perspectives on neurodegenerative diseases' management to ultimately deliver better support focused on the patient's perspective.
△ Less
Submitted 22 September, 2023;
originally announced September 2023.
-
Random Forests for time-fixed and time-dependent predictors: The DynForest R package
Authors:
Anthony Devaux,
Cécile Proust-Lima,
Robin Genuer
Abstract:
The R package DynForest implements random forests for predicting a continuous, a categorical or a (multiple causes) time-to-event outcome based on time-fixed and time-dependent predictors. The main originality of DynForest is that it handles time-dependent predictors that can be endogeneous (i.e., impacted by the outcome process), measured with error and measured at subject-specific times. At each…
▽ More
The R package DynForest implements random forests for predicting a continuous, a categorical or a (multiple causes) time-to-event outcome based on time-fixed and time-dependent predictors. The main originality of DynForest is that it handles time-dependent predictors that can be endogeneous (i.e., impacted by the outcome process), measured with error and measured at subject-specific times. At each recursive step of the tree building process, the time-dependent predictors are internally summarized into individual features on which the split can be done. This is achieved using flexible linear mixed models (thanks to the R package lcmm) which specification is pre-specified by the user. DynForest returns the mean for continuous outcome, the category with a majority vote for categorical outcome or the cumulative incidence function over time for survival outcome. DynForest also computes variable importance and minimal depth to inform on the most predictive variables or groups of variables. This paper aims to guide the user with step-by-step examples for fitting random forests using DynForest.
△ Less
Submitted 11 April, 2024; v1 submitted 6 February, 2023;
originally announced February 2023.
-
Disease progression model anchored around clinical diagnosis in longitudinal cohorts: example of Alzheimer's disease and related dementia
Authors:
Jérémie Lespinasse,
Carole Dufouil,
Cécile Proust-Lima,
the MEMENTO study group
Abstract:
Background. Alzheimer's disease and related dementia (ADRD) are characterized by multiple and progressive anatomo clinical changes. Yet, modeling changes over disease course from cohort data is challenging as the usual timescales are inappropriate and time-to-clinical diagnosis is available on small subsamples of participants with short follow-up durations prior to diagnosis. One solution to circu…
▽ More
Background. Alzheimer's disease and related dementia (ADRD) are characterized by multiple and progressive anatomo clinical changes. Yet, modeling changes over disease course from cohort data is challenging as the usual timescales are inappropriate and time-to-clinical diagnosis is available on small subsamples of participants with short follow-up durations prior to diagnosis. One solution to circumvent this challenge is to define the disease time as a latent variable.
Methods: We developed a multivariate mixed model approach that realigns individual trajectories into the latent disease time to describe disease progression. Our methodology exploits the clinical diagnosis information as a partially observed and approximate reference to guide the estimation of the latent disease time. The model estimation was carried out in the Bayesian Framework using Stan. We applied the methodology to 2186 participants of the MEMENTO study with 5-year follow-up. Repeated measures of 12 ADRD markers stemmed from cerebrospinal fluid (CSF), brain imaging and cognitive tests were analyzed.
Result: The estimated latent disease time spanned over twenty years before the clinical diagnosis. Considering the profile of a woman aged 70 with a high level of education and APOE4 carrier (the main genetic risk factor for ADRD), CSF markers of tau proteins accumulation preceded markers of brain atrophy by 5 years and cognitive decline by 10 years. We observed that individual characteristics could substantially modify the sequence and timing of these changes.
Conclusion: Our disease progression model does not only realign trajectories into the most homogeneous way. It accounts for the inherent residual inter-individual variability in dementia progression to describe the long-term changes according to the years preceding clinical diagnosis, and to provide clinically meaningful information on the sequence of events.
△ Less
Submitted 28 January, 2023;
originally announced January 2023.
-
Analysis of the 24-Hour Activity Cycle: An illustration examining the association with cognitive function in the Adult Changes in Thought (ACT) Study
Authors:
Yinxiang Wu,
Dori E. Rosenberg,
Mikael Anne Greenwood-Hickman,
Susan M. McCurry,
Cecile Proust-Lima,
Jennifer C. Nelson,
Paul K. Crane,
Andrea Z. LaCroix,
Eric B. Larson,
Pamela A. Shaw
Abstract:
The 24-hour activity cycle (24HAC) is a new paradigm for studying activity behaviors in relation to health outcomes. This approach captures the interrelatedness of the daily time spent in physical activity (PA), sedentary behavior (SB), and sleep. We illustrate and compare the use of three popular approaches, namely isotemporal substitution model (ISM), compositional data analysis (CoDA), and late…
▽ More
The 24-hour activity cycle (24HAC) is a new paradigm for studying activity behaviors in relation to health outcomes. This approach captures the interrelatedness of the daily time spent in physical activity (PA), sedentary behavior (SB), and sleep. We illustrate and compare the use of three popular approaches, namely isotemporal substitution model (ISM), compositional data analysis (CoDA), and latent profile analysis (LPA) for modeling outcome associations with the 24HAC. We apply these approaches to assess an association with a cognitive outcome, measured by CASI item response theory (IRT) score, in a cohort of 1034 older adults (mean [range] age = 77 [65-100]; 55.8% female; 90% White) who were part of the Adult Changes in Thought (ACT) Activity Monitoring (ACT-AM) sub-study. PA and SB were assessed with thigh-worn activPAL accelerometers for 7 days. We highlight differences in assumptions between the three approaches, discuss statistical challenges, and provide guidance on interpretation and selecting an appropriate approach. ISM is easiest to apply and interpret; however, the typical ISM model assumes a linear association. CoDA specifies a non-linear association through isometric logratio transformations that are more challenging to apply and interpret. LPA can classify individuals into groups with similar time-use patterns. Inference on associations of latent profiles with health outcomes need to account for the uncertainty of the LPA classifications which is often ignored. The selection of the most appropriate method should be guided by the scientific questions of interest and the applicability of each model's assumptions. The analytic results did not suggest that less time spent on SB and more in PA was associated with better cognitive function. Further research is needed into the health implications of the distinct 24HAC patterns identified in this cohort.
△ Less
Submitted 19 January, 2023;
originally announced January 2023.
-
Random survival forests with multivariate longitudinal endogenous covariates
Authors:
Anthony Devaux,
Catherine Helmer,
Robin Genuer,
Cécile Proust-Lima
Abstract:
Predicting the individual risk of a clinical event using the complete patient history is still a major challenge for personalized medicine. Among the methods developed to compute individual dynamic predictions, the joint models have the assets of using all the available information while accounting for dropout. However, they are restricted to a very small number of longitudinal predictors. Our obj…
▽ More
Predicting the individual risk of a clinical event using the complete patient history is still a major challenge for personalized medicine. Among the methods developed to compute individual dynamic predictions, the joint models have the assets of using all the available information while accounting for dropout. However, they are restricted to a very small number of longitudinal predictors. Our objective was to propose an innovative alternative solution to predict an event probability using a possibly large number of longitudinal predictors. We developed DynForest, an extension of competing-risk random survival forests that handles endogenous longitudinal predictors. At each node of the tree, the time-dependent predictors are translated into time-fixed features (using mixed models) to be used as candidates for splitting the subjects into two subgroups. The individual event probability is estimated in each tree by the Aalen-Johansen estimator of the leaf in which the subject is classified according to his/her history of predictors. The final individual prediction is given by the average of the tree-specific individual event probabilities. We carried out a simulation study to demonstrate the performances of DynForest both in a small dimensional context (in comparison with joint models) and in a large dimensional context (in comparison with a regression calibration method that ignores informative dropout). We also applied DynForest to (i) predict the individual probability of dementia in the elderly according to repeated measures of cognitive, functional, vascular and neuro-degeneration markers, and (ii) quantify the importance of each type of markers for the prediction of dementia. Implemented in the R package DynForest, our methodology provides a novel and appropriate solution for the prediction of events from any number of longitudinal endogenous predictors.
△ Less
Submitted 9 February, 2023; v1 submitted 11 August, 2022;
originally announced August 2022.
-
Fast and flexible inference for joint models of multivariate longitudinal and survival data using Integrated Nested Laplace Approximations
Authors:
Denis Rustand,
Janet van Niekerk,
Elias Teixeira Krainski,
Håvard Rue,
Cécile Proust-Lima
Abstract:
Modeling longitudinal and survival data jointly offers many advantages such as addressing measurement error and missing data in the longitudinal processes, understanding and quantifying the association between the longitudinal markers and the survival events and predicting the risk of events based on the longitudinal markers. A joint model involves multiple submodels (one for each longitudinal/sur…
▽ More
Modeling longitudinal and survival data jointly offers many advantages such as addressing measurement error and missing data in the longitudinal processes, understanding and quantifying the association between the longitudinal markers and the survival events and predicting the risk of events based on the longitudinal markers. A joint model involves multiple submodels (one for each longitudinal/survival outcome) usually linked together through correlated or shared random effects. Their estimation is computationally expensive (particularly due to a multidimensional integration of the likelihood over the random effects distribution) so that inference methods become rapidly intractable, and restricts applications of joint models to a small number of longitudinal markers and/or random effects. We introduce a Bayesian approximation based on the Integrated Nested Laplace Approximation algorithm implemented in the R package R-INLA to alleviate the computational burden and allow the estimation of multivariate joint models with fewer restrictions. Our simulation studies show that R-INLA substantially reduces the computation time and the variability of the parameter estimates compared to alternative estimation strategies. We further apply the methodology to analyze 5 longitudinal markers (3 continuous, 1 count, 1 binary, and 16 random effects) and competing risks of death and transplantation in a clinical trial on primary biliary cholangitis. R-INLA provides a fast and reliable inference technique for applying joint models to the complex multivariate data encountered in health research.
△ Less
Submitted 12 July, 2023; v1 submitted 11 March, 2022;
originally announced March 2022.
-
Describing complex disease progression using joint latent class models for multivariate longitudinal markers and clinical endpoints
Authors:
Cécile Proust-Lima,
Tiphaine Saulnier,
Viviane Philipps,
Anne Pavy-Le Traon,
Patrice Péran,
Olivier Rascol,
Wassilios G Meissner,
Alexandra Foubert-Samier
Abstract:
Neurodegenerative diseases are characterized by numerous markers of progression and clinical endpoints. For instance, Multiple System Atrophy (MSA), a rare neurodegenerative synucleinopathy, is characterized by various combinations of progressive autonomic failure and motor dysfunction, and a very poor prognosis. Describing the progression of such complex and multi-dimensional diseases is particul…
▽ More
Neurodegenerative diseases are characterized by numerous markers of progression and clinical endpoints. For instance, Multiple System Atrophy (MSA), a rare neurodegenerative synucleinopathy, is characterized by various combinations of progressive autonomic failure and motor dysfunction, and a very poor prognosis. Describing the progression of such complex and multi-dimensional diseases is particularly difficult. One has to simultaneously account for the assessment of multivariate markers over time, the occurrence of clinical endpoints, and a highly suspected heterogeneity between patients. Yet, such description is crucial for understanding the natural history of the disease, staging patients diagnosed with the disease, unravelling subphenotypes, and predicting the prognosis. Through the example of MSA progression, we show how a latent class approach modeling multiple repeated markers and clinical endpoints can help describe complex disease progression and identify subphenotypes for exploring new pathological hypotheses. The proposed joint latent class model includes class-specific multivariate mixed models to handle multivariate repeated biomarkers possibly summarized into latent dimensions and class-and-cause-specific proportional hazard models to handle time-to-event data. Maximum likelihood estimation procedure, validated through simulations is available in the lcmm R package. In the French MSA cohort comprising data of 598 patients during up to 13 years, five subphenotypes of MSA were identified that differ by the sequence and shape of biomarkers degradation, and the associated risk of death. In posterior analyses, the five subphenotypes were used to explore the association between clinical progression and external imaging and fluid biomarkers, while properly accounting for the uncertainty in the subphenotypes membership.
△ Less
Submitted 31 January, 2023; v1 submitted 10 February, 2022;
originally announced February 2022.
-
Joint models for the longitudinal analysis of measurement scales in the presence of informative dropout
Authors:
Tiphaine Saulnier,
Viviane Philipps,
Wassilios G Meissner,
Olivier Rascol,
Anne Pavy-Le Traon,
Alexandra Foubert-Samier,
Cécile Proust-Lima
Abstract:
In health cohort studies, repeated measures of markers are often used to describe the natural history of a disease. Joint models allow to study their evolution by taking into account the possible informative dropout usually due to clinical events. However, joint modeling developments mostly focused on continuous Gaussian markers while, in an increasing number of studies, the actual quantity of int…
▽ More
In health cohort studies, repeated measures of markers are often used to describe the natural history of a disease. Joint models allow to study their evolution by taking into account the possible informative dropout usually due to clinical events. However, joint modeling developments mostly focused on continuous Gaussian markers while, in an increasing number of studies, the actual quantity of interest is non-directly measurable; it constitutes a latent variable evaluated by a set of observed indicators from questionnaires or measurement scales. Classical examples include anxiety, fatigue, cognition. In this work,we explain how joint models can be extended to the framework of a latent quantity measured over time by indicators of different nature (e.g. continuous, binary, ordinal). The longitudinal submodel describes the evolution over time of the quantity of interest defined as a latent process in a structural mixed model, and links the latent process to each observation of the indicators through appropriate measurement models. Simultaneously, the risk of multi-cause event is modelled via a proportional cause-specific hazard model that includes a function of the mixed model elements as linear predictor to take into account the association between the latent process and the risk of event. Estimation, carried out in the maximum likelihood framework and implemented in the R-package JLPM, has been validated by simulations. The methodology is illustrated in the French cohort on Multiple-System Atrophy (MSA), a rare and fatal neurodegenerative disease, with the study of dysphagia progression over time stopped by the occurrence of death.
△ Less
Submitted 31 March, 2022; v1 submitted 6 October, 2021;
originally announced October 2021.
-
Modeling repeated self-reported outcome data: a continuous-time longitudinal Item Response Theory model
Authors:
Cécile Proust-Lima,
Viviane Philipps,
Bastien Perrot,
Myriam Blanchin,
Véronique Sébille
Abstract:
Item Response Theory (IRT) models have received growing interest in health science for analyzing latent constructs such as depression, anxiety, quality of life, or cognitive functioning from the information provided by each individual's items responses. However, in the presence of repeated item measures, IRT methods usually assume that the measurement occasions are made at the exact same time for…
▽ More
Item Response Theory (IRT) models have received growing interest in health science for analyzing latent constructs such as depression, anxiety, quality of life, or cognitive functioning from the information provided by each individual's items responses. However, in the presence of repeated item measures, IRT methods usually assume that the measurement occasions are made at the exact same time for all patients. In this paper, we show how the IRT methodology can be combined with the mixed model theory to provide a longitudinal IRT model which exploits the information of a measurement scale provided at the item level while simultaneously handling observation times that may vary across individuals and items. The latent construct is a latent process defined in continuous time that is linked to the observed item responses through a measurement model at each individual- and occasion-specific observation time; we focus here on a Graded Response Model for binary and ordinal items. The Maximum Likelihood Estimation procedure of the model is available in the R package lcmm. The proposed approach is contextualized in a clinical example in end-stage renal disease, the PREDIALA study. The objective is to study the trajectories of depressive symptomatology (as measured by 7 items of the Hospital Anxiety and Depression scale) according to the time from registration on the renal transplant waiting list and the renal replacement therapy. We also illustrate how the method can be used to assess Differential Item Functioning and lack of measurement invariance over time.
△ Less
Submitted 28 December, 2021; v1 submitted 27 September, 2021;
originally announced September 2021.
-
A multistate approach for mediation analysis in the presence of semi-competing risks with application in cancer survival disparities
Authors:
Linda Valeri,
Cécile Proust-Lima,
Weijia Fan,
Jarvis T. Chen,
Hélène Jacqmin-Gadda
Abstract:
We propose a novel methodology to quantify the effect of stochastic interventions on non-terminal time-to-events that lie on the pathway between an exposure and a terminal time-to-event outcome. Investigating these effects is particularly important in health disparities research when we seek to quantify inequities in timely delivery of treatment and its impact on patients survival time. Current ap…
▽ More
We propose a novel methodology to quantify the effect of stochastic interventions on non-terminal time-to-events that lie on the pathway between an exposure and a terminal time-to-event outcome. Investigating these effects is particularly important in health disparities research when we seek to quantify inequities in timely delivery of treatment and its impact on patients survival time. Current approaches fail to account for semi-competing risks arising in this setting. Under the potential outcome framework, we define and provide identifiability conditions for causal estimands for stochastic direct and indirect effects. Causal contrasts are estimated in continuous time within a multistate modeling framework and analytic formulae for the estimators of the causal contrasts are developed. We show via simulations that ignoring censoring in mediator and or outcome time-to-event processes, or ignoring competing risks may give misleading results. This work demonstrates that rigorous definition of the direct and indirect effects and joint estimation of the outcome and mediator time-to-event distributions in the presence of semi-competing risks are crucial for valid investigation of mechanisms in continuous time. We employ this novel methodology to investigate the role of delaying treatment uptake in explaining racial disparities in cancer survival in a cohort study of colon cancer patients.
△ Less
Submitted 25 February, 2021;
originally announced February 2021.
-
Individual dynamic prediction of clinical endpoint from large dimensional longitudinal biomarker history: a landmark approach
Authors:
Anthony Devaux,
Robin Genuer,
Karine Pérès,
Cécile Proust-Lima
Abstract:
The individual data collected throughout patient follow-up constitute crucial information for assessing the risk of a clinical event, and eventually for adapting a therapeutic strategy. Joint models and landmark models have been proposed to compute individual dynamic predictions from repeated measures to one or two markers. However, they hardly extend to the case where the complete patient history…
▽ More
The individual data collected throughout patient follow-up constitute crucial information for assessing the risk of a clinical event, and eventually for adapting a therapeutic strategy. Joint models and landmark models have been proposed to compute individual dynamic predictions from repeated measures to one or two markers. However, they hardly extend to the case where the complete patient history includes much more repeated markers possibly. Our objective was thus to propose a solution for the dynamic prediction of a health event that may exploit repeated measures of a possibly large number of markers. We combined a landmark approach extended to endogenous markers history with machine learning methods adapted to survival data. Each marker trajectory is modeled using the information collected up to landmark time, and summary variables that best capture the individual trajectories are derived. These summaries and additional covariates are then included in different prediction methods. To handle a possibly large dimensional history, we rely on machine learning methods adapted to survival data, namely regularized regressions and random survival forests, to predict the event from the landmark time, and we show how they can be combined into a superlearner. Then, the performances are evaluated by cross-validation using estimators of Brier Score and the area under the Receiver Operating Characteristic curve adapted to censored data. We demonstrate in a simulation study the benefits of machine learning survival methods over standard survival models, especially in the case of numerous and/or nonlinear relationships between the predictors and the event. We then applied the methodology in two prediction contexts: a clinical context with the prediction of death for patients with primary biliary cholangitis, and a public health context with the prediction of death in the general elderly population at different ages. Our methodology, implemented in R, enables the prediction of an event using the entire longitudinal patient history, even when the number of repeated markers is large. Although introduced with mixed models for the repeated markers and methods for a single right censored time-to-event, our method can be used with any other appropriate modeling technique for the markers and can be easily extended to competing risks setting.
△ Less
Submitted 21 January, 2022; v1 submitted 2 February, 2021;
originally announced February 2021.
-
Robust and Efficient Optimization Using a Marquardt-Levenberg Algorithm with R Package marqLevAlg
Authors:
Viviane Philipps,
Boris P Hejblum,
Mélanie Prague,
Daniel Commenges,
Cécile Proust-Lima
Abstract:
Implementations in R of classical general-purpose algorithms for local optimization generally have two major limitations which cause difficulties in applications to complex problems: too loose convergence criteria and too long calculation time. By relying on a Marquardt-Levenberg algorithm (MLA), a Newton-like method particularly robust for solving local optimization problems, we provide with marq…
▽ More
Implementations in R of classical general-purpose algorithms for local optimization generally have two major limitations which cause difficulties in applications to complex problems: too loose convergence criteria and too long calculation time. By relying on a Marquardt-Levenberg algorithm (MLA), a Newton-like method particularly robust for solving local optimization problems, we provide with marqLevAlg package an efficient and general-purpose local optimizer which (i) prevents convergence to saddle points by using a stringent convergence criterion based on the relative distance to minimum/maximum in addition to the stability of the parameters and of the objective function; and (ii) reduces the computation time in complex settings by allowing parallel calculations at each iteration. We demonstrate through a variety of cases from the literature that our implementation reliably and consistently reaches the optimum (even when other optimizers fail), and also largely reduces computational time in complex settings through the example of maximum likelihood estimation of different sophisticated statistical models.
△ Less
Submitted 26 November, 2021; v1 submitted 8 September, 2020;
originally announced September 2020.
-
Time-varying exposure history and subsequent health outcomes: a two-stage approach to identify critical windows
Authors:
Maude Wagner,
Francine Grodstein,
Karen Leffondre,
Cécilia Samieri,
Cécile Proust-Lima
Abstract:
Long-term behavioral and health risk factors constitute a primary focus of research on the etiology of chronic diseases. Yet, identifying critical time-windows during which risk factors have the strongest impact on disease risk is challenging. To assess the trajectory of association of an exposure history with an outcome, the weighted cumulative exposure index (WCIE) has been proposed, with weight…
▽ More
Long-term behavioral and health risk factors constitute a primary focus of research on the etiology of chronic diseases. Yet, identifying critical time-windows during which risk factors have the strongest impact on disease risk is challenging. To assess the trajectory of association of an exposure history with an outcome, the weighted cumulative exposure index (WCIE) has been proposed, with weights reflecting the relative importance of exposures at different times. However, WCIE is restricted to a complete observed error-free exposure whereas exposures are often measured with intermittent missingness and error. Moreover, it rarely explores exposure history that is very distant from the outcome as usually sought in life-course epidemiology. We extend the WCIE methodology to (i) exposures that are intermittently measured with error, and (ii) contexts where the exposure time-window precedes the outcome time-window using a landmark approach. First, the individual exposure history up to the landmark time is estimated using a mixed model that handles missing data and error in exposure measurement, and the predicted complete error-free exposure history is derived. Then the WCIE methodology is applied to assess the trajectory of association between the predicted exposure history and the health outcome collected after the landmark time. In our context, the health outcome is a longitudinal marker analyzed using a mixed model. A simulation study first demonstrates the correct inference obtained with this approach. Then, applied to the Nurses' Health Study (19,415 women) to investigate the association between BMI history (collected from midlife) and subsequent cognitive decline after age 70. In conclusion, this approach, easy to implement, provides a flexible tool for studying complex dynamic relationships and identifying critical time windows while accounting for exposure measurement errors.
△ Less
Submitted 25 February, 2021; v1 submitted 26 August, 2020;
originally announced August 2020.
-
Dynamic Modelling of Multivariate Dimensions and Their Temporal Relationships using Latent Processes: Application to Alzheimer's Disease
Authors:
Bachirou O. Taddé,
Hélène Jacqmin-Gadda,
Jean-François Dartigues,
Daniel Commenges,
Cécile Proust-Lima
Abstract:
Alzheimer's disease gradually affects several components including the cerebral dimension with brain atrophies, the cognitive dimension with a decline in various functions and the functional dimension with impairment in the daily living activities. Understanding how such dimensions interconnect is crucial for AD research. However it requires to simultaneously capture the dynamic and multidimension…
▽ More
Alzheimer's disease gradually affects several components including the cerebral dimension with brain atrophies, the cognitive dimension with a decline in various functions and the functional dimension with impairment in the daily living activities. Understanding how such dimensions interconnect is crucial for AD research. However it requires to simultaneously capture the dynamic and multidimensional aspects, and to explore temporal relationships between dimensions. We propose an original dynamic structural model that accounts for all these features. The model defines dimensions as latent processes and combines a multivariate linear mixed model and a system of difference equations to model trajectories and temporal relationships between latent processes in finely discrete time. Dimensions are simultaneously related to their observed (possibly multivariate) markers through nonlinear equations of observation. Parameters are estimated in the maximum likelihood framework enjoying a closed form for the likelihood. We demonstrate in a simulation study that this dynamic model in discrete time benefits the same causal interpretation of temporal relationships as models defined in continuous time as long as the discretization step remains small. The model is then applied to the data of the Alzheimer's Disease Neuroimaging Initiative. Three longitudinal dimensions (cerebral anatomy, cognitive ability and functional autonomy) measured by 6 markers are analyzed and their temporal structure is contrasted between different clinical stages of Alzheimer's disease. Keywords: causality, difference equations, latent process, longitudinal data, mixed models, multivariate data.
△ Less
Submitted 14 October, 2019; v1 submitted 10 June, 2018;
originally announced June 2018.
-
A joint model for multiple dynamic processes and clinical endpoints: application to Alzheimer's disease
Authors:
Cécile Proust-Lima,
Viviane Philipps,
Jean-François Dartigues
Abstract:
As other neurodegenerative diseases, Alzheimer's disease, the most frequent dementia in the elderly, is characterized by multiple progressive impairments in the brain structure and in clinical functions such as cognitive functioning and functional disability. Until recently, these components were mostly studied independently since no joint model for multivariate longitudinal data and time to event…
▽ More
As other neurodegenerative diseases, Alzheimer's disease, the most frequent dementia in the elderly, is characterized by multiple progressive impairments in the brain structure and in clinical functions such as cognitive functioning and functional disability. Until recently, these components were mostly studied independently since no joint model for multivariate longitudinal data and time to event was available in the statistical community. Yet, these components are fundamentally inter-related in the degradation process towards dementia and should be analyzed together. We thus propose a joint model to simultaneously describe the dynamics of multiple correlated components. Each component, defined as a latent process, is measured by one or several continuous markers (not necessarily Gaussian). Rather than considering the associated time to diagnosis as in standard joint models, we assume diagnosis corresponds to the passing above a covariate-specific threshold (to be estimated) of a pathological process which is modelled as a combination of the component-specific latent processes. This definition captures the clinical complexity of diagnoses such as dementia diagnosis but also benefits from simplifications for the computation of Maximum Likelihood Estimates. We show that the model and estimation procedure can also handle competing clinical endpoints. The estimation procedure, implemented in a R package, is validated by simulations and the method is illustrated on a large French population-based cohort of cerebral aging in which we focused on the dynamics of three clinical manifestations and the associated risk of dementia and death before dementia.
△ Less
Submitted 27 May, 2019; v1 submitted 27 March, 2018;
originally announced March 2018.
-
Individual dynamic predictions using landmarking and joint modelling: validation of estimators and robustness assessment
Authors:
Loïc Ferrer,
Hein Putter,
Cécile Proust-Lima
Abstract:
After the diagnosis of a disease, one major objective is to predict cumulative probabilities of events such as clinical relapse or death from the individual information collected up to a prediction time, including usually biomarker repeated measurements. Several competing estimators have been proposed to calculate these individual dynamic predictions, mainly from two approaches: joint modelling an…
▽ More
After the diagnosis of a disease, one major objective is to predict cumulative probabilities of events such as clinical relapse or death from the individual information collected up to a prediction time, including usually biomarker repeated measurements. Several competing estimators have been proposed to calculate these individual dynamic predictions, mainly from two approaches: joint modelling and landmarking. These approaches differ by the information used, the model assumptions and the complexity of the computational procedures. It is essential to properly validate the estimators derived from joint models and landmark models, quantify their variability and compare them in order to provide key elements for the development and use of individual dynamic predictions in clinical follow-up of patients. Motivated by the prediction of two competing causes of progression of prostate cancer from the history of prostate-specific antigen, we conducted an in-depth simulation study to validate and compare the dynamic predictions derived from these two methods. Specifically, we formally defined the quantity to estimate and its estimators, proposed techniques to assess the uncertainty around predictions and validated them. We also compared the individual dynamic predictions derived from joint models and landmark models in terms of prediction error, discriminatory power, efficiency and robustness to model assumptions. We show that these prediction tools should be handled with care, in particular by properly specifying models and estimators.
△ Less
Submitted 7 August, 2018; v1 submitted 12 July, 2017;
originally announced July 2017.
-
Joint modelling of longitudinal and multi-state processes: application to clinical progressions in prostate cancer
Authors:
Loïc Ferrer,
Virginie Rondeau,
James J. Dignam,
Tom Pickles,
Hélène Jacqmin-Gadda,
Cécile Proust-Lima
Abstract:
Joint modelling of longitudinal and survival data is increasingly used in clinical trials on cancer. In prostate cancer for example, these models permit to account for the link between longitudinal measures of prostate-specific antigen (PSA) and the time of clinical recurrence when studying the risk of relapse. In practice, multiple types of relapse may occur successively. Distinguishing these tra…
▽ More
Joint modelling of longitudinal and survival data is increasingly used in clinical trials on cancer. In prostate cancer for example, these models permit to account for the link between longitudinal measures of prostate-specific antigen (PSA) and the time of clinical recurrence when studying the risk of relapse. In practice, multiple types of relapse may occur successively. Distinguishing these transitions between health states would allow to evaluate, for example, how PSA trajectory and classical covariates impact the risk of dying after a distant recurrence post-radiotherapy, or to predict the risk of one specific type of clinical recurrence post-radiotherapy, from the PSA history. In this context, we present a joint model for a longitudinal process and a multi-state process which is divided into two sub-models: a linear mixed sub-model for longitudinal data, and a multi-state sub-model with proportional hazards for transition times, both linked by shared random effects. Parameters of this joint multi-state model are estimated within the maximum likelihood framework using an EM algorithm coupled to a quasi-Newton algorithm in case of slow convergence. It is implemented under R, by combining and extending the mstate and JM packages. The estimation program is validated by simulations and applied on pooled data from two cohorts of men with localized prostate cancer and treated by radiotherapy. Thanks to the classical covariates available at baseline and the PSA measurements collected repeatedly during the follow-up, we are able to assess the biomarker's trajectory, define the risks of transitions between health states, and quantify the impact of the PSA dynamics on each transition intensity.
△ Less
Submitted 29 June, 2015; v1 submitted 24 June, 2015;
originally announced June 2015.
-
Joint latent class model for longitudinal data and interval-censored semi-competing events: Application to dementia
Authors:
Anaïs Rouanet,
Pierre Joly,
Jean-François Dartigues,
Cécile Proust-Lima,
Hélène Jacqmin-Gadda
Abstract:
Joint models are used in ageing studies to investigate the association between longitudinal markers and a time-to-event, and have been extended to multiple markers and/or competing risks. The competing risk of death must be considered in the elderly because death and dementia have common risk factors. Moreover, in cohort studies, time-to-dementia is interval-censored because dementia is only asses…
▽ More
Joint models are used in ageing studies to investigate the association between longitudinal markers and a time-to-event, and have been extended to multiple markers and/or competing risks. The competing risk of death must be considered in the elderly because death and dementia have common risk factors. Moreover, in cohort studies, time-to-dementia is interval-censored because dementia is only assessed intermittently. So subjects can become demented and die between two follow-up visits without being diagnosed. To study pre-dementia cognitive decline, we propose a joint latent class model combining a (possibly multivariate) mixed model and an illness-death model handling both interval censoring (by accounting for a possible unobserved transition to dementia) and semi-competing risks. Parameters are estimated by maximum likelihood handling interval censoring. The correlation between the marker and the times-to-events is captured by latent classes, homogeneous groups with specific risks of death and dementia and profiles of cognitive decline. We propose markovian and semi-markovian versions. Both approaches are compared to a joint latent class model for standard competing risks through a simulation study, and then applied in a prospective cohort study of cerebral and functional ageing to distinguish different profiles of cognitive decline associated with risks of dementia and death. The comparison highlights that among demented subjects, mortality depends more on age than duration of dementia. This model distinguishes the so-called terminal pre-death decline (among non-demented subjects) from the pre-dementia decline.
△ Less
Submitted 24 June, 2015;
originally announced June 2015.
-
Estimation of extended mixed models using latent classes and latent processes: the R package lcmm
Authors:
Cécile Proust-Lima,
Viviane Philipps,
Benoit Liquet
Abstract:
The R package lcmm provides a series of functions to estimate statistical models based on linear mixed model theory. It includes the estimation of mixed models and latent class mixed models for Gaussian longitudinal outcomes (hlme), curvilinear and ordinal univariate longitudinal outcomes (lcmm) and curvilinear multivariate outcomes (multlcmm), as well as joint latent class mixed models (Jointlcmm…
▽ More
The R package lcmm provides a series of functions to estimate statistical models based on linear mixed model theory. It includes the estimation of mixed models and latent class mixed models for Gaussian longitudinal outcomes (hlme), curvilinear and ordinal univariate longitudinal outcomes (lcmm) and curvilinear multivariate outcomes (multlcmm), as well as joint latent class mixed models (Jointlcmm) for a (Gaussian or curvilinear) longitudinal outcome and a time-to-event that can be possibly left-truncated right-censored and defined in a competing setting. Maximum likelihood esimators are obtained using a modified Marquardt algorithm with strict convergence criteria based on the parameters and likelihood stability, and on the negativity of the second derivatives. The package also provides various post-fit functions including goodness-of-fit analyses, classification, plots, predicted trajectories, individual dynamic prediction of the event and predictive accuracy assessment. This paper constitutes a companion paper to the package by introducing each family of models, the estimation technique, some implementation details and giving examples through a dataset on cognitive aging.
△ Less
Submitted 24 January, 2016; v1 submitted 3 March, 2015;
originally announced March 2015.
-
Joint modelling of repeated multivariate cognitive measures and competing risks of dementia and death: a latent process and latent class approach
Authors:
Cécile Proust-Lima,
Jean-François Dartigues,
Hélène Jacqmin-Gadda
Abstract:
Joint models initially dedicated to a single longitudinal marker and a single time-to-event need to be extended to account for the rich longitudinal data of cohort studies. Multiple causes of clinical progression are indeed usually observed, and multiple longitudinal markers are collected when the true latent trait of interest is hard to capture (e.g. quality of life, functional dependency, cognit…
▽ More
Joint models initially dedicated to a single longitudinal marker and a single time-to-event need to be extended to account for the rich longitudinal data of cohort studies. Multiple causes of clinical progression are indeed usually observed, and multiple longitudinal markers are collected when the true latent trait of interest is hard to capture (e.g. quality of life, functional dependency, cognitive level). These multivariate and longitudinal data also usually have nonstandard distributions (discrete, asymmetric, bounded,...). We propose a joint model based on a latent process and latent classes to analyze simultaneously such multiple longitudinal markers of different natures, and multiple causes of progression. A latent process model describes the latent trait of interest and links it to the observed longitudinal outcomes using flexible measurement models adapted to different types of data, and a latent class structure links the longitudinal and the cause-specific survival models. The joint model is estimated in the maximum likelihood framework. A score test is developed to evaluate the assumption of conditional independence of the longitudinal markers and each cause of progression given the latent classes. In addition, individual dynamic cumulative incidences of each cause of progression based on the repeated marker data are derived. The methodology is validated in a simulation study and applied on real data about cognitive aging coming from a large population-based study. The aim is to predict the risk of dementia by accounting for the competing death according to the profiles of semantic memory measured by two asymmetric psychometric tests.
△ Less
Submitted 24 August, 2015; v1 submitted 26 September, 2014;
originally announced September 2014.
-
A universal approximate cross-validation criterion and its asymptotic distribution
Authors:
Daniel Commenges,
Cécile Proust-Lima,
Cécilia Samieri,
Benoit Liquet
Abstract:
A general framework is that the estimators of a distribution are obtained by minimizing a function (the estimating function) and they are assessed through another function (the assessment function). The estimating and assessment functions generally estimate risks. A classical case is that both functions estimate an information risk (specifically cross entropy); in that case Akaike information crit…
▽ More
A general framework is that the estimators of a distribution are obtained by minimizing a function (the estimating function) and they are assessed through another function (the assessment function). The estimating and assessment functions generally estimate risks. A classical case is that both functions estimate an information risk (specifically cross entropy); in that case Akaike information criterion (AIC) is relevant. In more general cases, the assessment risk can be estimated by leave-one-out crossvalidation. Since leave-one-out crossvalidation is computationally very demanding, an approximation formula can be very useful. A universal approximate crossvalidation criterion (UACV) for the leave-one-out crossvalidation is given. This criterion can be adapted to different types of estimators, including penalized likelihood and maximum a posteriori estimators, and of assessment risk functions, including information risk functions and continuous rank probability score (CRPS). This formula reduces to Takeuchi information criterion (TIC) when cross entropy is the risk for both estimation and assessment. The asymptotic distribution of UACV and of a difference of UACV is given. UACV can be used for comparing estimators of the distributions of ordered categorical data derived from threshold models and models based on continuous approximations. A simulation study and an analysis of real psychometric data are presented.
△ Less
Submitted 8 June, 2012;
originally announced June 2012.