A mixed effects cosinor modelling framework for circadian gene expression
Abstract
The cosinor model is frequently used to represent gene expression given the 24 hour day-night cycle time at which a corresponding tissue sample is collected. However, the timing of many biological processes are based on individual-specific internal timing systems that are offset relative to day-night cycle time. When these offsets are unknown, they pose a challenge in performing statistical analyses with a cosinor model. To clarify, when sample collection times are mis-recorded, cosinor regression can yield attenuated parameter estimates, which would also attenuate test statistics. This attenuation bias would inflate type II error rates in identifying genes with oscillatory behavior. This paper proposes a heuristic method to account for unknown offsets when tissue samples are collected in a longitudinal design. Specifically, this method involves first estimating individual-specific cosinor models for each gene. The times of sample collection for that individual are then translated based on the estimated phase-shifts across every gene. Simulation studies confirm that this method mitigates bias in estimation and inference. Illustrations with real data from three circadian biology studies highlight that this method produces parameter estimates and inferences akin to those obtained when each individual’s offset is known.
Keywords: Circadian biology; Dim-light melatonin onset; Measurement error; Mixed effects models; Two-stage methods
1 Introduction
Human physiology, metabolism, and behavior all exhibit diurnal variations consistent with a circadian rhythm (Marcheva et al.,, 2013). These rhythms are generated by an internal self-sustained oscillator located in the hypothalamic suprachiasmatic nucleus, which is often referred to as a circadian clock (Bae et al.,, 2001; Hastings et al.,, 2018; Herzog et al.,, 2017; Yamaguchi et al.,, 2003). An understanding of a patient’s circadian clock has implications in maximizing their quality of care, as survival rates for open-heart surgery (Montaigne et al.,, 2018), effectiveness of chemotherapy (Dallmann et al.,, 2016), and response to a vaccine (Long et al.,, 2016) each vary based on the time of day that treatment is administered. However, there is currently limited use of the circadian clock in the timing of treatment administration (Dallmann et al.,, 2014; Wittenbrink et al.,, 2018).
One challenge in integrating the circadian clock into clinical decision-making is that an individual’s clock time (internal circadian time, or ICT) is often offset relative to day-night cycle time (Zeitgeber time, or ZT) (Duffy et al.,, 2011; Lewy,, 1999; Wittenbrink et al.,, 2018). Laboratory tests can determine the time at which melatonin onset occurs under dim-light conditions, or DLMO time, which is the gold-standard marker of this offset (Hughey,, 2017; Kennaway,, 2023; Lewy,, 1999; Ruiz et al.,, 2020; Wittenbrink et al.,, 2018). However, these laboratory tests require that multiple tissue samples are collected from an individual under controlled conditions and analyzed (Kantermann et al.,, 2015; Kennaway,, 2019, 2020; Reid,, 2019). This process places a costly and labor-intensive burden on investigators who aim to identify treatment strategies based on the circadian clock. These tests can also fail to produce precise DLMO time estimates depending on the devices used in its determination (Kennaway,, 2019, 2020). However, the use of ZT instead of ICT in these research efforts can introduce bias in the parameter estimates of regression models commonly used to represent biological phenomena over time, which would bias statistical inferences (Gorczyca et al.,, 2023, 2024; Sollberger,, 1962; Weaver and Branden,, 1995).
This paper is motivated by the challenge of determining DLMO time, and considers a scenario where a longitudinal design is utilized to record biological phenomena from each individual. Initially, we assess the suitability of a linear mixed effects cosinor model, commonly used to represent gene expression over time, in accounting for the offset of ICT relative to ZT when the offset is unknown (Archer et al.,, 2014; del Olmo et al.,, 2022; Fontana et al.,, 2012; Hou et al.,, 2021; Möller-Levet et al.,, 2013). This assessment reveals that fixed parameter estimates for this model and test statistics computed from it are biased towards zero, or suffer from attenuation bias, when each individual possesses a unique offset. To remove this bias without performing laboratory tests, this paper proposes a method to identify an individual’s offset prior to regression. Specifically, given multiple samples collected from each individual over time, this method involves first estimating a cosinor model from each individual’s data, and then adjusting the sample collection times for that individual based on their individual-specific phase-shift estimate.
The remainder of this paper is organized as follows. In Section 2, background on the mixed effects cosinor model, motivating results, and an overview of the proposed method are presented. In Section 3, Monte Carlo simulation studies are performed to assess the utility of the proposed method. In Section 4, the proposed method is applied on real data from three circadian biology studies, where it is found to produce parameter estimates and inferences comparable to those where each individual’s DLMO time has been determined from laboratory tests. Finally, in Section 5, the proposed method and directions for future work are discussed.
2 Methodology
2.1 Background and notation
Suppose a longitudinal circadian biology experiment is conducted with individuals, where tissue samples are collected from the -th individual over time. In this paper, it is assumed that each sample is processed to extract the expression levels of different genes. Specifically, let denote an vector , in which denotes the -th recording of the -th gene for the -th individual. Further, let denote an vector , in which denotes the time at which occurs for all . It is emphasized that this paper adopts a notation convention where estimators, statistics, and other quantities related to the -th gene are denoted with in their superscript.
When each sample is independent of every other sample, a cosinor model is often specified for modelling gene expression given time, which has the nonlinear amplitude-phase representation
(1) |
Here, denotes the mean expression levels of the -th gene; denotes the amplitude of the -th gene, or the deviation from mean expression levels to peak expression levels; and denotes the phase-shift of the -th gene, which relates to the time at which gene expression levels peak (Cornelissen,, 2014). In this paper, the gene-specific random noise . When the model in (1) is specified, it can be transformed into the linear model
(2) |
where the identities
(3) |
can be used to transform the linear model in (2) into the nonlinear model in (1), and vice versa (Tong,, 1976). It is noted that transforming this nonlinear model into a linear model enables unbiased estimation of its parameters without additional technical assumptions (Boos and Stefanski,, 2013, Theorem 6.7).
A common extension of the model in (1) is to enable each individual to influence their repeated outcomes. Specifically, mixed effects models can enable this influence (Davidian and Giltiman,, 1995; Hedeker and Gibbons,, 2006; McCulloch and Searle,, 2000), where the cosinor model would instead be specified as
(4) |
Here, the parameters , , and from (1) maintain their interpretation, and represent constant parameter estimands for the population, or fixed effects. Additionally, denotes the -th individual’s influence on their mean expression levels of the -th gene; denotes the -th individual’s influence on their amplitude of the -th gene; and denotes the -th individual’s influence on their phase-shift of the -th gene. It is emphasized that , or parameters related to an individual’s influence on their repeated outcomes are considered random effects generated from a gene-specific probability distribution . In this paper, it is assumed that the expectation of each random effect is zero. Further, it is not assumed that each gene-specific equals the offset of an individual’s internal circadian time (ICT) relative to the 24 hour day-night cycle time (Zeitgeber time, or ZT).
Many circadian biology studies assume that (4) can be expressed as the linear mixed effects model
(5) |
where and are independent of the random noise for all (Archer et al.,, 2014; del Olmo et al.,, 2022; Fontana et al.,, 2012; Hou et al.,, 2021; Möller-Levet et al.,, 2013). Under these assumptions,
where denotes the vector of fixed effect estimands for the -th gene,
denotes the design matrix for estimating fixed effects, and
denotes the covariance matrix of the -th gene’s expression levels for the -th individual, with denoting an identity matrix. The corresponding likelihood function that is maximized for estimating the parameters of this model is defined as
When and are known, the expression
yields maximum likelihood estimates of the fixed effects (Davidian and Giltiman,, 1995, Page 78; McCulloch and Searle,, 2000, Equation 6.19). When and are unknown, it is not possible to obtain a closed-form expression of these parameter estimates, and an expectation-maximization algorithm is utilized to obtain them (Hedeker and Gibbons,, 2006, Chapter 4; Laird et al.,, 1987; McCulloch and Searle,, 2000, Chapter 14).
After parameter estimation, an investigator could perform hypothesis tests to assess the statistical significance of a gene’s oscillatory behavior. This paper considers hypothesis testing with the fixed effects parameter vector , with which Wald-type test statistics can be computed as
(6) |
Here, denotes a matrix, where , and denotes the empirical parameter vector estimate under the null hypothesis, which is (Halekoh and Højsgaard,, 2014). This paper defines
and , which can be considered a determination of whether or not a gene displays oscillatory behavior (Zong et al.,, 2023). This test statistic has an asymptotic distribution, and the null hypothesis would be rejected at a pre-determined -level if surpasses the percentile of this distribution (Halekoh and Højsgaard,, 2014).
2.2 Fixed effects estimation in the presence of random phase-shifts
A long-standing challenge in computing parameter estimates and test statistics for a cosinor model is that these quantities are biased when each individual has a distinct, unknown phase-shift parameter (Sollberger,, 1962; Weaver and Branden,, 1995). Recent investigations into this bias were motivated by the underlying offset of an individual’s ICT relative to ZT. Specifically, these investigations interpreted this offset as covariate measurement error, or mis-recorded time data, and theoretically as well as empirically quantified how cosinor model parameter estimates and test statistics are biased when ZT is used for regression (Gorczyca et al.,, 2023, 2024). Building upon these insights, this paper evaluates the accuracy of approximating the nonlinear mixed cosinor model in (4) with the linear mixed cosinor model in (2.1) when each individual’s random phase-shift parameter is unknown. Specifically, we first introduce the following proposition, which quantifies this accuracy.
Proposition 1.
Suppose each and . Assume the model in (2.1) is specified to estimate the response in (4), with both and known such that , or equals a diagonal matrix where denotes the -th element along the diagonal. Further, assume , where the corresponding probability density function of is symmetric around zero. Then the expectation of the fixed parameter estimates
where denotes the characteristic function for evaluated at . Further, the Wald test statistic in (6) computed with the expectation of these parameter estimates can be expressed as
A derivation for this result is provided in Appendix B. To clarify the setup for this result, when each , tissue samples are collected from an equispaced design. This design is optimal for minimizing the estimation variance of a cosinor model under multiple statistical criteria (Pukelsheim,, 2006, Pages 241-243). Further, when each , no samples are removed prior to estimation, which can occur when a collected tissue sample is poor quality (Laurie et al.,, 2010). It is noted that symmetry is assumed for the probability density function of . As a consequence, the marginal density function of is also symmetric, which results in being a real-valued function that is bounded by one in magnitude.
The significance of Proposition 1 is it presents a scenario where an investigator employs a common experimental design and has additional knowledge of and . In this scenario, the magnitude of the fixed parameter estimates and are biased towards zero, as the characteristic function is bounded by one in magnitude. Further, application of (3) yields
which indicates that the fixed amplitude parameter estimate is biased, while the fixed phase-shift parameter estimate is unbiased. These results also indicate that the test statistic is attenuated by relative to the scenario where there are no random phase-shift parameters.
To understand when the linear mixed effects model in (2.1) produces unbiased parameter estimates if the random phase-shifts are unknown, an alternate representation that is equivalent to it is
(7) |
which separates fixed and random effects into different nonlinear components (Mikulich et al.,, 2003). The following proposition transforms the model in (7) into the model in (4).
Proposition 2.
2.3 Overview of method
Propositions 1 and 2 highlight that parameter estimates and inferences of a linear mixed effects cosinor model are biased when the random phase-shift parameters are unknown. A practical consequence of not addressing these biases are that study conclusions made from a linear mixed effects cosinor model would be inaccurate. For example, the attenuation bias in parameter estimates and test statistics presented in Proposition 1 would result in genes being misclassified as having statistically insignificant oscillatory behavior in a circadian biology study.
To address this bias, an investigator could collect a sufficient number of samples from each individual to estimate individual-specific cosinor models for each gene. In this scenario, an investigator could utilize two-stage methods, which would leverage individual-specific nonlinear model parameter estimates as building blocks for estimation and inference (Davidian and Giltiman,, 1995, Chapter 5). It is noted that a two-stage method has been proposed for the cosinor model to address the presence of random phase-shifts (Weaver and Branden,, 1995).
A challenge in utilizing two-stage methods in circadian biology studies is each is distinct across genes, and each is not necessarily indicative of the underlying offset between an individual’s internal circadian time (ICT) and Zeitgeber time (ZT). Specifically, each gene displays different oscillatory behavior, with approximately 50% of mammalian genes displaying tissue-dependent oscillations (Mure et al.,, 2018; Ruben et al.,, 2018; Zhang et al.,, 2014). Further, certain biological processes, such as the secretion of some hormones relevant to thyroid function or pregnancy, can exhibit minimal diurnal variations (Kennaway,, 2019, 2020). This challenge is exacerbated by the limited number of samples typically collected in a circadian biology study (Brooks et al.,, 2023), which could produce imprecise individual-specific parameter estimates. Consequently, two-stage methods may not yield parameter estimates and inferences that align with those derived from knowing the offset of individual’s ICT relative to ZT.
To address this challenge and recover parameter estimates and inferences that would be obtained when each individual’s offset is known, we propose is a heuristic based on two-stage methods that involves the following steps. For each step, relevant comments are provided to clarify their significance and implications:
- Step 1.
-
Estimate a linear mixed effects cosinor model for each gene with data across every individual.
- Step 2.
-
For each gene-specific (indexed by ) cosinor model estimated in Step 1, compute the amplitude parameter estimate () and phase-shift parameter estimate () with the identities in (3).
- Step 3.
-
Estimate individual-specific linear cosinor models for each gene.
- Step 4.
-
For each individual-specific (indexed by ) and gene-specific (indexed by ) cosinor model estimated in Step 3, compute the amplitude parameter estimate () and phase-shift parameter estimate () with the identities in (3).
- Step 5.
-
Define
Here, denotes the estimated variance for , and denotes the estimated variance for .
- Step 6.
-
Define , where
- Step 7.
-
Estimate a linear mixed effects cosinor model with the data for each gene.
Steps 1-4 obtain individual-specific and population parameter estimates. Step 5 first computes an inverse-variance weighted average of and (Cochran and Carroll,, 1953), where the variance quantities are derived from the respective amplitude estimates. It is emphasized that the difference between this average and is an estimate of , where
is bounded between zero and one. The estimated weight can be interpreted as balancing the contribution of relative to based on their respective amplitude variances, where a derivation for the amplitude variance is provided in Appendix D. In Step 6, an additional inverse-variance weighted average of each across every gene is computed with the corresponding amplitude variance . The times of sample collection for the -th individual are then translated by the quantity obtained from this weighted average, which is denoted as . In Step 7, a linear mixed effects model is estimated with this translated data.
It is noted that the population phase-shift estimates obtained with this method can be different from those obtained with a linear mixed effects model where the times of sample collection are not translated. If an investigator is interested in analyzing estimates of , this paper recommends that an additional adjustment is performed to ensure each is the same regardless of whether or is used for estimation. This recommendation is supported by Proposition 1, which implies that phase-shift estimates are unbiased when generated by a symmetric probability distribution.
3 Simulation Study
A simulation study is conducted to evaluate the method proposed in Section 2.3, where the design of this study is detailed in Appendix E. Briefly, six settings are considered from the simulation software (Agostinelli et al.,, 2016; Ceglia et al.,, 2018). Specifically, each setting represents a scenario in the software where a simulated gene displays periodic behavior over a 24 hour interval, where the cosinor model is mis-specified for many of these settings. For each simulation setting, 2,000 simulation trials are performed, and in each simulation trial, two data sets are generated: one in which each individual has a random phase-shift parameter, and one in which there are no random phase-shift parameters. It is noted the second data set represents a scenario where the random phase-shifts are known.
This simulation study considers three estimation frameworks to obtain linear mixed effects cosinor models from these two data sets:
- Framework 1.
-
A linear mixed effects cosinor model is estimated using the method in Section 2.3 with the data set where each individual has an unknown random phase-shift parameter.
- Framework 2.
-
A linear mixed effects cosinor model is estimated with the data set where each individual has an unknown random phase-shift parameter.
- Framework 3.
-
A linear mixed effects cosinor model is estimated with the data set where there are no random phase-shift parameters.
It is emphasized that only one gene is generated in each simulation trial. As a consequence, Step 6 of the method in Framework 1 would set .
After conducting 2,000 simulation trials, the mean and standard deviation of the following quantities are reported:
-
1.
, or the estimated amplitude in a simulation trial.
-
2.
, or the computed Wald test statistic from (6) in a simulation trial.
It is noted that the phase-shift estimate is not considered, as Framework 1 can be modified to produce the same phase-shifts as Framework 2.
Table 1 presents the results for each simulation setting. Framework 1, or the method in Section 2.3, consistently produces amplitude estimates and test statistics that align with those produced if each individual’s is known. Framework 2, on the other hand, produces corresponding quantities that suffer from attenuation bias.
4 Illustrations
Publicly available data from three longitudinal circadian biology studies are considered for illustration: Archer et al., (2014) produced data from two cohorts (a control cohort and an intervention cohort); Braun et al., (2018) produced data from a single cohort; and Möller-Levet et al., (2013) produced data from two cohorts (a control cohort and an intervention cohort). Each data set has been described in detail in their respective studies and summarized by Gorczyca et al., (2024). Additionally, each data set has been processed by Huang and Braun, (2024) and made publicly available. The significance of each data set is that their corresponding studies performed laboratory tests to determine the offset of each study participant’s internal circadian time (ICT) relative to Zeitgeber time (ZT). However, gene expression measurements and offsets are unavailable for some study participants in these processed data sets. In this illustration, genes with missing expression measurements and samples from study participants with missing offsets within cohort-specific data are excluded before estimation and inference. It is noted that, for illustrative purposes, this illustration constructs two additional cohorts: one where data from both cohorts for Archer et al., (2014) are combined, and another where data from both cohorts for Möller-Levet et al., (2013) are combined.
The illustrations follow the evaluation procedure from Section 3 using data from each of the seven cohorts considered. Specifically, Wald test statistics in (6) and fixed amplitude parameters in (4) are computed from Frameworks 1 and 2 given ZT. Framework 3 computes these same quantities given ICT, which are considered the true quantities in this illustration. It is emphasized that each cohort-specific data set consists of multiple genes. To account for this, a linear model-based evaluation is presented in this illustration (Gorczyca et al.,, 2024). Specifically, the covariate of this linear model is specified as a quantity obtained from a framework given ZT, while the corresponding response variable is the corresponding true quantity obtained from Framework 3. No intercept term is specified in this linear model, which results in a single regression parameter estimate, . This paper utilizes to assess the relationship between the quantities obtained from each framework. If , a linear relationship exists between the quantity obtained with ZT and a quantity obtained with ICT. If , then quantities obtained with ICT are consistently larger than quantities obtained with ZT. If , then quantities obtained with ZT are consistently larger than quantities obtained with ICT. This assessment also reports the coefficient of determination () for each linear model, where higher values signify greater precision in linear model fit. It is noted that a stronger performing framework would obtain values closer to one and a larger value.
Table 2 presents linear model parameter estimates and coefficients of determination computed from each sample population’s data. Notably, Framework 1, or the method proposed in Section 2.3, reduced bias in amplitude estimation and Wald test statistic calculation, often yielding estimates approximately equal to one. Framework 2 outperforms Framework 1 in amplitude estimation with intervention cohort data from Archer et al., (2014), and Wald test statistic calculation with control cohort data from Archer et al., (2014). Table 2 highlights that the estimated values are consistently larger than one for Framework 2, which implies that estimation of a linear mixed effects model given ZT results in attenuated parameter estimates and test statistics.
5 Discussion
In this paper, a heuristic method is proposed to account for the offset of each individual’s internal circadian time (ICT) relative to Zeitgeber time (ZT) when these offsets are unknown. If these offsets are left unaddressed by an investigator during statistical analysis, the parameter estimates for a cosinor model and test statistics computed with these estimates suffer from attenuation bias (Sollberger,, 1962; Weaver and Branden,, 1995; Gorczyca et al.,, 2023, 2024), which would inflate type II error rates in identifying genes with oscillatory behavior. The method proposed in this paper requires that a sufficient number of samples are collected from each individual in a longitudinal design. The collection of samples in a longitudinal design enables estimation of individual-specific cosinor models for each gene, which the proposed method uses to translate the ZTs that samples are collected.
We recognize that there are limitations to this study. One limitation is that the cosinor model is biased towards the identification of genes that display sinusoidal oscillations. However, this model is common for representing gene expression over time (Archer et al.,, 2014; del Olmo et al.,, 2022; Fontana et al.,, 2012; Hou et al.,, 2021; Möller-Levet et al.,, 2013), and enables an investigator to obtain interpretable parameter estimates and perform hypothesis tests. Second, the translated times of sample collection obtained from this method do not necessarily correspond to translating the times of sample collection by each individual’s true offset. To clarify, the method translates sample collection times based on the estimated times that genes peak in their expression levels. The resulting quantity identified for translating sample collection times could be offset from an individual’s melatonin onset time under dim-light conditions, or DLMO time, which is a gold-standard biomarker for an individual’s true offset. However, the proposed method produces amplitude estimates and Wald test statistics akin to those obtained when each individual’s DLMO time is known.
The results presented in this paper open up avenues for future research. First, other frameworks have been proposed to identify genes with oscillatory behavior (Mei et al.,, 2020). There could be value in studying how statistical analyses with these frameworks are affected by the presence of individual-specific unknown offsets. Second, depending on the assumptions an investigator makes about the oscillatory behavior of a gene over time, there could be utility in extending recent non-parametric methods for data contaminated with covariate measurement error to longitudinal designs (Delaigle and Hall,, 2016; Di Marzio et al.,, 2021, 2023; Nghiem and Potgieter,, 2018; Nghiem et al.,, 2020).
Acknowledgements
The author would like to thank Forest Agostinelli at the University of South Carolina for conversation concerning the BioCycle simulation data sets, which supported development of the simulation study design in Section 3; Thomas Brooks at the University of Pennsylvania for input on factors that could affect reproducibility in circadian biology studies, which helped refine the scope of this paper and method development; and David Kennaway at the University of Adelaide for sharing insights on the cost of DLMO time determination and providing examples of biological processes that exhibit minimal diurnal variations, which enhanced the presentation of this paper.
Simulation Setting | Framework | ||
---|---|---|---|
1 | 1 | 0.292 (0.067) | 10.606 (6.166) |
2 | 0.273 (0.068) | 9.174 (5.958) | |
3 | 0.306 (0.071) | 10.816 (6.852) | |
2 | 1 | 0.322 (0.102) | 4.528 (2.989) |
2 | 0.290 (0.103) | 3.663 (2.730) | |
3 | 0.322 (0.107) | 4.459 (3.192) | |
3 | 1 | 0.266 (0.079) | 5.375 (3.391) |
2 | 0.237 (0.081) | 4.261 (3.098) | |
3 | 0.305 (0.084) | 6.878 (4.303) | |
4 | 1 | 0.150 (0.057) | 2.731 (2.237) |
2 | 0.116 (0.057) | 1.678 (1.825) | |
3 | 0.250 (0.062) | 8.280 (4.783) | |
5 | 1 | 0.193 (0.067) | 3.950 (2.888) |
2 | 0.165 (0.069) | 2.949 (2.578) | |
3 | 0.251 (0.072) | 6.689 (4.313) | |
6 | 1 | 0.306 (0.094) | 5.833 (3.969) |
2 | 0.254 (0.096) | 3.979 (3.443) | |
3 | 0.314 (0.085) | 7.509 (4.800) |
Sample Population | Framework | ||
---|---|---|---|
Archer (Control) | 1 | 1.014 (0.978) | 1.119 (0.944) |
2 | 1.019 (0.990) | 1.071 (0.978) | |
Archer (Intervention) | 1 | 0.976 (0.935) | 0.965 (0.904) |
2 | 0.994 (0.936) | 1.042 (0.911) | |
Archer (Combined) | 1 | 1.030 (0.967) | 1.052 (0.942) |
2 | 1.072 (0.970) | 1.195 (0.953) | |
Braun | 1 | 1.004 (0.996) | 1.030 (0.980) |
2 | 1.021 (0.995) | 1.068 (0.977) | |
Möller-Levet (Control) | 1 | 1.099 (0.990) | 1.158 (0.967) |
2 | 1.136 (0.987) | 1.242 (0.956) | |
Möller-Levet (Intervention) | 1 | 1.007 (0.989) | 1.052 (0.967) |
2 | 1.063 (0.991) | 1.135 (0.974) | |
Möller-Levet (Combined) | 1 | 1.080 (0.994) | 1.145 (0.979) |
2 | 1.110 (0.994) | 1.191 (0.978) |
Appendix A Supporting Lemmas
Lemma 1.
Suppose such that the probability density function for , denoted as , is symmetric with a mean of zero. Then .
Proof.
Given that is symmetric, the corresponding marginal probability density function is also symmetric. For , it follows that
The derivation is equivalent for . ∎
Lemma 2.
Suppose each , is a diagonal matrix, and . Further, define
Then each element of the matrix can be expressed as
where
Proof.
To simplify presentation, the superscript is omitted. Recall the identity presented in Davidian and Giltiman, (1995, page 78),
which yields
To compute this matrix, first note
(8) |
where (8) is due to orthogonality of distinct terms in a trigonometric basis (Tsybakov,, 2009, Lemma 1.7). It follows that
(9) |
The identity in (9) can be utilized to obtain
which implies
To conclude,
∎
Lemma 3.
Suppose such that probability density function of , denoted as , is symmetric with a mean of zero. Then
Proof.
The superscript is omitted. The result follows from the derivation of Theorem 1 in Gorczyca et al., (2023), where we find
(10) | ||||
(11) | ||||
(12) |
Here, (10) is due to Lemma 1; (11) is attributed to Euler’s formula, which yields when the probability density function for is symmetric with mean zero; and (12) is by application of the identities in (3). ∎
Appendix B Derivation of Proposition 1
Proof.
The superscript is again omitted to simplify presentation. The expected parameter estimates are first obtained. Given that is known, estimation of is analogous to solving the normal equation for generalized least squares estimation, or identifying the quantity that satisfies the equality
which under expectation is expressed as
given , , and are non-random quantities. First, note that application of Lemma 2 yields
for elements in the first row of ,
for elements in the second row of , and
for elements in the third row of . From these three expressions, it follows that
(13) |
where (13) is due to orthogonality of distinct terms in a trigonometric basis noted in Lemma 2 (Tsybakov,, 2009, Lemma 1.7). Now, by application of Lemma 3, we find
which yields
for the first element of ,
for the second element of , and
for the third element of . The expected fixed parameter estimates can subsequently be expressed as
(14) |
Consideration is now given towards computing the in (6) given these parameter estimates. First, note that
Further,
which implies that
To conclude, the Wald test statistic can be expressed as
∎
Appendix C Derivation of Proposition 2
Appendix D Derivation of Amplitude Variance
Define , which implies
Further, let denote the estimated covariance matrix for the parameter vector . By application of the Delta method (Boos and Stefanski,, 2013, Theorem 1.3),
Appendix E Overview of Simulation Study Design from Section 3
We present the following six simulation settings, where the names in parentheses are from the BioCycle software:
- Setting 1 (Cosine).
-
, , , , , , , , , , and
- Setting 2 (Cosine + Outlier).
-
, , , , , , , , , , , and
where
- Setting 3 (Cosine2).
-
, , , , , , , , , , and
- Setting 4 (Peak).
-
, , , , , , , , , , and
- Setting 5 (Triangle).
-
, , , , , , , , , , and
- Setting 6 (Square).
-
, , , , , , , , , , and
Here, denotes a normal distribution with mean and variance ; a truncated normal distribution with mean , variance , lower bound , and upper bound ; and denotes a uniform distribution with support from to . For every simulation setting, the true population amplitude parameter , which has been reported as being frequently observed across genes (Möller-Levet et al.,, 2013). Further, for Settings 1 and 4, a sample is collected from each individual once every two hours over a 48 hour period; Settings 2 and 5 once every three hours over a 48 hour period; and Settings 3 and 6 once every four hours over a 48 hour period. A 48 hour period is considered given guidelines for circadian biology experimental design (Hughes et al.,, 2017).
References
- Agostinelli et al., (2016) Agostinelli, F., Ceglia, N., Shahbaba, B., Sassone-Corsi, P., and Baldi, P. (2016). What time is it? deep learning approaches for circadian rhythms. Bioinformatics, 32(12):i8–i17.
- Archer et al., (2014) Archer, S. N., Laing, E. E., Möller-Levet, C. S., van der Veen, D. R., Bucca, G., Lazar, A. S., Santhi, N., Slak, A., Kabiljo, R., von Schantz, M., Smith, C. P., and Dijk, D.-J. (2014). Mistimed sleep disrupts circadian regulation of the human transcriptome. Proceedings of the National Academy of Sciences, 111(6).
- Bae et al., (2001) Bae, K., **, X., Maywood, E. S., Hastings, M. H., Reppert, S. M., and Weaver, D. R. (2001). Differential functions of mper1, mper2, and mper3 in the scn circadian clock. Neuron, 30(2):525–536.
- Boos and Stefanski, (2013) Boos, D. D. and Stefanski, L. A. (2013). Essential statistical inference. Springer Texts in Statistics. Springer, New York, NY, 2013 edition.
- Braun et al., (2018) Braun, R., Kath, W. L., Iwanaszko, M., Kula-Eversole, E., Abbott, S. M., Reid, K. J., Zee, P. C., and Allada, R. (2018). Universal method for robust detection of circadian state from gene expression. Proceedings of the National Academy of Sciences, 115(39).
- Brooks et al., (2023) Brooks, T. G., Manjrekar, A., Mrcˇela, A., and Grant, G. R. (2023). Meta-analysis of diurnal transcriptomics in mouse liver reveals low repeatability of rhythm analyses. Journal of Biological Rhythms, 38(6):556–570.
- Ceglia et al., (2018) Ceglia, N., Liu, Y., Chen, S., Agostinelli, F., Eckel-Mahan, K., Sassone-Corsi, P., and Baldi, P. (2018). CircadiOmics: circadian omic web portal. Nucleic Acids Research, 46(W1):W157–W162.
- Cochran and Carroll, (1953) Cochran, W. G. and Carroll, S. P. (1953). A sampling investigation of the efficiency of weighting inversely as the estimated variance. Biometrics, 9(4):447.
- Cornelissen, (2014) Cornelissen, G. (2014). Cosinor-based rhythmometry. Theoretical Biology and Medical Modelling, 11(1).
- Dallmann et al., (2014) Dallmann, R., Brown, S. A., and Gachon, F. (2014). Chronopharmacology: New insights and therapeutic implications. Annual Review of Pharmacology and Toxicology, 54(1):339–361.
- Dallmann et al., (2016) Dallmann, R., Okyar, A., and Lévi, F. (2016). Dosing-time makes the poison: Circadian regulation and pharmacotherapy. Trends in Molecular Medicine, 22(5):430–445.
- Davidian and Giltiman, (1995) Davidian, M. and Giltiman, D. M. (1995). Nonlinear models for repeated measurement data. Chapman & Hall/CRC Monographs on Statistics and Applied Probability. Chapman & Hall/CRC, Philadelphia, PA.
- del Olmo et al., (2022) del Olmo, M., Spörl, F., Korge, S., Jürchott, K., Felten, M., Grudziecki, A., de Zeeuw, J., Nowozin, C., Reuter, H., Blatt, T., Herzel, H., Kunz, D., Kramer, A., and Ananthasubramaniam, B. (2022). Inter-layer and inter-subject variability of diurnal gene expression in human skin. NAR Genomics and Bioinformatics, 4(4).
- Delaigle and Hall, (2016) Delaigle, A. and Hall, P. (2016). Methodology for non-parametric deconvolution when the error distribution is unknown. Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(1):231–252.
- Di Marzio et al., (2021) Di Marzio, M., Fensore, S., Panzera, A., and Taylor, C. C. (2021). Density estimation for circular data observed with errors. Biometrics, 78(1):248–260.
- Di Marzio et al., (2023) Di Marzio, M., Fensore, S., and Taylor, C. C. (2023). Kernel regression for errors-in-variables problems in the circular domain. Statistical Methods & Applications.
- Duffy et al., (2011) Duffy, J. F., Cain, S. W., Chang, A.-M., Phillips, A. J. K., Münch, M. Y., Gronfier, C., Wyatt, J. K., Dijk, D.-J., Wright, K. P., and Czeisler, C. A. (2011). Sex difference in the near-24-hour intrinsic period of the human circadian timing system. Proceedings of the National Academy of Sciences, 108(supplement_3):15602–15608.
- Fontana et al., (2012) Fontana, A., Copetti, M., Mazzoccoli, G., Kypraios, T., and Pellegrini, F. (2012). A linear mixed model approach to compare the evolution of multiple biological rhythms. Statistics in Medicine, 32(7):1125–1135.
- Gorczyca et al., (2024) Gorczyca, M., McDonald, T., and Sefas, J. (2024). A corrected score function framework for modelling circadian gene expression. arXiv:2401.01998.
- Gorczyca et al., (2023) Gorczyca, M. T., McDonald, T. M., and Sefas, J. D. (2023). Trigonometric regression in the presence of measurement error. Lawrence Livermore National Laboratory Technical Report.
- Halekoh and Højsgaard, (2014) Halekoh, U. and Højsgaard, S. (2014). A kenward-roger approximation and parametric bootstrap methods for tests in linear mixed models - the r package pbkr test. Journal of Statistical Software, 59(9).
- Hastings et al., (2018) Hastings, M. H., Maywood, E. S., and Brancaccio, M. (2018). Generation of circadian rhythms in the suprachiasmatic nucleus. Nature Reviews Neuroscience, 19(8):453–469.
- Hedeker and Gibbons, (2006) Hedeker, D. and Gibbons, R. D. (2006). Longitudinal Data Analysis. Wiley.
- Herzog et al., (2017) Herzog, E. D., Hermanstyne, T., Smyllie, N. J., and Hastings, M. H. (2017). Regulating the suprachiasmatic nucleus (scn) circadian clockwork: Interplay between cell-autonomous and circuit-level mechanisms. Cold Spring Harbor Perspectives in Biology, 9(1):a027706.
- Hou et al., (2021) Hou, R., Tomalin, L. E., and Suárez-Fariñas, M. (2021). cosinormixedeffects: an r package for mixed-effects cosinor models. BMC Bioinformatics, 22(1).
- Huang and Braun, (2024) Huang, Y. and Braun, R. (2024). Platform-independent estimation of human physiological time from single blood samples. Proceedings of the National Academy of Sciences, 121(3).
- Hughes et al., (2017) Hughes, M. E., Abruzzi, K. C., Allada, R., Anafi, R., Arpat, A. B., Asher, G., Baldi, P., de Bekker, C., Bell-Pedersen, D., Blau, J., Brown, S., Ceriani, M. F., Chen, Z., Chiu, J. C., Cox, J., Crowell, A. M., DeBruyne, J. P., Dijk, D.-J., DiTacchio, L., Doyle, F. J., Duffield, G. E., Dunlap, J. C., Eckel-Mahan, K., Esser, K. A., FitzGerald, G. A., Forger, D. B., Francey, L. J., Fu, Y.-H., Gachon, F., Gatfield, D., de Goede, P., Golden, S. S., Green, C., Harer, J., Harmer, S., Haspel, J., Hastings, M. H., Herzel, H., Herzog, E. D., Hoffmann, C., Hong, C., Hughey, J. J., Hurley, J. M., de la Iglesia, H. O., Johnson, C., Kay, S. A., Koike, N., Kornacker, K., Kramer, A., Lamia, K., Leise, T., Lewis, S. A., Li, J., Li, X., Liu, A. C., Loros, J. J., Martino, T. A., Menet, J. S., Merrow, M., Millar, A. J., Mockler, T., Naef, F., Nagoshi, E., Nitabach, M. N., Olmedo, M., Nusinow, D. A., Ptáček, L. J., Rand, D., Reddy, A. B., Robles, M. S., Roenneberg, T., Rosbash, M., Ruben, M. D., Rund, S. S., Sancar, A., Sassone-Corsi, P., Sehgal, A., Sherrill-Mix, S., Skene, D. J., Storch, K.-F., Takahashi, J. S., Ueda, H. R., Wang, H., Weitz, C., Westermark, P. O., Wijnen, H., Xu, Y., Wu, G., Yoo, S.-H., Young, M., Zhang, E. E., Zielinski, T., and Hogenesch, J. B. (2017). Guidelines for genome-scale analysis of biological rhythms. Journal of Biological Rhythms, 32(5):380–393.
- Hughey, (2017) Hughey, J. J. (2017). Machine learning identifies a compact gene set for monitoring the circadian clock in human blood. Genome Medicine, 9(1).
- Kantermann et al., (2015) Kantermann, T., Sung, H., and Burgess, H. J. (2015). Comparing the morningness-eveningness questionnaire and munich chronoType questionnaire to the dim light melatonin onset. Journal of Biological Rhythms, 30(5):449–453.
- Kennaway, (2019) Kennaway, D. J. (2019). A critical review of melatonin assays: Past and present. Journal of Pineal Research, 67(1).
- Kennaway, (2020) Kennaway, D. J. (2020). Measuring melatonin by immunoassay. Journal of Pineal Research, 69(1).
- Kennaway, (2023) Kennaway, D. J. (2023). The dim light melatonin onset across ages, methodologies, and sex and its relationship with morningness/eveningness. Sleep, 46(5).
- Laird et al., (1987) Laird, N., Lange, N., and Stram, D. (1987). Maximum likelihood computations with repeated measures: Application of the em algorithm. Journal of the American Statistical Association, 82(397):97–105.
- Laurie et al., (2010) Laurie, C. C., Doheny, K. F., Mirel, D. B., Pugh, E. W., Bierut, L. J., Bhangale, T., Boehm, F., Caporaso, N. E., Cornelis, M. C., Edenberg, H. J., Gabriel, S. B., Harris, E. L., Hu, F. B., Jacobs, K. B., Kraft, P., Landi, M. T., Lumley, T., Manolio, T. A., McHugh, C., Painter, I., Paschall, J., Rice, J. P., Rice, K. M., Zheng, X., and Weir, B. S. (2010). Quality control and quality assurance in genotypic data for genome‐wide association studies. Genetic Epidemiology, 34(6):591–602.
- Lewy, (1999) Lewy, A. J. (1999). The dim light melatonin onset, melatonin assays and biological rhythm research in humans. Neurosignals, 8(1-2):79–83.
- Long et al., (2016) Long, J. E., Drayson, M. T., Taylor, A. E., Toellner, K. M., Lord, J. M., and Phillips, A. C. (2016). Morning vaccination enhances antibody response over afternoon vaccination: A cluster-randomised trial. Vaccine, 34(24):2679–2685.
- Marcheva et al., (2013) Marcheva, B., Ramsey, K. M., Peek, C. B., Affinati, A., Maury, E., and Bass, J. (2013). Circadian Clocks and Metabolism, page 127–155. Springer Berlin Heidelberg.
- McCulloch and Searle, (2000) McCulloch, C. E. and Searle, S. R. (2000). Generalized, Linear, and Mixed Models. Wiley.
- Mei et al., (2020) Mei, W., Jiang, Z., Chen, Y., Chen, L., Sancar, A., and Jiang, Y. (2020). Genome-wide circadian rhythm detection methods: systematic evaluations and practical guidelines. Briefings in Bioinformatics, 22(3).
- Mikulich et al., (2003) Mikulich, S. K., Zerbe, G. O., Jones, R. H., and Crowley, T. J. (2003). Comparing linear and nonlinear mixed model approaches to cosinor analysis. Statistics in Medicine, 22(20):3195–3211.
- Möller-Levet et al., (2013) Möller-Levet, C. S., Archer, S. N., Bucca, G., Laing, E. E., Slak, A., Kabiljo, R., Lo, J. C. Y., Santhi, N., von Schantz, M., Smith, C. P., and Dijk, D.-J. (2013). Effects of insufficient sleep on circadian rhythmicity and expression amplitude of the human blood transcriptome. Proceedings of the National Academy of Sciences, 110(12).
- Montaigne et al., (2018) Montaigne, D., Marechal, X., Modine, T., Coisne, A., Mouton, S., Fayad, G., Ninni, S., Klein, C., Ortmans, S., Seunes, C., Potelle, C., Berthier, A., Gheeraert, C., Piveteau, C., Deprez, R., Eeckhoute, J., Duez, H., Lacroix, D., Deprez, B., Jegou, B., Koussa, M., Edme, J.-L., Lefebvre, P., and Staels, B. (2018). Daytime variation of perioperative myocardial injury in cardiac surgery and its prevention by rev-erba antagonism: a single-centre propensity-matched cohort study and a randomised study. The Lancet, 391(10115):59–69.
- Mure et al., (2018) Mure, L. S., Le, H. D., Benegiamo, G., Chang, M. W., Rios, L., Jillani, N., Ngotho, M., Kariuki, T., Dkhissi-Benyahya, O., Cooper, H. M., and Panda, S. (2018). Diurnal transcriptome atlas of a primate across major neural and peripheral tissues. Science, 359(6381).
- Nghiem and Potgieter, (2018) Nghiem, L. and Potgieter, C. J. (2018). Density estimation in the presence of heteroscedastic measurement error of unknown type using phase function deconvolution. Statistics in Medicine, 37(25):3679–3692.
- Nghiem et al., (2020) Nghiem, L. H., Byrd, M. C., and Potgieter, C. J. (2020). Estimation in linear errors-in-variables models with unknown error distribution. Biometrika, 107(4):841–856.
- Pukelsheim, (2006) Pukelsheim, F. (2006). Optimal Design of Experiments. Springer Texts in Statistics. Society for Industrial and Applied Mathematics, Philadelphia, PA.
- Reid, (2019) Reid, K. J. (2019). Assessment of circadian rhythms. Neurologic Clinics, 37(3):505–526.
- Ruben et al., (2018) Ruben, M. D., Wu, G., Smith, D. F., Schmidt, R. E., Francey, L. J., Lee, Y. Y., Anafi, R. C., and Hogenesch, J. B. (2018). A database of tissue-specific rhythmically expressed human genes has potential applications in circadian medicine. Science Translational Medicine, 10(458).
- Ruiz et al., (2020) Ruiz, F. S., Beijamini, F., Beale, A. D., da Silva B. Gonçalves, B., Vartanian, D., Taporoski, T. P., Middleton, B., Krieger, J. E., Vallada, H., Arendt, J., Pereira, A. C., Knutson, K. L., Pedrazzoli, M., and von Schantz, M. (2020). Early chronotype with advanced activity rhythms and dim light melatonin onset in a rural population. Journal of Pineal Research, 69(3).
- Sollberger, (1962) Sollberger, A. (1962). General properties of biological rhythms. Annals of the New York Academy of Sciences, 98(4):757–774.
- Tong, (1976) Tong, Y. L. (1976). Parameter estimation in studying circadian rhythms. Biometrics, 32(1):85.
- Tsybakov, (2009) Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer New York.
- Weaver and Branden, (1995) Weaver, R. J. and Branden, M. N. (1995). Nonlinear mixed effects methodology for rhythmic data. US Army Conference on Applied Statistics, 18.
- Wittenbrink et al., (2018) Wittenbrink, N., Ananthasubramaniam, B., Münch, M., Koller, B., Maier, B., Weschke, C., Bes, F., de Zeeuw, J., Nowozin, C., Wahnschaffe, A., Wisniewski, S., Zaleska, M., Bartok, O., Ashwal-Fluss, R., Lammert, H., Herzel, H., Hummel, M., Kadener, S., Kunz, D., and Kramer, A. (2018). High-accuracy determination of internal circadian time from a single blood sample. Journal of Clinical Investigation, 128(9):3826–3839.
- Yamaguchi et al., (2003) Yamaguchi, S., Isejima, H., Matsuo, T., Okura, R., Yagita, K., Kobayashi, M., and Okamura, H. (2003). Synchronization of cellular clocks in the suprachiasmatic nucleus. Science, 302(5649):1408–1412.
- Zhang et al., (2014) Zhang, R., Lahens, N. F., Ballance, H. I., Hughes, M. E., and Hogenesch, J. B. (2014). A circadian gene expression atlas in mammals: Implications for biology and medicine. Proceedings of the National Academy of Sciences, 111(45):16219–16224.
- Zong et al., (2023) Zong, W., Seney, M. L., Ketchesin, K. D., Gorczyca, M. T., Liu, A. C., Esser, K. A., Tseng, G. C., McClung, C. A., and Huo, Z. (2023). Experimental design and power calculation in omics circadian rhythmicity detection using the cosinor model. Statistics in Medicine, 42(18):3236–3258.