Fast Gibbs sampling for the local and global trend Bayesian exponential smoothing model

Xueying Long
[email protected]
   Daniel F. Schmidt
[email protected]
   Christoph Bergmeir
[email protected]
   Slawek Smyl
[email protected]
Abstract

In Smyl et al. [Local and global trend Bayesian exponential smoothing models. International Journal of Forecasting, 2024.], a generalised exponential smoothing model was proposed that is able to capture strong trends and volatility in time series. This method achieved state-of-the-art performance in many forecasting tasks, but its fitting procedure, which is based on the NUTS sampler, is very computationally expensive. In this work, we propose several modifications to the original model, as well as a bespoke Gibbs sampler for posterior exploration; these changes improve sampling time by an order of magnitude, thus rendering the model much more practically relevant. The new model, and sampler, are evaluated on the M3 dataset and are shown to be competitive, or superior, in terms of accuracy to the original method, while being substantially faster to run.

Keywords: Exponential smoothing; Gibbs sampling; Scale mixtures

1 Introduction

Exponential smoothing (ETS) remains a standard forecasting procedure used in practice due to its simplicity, robustness and accuracy. In its most basic version, forecasts are produced by using the weighted sum of past observations, with the weights exponentially decaying in time. This basic version has been further extended to model trend and seasonality in either an additive or multiplicative form [6]; this is often referred to as the classical Holt-Winters method [27]. Many additional extensions of the classical framework exist; perhaps most notably, Gardner and Mckenzie [4] proposed a damped version of trend to make forecasts more conservative, particularly when the forecast horizon is long. Modern implementations of the ETS model, such as in the R forecast package [8] and the more recent fable package [17], can provide practitioners with fully automatic model selection, in which no expert knowledge is required during to make forecasts. In order to facilitate the generation of probabilistic forecasts, assumptions must be made regarding the distribution of the errors, or innovations. The classical choice is to assume that the errors are normally distributed with zero mean and a constant variance over time [9].

In the existing literature, most implementations of the ETS model are approached from a frequentist perspective. However, implementations within a Bayesian framework, such as those of Andrawis and Atiya [1], Bermúdez et al. [2, 3], have demonstrated promising results. The drawback of Bayesian approaches has traditionally been that implementation requires a certain amount of specialised expertise, specifically with regards to posterior sampling via MCMC. The development of generic Bayesian tools such as Stan [23] and JAGS [18] has eased the pain of the modelling and programming process, but Bayesian inference still sees relatively little application within the context of exponential smoothing models. This is presumably due to the fact that existing Bayesian implementations have not shown large accuracy gains vis à vis frequentist implementations while usually being extremely slow to fit, particularly as the models become more sophisticated.

The recently proposed local and global trend (LGT) exponential smoothing model [22] extends the classical ETS model to capture trends that grow faster than linear but slower than exponential, and relaxes the error assumption to accommodate non-normally distributed and heteroscedastic errors. This model has been able to achieve outstanding accuracy on well established benchmarks, attaining state-of-the-art performance on univariate forecasting tasks. However, while effective, a major issue with the LGT model is the high computational complexity of the Bayesian sampling procedure used to explore the posterior. In [22] the proposed LGT models were primarily implemented via Stan [24], with only some preliminary results for a bespoke Gibbs sampling implementation for a simplified, non-seasonal version of the model being provided. While this simplified Gibbs sampler promised speed improvements of an order of magnitude over the Stan implementations, while retaining comparable accuracy, Smyl et al. [22] did not provide details on derivation or implementation.

In this paper, we consolidate the seasonal and non-seasonal variants of LGT within a single model formulation, and then extend the preliminary Gibbs sampling procedure to handle this unified model. We also provide all derivations and details required for implementation. Comprehensive experiments performed on the M3 competition benchmarking dataset [14] demonstrate that the proposed Gibbs sampler is not only highly accurate, but crucially, orders of magnitude faster than the original Stan implementations. This dramatic speedup has the advantage of rendering the Bayesian global and local exponential smoothing model useable in practice, yielding a procedure with acceptable time complexity that achieves state-of-the-art accuracy in many forecasting tasks. Moreover, through the novel use of the powerful horseshoe shrinkage prior for estimation of the seasonality adjustments, the resulting procedure is highly robust to the potential misspecification of seasonality. In addition to the original Stan implementation, this newly proposed Gibbs sampler is available in the R Rlgt package on Github111https://github.com/cbergmeir/Rlgt. This package provides a complete implementation of the proposed procedure, with detailed documentation and comprehensive examples.

This paper is structured as follows. In Section 2, we review the LGT model originally proposed in Smyl et al. [22]. In Section 3, we review the basics of Bayesian inference and Monte Carlo Markov chain (MCMC) sampling approaches for posterior approximation. Section 4 introduces the modified LGT model and details the proposed Gibbs sampling procedure for exploring the posterior distribution. A comprehensive experimental study on the M3 dataset is provided in Section 6. We further discuss the robustness of seasonal priors under different scenarios with an ablation study in Section 7. Section 8 concludes our work.

2 The local and global trend model

Let ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT denotes the realisation of the time series at time t𝑡titalic_t. Then, Smyl et al. [22] define the local and global trend exponential smoothing model as:

yt+1t(ν,y^t+1,σ^t+1),similar-tosubscript𝑦𝑡1𝑡𝜈subscript^𝑦𝑡1subscript^𝜎𝑡1y_{t+1}\sim t(\nu,\hat{y}_{t+1},\hat{\sigma}_{t+1}),\\ italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∼ italic_t ( italic_ν , over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) , (1)

where

y^t+1subscript^𝑦𝑡1\displaystyle\hat{y}_{t+1}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT =lt+γltρ+λbt,absentsubscript𝑙𝑡𝛾superscriptsubscript𝑙𝑡𝜌𝜆subscript𝑏𝑡\displaystyle=l_{t}+\gamma l_{t}^{\rho}+\lambda b_{t},= italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_γ italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ρ end_POSTSUPERSCRIPT + italic_λ italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (2)
lt+1subscript𝑙𝑡1\displaystyle l_{t+1}italic_l start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT =αyt+1+(1α)lt,absent𝛼subscript𝑦𝑡11𝛼subscript𝑙𝑡\displaystyle=\alpha y_{t+1}+\left(1-\alpha\right)l_{t},= italic_α italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT + ( 1 - italic_α ) italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (3)
bt+1subscript𝑏𝑡1\displaystyle b_{t+1}italic_b start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT =β(lt+1lt)+(1β)bt,absent𝛽subscript𝑙𝑡1subscript𝑙𝑡1𝛽subscript𝑏𝑡\displaystyle=\beta\left(l_{t+1}-l_{t}\right)+\left(1-\beta\right)b_{t},= italic_β ( italic_l start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + ( 1 - italic_β ) italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (4)
σ^t+1subscript^𝜎𝑡1\displaystyle\hat{\sigma}_{t+1}over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT =σy^t+1τ+ξ.absent𝜎superscriptsubscript^𝑦𝑡1𝜏𝜉\displaystyle=\sigma\hat{y}_{t+1}^{\tau}+\xi.= italic_σ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT + italic_ξ . (5)

t(ν,μ,σ)𝑡𝜈𝜇𝜎t(\nu,\mu,\sigma)italic_t ( italic_ν , italic_μ , italic_σ ) denotes a Student t𝑡titalic_t-distribution with degrees-of-freedom ν>0𝜈0\nu>0italic_ν > 0, location μ𝜇\muitalic_μ and scale σ>0𝜎0\sigma>0italic_σ > 0. Table 1 details the parameters of this model and their interpretations. The one-step-ahead forecast y^t+1subscript^𝑦𝑡1\hat{y}_{t+1}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT is formed as a linear combination of the (smoothed) level value, ltsubscript𝑙𝑡l_{t}italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and local trend, btsubscript𝑏𝑡b_{t}italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, at the previous time step. The LGT model extends the classical non-seasonal, (damped) linear trend ETS model in three major ways. First, in contrast to the classical choice of normally distributed errors, the values of series under the LGT are instead assumed to follow a Student t𝑡titalic_t-distribution with degree-of-freedom ν𝜈\nuitalic_ν, location y^t+1subscript^𝑦𝑡1\hat{y}_{t+1}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT and scale σ^t+1subscript^𝜎𝑡1\hat{\sigma}_{t+1}over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT. The additional degrees-of-freedom parameter ν𝜈\nuitalic_ν controls the heaviness of the tail of the distribution; as ν𝜈\nuitalic_ν tends to infinity, the t𝑡titalic_t-distribution converges to a normal distribution, and as ν0𝜈0\nu\to 0italic_ν → 0 the tails of the t𝑡titalic_t-distribution become increasingly heavier. Such a generalised, heavy-tailed distribution allows for the LSGT model to better capture the volatility in a time series, and provides resistance to the influence of outliers of the series. The second important difference from the classical ETS model is the introduction of the “global” trend term used when forming the one-step-ahead forecast (2). The linear weight and power parameters γ𝛾\gammaitalic_γ and ρ𝜌\rhoitalic_ρ are constant over the entire series, and in this sense are global to the series. The expression γltρ𝛾superscriptsubscript𝑙𝑡𝜌\gamma l_{t}^{\rho}italic_γ italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ρ end_POSTSUPERSCRIPT is a generalisation of the linear and exponential trends [22], and has been demonstrated to perform well in capturing trends that grows faster than linear but slower than exponential (for ρ>0𝜌0\rho>0italic_ρ > 0). This term can also model the damped trend that is popular in forecasting [4] if ρ𝜌\rhoitalic_ρ is taken to be negative. The third difference from the classical ETS models is the introduction of heteroscedasticity through the use use of a dynamic scale term σ^t+1subscript^𝜎𝑡1\hat{\sigma}_{t+1}over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT, given by (5), which is formed from a linear combination of a powered version of the prediction y^t+1subscript^𝑦𝑡1\hat{y}_{t+1}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT, plus an offset term ξ>0𝜉0\xi>0italic_ξ > 0. In practice, the scale of errors is very likely to vary with time, and (5) accommodates the possibility of a larger scale of error for larger values of the series, with the rate of growth controlled by the power parameter τ>0𝜏0\tau>0italic_τ > 0.

Description
ν𝜈\nuitalic_ν degree-of-freedom parameter in the student t𝑡titalic_t-distribution
γ𝛾\gammaitalic_γ coefficient trend of the global trend
ρ𝜌\rhoitalic_ρ power coefficient of the global trend, in [0.5,1]0.51[-0.5,1][ - 0.5 , 1 ]
λ𝜆\lambdaitalic_λ dam** coefficient of the local trend, in [0,1]01[0,1][ 0 , 1 ]
α𝛼\alphaitalic_α level smoothing parameter, in [0,1]01[0,1][ 0 , 1 ]
β𝛽\betaitalic_β local trend smoothing parameter, in [0,1]01[0,1][ 0 , 1 ]
ζ𝜁\zetaitalic_ζ seasonality smoothing parameter, in [0,1]01[0,1][ 0 , 1 ]
σ𝜎\sigmaitalic_σ coefficient of the size of error, positive
τ𝜏\tauitalic_τ power coefficient of the size of error, in [0,1]01[0,1][ 0 , 1 ]
ξ𝜉\xiitalic_ξ minimum value of the size of error, positive
b1subscript𝑏1b_{1}italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT initial local trend
sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT initial seasonality, positive, i = 1,…, m
Table 1: Parameters of the LGT and SGT model in the original paper.

A seasonal version of the LGT, called the seasonal global trend (SGT) model was also introduced in Smyl et al. [22]. Under the SGT, the time series ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is again modelled using a Student t𝑡titalic_t-distribution, as per (1), but the model forecasts y^t+1subscript^𝑦𝑡1\hat{y}_{t+1}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT and σ^t+1subscript^𝜎𝑡1\hat{\sigma}_{t+1}over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT are modified to handle multiplicative seasonality effects:

y^t+1subscript^𝑦𝑡1\displaystyle\hat{y}_{t+1}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT =(lt+γltρ)st+1,absentsubscript𝑙𝑡𝛾superscriptsubscript𝑙𝑡𝜌subscript𝑠𝑡1\displaystyle=(l_{t}+\gamma l_{t}^{\rho})s_{t+1},= ( italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_γ italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ρ end_POSTSUPERSCRIPT ) italic_s start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , (6)
lt+1subscript𝑙𝑡1\displaystyle l_{t+1}italic_l start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT =αyt+1st+1+(1α)lt,absent𝛼subscript𝑦𝑡1subscript𝑠𝑡11𝛼subscript𝑙𝑡\displaystyle=\alpha\frac{y_{t+1}}{s_{t+1}}+\left(1-\alpha\right)l_{t},= italic_α divide start_ARG italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_s start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_ARG + ( 1 - italic_α ) italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (7)
st+msubscript𝑠𝑡𝑚\displaystyle s_{t+m}italic_s start_POSTSUBSCRIPT italic_t + italic_m end_POSTSUBSCRIPT =ζytlt+(1ζ)st,absent𝜁subscript𝑦𝑡subscript𝑙𝑡1𝜁subscript𝑠𝑡\displaystyle=\zeta\frac{y_{t}}{l_{t}}+(1-\zeta)s_{t},= italic_ζ divide start_ARG italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG + ( 1 - italic_ζ ) italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (8)
σ^t+1subscript^𝜎𝑡1\displaystyle\hat{\sigma}_{t+1}over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT =σy^t+1τ+ξ.absent𝜎superscriptsubscript^𝑦𝑡1𝜏𝜉\displaystyle=\sigma\hat{y}_{t+1}^{\tau}+\xi.= italic_σ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT + italic_ξ . (9)

with

1mi=1msi=1.1𝑚superscriptsubscript𝑖1𝑚subscript𝑠𝑖1\frac{1}{m}\sum_{i=1}^{m}s_{i}=1.divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 . (10)

The SGT model is an extension of the classical Holt-Winters model that also possesses the improvements discussed above for the non-seasonal version, and with parameters described in Table 1, and is presented in Smyl et al. [22] as a separate model from the non-seasonal LGT. For simplicity, the local trend component is not included in (6) when forming the one-step-ahead forecasts, as empirical evidence suggested this term provided no benefit when forecasting seasonal series. The seasonality terms are multiplicative factors, and their overall effect should be not change the scale of the data; the sum constraint (10) is introduced to ensure this. For further details on the LGT and SGT models we refer the reader to Smyl et al. [22] for a thorough discussion of the model and parameter space.

While these models are flexible and powerful extensions of the classical ETS model that have demonstrated state-of-the-art performance in forecasting benchmarks, they are also substantial more complex and contain a number of additional free parameters that must be fitted. As the series to which these techniques can be applied may often be short, a Bayesian framing of the problem was used in Smyl et al. [22] for model fitting and forecasting. Monte Carlo Markov chain (MCMC) sampling via the generic sampling tool “Stan” was used for posterior approximation. This process is computationally expensive, meaning that the overall fitting time, even for short series is often prohibitively long. This is the primary weakness of the LGT/SGT models in comparison to other forecasting techniques, and is partly due to the use of a generic tool which cannot directly exploit properties of the model, and partly due to the model formulation which introduces additional dependencies between model parameters, which is known to have detrimental effects on posterior exploration via MCMC. To address this weakness, this paper proposes a unified, modified LGT/SGT model and an accompanying Gibbs sampler that dramatically speeds up the MCMC sampling process. The next section discusses the fundamentals of Bayesian inference and MCMC sampling procedures, which prepares for the formal introduction of proposed sampler later.

3 Monte Carlo Markov Chain: A Brief Review

In the Bayesian framework, the parameters θ𝜃\thetaitalic_θ of the model p(y|θ)𝑝conditional𝑦𝜃p(y\,|\,\theta)italic_p ( italic_y | italic_θ ) are assumed to be random variables that follow a prior distribution π(θ)𝜋𝜃\pi(\theta)italic_π ( italic_θ ). The posterior distribution of θ𝜃\thetaitalic_θ, after seeing the sample data y𝑦yitalic_y, is given by

p(θ|y1,,yn)=p(y1,,yn|θ)π(θ)p(y1,,yn)p(y1,,yn|θ)π(θ),𝑝conditional𝜃subscript𝑦1subscript𝑦𝑛𝑝subscript𝑦1conditionalsubscript𝑦𝑛𝜃𝜋𝜃𝑝subscript𝑦1subscript𝑦𝑛proportional-to𝑝subscript𝑦1conditionalsubscript𝑦𝑛𝜃𝜋𝜃p(\theta\,|\,y_{1},\ldots,y_{n})=\frac{p(y_{1},\ldots,y_{n}\,|\,\theta)\pi(% \theta)}{p(y_{1},\ldots,y_{n})}\propto p(y_{1},\ldots,y_{n}\,|\,\theta)\pi(% \theta),italic_p ( italic_θ | italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = divide start_ARG italic_p ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | italic_θ ) italic_π ( italic_θ ) end_ARG start_ARG italic_p ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG ∝ italic_p ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | italic_θ ) italic_π ( italic_θ ) ,

where p(y1,,yn|θ)𝑝subscript𝑦1conditionalsubscript𝑦𝑛𝜃p(y_{1},\ldots,y_{n}\,|\,\theta)italic_p ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | italic_θ ) is the likelihood function and

p(y1,,yn)=Θp(y1,,yn|θ)π(θ)𝑑θ𝑝subscript𝑦1subscript𝑦𝑛subscriptΘ𝑝subscript𝑦1conditionalsubscript𝑦𝑛𝜃𝜋𝜃differential-d𝜃p(y_{1},\ldots,y_{n})=\int_{\Theta}p(y_{1},\ldots,y_{n}\,|\,\theta)\pi(\theta)d\thetaitalic_p ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = ∫ start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT italic_p ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | italic_θ ) italic_π ( italic_θ ) italic_d italic_θ

is the marginal probability of the data. From the posterior distribution, one can obtain full information about the model parameters. However, the high dimensional integral required to compute the normalising term p(y1,,yn)𝑝subscript𝑦1subscript𝑦𝑛p(y_{1},\ldots,y_{n})italic_p ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) is usually intractable, so in practice, a simulation approach, such as Monte Carlo Markov chain, is frequently used to approximate the posterior distribution. In the MCMC approach the posterior is approximated via a set of samples, say θ(1),θ(2),,θ(m)superscript𝜃1superscript𝜃2superscript𝜃𝑚\theta^{(1)},\theta^{(2)},\dots,\theta^{(m)}italic_θ start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , italic_θ start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT , … , italic_θ start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT, that are randomly drawn (simulated) from the posterior distribution. A key strength of the MCMC approach is that it is simulation consistent, in the sense that the sample distribution will converge almost surely to the exact posterior distribution as m𝑚m\to\inftyitalic_m → ∞. Estimates of parameters or other posterior quantities such as intervals can readily be obtained from the collection of posterior samples.

Recently, powerful generic Bayesian tools such as Stan have become available for Bayesian modelling and posterior exploration. These allow the non-specialist to define a Bayesian hierarchy and obtain posterior samples via Hamiltonian Monte Carlo approach. However, this generality comes at a price, as the No-U-Turn sampler (NUTS) [5] used by Stan, can be computationally expensive for even moderate numbers of model parameters. This can lead to low sampling efficiency relative to run-time, particularly if some of the model parameters exhibit a high statistical dependency. In contrast, by exploiting the specific statistical properties of a problem one can often apply computationally cheap algorithms, such as Gibbs sampling, for relatively sophisticated models. As such, depending on the structure of the model and prior distributions in question, generic Bayesian tools may be unnecessarily computationally costly, and it may be possible to obtain large computational speed-ups by develo** bespoke sampling algorithms. We will now briefly review several key random number generation algorithms which are often used as building blocks for sampling algorithms.

3.1 Gibbs sampling

Gibbs sampling is a key MCMC algorithm. They idea underlying Gibbs sampling is that we may sample from a joint density by iteratively sampling each random variable (in our case, model parameters) iteratively from their conditional densities [20]. This means that instead of sampling from a high-dimensional joint distribution directly, the sampling process reduces to sampling from a sequence of conditional posteriors that are potentially easier to sample from. For example, if we want to sample from a posterior distribution p(θ1,θ2,θ3|y)𝑝subscript𝜃1subscript𝜃2conditionalsubscript𝜃3𝑦p(\theta_{1},\theta_{2},\theta_{3}\,|\,y)italic_p ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT | italic_y ), then we may instead iteratively sample from p(θ1|θ2,θ3,y)𝑝conditionalsubscript𝜃1subscript𝜃2subscript𝜃3𝑦p(\theta_{1}\,|\,\theta_{2},\theta_{3},y)italic_p ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_y ), p(θ2|θ1,θ3,y)𝑝conditionalsubscript𝜃2subscript𝜃1subscript𝜃3𝑦p(\theta_{2}\,|\,\theta_{1},\theta_{3},y)italic_p ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_y ) and p(θ3|θ1,θ2,y)𝑝conditionalsubscript𝜃3subscript𝜃1subscript𝜃2𝑦p(\theta_{3}\,|\,\theta_{1},\theta_{2},y)italic_p ( italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT | italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_y ) (the order in which we choose to sample is irrelevant). Such a process allows us a free choice of sampling algorithm for each of the conditional distributions, and is most efficient when the conditional distributions for the parameters can be identified as some well-studied distributions, for which efficient random sampling algorithms exist. A weakness of Gibbs sampling is the high degree of dependency that is often present in the random samples, particularly if the parameters exhibit a high degree of statistical dependency. Variants of the basic Gibbs sampler have been introduced to mitigate this problem; specifically grou** or collapsing [12]. When sampling the joint distribution of θ1subscript𝜃1\theta_{1}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and θ2subscript𝜃2\theta_{2}italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is possible, we can group θ1subscript𝜃1\theta_{1}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and θ2subscript𝜃2\theta_{2}italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and the sampling process becomes

  1. 1.

    Sample θ1subscript𝜃1\theta_{1}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, θ2subscript𝜃2\theta_{2}italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT from p(θ1,θ2|θ3,y)𝑝subscript𝜃1conditionalsubscript𝜃2subscript𝜃3𝑦p(\theta_{1},\theta_{2}\,|\,\theta_{3},y)italic_p ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_y );

  2. 2.

    Sample θ3subscript𝜃3\theta_{3}italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT from p(θ3|θ1,θ2,y)𝑝conditionalsubscript𝜃3subscript𝜃1subscript𝜃2𝑦p(\theta_{3}\,|\,\theta_{1},\theta_{2},y)italic_p ( italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT | italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_y ).

This will generally act to reduce correlation in the MCMC chain. Alternatively, if the marginal distribution, say p(θ1,θ2|y)=p(θ1|θ2,θ3,y)𝑑θ3𝑝subscript𝜃1conditionalsubscript𝜃2𝑦𝑝conditionalsubscript𝜃1subscript𝜃2subscript𝜃3𝑦differential-dsubscript𝜃3p(\theta_{1},\theta_{2}\,|\,y)=\int p(\theta_{1}\,|\,\theta_{2},\theta_{3},y)d% \theta_{3}italic_p ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | italic_y ) = ∫ italic_p ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_y ) italic_d italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT by integrating out the auxiliary variable θ3subscript𝜃3\theta_{3}italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, is easy to sample from, one can implement a collapsed Gibbs sampler:

  1. 1.

    Sample θ1subscript𝜃1\theta_{1}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT from p(θ1|θ2,y)𝑝conditionalsubscript𝜃1subscript𝜃2𝑦p(\theta_{1}\,|\,\theta_{2},y)italic_p ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_y );

  2. 2.

    Sample θ2subscript𝜃2\theta_{2}italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT from p(θ2|θ1,y)𝑝conditionalsubscript𝜃2subscript𝜃1𝑦p(\theta_{2}\,|\,\theta_{1},y)italic_p ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y );

  3. 3.

    Sample the auxiliary variable θ3subscript𝜃3\theta_{3}italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT from p(θ3|θ1,θ2,y)𝑝conditionalsubscript𝜃3subscript𝜃1subscript𝜃2𝑦p(\theta_{3}\,|\,\theta_{1},\theta_{2},y)italic_p ( italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT | italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_y ).

In this case, by integrating out parameters we can again reduce correlation in the resulting Markov chain. In this paper, a bespoke sampler is developed based on Gibbs sampling. A majority of the conditional posteriors can be written as recognisable distributions that are straightforward to sample from using standard random sampling algorithms. For those conditional distributions for which this is not the case, we utilise either the Metropolis-Hastings algorithm or the grid sampling algorithm to generate random samples.

3.1.1 Scale Mixtures

As noted, Gibbs sampling is most effective when the conditional distributions can be identified as some well studied distributions. Usually, this is only the case when the prior and likelihoods are conjugate, which in turn requires that the distributions involved be members of the exponential family. In the case that non-exponential family distributions are used, it is still possible to retain conditional conjugacy by the use of continuous mixtures. The most common mixture representation is known as the scale-mixture-of-normals. A density q(x)𝑞𝑥q(x)italic_q ( italic_x ) is representable by a scale-mixture-of-normals if it can be written as

q(x)=ϕ(x|m,s2)p(s2)𝑑s2,𝑞𝑥italic-ϕconditional𝑥𝑚superscript𝑠2𝑝superscript𝑠2differential-dsuperscript𝑠2q(x)=\int\phi(x\,|\,m,s^{2})p(s^{2})ds^{2},italic_q ( italic_x ) = ∫ italic_ϕ ( italic_x | italic_m , italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_p ( italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_d italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where ϕ(x|m,s2)italic-ϕconditional𝑥𝑚superscript𝑠2\phi(x\,|\,m,s^{2})italic_ϕ ( italic_x | italic_m , italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) denotes the probability density of a normal distribution with mean m𝑚mitalic_m and standard deviation s𝑠sitalic_s, and p(s2)𝑝superscript𝑠2p(s^{2})italic_p ( italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) is a mixing density. For example, the Student t-distribution can be expressed as a normal-inverse-gamma mixture [11]. That is, if a random variable follows a Student t-distribution with degree-of-freedom ν𝜈\nuitalic_ν, location μ𝜇\muitalic_μ and scale τ𝜏\tauitalic_τ, i.e.,

y|ν,μ,τt(ν,μ,τ),similar-toconditional𝑦𝜈𝜇𝜏𝑡𝜈𝜇𝜏y\,|\,\nu,\mu,\tau\sim t(\nu,\mu,\tau),italic_y | italic_ν , italic_μ , italic_τ ∼ italic_t ( italic_ν , italic_μ , italic_τ ) ,

then the density can be equivalently expressed as the following mixture,

y|μ,τ,ωconditional𝑦𝜇𝜏𝜔\displaystyle y\,|\,\mu,\tau,\omegaitalic_y | italic_μ , italic_τ , italic_ω N(μ,τ2ω2),similar-toabsent𝑁𝜇superscript𝜏2superscript𝜔2\displaystyle\sim N(\mu,\tau^{2}\omega^{2}),∼ italic_N ( italic_μ , italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (11)
ω2|νconditionalsuperscript𝜔2𝜈\displaystyle\omega^{2}\,|\,\nuitalic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | italic_ν IG(ν2,ν2),similar-toabsentIG𝜈2𝜈2\displaystyle\sim\rm{IG}\left(\frac{\nu}{2},\frac{\nu}{2}\right),∼ roman_IG ( divide start_ARG italic_ν end_ARG start_ARG 2 end_ARG , divide start_ARG italic_ν end_ARG start_ARG 2 end_ARG ) , (12)

where IG(α,β)IG𝛼𝛽\rm{IG}(\alpha,\beta)roman_IG ( italic_α , italic_β ) denotes the inverse-gamma distribution with shape α𝛼\alphaitalic_α and scale β𝛽\betaitalic_β. Here, the variable ω𝜔\omegaitalic_ω is often referred to as a latent or auxiliary variable. The Student t-distribution density can be recovered by marginalising over y𝑦yitalic_y. While the introduction of a latent variable into a Bayesian hierarchy increases the number of random variables that need to be sampled, it brings the considerable advantage that the Student-t𝑡titalic_t, which is not a member of the exponential family, is now representable as a mixture of exponential family distributions. This opens the possibility of conditional conjugacy by choice of appropriate prior distributions, which itself substantially facilitates efficient Gibbs sampling. We use this technique extensively in this paper when constructing the Gibbs sampler in Section 5.

3.2 The Metropolis-Hastings algorithm

The Metropolis-Hastings algorithm is a versatile and powerful sampling method that is particular useful when direct sampling from the target distribution p(θ)𝑝𝜃p(\theta)italic_p ( italic_θ ) is difficult. Given a proposal distribution, q(θ|θ)𝑞conditionalsuperscript𝜃𝜃q(\theta^{*}\,|\,\theta)italic_q ( italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | italic_θ ), samples can be generated according to the following procedure [19]:

  1. 1.

    Generate a proposal θsuperscript𝜃\theta^{*}italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT from q(θ|θ{i1})𝑞conditionalsuperscript𝜃superscript𝜃𝑖1q(\theta^{*}\,|\,\theta^{\{i-1\}})italic_q ( italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | italic_θ start_POSTSUPERSCRIPT { italic_i - 1 } end_POSTSUPERSCRIPT );

  2. 2.

    generate uU(0,1)similar-to𝑢𝑈01u\sim U(0,1)italic_u ∼ italic_U ( 0 , 1 ), and accept θsuperscript𝜃\theta^{*}italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT if

    u<p(θ)p(θ{i1})q(θ{i1}|θ)q(θ|θ{i1}).𝑢𝑝superscript𝜃𝑝superscript𝜃𝑖1𝑞conditionalsuperscript𝜃𝑖1superscript𝜃𝑞conditionalsuperscript𝜃superscript𝜃𝑖1u<\frac{p(\theta^{*})}{p(\theta^{\{i-1\}})}\frac{q(\theta^{\{i-1\}}\,|\,\theta% ^{*})}{q(\theta^{*}\,|\,\theta^{\{i-1\}})}.italic_u < divide start_ARG italic_p ( italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_p ( italic_θ start_POSTSUPERSCRIPT { italic_i - 1 } end_POSTSUPERSCRIPT ) end_ARG divide start_ARG italic_q ( italic_θ start_POSTSUPERSCRIPT { italic_i - 1 } end_POSTSUPERSCRIPT | italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_q ( italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | italic_θ start_POSTSUPERSCRIPT { italic_i - 1 } end_POSTSUPERSCRIPT ) end_ARG .

Here, the quantity θ{i}superscript𝜃𝑖\theta^{\{i\}}italic_θ start_POSTSUPERSCRIPT { italic_i } end_POSTSUPERSCRIPT denotes the i𝑖iitalic_i-th sample in the Markov chain. A very common choice of proposal distribution is

θ|θ{i1}MVN(μ(θ{i1}),Σ(α,θ{i1})),similar-toconditionalsuperscript𝜃superscript𝜃𝑖1MVN𝜇superscript𝜃𝑖1Σ𝛼superscript𝜃𝑖1\theta^{*}\,|\,\theta^{\{i-1\}}\sim{\rm MVN}\left(\mu\left(\theta^{\{i-1\}}% \right),\Sigma\left(\alpha,\theta^{\{i-1\}}\right)\right),italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | italic_θ start_POSTSUPERSCRIPT { italic_i - 1 } end_POSTSUPERSCRIPT ∼ roman_MVN ( italic_μ ( italic_θ start_POSTSUPERSCRIPT { italic_i - 1 } end_POSTSUPERSCRIPT ) , roman_Σ ( italic_α , italic_θ start_POSTSUPERSCRIPT { italic_i - 1 } end_POSTSUPERSCRIPT ) ) ,

where MVN()MVN{\rm MVN}(\cdot)roman_MVN ( ⋅ ) denotes a multivariate normal distribution, with μ()𝜇\mu(\cdot)italic_μ ( ⋅ ) and Σ()Σ\Sigma(\cdot)roman_Σ ( ⋅ ) functions that determine the mean and covariance of the multivariate normal distribution, respectively. The parameter α𝛼\alphaitalic_α is often called a “step-size”, and usually controls the overall scale of the proposal; this generally needs to be chosen so that the Metropolis-Hastings procedure yields an acceptance rate around 50% to 60%. In this paper we use a specific variation of Metropolis-Hastings, derived from the algorithm in Titsias and Papaspiliopoulos [25], in which μ()𝜇\mu(\cdot)italic_μ ( ⋅ ) is determined using the gradient of the negative log-likelihood and Σ()Σ\Sigma(\cdot)roman_Σ ( ⋅ ) is determined based on both the step-size and the curvature of the prior distribution. The step-size is automatically tuned using the algorithm presented in Schmidt and Makalic [21].

3.3 Grid sampling

A finite grid approximation (“grid sampling”) is a simple and fast way to approximate a posterior distribution [16]. Generating a sampling using a grid sampler consists of the following steps. First, generate a set of finite candidates, say Θ¯={θ¯1,,θ¯q}Θ¯Θsubscript¯𝜃1subscript¯𝜃𝑞Θ\bar{\Theta}=\{\bar{\theta}_{1},\ldots,\bar{\theta}_{q}\}\subset\Thetaover¯ start_ARG roman_Θ end_ARG = { over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT } ⊂ roman_Θ from the parameter space ΘΘ\Thetaroman_Θ, and compute the corresponding posterior probability density of the conditional posterior distribution, p(θ|)𝑝conditional𝜃p(\theta\,|\,\cdots)italic_p ( italic_θ | ⋯ ) at each of these candidates. Then, normalise these density values and treat them as a multinomial distribution over the set Θ¯¯Θ\bar{\Theta}over¯ start_ARG roman_Θ end_ARG, i.e.,

(θ=θ¯i)p(θ¯i|).proportional-to𝜃subscript¯𝜃𝑖𝑝conditionalsubscript¯𝜃𝑖\mathbb{P}(\theta=\bar{\theta}_{i})\propto p(\bar{\theta}_{i}\,|\,\cdots).blackboard_P ( italic_θ = over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∝ italic_p ( over¯ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ⋯ ) .

Finally, we draw a sample from this multinomial distribution. It is important to note that samples drawn from a grid-sampler will represent only a quantised approximation of the original continuous distribution; however, in many cases, this may be adequate if q𝑞qitalic_q is chosen to be sufficiently large, or if model is relatively insensitive to the precise value of the parameter being sampled. An advantage of grid sampling is that each draw is independent, which helps to reduce overall correlation in the Markov chain. With an appropriately chosen grid, grid sampling can be both an efficient and accurate alternative to methods such as Metropolis-Hastings or rejection sampling.

4 The Local-Seasonal-Global Trend (LSGT) Model

In this work, we present a unified version of the LGT and SGT, which we call the LSGT model. In addition to unify the two models into a single formulation, we also make several adjustments to the model specification; these are designed to reduce statistical dependency between model parameters, as well to simplify posterior sampling. The LSGT models observation yt+1subscript𝑦𝑡1y_{t+1}italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT as

yt+1|y^t+1,σ^t+1,νt(ν,y^t+1,σ^t+1),similar-toconditionalsubscript𝑦𝑡1subscript^𝑦𝑡1subscript^𝜎𝑡1𝜈𝑡𝜈subscript^𝑦𝑡1subscript^𝜎𝑡1y_{t+1}\,|\,\hat{y}_{t+1},\hat{\sigma}_{t+1},\nu\sim t(\nu,\hat{y}_{t+1},\hat{% \sigma}_{t+1}),italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT | over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , italic_ν ∼ italic_t ( italic_ν , over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) , (13)

where

y^t+1subscript^𝑦𝑡1\displaystyle\hat{y}_{t+1}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT =(lt+γltρ+λbt)st+1m,absentsubscript𝑙𝑡𝛾superscriptsubscript𝑙𝑡𝜌𝜆subscript𝑏𝑡subscript𝑠𝑡1𝑚\displaystyle=(l_{t}+\gamma l_{t}^{\rho}+\lambda b_{t})s_{t+1-m},= ( italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_γ italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ρ end_POSTSUPERSCRIPT + italic_λ italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_s start_POSTSUBSCRIPT italic_t + 1 - italic_m end_POSTSUBSCRIPT , (14)
ltsubscript𝑙𝑡\displaystyle l_{t}italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =α(ytstm)+(1α)lt1,absent𝛼subscript𝑦𝑡subscript𝑠𝑡𝑚1𝛼subscript𝑙𝑡1\displaystyle=\alpha\left(\frac{y_{t}}{s_{t-m}}\right)+(1-\alpha)l_{t-1},= italic_α ( divide start_ARG italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_s start_POSTSUBSCRIPT italic_t - italic_m end_POSTSUBSCRIPT end_ARG ) + ( 1 - italic_α ) italic_l start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , (15)
btsubscript𝑏𝑡\displaystyle b_{t}italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =β(ltlt1)+(1β)bt1,absent𝛽subscript𝑙𝑡subscript𝑙𝑡11𝛽subscript𝑏𝑡1\displaystyle=\beta(l_{t}-l_{t-1})+(1-\beta)b_{t-1},= italic_β ( italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_l start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) + ( 1 - italic_β ) italic_b start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , (16)
logstlogsubscript𝑠𝑡\displaystyle{\rm log}s_{t}roman_log italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =ζlogytlt+(1ζ)logstm,absent𝜁logsubscript𝑦𝑡subscript𝑙𝑡1𝜁logsubscript𝑠𝑡𝑚\displaystyle=\zeta{\rm log}\frac{y_{t}}{l_{t}}+(1-\zeta){\rm log}s_{t-m},= italic_ζ roman_log divide start_ARG italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG + ( 1 - italic_ζ ) roman_log italic_s start_POSTSUBSCRIPT italic_t - italic_m end_POSTSUBSCRIPT , (17)
σ^t+12superscriptsubscript^𝜎𝑡12\displaystyle\hat{\sigma}_{t+1}^{2}over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =χ2(ϕ2+(1ϕ)2lt2τ),absentsuperscript𝜒2superscriptitalic-ϕ2superscript1italic-ϕ2superscriptsubscript𝑙𝑡2𝜏\displaystyle=\chi^{2}(\phi^{2}+(1-\phi)^{2}l_{t}^{2\tau}),= italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_ϕ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_τ end_POSTSUPERSCRIPT ) , (18)

subject to

imlogsi=0.superscriptsubscript𝑖𝑚logsubscript𝑠𝑖0\sum_{i}^{m}{\rm log}s_{i}=0.∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 . (19)

The parameters of this model are described in Table 2. The LSGT model includes both the seasonal and non-seasonal variants with no specific distinction between the two; instead, we can recover either variant by setting some of the model parameters to specific constants. When all the seasonality factors stsubscript𝑠𝑡s_{t}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are set to 1, i.e., there is no seasonal modification, the LSGT model reduces to a version of the original LGT model. Setting λ=0𝜆0\lambda=0italic_λ = 0 yields the seasonal version of the LSGT.

The LSGT model also makes several changes in model formulation to the LGT and SGT models of  Smyl et al. [22] discussed in Section 2. An important modification is in the way in heteroscedasticity is incorporated into the model. In the LSGT model, the conditional scale σ^t+1subscript^𝜎𝑡1\hat{\sigma}_{t+1}over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT, given by (18), depends on the global level lt+1subscript𝑙𝑡1l_{t+1}italic_l start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT rather then the one-step-ahead forecast y^t+1subscript^𝑦𝑡1\hat{y}_{t+1}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT as in the original LGT/SGT models. This has two effects: (i) it decouples the scale from the location (forecast) in the Student-t𝑡titalic_t distribution, reducing correlation between the parameters (γ,ρ,λ)𝛾𝜌𝜆(\gamma,\rho,\lambda)( italic_γ , italic_ρ , italic_λ ) and (χ,ϕ,τ)𝜒italic-ϕ𝜏(\chi,\phi,\tau)( italic_χ , italic_ϕ , italic_τ ); and (ii) the conditional distribution for the weights λ𝜆\lambdaitalic_λ and γ𝛾\gammaitalic_γ reduces to a linear regression. A second point of difference is that in the original LGT/SGT formulation, the heteroskedasticity is handled by summing the standard deviations of the homoskedastic and time varying components. This formulation is somewhat unnatural, as the variances of sums of random variables are additive, rather than their standard deviations. This formulation also introduces substantial correlation between the standard deviation of the homoskedastic component, ξ𝜉\xiitalic_ξ, and the scale σ𝜎\sigmaitalic_σ of the heteroskedastic component, as both must be adjusted simultaneously to maintain the same overall scale of errors. In contrast, from (18) it is clear that LSGT directly models the conditional variance of yt+1subscript𝑦𝑡1y_{t+1}italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT as a scaled mixture of the homoskedastic and heteroskedastic terms. The parameter χ𝜒\chiitalic_χ controls the overall scale of the error terms, while the mixing parameter ϕitalic-ϕ\phiitalic_ϕ controls how much contribution is made to the variance by the homoskedastic and heteroskedastic terms, with the model reducing to a purely heteroskedastic form when ϕ=1italic-ϕ1\phi=1italic_ϕ = 1. This formulation has two benefits: (i) it reduces the correlation between the parameters that determine σ^t+12superscriptsubscript^𝜎𝑡12\hat{\sigma}_{t+1}^{2}over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT substantially, and (ii) it allows us to easily utilise a scale-mixture representation of the Student-t𝑡titalic_t distribution to simplify sampling. The final modification relates to the way in which the seasonality adjustments are handled. As these quantities appear as multiplicative factors when forecasting the level (15), they are smoothed on the logarithmic scale in the LSGT model, as per (17). As the seasonal factors should not introduce an overall change in scale, the sum-constraint (19) ensures that they have a zero sum in the logarithmic scale, or equivalently, that their product is equal to one.

Description
ν𝜈\nuitalic_ν degree-of-freedom parameter in the student t𝑡titalic_t-distribution
γ𝛾\gammaitalic_γ coefficient trend of the global trend
ρ𝜌\rhoitalic_ρ power coefficient of the global trend, in [0.5,1]0.51[-0.5,1][ - 0.5 , 1 ]
λ𝜆\lambdaitalic_λ dam** coefficient of the local trend, in [100,1]1001[-100,1][ - 100 , 1 ]
α𝛼\alphaitalic_α level smoothing parameter, in [0,1]01[0,1][ 0 , 1 ]
β𝛽\betaitalic_β local trend smoothing parameter, in [0,1]01[0,1][ 0 , 1 ]
ζ𝜁\zetaitalic_ζ seasonality smoothing parameter, in [0,1]01[0,1][ 0 , 1 ]
χ𝜒\chiitalic_χ scale of error, positive, constant for each time period
ϕitalic-ϕ\phiitalic_ϕ mixture of homoscedastic error and heteroscedastic error parameter, in [0,1]01[0,1][ 0 , 1 ]
τ𝜏\tauitalic_τ power coefficient of the heteroscedastic error, in [0,1]01[0,1][ 0 , 1 ]
b1subscript𝑏1b_{1}italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT initial local trend
sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT initial seasonality, positive, i = 1,…, m
Table 2: Parameters of the LGT model.

4.1 Prior distributions

As we are using a Bayesian approach to learn the LSGT model we require the specification of suitable prior distributions over all model parameters. To avoid our choice of prior distributions introducing a strong estimation bias we choose to use weakly informative priors where appropriate. The overall error scale χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is assigned a standard uninformative scale-invariant prior χ2(1/χ2)dχ2similar-tosuperscript𝜒21superscript𝜒2𝑑superscript𝜒2\chi^{2}\sim(1/\chi^{2})d{\chi^{2}}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∼ ( 1 / italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_d italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The coefficients γ𝛾\gammaitalic_γ and λ𝜆\lambdaitalic_λ, and the initial value of the local trend b1subscript𝑏1b_{1}italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, are all assigned weakly informative Cauchy prior distributions:

γC(0,sγ),λC(0,sλ)andb1C(0,sb).formulae-sequencesimilar-to𝛾𝐶0subscript𝑠𝛾similar-to𝜆𝐶0subscript𝑠𝜆andsubscript𝑏1similar-to𝐶0subscript𝑠𝑏\gamma\sim C(0,s_{\gamma}),\;\;\lambda\sim C(0,s_{\lambda})\;\;{\rm and}\;\;b_% {1}\sim C(0,s_{b}).italic_γ ∼ italic_C ( 0 , italic_s start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ) , italic_λ ∼ italic_C ( 0 , italic_s start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ) roman_and italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∼ italic_C ( 0 , italic_s start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) .

This choice of prior distribution a priori preferences smaller values of the coefficients, while still allowing large values to be a priori plausible. By default we take sλ=1subscript𝑠𝜆1s_{\lambda}=1italic_s start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT = 1 and sγ=sb=max(𝐲)/100subscript𝑠𝛾subscript𝑠𝑏max𝐲100s_{\gamma}=s_{b}={\rm max}({\bf y})/100italic_s start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT = italic_s start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = roman_max ( bold_y ) / 100, allowing the prior distributions for γ𝛾\gammaitalic_γ and b1subscript𝑏1b_{1}italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to automatically adapt to the scale of the time series. The smoothing parameters are defined on (0,1)01(0,1)( 0 , 1 ), and are assigned beta prior distributions

α,β,ζBeta(a,b).similar-to𝛼𝛽𝜁Beta𝑎𝑏\alpha,\beta,\zeta\sim{\rm Beta}(a,b).italic_α , italic_β , italic_ζ ∼ roman_Beta ( italic_a , italic_b ) .

The default choice of hyperparameters is a=1𝑎1a=1italic_a = 1 and b=1/2𝑏12b=1/2italic_b = 1 / 2; this distribution masses more prior probability near α=1𝛼1\alpha=1italic_α = 1 (say) than α=0𝛼0\alpha=0italic_α = 0. This is appropriate as small changes to α𝛼\alphaitalic_α when α𝛼\alphaitalic_α is close to one result in much larger changes in model response than similar magnitude changes when α𝛼\alphaitalic_α is close to zero. The heteroscedastic mixing parameter ϕitalic-ϕ\phiitalic_ϕ is assigned a uniform distribution on (0,1)01(0,1)( 0 , 1 ).

The power parameters τ𝜏\tauitalic_τ and ρ𝜌\rhoitalic_ρ are sampled using a grid sampler (see Section 3.3). We use a uniformly spaced grid of candidate values for both parameters (over the range of permissible values, see Table 2). The degrees-of-freedom parameter ν𝜈\nuitalic_ν is also grid sampled; however in this case a simple uniform spacing is inappropriate. This is because the change in the behaviour of the t𝑡titalic_t-distribution as ν𝜈\nuitalic_ν varies is not uniform on the real line, i.e., increasing ν𝜈\nuitalic_ν from ν0subscript𝜈0\nu_{0}italic_ν start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to ν0+νδsubscript𝜈0subscript𝜈𝛿\nu_{0}+\nu_{\delta}italic_ν start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_ν start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT is not equivalent to increasing ν𝜈\nuitalic_ν from 2ν02subscript𝜈02\nu_{0}2 italic_ν start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to 2ν+νδ2𝜈subscript𝜈𝛿2\nu+\nu_{\delta}2 italic_ν + italic_ν start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT, i.e., the effect of increasing ν𝜈\nuitalic_ν by some amount δ𝛿\deltaitalic_δ depends on the value of ν𝜈\nuitalic_ν. Taking this into account, we choose the candidates in the ν𝜈\nuitalic_ν-grid so that the symmetric Kullback–Leibler (KL) divergence [10] between all neighbouring pairs in the candidate set is equal, i.e., all neighbouring t𝑡titalic_t-distributions are equally “distant” in terms of symmetric KL divergence.

Instead of using Cauchy priors as in Smyl et al. [22], we assign the initial seasonal factors horseshoe priors. The prior hierarchy for the horseshoe prior is

logsisubscript𝑠𝑖\displaystyle\log s_{i}roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT N(0,ψsi2δ2),i=1,,m,formulae-sequencesimilar-toabsent𝑁0subscriptsuperscript𝜓2subscript𝑠𝑖superscript𝛿2𝑖1𝑚\displaystyle\sim N(0,\psi^{2}_{s_{i}}\delta^{2}),\;i=1,\dots,m,∼ italic_N ( 0 , italic_ψ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , italic_i = 1 , … , italic_m , (20)
ψsisubscript𝜓subscript𝑠𝑖\displaystyle\psi_{s_{i}}italic_ψ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT C+(0,1),i=1,,m,formulae-sequencesimilar-toabsentsuperscript𝐶01𝑖1𝑚\displaystyle\sim C^{+}(0,1),\;i=1,\dots,m,∼ italic_C start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ( 0 , 1 ) , italic_i = 1 , … , italic_m , (21)
δ𝛿\displaystyle\deltaitalic_δ C+(0,1).similar-toabsentsuperscript𝐶01\displaystyle\sim C^{+}(0,1).∼ italic_C start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ( 0 , 1 ) . (22)

An important characteristic of the horseshoe prior is its infinitely tall spike (pole) at zero. This massing of prior probability at the origin means that if the true effects are zero, or close to zero, they will be aggressively shrunk away. In the case of the log-seasonal terms, a logsi=0subscript𝑠𝑖0\log s_{i}=0roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 implies si=1subscript𝑠𝑖1s_{i}=1italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1, i.e., no seasonality. This property provides the LSGT model a greater robustness to the misspecification of seasonality effects than the Cauchy priors used in the original SGT model.

5 Posterior sampling for the LSGT model

We now describe a Gibbs sampler for the LSGT model (13)–(19) using the prior distributions discussed in Section 4.1.

5.1 Scale-mixture representations

The continuous scale-mixture technique described in Section 3.1.1 is employed to simplify posterior sampling. The use of scale-mixture representations allows for conditional conjugacy even in the case of non-exponential family distributions (such as the Cauchy), at the expense of the introduction of additional latent variables that must also be sampled. We use the scale-mixture-of-normals representation of the t𝑡titalic_t-distribution given by (11)–(12) to rewrite the response model (13) as

yt+1|y^t+1,σ^t+1,ωt+12conditionalsubscript𝑦𝑡1subscript^𝑦𝑡1subscript^𝜎𝑡1superscriptsubscript𝜔𝑡12\displaystyle y_{t+1}\,|\,\hat{y}_{t+1},\hat{\sigma}_{t+1},\omega_{t+1}^{2}italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT | over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT N(y^t+1,σ^t+12ωt+12),similar-toabsent𝑁subscript^𝑦𝑡1superscriptsubscript^𝜎𝑡12superscriptsubscript𝜔𝑡12\displaystyle\sim N(\hat{y}_{t+1},\hat{\sigma}_{t+1}^{2}\omega_{t+1}^{2}),∼ italic_N ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ,
ωt+12|νconditionalsuperscriptsubscript𝜔𝑡12𝜈\displaystyle\omega_{t+1}^{2}\,|\,\nuitalic_ω start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | italic_ν IG(ν2,ν2).similar-toabsentIG𝜈2𝜈2\displaystyle\sim\rm{IG}\left(\frac{\nu}{2},\frac{\nu}{2}\right).∼ roman_IG ( divide start_ARG italic_ν end_ARG start_ARG 2 end_ARG , divide start_ARG italic_ν end_ARG start_ARG 2 end_ARG ) .

Moreover, the Cauchy distribution is a special case of the Student t-distribution with ν=1𝜈1\nu=1italic_ν = 1. The Cauchy prior distribution for the parameter γ𝛾\gammaitalic_γ with scale sysubscript𝑠𝑦s_{y}italic_s start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT can therefore be written as

γ|ξγ2,sγN(0,ξγ2sγ2)andξγ2IG(12,12),similar-toconditional𝛾superscriptsubscript𝜉𝛾2subscript𝑠𝛾𝑁0superscriptsubscript𝜉𝛾2superscriptsubscript𝑠𝛾2andsuperscriptsubscript𝜉𝛾2similar-toIG1212\gamma\,|\,\xi_{\gamma}^{2},s_{\gamma}\sim N(0,\xi_{\gamma}^{2}s_{\gamma}^{2})% \;\;{\rm and}\;\;\xi_{\gamma}^{2}\sim\rm{IG}\left(\frac{1}{2},\frac{1}{2}% \right),italic_γ | italic_ξ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_s start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ∼ italic_N ( 0 , italic_ξ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) roman_and italic_ξ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∼ roman_IG ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) ,

by introducing the latent variable ξγ2superscriptsubscript𝜉𝛾2\xi_{\gamma}^{2}italic_ξ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, with similar representations for the parameters λ𝜆\lambdaitalic_λ and b1subscript𝑏1b_{1}italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. The half-Cauchy distribution, used in the horseshoe prior, can also be expressed as an inverse-gamma scale-mixture of inverse-gamma distributions [26]. Specifically, if

y2|ηIG(12,1η)andηIG(12,1a2).similar-toconditionalsuperscript𝑦2𝜂IG121𝜂and𝜂similar-toIG121superscript𝑎2y^{2}\,|\,\eta\sim{\rm IG}\left(\frac{1}{2},\frac{1}{\eta}\right)\;\;{\rm and}% \;\;\eta\sim{\rm IG}\left(\frac{1}{2},\frac{1}{a^{2}}\right).italic_y start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | italic_η ∼ roman_IG ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , divide start_ARG 1 end_ARG start_ARG italic_η end_ARG ) roman_and italic_η ∼ roman_IG ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) .

then y|aC+(0,a)similar-toconditional𝑦𝑎superscript𝐶0𝑎y\,|\,a\sim C^{+}(0,a)italic_y | italic_a ∼ italic_C start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ( 0 , italic_a ). Using this the horseshoe prior for the seasonality factors in 20, 21 and 22 can be written as [13],

logsi|ψsi2,δ2conditionallogsubscript𝑠𝑖subscriptsuperscript𝜓2subscript𝑠𝑖superscript𝛿2\displaystyle{\rm log}s_{i}\,|\,\psi^{2}_{s_{i}},\delta^{2}roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_ψ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT N(0,ψsi2δ2),similar-toabsent𝑁0subscriptsuperscript𝜓2subscript𝑠𝑖superscript𝛿2\displaystyle\sim N(0,\psi^{2}_{s_{i}}\delta^{2}),∼ italic_N ( 0 , italic_ψ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (23)
ψsi2|ηsiconditionalsuperscriptsubscript𝜓subscript𝑠𝑖2subscript𝜂subscript𝑠𝑖\displaystyle\psi_{s_{i}}^{2}\,|\,\eta_{s_{i}}italic_ψ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | italic_η start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT IG(12,1ηsi),similar-toabsentIG121subscript𝜂subscriptsi\displaystyle\sim\rm{IG}\left(\frac{1}{2},\frac{1}{\eta_{s_{i}}}\right),∼ roman_IG ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , divide start_ARG 1 end_ARG start_ARG italic_η start_POSTSUBSCRIPT roman_s start_POSTSUBSCRIPT roman_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ) , (24)
δ|ηδconditional𝛿subscript𝜂𝛿\displaystyle\delta\,|\,\eta_{\delta}italic_δ | italic_η start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT IG(12,1ηδ),similar-toabsentIG121subscript𝜂𝛿\displaystyle\sim\rm{IG}\left(\frac{1}{2},\frac{1}{\eta_{\delta}}\right),∼ roman_IG ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , divide start_ARG 1 end_ARG start_ARG italic_η start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT end_ARG ) , (25)
ηs1,,ηsm,ηδsubscript𝜂subscript𝑠1subscript𝜂subscript𝑠𝑚subscript𝜂𝛿\displaystyle\eta_{s_{1}},\dots,\eta_{s_{m}},\eta_{\delta}italic_η start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , italic_η start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT IG(12,1),similar-toabsentIG121\displaystyle\sim\rm{IG}\left(\frac{1}{2},1\right),∼ roman_IG ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , 1 ) , (26)

where ηs1,,ηsmsubscript𝜂subscript𝑠1subscript𝜂subscript𝑠𝑚\eta_{s_{1}},\dots,\eta_{s_{m}}italic_η start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , italic_η start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT and ηδsubscript𝜂𝛿\eta_{\delta}italic_η start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT are latent variables. For convenience the complete Bayesian LSGT hierarchy, including the scale-mixture representations is given in Appendix A.

5.2 The Gibbs sampler

Consider a time series 𝐲=(y1,y2,,yT)𝐲subscript𝑦1subscript𝑦2subscript𝑦𝑇\mathbf{y}=(y_{1},y_{2},\dots,y_{T})bold_y = ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ), and the corresponding one-step-ahead forecasts produced by the LSGT model, 𝐲^=(y^2,,y^T)^𝐲subscript^𝑦2subscript^𝑦𝑇\hat{\mathbf{y}}=(\hat{y}_{2},\dots,\hat{y}_{T})over^ start_ARG bold_y end_ARG = ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ). We now present a Gibbs sampling procedure for sampling from the posterior of the LSGT model. The Gibbs sampler uses the scale-mixture-of-normals representation for the Student t𝑡titalic_t-distribution in Steps 1 to Step 5, and integrates out the latent variables, ωt2superscriptsubscript𝜔𝑡2\omega_{t}^{2}italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, for Step 6 onwards. The Gibbs sampler repeatedly iterates the following steps:

  1. 1.

    Sample the global variance χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT from the inverse-gamma distribution

    χ2|IG(T12,t=1T1(yt+1y^t+1)22ωt+12(ϕ2+(1ϕ)2lt2τ)).similar-toconditionalsuperscript𝜒2IG𝑇12superscriptsubscript𝑡1𝑇1superscriptsubscript𝑦𝑡1subscript^𝑦𝑡122superscriptsubscript𝜔𝑡12superscriptitalic-ϕ2superscript1italic-ϕ2superscriptsubscript𝑙𝑡2𝜏\chi^{2}\,|\,\cdots\sim{\rm IG}\left(\frac{T-1}{2},\,\sum_{t=1}^{T-1}\frac{(y_% {t+1}-\hat{y}_{t+1})^{2}}{2\omega_{t+1}^{2}(\phi^{2}+(1-\phi)^{2}l_{t}^{2\tau}% )}\right).italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | ⋯ ∼ roman_IG ( divide start_ARG italic_T - 1 end_ARG start_ARG 2 end_ARG , ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT divide start_ARG ( italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_ω start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_ϕ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_τ end_POSTSUPERSCRIPT ) end_ARG ) .

    Note that if the model is homoscedastic, ϕ=1italic-ϕ1\phi=1italic_ϕ = 1.

  2. 2.

    Sample the latent variables ωt+12superscriptsubscript𝜔𝑡12\omega_{t+1}^{2}italic_ω start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT from the inverse-gamma distributions

    ωt+12|IG(ν+12,β~=(yt+1y^t+1)22σ^t+12+ν2)similar-toconditionalsuperscriptsubscript𝜔𝑡12IG𝜈12~𝛽superscriptsubscript𝑦𝑡1subscript^𝑦𝑡122superscriptsubscript^𝜎𝑡12𝜈2\omega_{t+1}^{2}\,|\,\cdots\sim{\rm IG}\left(\frac{\nu+1}{2},\,\tilde{\beta}=% \frac{(y_{t+1}-\hat{y}_{t+1})^{2}}{2\hat{\sigma}_{t+1}^{2}}+\frac{\nu}{2}\right)italic_ω start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | ⋯ ∼ roman_IG ( divide start_ARG italic_ν + 1 end_ARG start_ARG 2 end_ARG , over~ start_ARG italic_β end_ARG = divide start_ARG ( italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_ν end_ARG start_ARG 2 end_ARG )

    for t=1,,T1𝑡1𝑇1t=1,\ldots,T-1italic_t = 1 , … , italic_T - 1.

  3. 3.

    Sample the degrees-of-freedom ν𝜈\nuitalic_ν using a grid sampler (see Section 5.2.3).

  4. 4.

    Sample the global trend coefficient γ𝛾\gammaitalic_γ from the normal distribution N(μ~,σ~2)𝑁~𝜇superscript~𝜎2N(\tilde{\mu},\tilde{\sigma}^{2})italic_N ( over~ start_ARG italic_μ end_ARG , over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), where

    μ~=1σ~2t=1T1[ltρst(yt+1(lt+λbt)st)ωt+12σ^t+12]andσ~2=(t=1T1lt2ρst2ωt+12σ^t+12+1ξγ2sγ2)1~𝜇1superscript~𝜎2superscriptsubscript𝑡1𝑇1delimited-[]superscriptsubscript𝑙𝑡𝜌subscript𝑠𝑡subscript𝑦𝑡1subscript𝑙𝑡𝜆subscript𝑏𝑡subscript𝑠𝑡superscriptsubscript𝜔𝑡12superscriptsubscript^𝜎𝑡12andsuperscript~𝜎2superscriptsuperscriptsubscript𝑡1𝑇1superscriptsubscript𝑙𝑡2𝜌superscriptsubscript𝑠𝑡2superscriptsubscript𝜔𝑡12superscriptsubscript^𝜎𝑡121superscriptsubscript𝜉𝛾2superscriptsubscript𝑠𝛾21\tilde{\mu}=\frac{1}{\tilde{\sigma}^{2}}\sum_{t=1}^{T-1}\left[\frac{l_{t}^{% \rho}s_{t}(y_{t+1}-(l_{t}+\lambda b_{t})s_{t})}{\omega_{t+1}^{2}\hat{\sigma}_{% t+1}^{2}}\right]\;\;{\rm and}\;\;\tilde{\sigma}^{2}=\left(\sum_{t=1}^{T-1}% \frac{l_{t}^{2\rho}s_{t}^{2}}{\omega_{t+1}^{2}\hat{\sigma}_{t+1}^{2}}+\frac{1}% {\xi_{\gamma}^{2}s_{\gamma}^{2}}\right)^{-1}over~ start_ARG italic_μ end_ARG = divide start_ARG 1 end_ARG start_ARG over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT [ divide start_ARG italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ρ end_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - ( italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG start_ARG italic_ω start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] roman_and over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT divide start_ARG italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_ρ end_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ω start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_ξ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT

    and σ^t+12subscriptsuperscript^𝜎2𝑡1\hat{\sigma}^{2}_{t+1}over^ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT is given by (18); then sample the latent variable ξγsubscript𝜉𝛾\xi_{\gamma}italic_ξ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT from the inverse-gamma distribution

    ξγ2|IG(1,γ22sγ2+12).similar-toconditionalsuperscriptsubscript𝜉𝛾2IG1superscript𝛾22superscriptsubscript𝑠𝛾212\xi_{\gamma}^{2}\,|\,\cdots\sim{\rm IG}\left(1,\,\frac{\gamma^{2}}{2s_{\gamma}% ^{2}}+\frac{1}{2}\right).italic_ξ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | ⋯ ∼ roman_IG ( 1 , divide start_ARG italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_s start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) .
  5. 5.

    If we are using a non-seasonal model:

    1. (a)

      Sample the local trend coefficient λ𝜆\lambdaitalic_λ from the normal distribution N(μ~,σ~2)𝑁~𝜇superscript~𝜎2N(\tilde{\mu},\tilde{\sigma}^{2})italic_N ( over~ start_ARG italic_μ end_ARG , over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), where

      μ~=1σ~2t=1T1[btst(yt+1(lt+γltρ)st)ωt+12σ^t+12]andσ~2=(t=1T1bt2st2ωt+12σ^t+12+1ξλ2sλ2)1,~𝜇1superscript~𝜎2superscriptsubscript𝑡1𝑇1delimited-[]subscript𝑏𝑡subscript𝑠𝑡subscript𝑦𝑡1subscript𝑙𝑡𝛾superscriptsubscript𝑙𝑡𝜌subscript𝑠𝑡superscriptsubscript𝜔𝑡12superscriptsubscript^𝜎𝑡12andsuperscript~𝜎2superscriptsuperscriptsubscript𝑡1𝑇1superscriptsubscript𝑏𝑡2superscriptsubscript𝑠𝑡2superscriptsubscript𝜔𝑡12superscriptsubscript^𝜎𝑡121superscriptsubscript𝜉𝜆2superscriptsubscript𝑠𝜆21\tilde{\mu}=\frac{1}{\tilde{\sigma}^{2}}\sum_{t=1}^{T-1}\left[\frac{b_{t}s_{t}% (y_{t+1}-(l_{t}+\gamma l_{t}^{\rho})s_{t})}{\omega_{t+1}^{2}\hat{\sigma}_{t+1}% ^{2}}\right]\;\;{\rm and}\;\;\tilde{\sigma}^{2}=\left(\sum_{t=1}^{T-1}\frac{b_% {t}^{2}s_{t}^{2}}{\omega_{t+1}^{2}\hat{\sigma}_{t+1}^{2}}+\frac{1}{\xi_{% \lambda}^{2}s_{\lambda}^{2}}\right)^{-1},over~ start_ARG italic_μ end_ARG = divide start_ARG 1 end_ARG start_ARG over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT [ divide start_ARG italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - ( italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_γ italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ρ end_POSTSUPERSCRIPT ) italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG start_ARG italic_ω start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] roman_and over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT divide start_ARG italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ω start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_ξ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ,

      and sample the latent variable ξλsubscript𝜉𝜆\xi_{\lambda}italic_ξ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT from the inverse-gamma distribution

      ξλ2|IG(1,λ22sλ2+12).similar-toconditionalsuperscriptsubscript𝜉𝜆2IG1superscript𝜆22superscriptsubscript𝑠𝜆212\xi_{\lambda}^{2}\,|\,\cdots\sim{\rm IG}\left(1,\,\frac{\lambda^{2}}{2s_{% \lambda}^{2}}+\frac{1}{2}\right).italic_ξ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | ⋯ ∼ roman_IG ( 1 , divide start_ARG italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_s start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) .
    2. (b)

      Sample the initial local trend b1subscript𝑏1b_{1}italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT from the normal distribution N(μ~,σ~2)𝑁~𝜇superscript~𝜎2N(\tilde{\mu},\tilde{\sigma}^{2})italic_N ( over~ start_ARG italic_μ end_ARG , over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) where

      μ~=1σ~2t=1T1λ2(1β)2(t1)b1+λ(1β)t1(yt+1y^t+1)ωt+12σ^t+12andσ~2=(t=1T1λ2(1β)2(t1)ωt+12σ^t+12+1ξb12sb12)1,~𝜇1superscript~𝜎2superscriptsubscript𝑡1𝑇1superscript𝜆2superscript1𝛽2𝑡1subscript𝑏1𝜆superscript1𝛽𝑡1subscript𝑦𝑡1subscript^𝑦𝑡1superscriptsubscript𝜔𝑡12superscriptsubscript^𝜎𝑡12andsuperscript~𝜎2superscriptsuperscriptsubscript𝑡1𝑇1superscript𝜆2superscript1𝛽2𝑡1superscriptsubscript𝜔𝑡12superscriptsubscript^𝜎𝑡121superscriptsubscript𝜉subscript𝑏12superscriptsubscript𝑠subscript𝑏121\tilde{\mu}=\frac{1}{\tilde{\sigma}^{2}}\sum_{t=1}^{T-1}\frac{\lambda^{2}(1-% \beta)^{2(t-1)}b_{1}+\lambda(1-\beta)^{t-1}(y_{t+1}-\hat{y}_{t+1})}{\omega_{t+% 1}^{2}\hat{\sigma}_{t+1}^{2}}\;\;{\rm and}\;\;\tilde{\sigma}^{2}=\left(\sum_{t% =1}^{T-1}\frac{\lambda^{2}(1-\beta)^{2(t-1)}}{\omega_{t+1}^{2}\hat{\sigma}_{t+% 1}^{2}}+\frac{1}{\xi_{b_{1}}^{2}s_{b_{1}}^{2}}\right)^{-1},over~ start_ARG italic_μ end_ARG = divide start_ARG 1 end_ARG start_ARG over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT divide start_ARG italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_β ) start_POSTSUPERSCRIPT 2 ( italic_t - 1 ) end_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_λ ( 1 - italic_β ) start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_ω start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_and over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT divide start_ARG italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_β ) start_POSTSUPERSCRIPT 2 ( italic_t - 1 ) end_POSTSUPERSCRIPT end_ARG start_ARG italic_ω start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_ξ start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ,

      and sample the latent variable ξb1subscript𝜉subscript𝑏1\xi_{b_{1}}italic_ξ start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT from the inverse-gamma distribution

      ξb12|IG(1,b122sb12+12).similar-toconditionalsuperscriptsubscript𝜉subscript𝑏12IG1superscriptsubscript𝑏122superscriptsubscript𝑠subscript𝑏1212\xi_{b_{1}}^{2}\,|\,\cdots\sim{\rm IG}\left(1,\,\frac{b_{1}^{2}}{2s_{b_{1}}^{2% }}+\frac{1}{2}\right).italic_ξ start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | ⋯ ∼ roman_IG ( 1 , divide start_ARG italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_s start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) .
  6. 6.

    Sample the global trend smoothing parameter α𝛼\alphaitalic_α, the local trend smoothing parameter β𝛽\betaitalic_β and (if required) the seasonal smoothing parameter ζ𝜁\zetaitalic_ζ using a gradient-assisted Metropolis-Hastings algorithm [21] (see Section 5.2.1).

  7. 7.

    If we are using a seasonal model:

    1. (a)

      Sample the initial seasonal values, logsilogsubscript𝑠𝑖{\rm log}s_{i}roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, using a gradient-assisted Metropolis-Hastings algorithm (see Section 5.2.2).

    2. (b)

      Sample the horseshoe variances ψsi2superscriptsubscript𝜓subscript𝑠𝑖2\psi_{s_{i}}^{2}italic_ψ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and δ2superscript𝛿2\delta^{2}italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT from the inverse-gamma distributions [13]

      ψsi2|conditionalsuperscriptsubscript𝜓subscript𝑠𝑖2\displaystyle\psi_{s_{i}}^{2}\,|\,\cdotsitalic_ψ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | ⋯ IG(1,1ηsi+(logsi)22δ2),i=1,,m1,formulae-sequencesimilar-toabsentIG11subscript𝜂subscript𝑠𝑖superscriptlogsubscript𝑠𝑖22superscript𝛿2𝑖1𝑚1\displaystyle\sim{\rm IG}\left(1,\frac{1}{\eta_{s_{i}}}+\frac{({\rm log}s_{i})% ^{2}}{2\delta^{2}}\right),\;i=1,\dots,m-1,∼ roman_IG ( 1 , divide start_ARG 1 end_ARG start_ARG italic_η start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG + divide start_ARG ( roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) , italic_i = 1 , … , italic_m - 1 ,
      δ2|conditionalsuperscript𝛿2\displaystyle\delta^{2}\,|\,\cdotsitalic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | ⋯ IG(m2,1ηδ+12i=1m(logsi)2ψsi2).similar-toabsentIG𝑚21subscript𝜂𝛿12superscriptsubscript𝑖1𝑚superscriptlogsubscript𝑠𝑖2superscriptsubscript𝜓subscript𝑠𝑖2\displaystyle\sim{\rm IG}\left(\frac{m}{2},\frac{1}{\eta_{\delta}}+\frac{1}{2}% \sum_{i=1}^{m}\frac{({\rm log}s_{i})^{2}}{\psi_{s_{i}}^{2}}\right).∼ roman_IG ( divide start_ARG italic_m end_ARG start_ARG 2 end_ARG , divide start_ARG 1 end_ARG start_ARG italic_η start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT divide start_ARG ( roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ψ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) .
    3. (c)

      Sample the horseshoe latent variables ηsisubscript𝜂subscript𝑠𝑖\eta_{s_{i}}italic_η start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT and ηδsubscript𝜂𝛿\eta_{\delta}italic_η start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT from the inverse-gamma distributions

      ηsi|conditionalsubscript𝜂subscript𝑠𝑖\displaystyle\eta_{s_{i}}\,|\,\cdotsitalic_η start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT | ⋯ IG(1,1+1ψsi2),i=1,,m1,formulae-sequencesimilar-toabsentIG111superscriptsubscript𝜓subscript𝑠𝑖2𝑖1𝑚1\displaystyle\sim{\rm IG}\left(1,1+\frac{1}{\psi_{s_{i}}^{2}}\right),\;i=1,% \dots,m-1,∼ roman_IG ( 1 , 1 + divide start_ARG 1 end_ARG start_ARG italic_ψ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) , italic_i = 1 , … , italic_m - 1 ,
      ηδ|conditionalsubscript𝜂𝛿\displaystyle\eta_{\delta}\,|\,\cdotsitalic_η start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT | ⋯ IG(1,1+1δ2).similar-toabsentIG111superscript𝛿2\displaystyle\sim{\rm IG}\left(1,1+\frac{1}{\delta^{2}}\right).∼ roman_IG ( 1 , 1 + divide start_ARG 1 end_ARG start_ARG italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) .
  8. 8.

    Sample the global trend power parameter ρ𝜌\rhoitalic_ρ using a grid sampler (see Section 5.2.3).

  9. 9.

    If we are using a heteroscedastic model, sample the heteroscedastic power parameter τ𝜏\tauitalic_τ, and the heteroscedastic mixing parameter ϕitalic-ϕ\phiitalic_ϕ using a grid sampler (see Section 5.2.3).

Derivations of the conditional distributions for the coefficients γ𝛾\gammaitalic_γ and λ𝜆\lambdaitalic_λ, and initial local trend b1subscript𝑏1b_{1}italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, are detailed in Appendices B and C, respectively.

5.2.1 Sampling α𝛼\alphaitalic_α, β𝛽\betaitalic_β and ζ𝜁\zetaitalic_ζ

We group α𝛼\alphaitalic_α, β𝛽\betaitalic_β and ζ𝜁\zetaitalic_ζ (if we are using a seasonal model) and sample them in a single step using a gradient-assisted Metropolis-Hastings algorithm [21]. As α,β,ζ(0,1)𝛼𝛽𝜁01\alpha,\beta,\zeta\in(0,1)italic_α , italic_β , italic_ζ ∈ ( 0 , 1 ), a logistic transformation is first performed to transform the parameter space into the real line, i.e., we sample log(α/(1α))𝛼1𝛼\log(\alpha/(1-\alpha))roman_log ( italic_α / ( 1 - italic_α ) ) rather than α𝛼\alphaitalic_α. The latent variables ωt2superscriptsubscript𝜔𝑡2\omega_{t}^{2}italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT are integrated out of the likelihood for better sampling convergence. The negative log-likelihood is

L(α,β,ζ)𝐿𝛼𝛽𝜁\displaystyle L(\alpha,\beta,\zeta)italic_L ( italic_α , italic_β , italic_ζ ) =ν+12t=1T1log(1+et+12νσ^t+12)+12t=1T1logσ^t+12absent𝜈12superscriptsubscript𝑡1𝑇1log1superscriptsubscript𝑒𝑡12𝜈superscriptsubscript^𝜎𝑡1212superscriptsubscript𝑡1𝑇1logsuperscriptsubscript^𝜎𝑡12\displaystyle=\frac{\nu+1}{2}\sum_{t=1}^{T-1}{\rm log}\left(1+\frac{e_{t+1}^{2% }}{\nu\,\hat{\sigma}_{t+1}^{2}}\right)+\frac{1}{2}\sum_{t=1}^{T-1}{\rm log}% \hat{\sigma}_{t+1}^{2}= divide start_ARG italic_ν + 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT roman_log ( 1 + divide start_ARG italic_e start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ν over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT roman_log over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=ν2t=1T1logσ^t+12+ν+12t=1T1log(νσ^t+12+et+12)+C,absent𝜈2superscriptsubscript𝑡1𝑇1logsuperscriptsubscript^𝜎𝑡12𝜈12superscriptsubscript𝑡1𝑇1log𝜈superscriptsubscript^𝜎𝑡12superscriptsubscript𝑒𝑡12𝐶\displaystyle=-\frac{\nu}{2}\sum_{t=1}^{T-1}{\rm log}\hat{\sigma}_{t+1}^{2}+% \frac{\nu+1}{2}\sum_{t=1}^{T-1}{\rm log}(\nu\,\hat{\sigma}_{t+1}^{2}+e_{t+1}^{% 2})+C,= - divide start_ARG italic_ν end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT roman_log over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_ν + 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT roman_log ( italic_ν over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_e start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_C , (27)

where et+1=yt+1y^t+1subscript𝑒𝑡1subscript𝑦𝑡1subscript^𝑦𝑡1e_{t+1}=y_{t+1}-\hat{y}_{t+1}italic_e start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT. Unlike the basic Metropolis-Hastings algorithm, the gradients of L(α,β,ζ)𝐿𝛼𝛽𝜁L(\alpha,\beta,\zeta)italic_L ( italic_α , italic_β , italic_ζ ) with respect to α𝛼\alphaitalic_α, β𝛽\betaitalic_β and ζ𝜁\zetaitalic_ζ are utilized to improve the efficiency of the sampler. Note that (27) depends on α𝛼\alphaitalic_α, β𝛽\betaitalic_β and ζ𝜁\zetaitalic_ζ through the one-step-ahead predictions y^t+1(α,β,ζ)subscript^𝑦𝑡1𝛼𝛽𝜁\hat{y}_{t+1}(\alpha,\beta,\zeta)over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_α , italic_β , italic_ζ ) and scales σ^t+1(α,ζ)subscript^𝜎𝑡1𝛼𝜁\hat{\sigma}_{t+1}(\alpha,\zeta)over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_α , italic_ζ ). The gradients for α𝛼\alphaitalic_α and β𝛽\betaitalic_β for the non-seasonal model can be calculated using the chain rule, with details provided in Appendix D. For the ζ𝜁\zetaitalic_ζ parameter, the gradients are time consuming to compute so we do not utilize them (i.e., we set the gradient for ζ𝜁\zetaitalic_ζ to zero). As the underlying algorithm is a Metropolis-Hastings algorithm, the gradients can be computed approximately (or not computed at all, as in the case of ζ𝜁\zetaitalic_ζ) without affecting the correctness of the sampling; more accurate computations simply lead to improved efficiency.

5.2.2 Sampling the initial seasonal factors

From (19), the last initial seasonal term smsubscript𝑠𝑚s_{m}italic_s start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT can then be obtained by sm=exp(i=1m1logsi)subscript𝑠𝑚expsuperscriptsubscript𝑖1𝑚1logsubscript𝑠𝑖s_{m}={\rm exp}(-\sum_{i=1}^{m-1}{\rm log}s_{i})italic_s start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = roman_exp ( - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m - 1 end_POSTSUPERSCRIPT roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). We integrate out the latent variables, ωt2superscriptsubscript𝜔𝑡2\omega_{t}^{2}italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, and the negative log likelihood L(s1,,sm1)𝐿subscript𝑠1subscript𝑠𝑚1L(s_{1},\dots,s_{m-1})italic_L ( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_s start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT ) is essentially the same as (27). The relevant derivatives are presented in Appendix E.

5.2.3 Sampling with a grid sampler

A grid sampler (see Section 3.3) is implemented for sampling ν𝜈\nuitalic_ν, ρ𝜌\rhoitalic_ρ, ϕitalic-ϕ\phiitalic_ϕ, and τ𝜏\tauitalic_τ. The negative log posterior for ν𝜈\nuitalic_ν, conditional on the latent variables ωt2superscriptsubscript𝜔𝑡2\omega_{t}^{2}italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, is given by

L(ν|𝝎2)=(T1)ν2logν2+(T1)logΓ(ν2)+ν+12t=1T1logωt+12+ν2t=1T11ωt+12,𝐿conditional𝜈superscript𝝎2𝑇1𝜈2log𝜈2𝑇1logΓ𝜈2𝜈12superscriptsubscript𝑡1𝑇1logsuperscriptsubscript𝜔𝑡12𝜈2superscriptsubscript𝑡1𝑇11superscriptsubscript𝜔𝑡12L(\nu\,|\,\bm{\omega}^{2})=-(T-1)\frac{\nu}{2}{\rm log}\frac{\nu}{2}+(T-1){\rm log% }\Gamma\left(\frac{\nu}{2}\right)+\frac{\nu+1}{2}\sum_{t=1}^{T-1}{\rm log}% \omega_{t+1}^{2}+\frac{\nu}{2}\sum_{t=1}^{T-1}\frac{1}{\omega_{t+1}^{2}},italic_L ( italic_ν | bold_italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) = - ( italic_T - 1 ) divide start_ARG italic_ν end_ARG start_ARG 2 end_ARG roman_log divide start_ARG italic_ν end_ARG start_ARG 2 end_ARG + ( italic_T - 1 ) roman_log roman_Γ ( divide start_ARG italic_ν end_ARG start_ARG 2 end_ARG ) + divide start_ARG italic_ν + 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT roman_log italic_ω start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_ν end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_ω start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ,

where Γ()Γ\Gamma(\cdot)roman_Γ ( ⋅ ) denotes the gamma function. We use the set of candidate ν𝜈\nuitalic_ν values determined using the procedure in Section 4.1. When sampling the power parameters ρ𝜌\rhoitalic_ρ and τ𝜏\tauitalic_τ, and the heteroscedastic mixing parameter ϕitalic-ϕ\phiitalic_ϕ, we integrate the latent variables ωt2subscriptsuperscript𝜔2𝑡\omega^{2}_{t}italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT out of the likelihood. The negative-log conditional posteriors for these parameters are given by

L(ρ|𝐲^,ν,χ2,ϕ,τ)=ν+12t=1T1log(1+et+12νσ^t+12)+log(ρ2+1)+C,𝐿conditional𝜌^𝐲𝜈superscript𝜒2italic-ϕ𝜏𝜈12superscriptsubscript𝑡1𝑇1log1superscriptsubscript𝑒𝑡12𝜈superscriptsubscript^𝜎𝑡12logsuperscript𝜌21𝐶L(\rho\,|\,\mathbf{\hat{y}},\nu,\chi^{2},\phi,\tau)=\frac{\nu+1}{2}\sum_{t=1}^% {T-1}{\rm log}\left(1+\frac{e_{t+1}^{2}}{\nu\hat{\sigma}_{t+1}^{2}}\right)+{% \rm log}(\rho^{2}+1)+C,italic_L ( italic_ρ | over^ start_ARG bold_y end_ARG , italic_ν , italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_ϕ , italic_τ ) = divide start_ARG italic_ν + 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT roman_log ( 1 + divide start_ARG italic_e start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ν over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) + roman_log ( italic_ρ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) + italic_C ,
L(ϕ|𝐲^,ν,χ2,τ)=ν+12t=1T1log(1+et+12νσ^t+12)+12t=1T1logσ^t+12+C,𝐿conditionalitalic-ϕ^𝐲𝜈superscript𝜒2𝜏𝜈12superscriptsubscript𝑡1𝑇1log1superscriptsubscript𝑒𝑡12𝜈superscriptsubscript^𝜎𝑡1212superscriptsubscript𝑡1𝑇1logsuperscriptsubscript^𝜎𝑡12𝐶L(\phi\,|\,\mathbf{\hat{y}},\nu,\chi^{2},\tau)=\frac{\nu+1}{2}\sum_{t=1}^{T-1}% {\rm log}\left(1+\frac{e_{t+1}^{2}}{\nu\hat{\sigma}_{t+1}^{2}}\right)+\frac{1}% {2}\sum_{t=1}^{T-1}{\rm log}\hat{\sigma}_{t+1}^{2}+C,italic_L ( italic_ϕ | over^ start_ARG bold_y end_ARG , italic_ν , italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_τ ) = divide start_ARG italic_ν + 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT roman_log ( 1 + divide start_ARG italic_e start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ν over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT roman_log over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_C ,
L(τ|𝐲^,ν,χ2,ϕ)=ν+12t=1T1log(1+et+12νσ^t+12)+12t=1T1logσ^t+12+C,𝐿conditional𝜏^𝐲𝜈superscript𝜒2italic-ϕ𝜈12superscriptsubscript𝑡1𝑇1log1superscriptsubscript𝑒𝑡12𝜈superscriptsubscript^𝜎𝑡1212superscriptsubscript𝑡1𝑇1logsuperscriptsubscript^𝜎𝑡12𝐶L(\tau\,|\,\mathbf{\hat{y}},\nu,\chi^{2},\phi)=\frac{\nu+1}{2}\sum_{t=1}^{T-1}% {\rm log}\left(1+\frac{e_{t+1}^{2}}{\nu\hat{\sigma}_{t+1}^{2}}\right)+\frac{1}% {2}\sum_{t=1}^{T-1}{\rm log}\hat{\sigma}_{t+1}^{2}+C,italic_L ( italic_τ | over^ start_ARG bold_y end_ARG , italic_ν , italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_ϕ ) = divide start_ARG italic_ν + 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT roman_log ( 1 + divide start_ARG italic_e start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ν over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT roman_log over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_C ,

where et+12=(yt+1y^t+1)2superscriptsubscript𝑒𝑡12superscriptsubscript𝑦𝑡1subscript^𝑦𝑡12e_{t+1}^{2}=(y_{t+1}-\hat{y}_{t+1})^{2}italic_e start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ( italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The quantities y^t+1subscript^𝑦𝑡1\hat{y}_{t+1}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT and σ^t+12superscriptsubscript^𝜎𝑡12\hat{\sigma}_{t+1}^{2}over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT are formed using (14) and (18), respectively. The candidate values for these three parameters are set uniformly based on the corresponding parameter limits.

6 Experiments

The proposed model extends the classical ETS model, which is a univariate forecasting procedure that does not utilize global learning across series. The M3 competition [14] provides a standard benchmark dataset for univariate methods. It consists of a mix of seasonal and non-seasonal series: 645 yearly series, 756 quarterly series, 1428 monthly series, and 174 other series. We use the M3 dataset from the Mcomp R package [7]. Table 3 summarizes the series lengths (T𝑇Titalic_T) and corresponding forecast horizon (hhitalic_h) in each category.

Category T𝑇Titalic_T hhitalic_h
Yearly 14-41 6
Monthly 48-126 18
Quarterly 16-64 8
Other 63-96 8
Table 3: Length of observations and forecast horizon for the M3 dataset in each category.

6.1 Evaluation metrics

Following the M3 competition, and the experimental analysis in Smyl et al. [22], we use the symmetric mean absolute percentage error (sMAPE) and the mean absolute scaled error (MASE) metrics to measure forecasting performance. These metrics are given by

sMAPEsMAPE\displaystyle{\rm sMAPE}roman_sMAPE =200ht=1h|yT+ty^T+t||yT+t|+|y^T+t|,absent200superscriptsubscript𝑡1subscript𝑦𝑇𝑡subscript^𝑦𝑇𝑡subscript𝑦𝑇𝑡subscript^𝑦𝑇𝑡\displaystyle=\frac{200}{h}\sum_{t=1}^{h}\frac{|y_{T+t}-\hat{y}_{T+t}|}{|y_{T+% t}|+|\hat{y}_{T+t}|},= divide start_ARG 200 end_ARG start_ARG italic_h end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT divide start_ARG | italic_y start_POSTSUBSCRIPT italic_T + italic_t end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_T + italic_t end_POSTSUBSCRIPT | end_ARG start_ARG | italic_y start_POSTSUBSCRIPT italic_T + italic_t end_POSTSUBSCRIPT | + | over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_T + italic_t end_POSTSUBSCRIPT | end_ARG , (28)
MASEMASE\displaystyle{\rm MASE}roman_MASE =h1t=1h|yT+ty^T+t|(Ts)1t=s+1T|ytyts|,absentsuperscript1superscriptsubscript𝑡1subscript𝑦𝑇𝑡subscript^𝑦𝑇𝑡superscript𝑇𝑠1superscriptsubscript𝑡𝑠1𝑇subscript𝑦𝑡subscript𝑦𝑡𝑠\displaystyle=\frac{h^{-1}\sum_{t=1}^{h}|y_{T+t}-\hat{y}_{T+t}|}{(T-s)^{-1}% \sum_{t=s+1}^{T}|y_{t}-y_{t-s}|},= divide start_ARG italic_h start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT | italic_y start_POSTSUBSCRIPT italic_T + italic_t end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_T + italic_t end_POSTSUBSCRIPT | end_ARG start_ARG ( italic_T - italic_s ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t = italic_s + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT | italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t - italic_s end_POSTSUBSCRIPT | end_ARG , (29)

respectively. The denominator in (29) is the average error of the in-sample (seasonal) naïve forecasts, where s𝑠sitalic_s denotes the periodicity; s𝑠sitalic_s is set to one for non-seasonal series (such as yearly series), to 4 for quarterly series, and to 12 for monthly series, respectively. Probabilistic forecasts are evaluated using the mean scaled interval score (MSIS), as per Makridakis et al. [15]. The MSIS is given by

MSIS=h1t=T+1T+h(qt[u]qt[l]+2α(qt[l]yt)𝟙yt<qt[l]+2α(ytqt[u])𝟙yt>qt[u])(Ts)1t=s+1T|ytyts|,MSISsuperscript1superscriptsubscript𝑡𝑇1𝑇superscriptsubscript𝑞𝑡delimited-[]𝑢superscriptsubscript𝑞𝑡delimited-[]𝑙2𝛼superscriptsubscript𝑞𝑡delimited-[]𝑙subscript𝑦𝑡subscript1subscript𝑦𝑡superscriptsubscript𝑞𝑡delimited-[]𝑙2𝛼subscript𝑦𝑡superscriptsubscript𝑞𝑡delimited-[]𝑢subscript1subscript𝑦𝑡superscriptsubscript𝑞𝑡delimited-[]𝑢superscript𝑇𝑠1superscriptsubscript𝑡𝑠1𝑇subscript𝑦𝑡subscript𝑦𝑡𝑠\displaystyle{\rm MSIS}=\frac{h^{-1}\sum_{t=T+1}^{T+h}\left(q_{t}^{[u]}-q_{t}^% {[l]}+\frac{2}{\alpha}(q_{t}^{[l]}-y_{t})\mathbbm{1}_{y_{t}<q_{t}^{[l]}}+\frac% {2}{\alpha}(y_{t}-q_{t}^{[u]})\mathbbm{1}_{y_{t}>q_{t}^{[u]}}\right)}{(T-s)^{-% 1}\sum_{t=s+1}^{T}|y_{t}-y_{t-s}|},roman_MSIS = divide start_ARG italic_h start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t = italic_T + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T + italic_h end_POSTSUPERSCRIPT ( italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_u ] end_POSTSUPERSCRIPT - italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_l ] end_POSTSUPERSCRIPT + divide start_ARG 2 end_ARG start_ARG italic_α end_ARG ( italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_l ] end_POSTSUPERSCRIPT - italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) blackboard_1 start_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT < italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_l ] end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + divide start_ARG 2 end_ARG start_ARG italic_α end_ARG ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_u ] end_POSTSUPERSCRIPT ) blackboard_1 start_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT > italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_u ] end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_ARG start_ARG ( italic_T - italic_s ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t = italic_s + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT | italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t - italic_s end_POSTSUBSCRIPT | end_ARG , (30)

where 𝟙xsubscript1𝑥\mathbbm{1}_{x}blackboard_1 start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT denotes the indicator function, which returns a one if the condition x𝑥xitalic_x is true and a zero otherwise. The quantities qt[u]superscriptsubscript𝑞𝑡delimited-[]𝑢q_{t}^{[u]}italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_u ] end_POSTSUPERSCRIPT and qt[l]superscriptsubscript𝑞𝑡delimited-[]𝑙q_{t}^{[l]}italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_l ] end_POSTSUPERSCRIPT that appear in the numerator of (30) are used to denote the upper and lower bounds of the prediction interval, respectively. The quantity 1α1𝛼1-\alpha1 - italic_α is the desired level of coverage; for example, α=0.1𝛼0.1\alpha=0.1italic_α = 0.1 if we are considering a 90%percent9090\%90 % prediction interval. The MSIS is an omnibus measure that penalises both the width of the forecasting interval and the attained coverage of the prediction interval.

6.2 Results and analysis

We consider both homoscedastic and heteroscedastic and variants of LSGT using the Gibbs sampler. The left-hand columns of Table 4 presents accuracy in terms of sMAPE and MASE, and the average running time per series, for both the LSGT and the original L/SGT model (with heteroscedastic errors) sampled using Stan results . The running time reported is the average running time of the models on the first 100 series in each category, executed with a single core on the same machine, for maximal comparability. The LGT Stan models have previously achieved state-of-the-art performance on the M3 dataset, as reported in Smyl et al. [22]. Compared with the Stan sampler, the Gibbs implementations obtain slightly improved accuracy in both measures, with the improvements largest for the yearly (non-seasonal) series. When considering the model fitting time, the proposed Gibbs sampler takes significantly less computation time in comparison to the Stan implementation, and renders the LSGT model a feasible tool for deploment in practice. In regards to the different error models, the heteroscedastic models perform better than homoscedastic models on all categories except for yearly series.

sMAPE MASE Avg Runtime (s) Below 99p Below 95p Below 5p Below 1p MSIS 90p MSIS 98p
Yearly series
LSGT Gibbs (homoscedastic error) 14.91 2.55 3.79 97.44 90.80 7.80 2.43 19.36 40.57
LSGT Gibbs (heteroscedastic error) 14.99 2.50 4.63 98.94 94.11 4.96 1.19 16.47 27.92
LGT Stan 15.18 2.48 60.03 97.16 91.42 6.23 2.04 17.38 32.64
Monthly series
LSGT Gibbs (homoscedastic error) 13.94 0.83 12.82 97.93 93.66 5.18 1.40 5.38 8.67
LSGT Gibbs (heteroscedastic error) 13.76 0.82 14.67 98.36 94.64 4.64 1.24 5.22 8.52
SGT Stan 13.77 0.83 163.84 97.51 92.55 5.21 1.69 5.10 8.20
Quarterly series
LSGT Gibbs (homoscedastic error) 8.78 1.06 9.22 97.30 92.26 10.20 3.27 7.46 14.23
LSGT Gibbs (heteroscedastic error) 8.78 1.06 10.70 97.59 92.97 8.61 2.12 7.19 13.06
SGT Stan 8.87 1.07 374.12 96.13 90.16 11.79 4.76 7.64 15.95
Other series
LSGT Gibbs (homoscedastic error) 4.21 1.70 5.19 99.64 97.34 4.02 0.50 10.8 17.1
LSGT Gibbs (heteroscedastic error) 4.16 1.69 7.71 99.64 97.27 4.45 0.86 10.51 16.53
LGT Stan 4.25 1.72 150.88 99.43 97.49 4.60 1.44 10.69 16.68
Table 4: Accuracy results, runtime (seconds), interval coverage and MSIS scores of the models on the M3 dataset by category.

The right-hand side of Table 4 provides the performance of interval coverage and the MSIS scores in terms of 90% and 98% prediction intervals of the two samplers. The Gibbs samplers achieves better coverage in comparison to the Stan implementation in most categories other than “Other series”. More generally, the results suggest that the M3 series tend to be better modelled using heteroscedasticity assumptions, particularly the quarterly series. It is also worth pointing out that Smyl et al. [22] commented that the L/SGT models can produce slightly narrow intervals. The intervals generated by the LSGT Gibbs sampler tend to be wider as we specify a more larger space of candidate ν𝜈\nuitalic_ν values in our grid. In contrast, the values used in the original Stan implementation appear to be insufficiently diverse; reducing the minimum ν𝜈\nuitalic_ν in the Stan implementation could potentially fix this problem, though sampling small values of ν𝜈\nuitalic_ν could also make the underlying sampling algorithm quite unstable, as Stan is known to have some issues handling heavy tailed distributions. Additionally, the implicit prior on ν𝜈\nuitalic_ν used in the LSGT model (in which the candidates are equi-distant in terms of symmetric KL divergence) would be potentially quite difficult to implement in Stan, as it does not allow for easy sampling from discrete parameter spaces. In regards to the MSIS scores, the LSGT Gibbs sampler achieves superior results when compared to the Stan version in all categories but monthly, and remains competitive even in this setting.

We additionally performed Wilcoxon signed rank tests of the proposed two Gibbs variants and the original Stan model. We rank the methods based on per-series performance with respect to sMAPE, MASE, MSIS90, and MSIS98. Table 5 provide the average per-series ranking and the corresponding p𝑝pitalic_p-values of the testing results. From previous Table 4, it can be seen that the Gibbs samplers achieved better accuracy than the original Stan L/SGT with respect to point forecast evaluation metrics. In line with the previous results, Table 5 show that the Gibbs samplers rank higher than the Stan version, even though the overall performance is not statistically significant at the 0.050.050.050.05 level. In terms of interval forecasting, the Stan model ranks slightly higher on average compared to both Gibbs variants. From Table 4, we see that the Gibbs variants achieve higher accuracy for all but the monthly series. However, the monthly series constitute approximately half of the overall M3 dataset, and it is therefore expected that the ranking results will be largely dominated by the performance on the monthly series; additionally, rankings do not take into account the degree of difference in performance, so the larger improvements of the LSGT on yearly series, for example, are not as impactful. However, overall, the proposed Gibbs samplers are clearly highly accurate and strongly competitive with, if not superior to, the original Stan implementation in terms of forecasting metrics, while being substantially faster.

Gibbs (homo) - Stan Gibbs (hetero) - Stan Gibbs (homo) - Gibbs (hetero)
Testing metric: sMAPE
Method left avg rank 1.44 1.44 1.51
Method right avg rank 1.56 1.56 1.49
p-value 0.75 0.69 0.94
Testing metric: MASE
Method left avg rank 1.44 1.44 1.51
Method right avg rank 1.56 1.56 1.49
p-value 0.65 0.62 0.96
Testing metric: MSIS90
Method left avg rank 1.66 1.66 1.47
Method right avg rank 1.34 1.34 1.53
p-value 0.002 0.003 0.94
Testing metric: MSIS98
Method left avg rank 1.80 1.83 1.39
Method right avg rank 1.20 1.17 1.61
p-value 5.93e-14 5.04e-16 0.44
Table 5: Per-series ranking and statistical significance results in terms of the accuracy and probabilistic metrics.

7 Ablation study

Instead of assigning Cauchy priors to the initial seasonal factors as per the original paper LGT model in Smyl et al. [22], we utilise horseshoe priors (see the hierarchy  23, 24, 25 and 26). As previously discussed (Section 4.1), these are a special class of priors that encourage sparsity by massing prior probability around the origin of the prior. If the log-seasonality terms are all shrunk to zero, then the multiplicative seasonality terms will be equal to one and no seasonality adjustment will occur. The motivation behind using these types of priors is to provide some robustness in the case that the user specifies seasonality, but there is no evidence in the data to support it. It is therefore of interest to test the performance of the horseshoe priors vis à vis Cauchy priors which do not encourage sparsity.

Table 6 summarizes the results of the ablation test. We applied the LSGT and Stan SGT/LGT models with seasonality to the monthly and yearly series. For the monthly series, we tried both horseshoe and Cauchy C(0,1)𝐶01C(0,1)italic_C ( 0 , 1 ) priors for the LSGT with a seasonality of 12. The upper part of Table 6 shows the results for these two priors; for this data, which likely has strong seasonal effects there is no real difference between the performance of the Cauchy and horseshoe priors, as they both have heavy tails. We are also interested in how robust the two priors are and their ability to distinguish if no seasonality actually occurs, even under seasonal presumptions. The lower half of Table 6 compares the results of the non-seasonal models and seasonal models with an arbitrary periodicity of 4444 applied to the yearly series. Models that use horseshoe priors remain competitive, while models with Cauchy priors perform worse under both accuracy metrics. This suggests that the horseshoe priors are more robust and likely to achieve better results even when a seasonal model is accidentally chosen for series that may not have much evidence of seasonality. The original SGT Stan implementation used a larger scale parameter of the Cauchy prior, i.e., a heavier tail, which has poorer ability to shrink towards zero. The final entry of Table 6 shows that this choice of prior results in very poor performance in comparison to the use of the horseshoe prior.

sMAPE MASE
Monthly series
LSGT Gibbs (homoscedastic, horseshoe prior) 13.94 0.83
LSGT Gibbs (heteroscedastic, horseshoe prior) 13.76 0.82
LSGT Gibbs (homoscedastic, Cauchy prior) 13.92 0.83
LSGT Gibbs (heteroscedastic, Cauchy prior) 13.78 0.83
Yearly series
LSGT Gibbs (homoscedastic error) 14.91 2.55
LSGT Gibbs (heteroscedastic error) 14.99 2.50
LSGT Gibbs (homoscedastic, horseshoe prior) 15.35 2.62
LSGT Gibbs (heteroscedastic, horseshoe prior) 15.37 2.55
LSGT Gibbs (homoscedastic, Cauchy prior) 15.81 2.72
LSGT Gibbs (heteroscedastic, Cauchy prior) 15.56 2.61
SGT Stan (Cauchy prior C(0,4)𝐶04C(0,4)italic_C ( 0 , 4 )) 16.56 2.74
Table 6: Ablation study of the priors for the initial seasonal factors.

8 Conclusion

In this paper we have presented a fast and accurate Gibbs sampler for posterior exploration of the LSGT model. The LSGT is an extension of the classical exponential smoothing model which has the ability to capture the an heteroscedastic error structure, and super-linear but sub-exponential trends, with non-normal errors. We have combined the seasonal and non-seasonal variants presented in the work of Smyl et al. [22] into a single formulation, and modified the model to improve statistical coherence and the efficiency of the sampling process. In comparison to the original Stan implementation, the proposed Gibbs sampler demonstrated highly accurate performance, and importantly, is much faster, significantly reducing the computational effort required to explore the posterior distribution. The novel use of horseshoe priors in place of Cauchy priors for the seasonal factors has been demonstrated to improve the robustness of the model under both seasonal and non-seasonal conditions.

Despite the new Gibbs sampler being considerably faster than the Stan implementation, it still remains orders of magnitude slower than the classic ETS models. However, the LGT model is designed for data-scarce case, rather than the setting of big data, where global models are potentially more suitable. The promising features of the LSGT model, coupled with an efficient sampling algorithm, means that the LSGT is a feasible, and attractive algorithm for real-world univariate, seasonal and non-seasonal, forecasting applications.

Appendix A Bayesian hierarchy for the LSGT model

The complete Bayesian hierarchy, including scale-mixture expansions, for the LSGT model is given below:

yt+1|y^t+1,σ^t+1,ωt+12conditionalsubscript𝑦𝑡1subscript^𝑦𝑡1subscript^𝜎𝑡1superscriptsubscript𝜔𝑡12\displaystyle y_{t+1}\,|\,\hat{y}_{t+1},\hat{\sigma}_{t+1},\omega_{t+1}^{2}italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT | over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT N(y^t+1,σ^t+12ωt+12),t=1,,T1formulae-sequencesimilar-toabsent𝑁subscript^𝑦𝑡1superscriptsubscript^𝜎𝑡12superscriptsubscript𝜔𝑡12𝑡1𝑇1\displaystyle\sim N(\hat{y}_{t+1},\hat{\sigma}_{t+1}^{2}\omega_{t+1}^{2}),\;t=% 1,\ldots,T-1∼ italic_N ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , italic_t = 1 , … , italic_T - 1

with

ωt+12|νconditionalsuperscriptsubscript𝜔𝑡12𝜈\displaystyle\omega_{t+1}^{2}\,|\,\nuitalic_ω start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | italic_ν IG(ν2,ν2),t=1,,T1formulae-sequencesimilar-toabsentIG𝜈2𝜈2t1T1\displaystyle\sim\rm{IG}\left(\frac{\nu}{2},\frac{\nu}{2}\right),\;t=1,\ldots,% T-1∼ roman_IG ( divide start_ARG italic_ν end_ARG start_ARG 2 end_ARG , divide start_ARG italic_ν end_ARG start_ARG 2 end_ARG ) , roman_t = 1 , … , roman_T - 1
ν𝜈\displaystyle\nuitalic_ν U(νl,νu),(νl=1.6,νu=1000)similar-toabsent𝑈subscript𝜈𝑙subscript𝜈𝑢formulae-sequencesubscript𝜈𝑙1.6subscript𝜈𝑢1000\displaystyle\sim U(\nu_{l},\nu_{u}),\;\;(\nu_{l}=1.6,\,\nu_{u}=1000)∼ italic_U ( italic_ν start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , italic_ν start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ) , ( italic_ν start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = 1.6 , italic_ν start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = 1000 )
ρ𝜌\displaystyle\rhoitalic_ρ U(0.5,1),absent𝑈0.51\displaystyle\in U(-0.5,1),∈ italic_U ( - 0.5 , 1 ) ,
α,β,ζ𝛼𝛽𝜁\displaystyle\alpha,\beta,\zetaitalic_α , italic_β , italic_ζ Beta(1,1/2),similar-toabsentBeta112\displaystyle\sim\rm{Beta}(1,1/2),∼ roman_Beta ( 1 , 1 / 2 ) ,
γ|ξγ2,sγconditional𝛾superscriptsubscript𝜉𝛾2subscript𝑠𝛾\displaystyle\gamma\,|\,\xi_{\gamma}^{2},s_{\gamma}italic_γ | italic_ξ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_s start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT N(0,ξγ2sγ2),ξγ2IG(12,12),formulae-sequencesimilar-toabsent𝑁0superscriptsubscript𝜉𝛾2superscriptsubscript𝑠𝛾2similar-tosuperscriptsubscript𝜉𝛾2IG1212\displaystyle\sim N(0,\xi_{\gamma}^{2}s_{\gamma}^{2}),\;\xi_{\gamma}^{2}\sim% \rm{IG}\left(\frac{1}{2},\frac{1}{2}\right),∼ italic_N ( 0 , italic_ξ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , italic_ξ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∼ roman_IG ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) ,
λ|ξλ2,sλconditional𝜆superscriptsubscript𝜉𝜆2subscript𝑠𝜆\displaystyle\lambda\,|\,\xi_{\lambda}^{2},s_{\lambda}italic_λ | italic_ξ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_s start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT N(0,ξλ2sλ2),ξλ2IG(12,12),formulae-sequencesimilar-toabsent𝑁0superscriptsubscript𝜉𝜆2superscriptsubscript𝑠𝜆2similar-tosuperscriptsubscript𝜉𝜆2IG1212\displaystyle\sim N(0,\xi_{\lambda}^{2}s_{\lambda}^{2}),\;\xi_{\lambda}^{2}% \sim\rm{IG}\left(\frac{1}{2},\frac{1}{2}\right),∼ italic_N ( 0 , italic_ξ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , italic_ξ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∼ roman_IG ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) ,
b1|ξb12,sb1conditionalsubscript𝑏1superscriptsubscript𝜉subscript𝑏12subscript𝑠subscript𝑏1\displaystyle b_{1}\,|\,\xi_{b_{1}}^{2},s_{b_{1}}italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | italic_ξ start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_s start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT N(0,ξb12sb12),ξb12IG(12,12),b1(100,1)formulae-sequencesimilar-toabsent𝑁0superscriptsubscript𝜉subscript𝑏12superscriptsubscript𝑠subscript𝑏12formulae-sequencesimilar-tosuperscriptsubscript𝜉subscript𝑏12IG1212subscriptb11001\displaystyle\sim N(0,\xi_{b_{1}}^{2}s_{b_{1}}^{2}),\;\xi_{b_{1}}^{2}\sim\rm{% IG}\left(\frac{1}{2},\frac{1}{2}\right),\;\;b_{1}\in(-100,1)∼ italic_N ( 0 , italic_ξ start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , italic_ξ start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∼ roman_IG ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) , roman_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ ( - 100 , 1 )
sλsubscript𝑠𝜆\displaystyle s_{\lambda}italic_s start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT =1,sγ,sb1=max(𝐲)/100,formulae-sequenceabsent1subscript𝑠𝛾subscript𝑠subscript𝑏1max𝐲100\displaystyle=1,\;s_{\gamma},s_{b_{1}}={\rm max}({\bf y})/100,= 1 , italic_s start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = roman_max ( bold_y ) / 100 ,
χ2superscript𝜒2\displaystyle\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 1χ2dχ2,similar-toabsent1superscript𝜒2𝑑superscript𝜒2\displaystyle\sim\frac{1}{\chi^{2}}d{\chi^{2}},∼ divide start_ARG 1 end_ARG start_ARG italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_d italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,
ϕitalic-ϕ\displaystyle\phiitalic_ϕ U(0,1),similar-toabsent𝑈01\displaystyle\sim U(0,1),∼ italic_U ( 0 , 1 ) ,
τ𝜏\displaystyle\tauitalic_τ U(0,1),similar-toabsent𝑈01\displaystyle\sim U(0,1),∼ italic_U ( 0 , 1 ) ,
logsi|ψsi2,δ2conditionallogsubscript𝑠𝑖subscriptsuperscript𝜓2subscript𝑠𝑖superscript𝛿2\displaystyle{\rm log}s_{i}\,|\,\psi^{2}_{s_{i}},\delta^{2}roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_ψ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT N(0,ψsi2δ2),similar-toabsent𝑁0subscriptsuperscript𝜓2subscript𝑠𝑖superscript𝛿2\displaystyle\sim N(0,\psi^{2}_{s_{i}}\delta^{2}),∼ italic_N ( 0 , italic_ψ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ,
ψsi2|ηsiconditionalsuperscriptsubscript𝜓subscript𝑠𝑖2subscript𝜂subscript𝑠𝑖\displaystyle\psi_{s_{i}}^{2}\,|\,\eta_{s_{i}}italic_ψ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | italic_η start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT IG(12,1ηsi),similar-toabsentIG121subscript𝜂subscriptsi\displaystyle\sim\rm{IG}\left(\frac{1}{2},\frac{1}{\eta_{s_{i}}}\right),∼ roman_IG ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , divide start_ARG 1 end_ARG start_ARG italic_η start_POSTSUBSCRIPT roman_s start_POSTSUBSCRIPT roman_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ) ,
δ|ηδconditional𝛿subscript𝜂𝛿\displaystyle\delta\,|\,\eta_{\delta}italic_δ | italic_η start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT IG(12,1ηδ),similar-toabsentIG121subscript𝜂𝛿\displaystyle\sim\rm{IG}\left(\frac{1}{2},\frac{1}{\eta_{\delta}}\right),∼ roman_IG ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , divide start_ARG 1 end_ARG start_ARG italic_η start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT end_ARG ) ,
ηs1,,ηsm,ηδsubscript𝜂subscript𝑠1subscript𝜂subscript𝑠𝑚subscript𝜂𝛿\displaystyle\eta_{s_{1}},\dots,\eta_{s_{m}},\eta_{\delta}italic_η start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , italic_η start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT IG(12,1).similar-toabsentIG121\displaystyle\sim\rm{IG}\left(\frac{1}{2},1\right).∼ roman_IG ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG , 1 ) .

where

y^t+1subscript^𝑦𝑡1\displaystyle\hat{y}_{t+1}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT =(lt+γltρ+λbt)st+1m,absentsubscript𝑙𝑡𝛾superscriptsubscript𝑙𝑡𝜌𝜆subscript𝑏𝑡subscript𝑠𝑡1𝑚\displaystyle=(l_{t}+\gamma l_{t}^{\rho}+\lambda b_{t})s_{t+1-m},= ( italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_γ italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ρ end_POSTSUPERSCRIPT + italic_λ italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_s start_POSTSUBSCRIPT italic_t + 1 - italic_m end_POSTSUBSCRIPT ,
ltsubscript𝑙𝑡\displaystyle l_{t}italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =α(ytstm)+(1α)lt1,absent𝛼subscript𝑦𝑡subscript𝑠𝑡𝑚1𝛼subscript𝑙𝑡1\displaystyle=\alpha\left(\frac{y_{t}}{s_{t-m}}\right)+(1-\alpha)l_{t-1},= italic_α ( divide start_ARG italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_s start_POSTSUBSCRIPT italic_t - italic_m end_POSTSUBSCRIPT end_ARG ) + ( 1 - italic_α ) italic_l start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ,
btsubscript𝑏𝑡\displaystyle b_{t}italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =β(ltlt1)+(1β)bt1,absent𝛽subscript𝑙𝑡subscript𝑙𝑡11𝛽subscript𝑏𝑡1\displaystyle=\beta(l_{t}-l_{t-1})+(1-\beta)b_{t-1},= italic_β ( italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_l start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) + ( 1 - italic_β ) italic_b start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ,
logstlogsubscript𝑠𝑡\displaystyle{\rm log}s_{t}roman_log italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =ζlogytlt+(1ζ)logstm,absent𝜁logsubscript𝑦𝑡subscript𝑙𝑡1𝜁logsubscript𝑠𝑡𝑚\displaystyle=\zeta{\rm log}\frac{y_{t}}{l_{t}}+(1-\zeta){\rm log}s_{t-m},= italic_ζ roman_log divide start_ARG italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG + ( 1 - italic_ζ ) roman_log italic_s start_POSTSUBSCRIPT italic_t - italic_m end_POSTSUBSCRIPT ,
σ^t+12superscriptsubscript^𝜎𝑡12\displaystyle\hat{\sigma}_{t+1}^{2}over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =χ2(ϕ2+(1ϕ)2lt2τ),absentsuperscript𝜒2superscriptitalic-ϕ2superscript1italic-ϕ2superscriptsubscript𝑙𝑡2𝜏\displaystyle=\chi^{2}(\phi^{2}+(1-\phi)^{2}l_{t}^{2\tau}),= italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 1 - italic_ϕ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_τ end_POSTSUPERSCRIPT ) ,

subject to

imlogsi=0.superscriptsubscript𝑖𝑚logsubscript𝑠𝑖0\sum_{i}^{m}{\rm log}s_{i}=0.∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 .

Appendix B Derivation of normal conjugate prior

The posterior for a normal (joint) likelihood with a normal prior also follows a normal distribution,

yi|wN((wxi+ci)si,σi2),i=1,,n,andwN(0,A2),formulae-sequencesimilar-toconditionalsubscript𝑦𝑖𝑤𝑁𝑤subscript𝑥𝑖subscript𝑐𝑖subscript𝑠𝑖superscriptsubscript𝜎𝑖2formulae-sequence𝑖1𝑛similar-toand𝑤𝑁0superscript𝐴2y_{i}|w\sim N((wx_{i}+c_{i})s_{i},\sigma_{i}^{2}),\;i=1,\dots,n,\;\;{\rm and}% \;\;w\sim N(0,A^{2}),italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_w ∼ italic_N ( ( italic_w italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , italic_i = 1 , … , italic_n , roman_and italic_w ∼ italic_N ( 0 , italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (31)

then wN(μ~,σ~2)similar-to𝑤𝑁~𝜇superscript~𝜎2w\sim N(\tilde{\mu},\tilde{\sigma}^{2})italic_w ∼ italic_N ( over~ start_ARG italic_μ end_ARG , over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) where

μ~=1σ~2i=1nxisi(yicisi)σi2andσ~2=(i=1nxi2si2σi2+1A2)1.~𝜇1superscript~𝜎2superscriptsubscript𝑖1𝑛subscript𝑥𝑖subscript𝑠𝑖subscript𝑦𝑖subscript𝑐𝑖subscript𝑠𝑖superscriptsubscript𝜎𝑖2andsuperscript~𝜎2superscriptsuperscriptsubscript𝑖1𝑛superscriptsubscript𝑥𝑖2superscriptsubscript𝑠𝑖2superscriptsubscript𝜎𝑖21superscript𝐴21\tilde{\mu}=\frac{1}{\tilde{\sigma}^{2}}\sum_{i=1}^{n}\frac{x_{i}s_{i}(y_{i}-c% _{i}s_{i})}{\sigma_{i}^{2}}\;\;{\rm and}\;\;\tilde{\sigma}^{2}=\left(\sum_{i=1% }^{n}\frac{x_{i}^{2}s_{i}^{2}}{\sigma_{i}^{2}}+\frac{1}{A^{2}}\right)^{-1}.over~ start_ARG italic_μ end_ARG = divide start_ARG 1 end_ARG start_ARG over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_and over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT . (32)

The derivation is given as follows. The conditional posterior is obtained by multiplying the normal density,

p(w|y1,y2,yn)i=1n(12πσi2)1/2exp(12σi2(yi(wxi+ci)si)2)(12πA2)1/2exp(w22A2).proportional-to𝑝conditional𝑤subscript𝑦1subscript𝑦2subscript𝑦𝑛superscriptsubscriptproduct𝑖1𝑛superscript12𝜋superscriptsubscript𝜎𝑖212exp12superscriptsubscript𝜎𝑖2superscriptsubscript𝑦𝑖𝑤subscript𝑥𝑖subscript𝑐𝑖subscript𝑠𝑖2superscript12𝜋superscript𝐴212expsuperscript𝑤22superscript𝐴2p(w\,|\,y_{1},y_{2},\dots y_{n})\propto\prod_{i=1}^{n}\left(\frac{1}{2\pi% \sigma_{i}^{2}}\right)^{1/2}{\rm exp}\left(-\frac{1}{2\sigma_{i}^{2}}(y_{i}-(% wx_{i}+c_{i})s_{i})^{2}\right)\left(\frac{1}{2\pi A^{2}}\right)^{1/2}{\rm exp}% \left(-\frac{w^{2}}{2A^{2}}\right).italic_p ( italic_w | italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∝ ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 italic_π italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - ( italic_w italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ( divide start_ARG 1 end_ARG start_ARG 2 italic_π italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT roman_exp ( - divide start_ARG italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) .

As only the exponential terms depend on w𝑤witalic_w, so we drop the other terms and get

p(w|y1,y2,yn)𝑝conditional𝑤subscript𝑦1subscript𝑦2subscript𝑦𝑛\displaystyle p(w\,|\,y_{1},y_{2},\dots y_{n})italic_p ( italic_w | italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) i=1nexp(12σi2(yi(wxi+ci)si)2)exp(w22A2)proportional-toabsentsuperscriptsubscriptproduct𝑖1𝑛exp12superscriptsubscript𝜎𝑖2superscriptsubscript𝑦𝑖𝑤subscript𝑥𝑖subscript𝑐𝑖subscript𝑠𝑖2expsuperscript𝑤22superscript𝐴2\displaystyle\propto\prod_{i=1}^{n}{\rm exp}\left(-\frac{1}{2\sigma_{i}^{2}}(y% _{i}-(wx_{i}+c_{i})s_{i})^{2}\right){\rm exp}\left(-\frac{w^{2}}{2A^{2}}\right)∝ ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - ( italic_w italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) roman_exp ( - divide start_ARG italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG )
exp(12i=1n(yi(wxi+ci)si)2σi2w22A2).proportional-toabsentexp12superscriptsubscript𝑖1𝑛superscriptsubscript𝑦𝑖𝑤subscript𝑥𝑖subscript𝑐𝑖subscript𝑠𝑖2superscriptsubscript𝜎𝑖2superscript𝑤22superscript𝐴2\displaystyle\propto{\rm exp}\left(-\frac{1}{2}\sum_{i=1}^{n}\frac{(y_{i}-(wx_% {i}+c_{i})s_{i})^{2}}{\sigma_{i}^{2}}-\frac{w^{2}}{2A^{2}}\right).∝ roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - ( italic_w italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - divide start_ARG italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) .

Then we expand the square term in the summation,

p(w|y1,y2,yn)exp(12(w2i=1nxi2si2σi22wi=1nxisi(yicisi)σi2+w2A2+i=1n(yicisi)2σi2)).proportional-to𝑝conditional𝑤subscript𝑦1subscript𝑦2subscript𝑦𝑛exp12superscript𝑤2superscriptsubscript𝑖1𝑛superscriptsubscript𝑥𝑖2superscriptsubscript𝑠𝑖2superscriptsubscript𝜎𝑖22𝑤superscriptsubscript𝑖1𝑛subscript𝑥𝑖subscript𝑠𝑖subscript𝑦𝑖subscript𝑐𝑖subscript𝑠𝑖superscriptsubscript𝜎𝑖2superscript𝑤2superscript𝐴2superscriptsubscript𝑖1𝑛superscriptsubscript𝑦𝑖subscript𝑐𝑖subscript𝑠𝑖2superscriptsubscript𝜎𝑖2p(w\,|\,y_{1},y_{2},\dots y_{n})\propto{\rm exp}\left(-\frac{1}{2}\left(w^{2}% \sum_{i=1}^{n}\frac{x_{i}^{2}s_{i}^{2}}{\sigma_{i}^{2}}-2w\sum_{i=1}^{n}\frac{% x_{i}s_{i}(y_{i}-c_{i}s_{i})}{\sigma_{i}^{2}}+\frac{w^{2}}{A^{2}}+\sum_{i=1}^{% n}\frac{(y_{i}-c_{i}s_{i})^{2}}{\sigma_{i}^{2}}\right)\right).italic_p ( italic_w | italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∝ roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - 2 italic_w ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ) .

Again, if we drop the constants,

p(w|y1,y2,yn)𝑝conditional𝑤subscript𝑦1subscript𝑦2subscript𝑦𝑛\displaystyle p(w\,|\,y_{1},y_{2},\dots y_{n})italic_p ( italic_w | italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) exp(12(w2i=1nxi2si2σi22wi=1nxisi(yicisi)σi2+w2A2))proportional-toabsentexp12superscript𝑤2superscriptsubscript𝑖1𝑛superscriptsubscript𝑥𝑖2superscriptsubscript𝑠𝑖2superscriptsubscript𝜎𝑖22𝑤superscriptsubscript𝑖1𝑛subscript𝑥𝑖subscript𝑠𝑖subscript𝑦𝑖subscript𝑐𝑖subscript𝑠𝑖superscriptsubscript𝜎𝑖2superscript𝑤2superscript𝐴2\displaystyle\propto{\rm exp}\left(-\frac{1}{2}\left(w^{2}\sum_{i=1}^{n}\frac{% x_{i}^{2}s_{i}^{2}}{\sigma_{i}^{2}}-2w\sum_{i=1}^{n}\frac{x_{i}s_{i}(y_{i}-c_{% i}s_{i})}{\sigma_{i}^{2}}+\frac{w^{2}}{A^{2}}\right)\right)∝ roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - 2 italic_w ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) )
exp(12(w2(i=1nxi2si2σi2+1A2)2wi=1nxisi(yicisi)σi2)),proportional-toabsentexp12superscript𝑤2superscriptsubscript𝑖1𝑛superscriptsubscript𝑥𝑖2superscriptsubscript𝑠𝑖2superscriptsubscript𝜎𝑖21superscript𝐴22𝑤superscriptsubscript𝑖1𝑛subscript𝑥𝑖subscript𝑠𝑖subscript𝑦𝑖subscript𝑐𝑖subscript𝑠𝑖superscriptsubscript𝜎𝑖2\displaystyle\propto{\rm exp}\left(-\frac{1}{2}\left(w^{2}\left(\sum_{i=1}^{n}% \frac{x_{i}^{2}s_{i}^{2}}{\sigma_{i}^{2}}+\frac{1}{A^{2}}\right)-2w\sum_{i=1}^% {n}\frac{x_{i}s_{i}(y_{i}-c_{i}s_{i})}{\sigma_{i}^{2}}\right)\right),∝ roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) - 2 italic_w ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ) ,

which is proportional to a normal distribution with mean μ~~𝜇\tilde{\mu}over~ start_ARG italic_μ end_ARG and variance σ~2superscript~𝜎2\tilde{\sigma}^{2}over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. If we further tidy up the above posterior,

p(w|y1,y2,yn)exp(12(i=1nxi2si2σi2+1A2)(w22wi=1nxisi(yicisi)σi2i=1nxi2si2σi2+1A2)),proportional-to𝑝conditional𝑤subscript𝑦1subscript𝑦2subscript𝑦𝑛exp12superscriptsubscript𝑖1𝑛superscriptsubscript𝑥𝑖2superscriptsubscript𝑠𝑖2superscriptsubscript𝜎𝑖21superscript𝐴2superscript𝑤22𝑤superscriptsubscript𝑖1𝑛subscript𝑥𝑖subscript𝑠𝑖subscript𝑦𝑖subscript𝑐𝑖subscript𝑠𝑖superscriptsubscript𝜎𝑖2superscriptsubscript𝑖1𝑛superscriptsubscript𝑥𝑖2superscriptsubscript𝑠𝑖2superscriptsubscript𝜎𝑖21superscript𝐴2p(w\,|\,y_{1},y_{2},\dots y_{n})\propto{\rm exp}\left(-\frac{1}{2}\left(\sum_{% i=1}^{n}\frac{x_{i}^{2}s_{i}^{2}}{\sigma_{i}^{2}}+\frac{1}{A^{2}}\right)\left(% w^{2}-2w\frac{\sum_{i=1}^{n}\frac{x_{i}s_{i}(y_{i}-c_{i}s_{i})}{\sigma_{i}^{2}% }}{\sum_{i=1}^{n}\frac{x_{i}^{2}s_{i}^{2}}{\sigma_{i}^{2}}+\frac{1}{A^{2}}}% \right)\right),italic_p ( italic_w | italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∝ roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ( italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_w divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG ) ) ,

thus we get

σ~2=(i=1nxi2si2σi2+1A2)1,superscript~𝜎2superscriptsuperscriptsubscript𝑖1𝑛superscriptsubscript𝑥𝑖2superscriptsubscript𝑠𝑖2superscriptsubscript𝜎𝑖21superscript𝐴21\tilde{\sigma}^{2}=\left(\sum_{i=1}^{n}\frac{x_{i}^{2}s_{i}^{2}}{\sigma_{i}^{2% }}+\frac{1}{A^{2}}\right)^{-1},over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ,

and

μ~=1σ~2i=1nxisi(yicisi)σi2,~𝜇1superscript~𝜎2superscriptsubscript𝑖1𝑛subscript𝑥𝑖subscript𝑠𝑖subscript𝑦𝑖subscript𝑐𝑖subscript𝑠𝑖superscriptsubscript𝜎𝑖2\tilde{\mu}=\frac{1}{\tilde{\sigma}^{2}}\sum_{i=1}^{n}\frac{x_{i}s_{i}(y_{i}-c% _{i}s_{i})}{\sigma_{i}^{2}},over~ start_ARG italic_μ end_ARG = divide start_ARG 1 end_ARG start_ARG over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ,

which matches (32).

Appendix C Derivation of the conditional for b1subscript𝑏1b_{1}italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT

The initial value of the local trend b1subscript𝑏1b_{1}italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is fitted for the non-seasonal model, i.e., st=1subscript𝑠𝑡1s_{t}=1italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 1. From (16), (14) can be expressed w.r.t. b1subscript𝑏1b_{1}italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT,

y^t+1=λ(1β)t1b1+ct+1,subscript^𝑦𝑡1𝜆superscript1𝛽𝑡1subscript𝑏1subscript𝑐𝑡1\hat{y}_{t+1}=\lambda(1-\beta)^{t-1}b_{1}+c_{t+1},over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_λ ( 1 - italic_β ) start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_c start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ,

where ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT denotes the remaining constant at time t𝑡titalic_t. The posterior distribution follows the pattern in (31) which can be derived by (32), with xi=λ(1β)t1subscript𝑥𝑖𝜆superscript1𝛽𝑡1x_{i}=\lambda(1-\beta)^{t-1}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_λ ( 1 - italic_β ) start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT and si=1subscript𝑠𝑖1s_{i}=1italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1. However, it is much more complicated to directly calculate the remaining constant in this case. Note that y^i=xiw+cisubscript^𝑦𝑖subscript𝑥𝑖𝑤subscript𝑐𝑖\hat{y}_{i}=x_{i}w+c_{i}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_w + italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the posterior mean can be expressed in an alternative form from (32) by substituting cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with y^ixiwsubscript^𝑦𝑖subscript𝑥𝑖𝑤\hat{y}_{i}-x_{i}wover^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_w, so that

μ~=1σ~2i=1nxi(yici)σi2=1σ~2i=1nxi2w+xi(yiy^i)σi2,~𝜇1superscript~𝜎2superscriptsubscript𝑖1𝑛subscript𝑥𝑖subscript𝑦𝑖subscript𝑐𝑖superscriptsubscript𝜎𝑖21superscript~𝜎2superscriptsubscript𝑖1𝑛superscriptsubscript𝑥𝑖2𝑤subscript𝑥𝑖subscript𝑦𝑖subscript^𝑦𝑖superscriptsubscript𝜎𝑖2\tilde{\mu}=\frac{1}{\tilde{\sigma}^{2}}\sum_{i=1}^{n}\frac{x_{i}(y_{i}-c_{i})% }{\sigma_{i}^{2}}=\frac{1}{\tilde{\sigma}^{2}}\sum_{i=1}^{n}\frac{x_{i}^{2}w+x% _{i}(y_{i}-\hat{y}_{i})}{\sigma_{i}^{2}},over~ start_ARG italic_μ end_ARG = divide start_ARG 1 end_ARG start_ARG over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = divide start_ARG 1 end_ARG start_ARG over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_w + italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ,

given the current value of w𝑤witalic_w.

Appendix D Derivation of the gradients for sampling α𝛼\alphaitalic_α and β𝛽\betaitalic_β

The gradients are calculated based on chain rule. We first derive the gradients for α𝛼\alphaitalic_α as the following. With L(α,β,ζ)𝐿𝛼𝛽𝜁L(\alpha,\beta,\zeta)italic_L ( italic_α , italic_β , italic_ζ ) defined in (27),

L(α,β,ζ)α=ν2t=1T11σ^t+12σ^t+12α+ν+12t=1T1νσ^t+12α+et+12ανσ^t+12+et+12.𝐿𝛼𝛽𝜁𝛼𝜈2superscriptsubscript𝑡1𝑇11superscriptsubscript^𝜎𝑡12superscriptsubscript^𝜎𝑡12𝛼𝜈12superscriptsubscript𝑡1𝑇1𝜈superscriptsubscript^𝜎𝑡12𝛼superscriptsubscript𝑒𝑡12𝛼𝜈superscriptsubscript^𝜎𝑡12superscriptsubscript𝑒𝑡12\dfrac{\partial L(\alpha,\beta,\zeta)}{\partial\alpha}=-\frac{\nu}{2}\sum_{t=1% }^{T-1}\frac{1}{\hat{\sigma}_{t+1}^{2}}\dfrac{\partial\hat{\sigma}_{t+1}^{2}}{% \partial\alpha}+\frac{\nu+1}{2}\sum_{t=1}^{T-1}\frac{\nu\dfrac{\partial\hat{% \sigma}_{t+1}^{2}}{\partial\alpha}+\dfrac{\partial e_{t+1}^{2}}{\partial\alpha% }}{\nu\hat{\sigma}_{t+1}^{2}+e_{t+1}^{2}}.divide start_ARG ∂ italic_L ( italic_α , italic_β , italic_ζ ) end_ARG start_ARG ∂ italic_α end_ARG = - divide start_ARG italic_ν end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG ∂ over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_α end_ARG + divide start_ARG italic_ν + 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT divide start_ARG italic_ν divide start_ARG ∂ over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_α end_ARG + divide start_ARG ∂ italic_e start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_α end_ARG end_ARG start_ARG italic_ν over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_e start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

Since et+12=(yt+1y^t+1)2superscriptsubscript𝑒𝑡12superscriptsubscript𝑦𝑡1subscript^𝑦𝑡12e_{t+1}^{2}=(y_{t+1}-\hat{y}_{t+1})^{2}italic_e start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ( italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, we get

et+12α=2(yt+1y^t+1)y^t+1α.superscriptsubscript𝑒𝑡12𝛼2subscript𝑦𝑡1subscript^𝑦𝑡1subscript^𝑦𝑡1𝛼\dfrac{\partial e_{t+1}^{2}}{\partial\alpha}=-2(y_{t+1}-\hat{y}_{t+1})\dfrac{% \partial\hat{y}_{t+1}}{\partial\alpha}.divide start_ARG ∂ italic_e start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_α end_ARG = - 2 ( italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) divide start_ARG ∂ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_α end_ARG .

From (18), we have

σ^t+12α=2τχ2(1ϕ)2lt2τ1ltα.superscriptsubscript^𝜎𝑡12𝛼2𝜏superscript𝜒2superscript1italic-ϕ2superscriptsubscript𝑙𝑡2𝜏1subscript𝑙𝑡𝛼\dfrac{\partial\hat{\sigma}_{t+1}^{2}}{\partial\alpha}=2\tau\chi^{2}(1-\phi)^{% 2}l_{t}^{2\tau-1}\dfrac{\partial l_{t}}{\partial\alpha}.divide start_ARG ∂ over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_α end_ARG = 2 italic_τ italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_ϕ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_τ - 1 end_POSTSUPERSCRIPT divide start_ARG ∂ italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_α end_ARG .

Then, we calculate y^t+1αsubscript^𝑦𝑡1𝛼\dfrac{\partial\hat{y}_{t+1}}{\partial\alpha}divide start_ARG ∂ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_α end_ARG and ltαsubscript𝑙𝑡𝛼\dfrac{\partial l_{t}}{\partial\alpha}divide start_ARG ∂ italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_α end_ARG recursively. According to (14) and st=1subscript𝑠𝑡1s_{t}=1italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 1 for the non-seasonal model, we get

y^t+1α=ltα+γρltρ1ltα+λbtα,subscript^𝑦𝑡1𝛼subscript𝑙𝑡𝛼𝛾𝜌superscriptsubscript𝑙𝑡𝜌1subscript𝑙𝑡𝛼𝜆subscript𝑏𝑡𝛼\dfrac{\partial\hat{y}_{t+1}}{\partial\alpha}=\dfrac{\partial l_{t}}{\partial% \alpha}+\gamma\rho l_{t}^{\rho-1}\dfrac{\partial l_{t}}{\partial\alpha}+% \lambda\dfrac{\partial b_{t}}{\partial\alpha},divide start_ARG ∂ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_α end_ARG = divide start_ARG ∂ italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_α end_ARG + italic_γ italic_ρ italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ρ - 1 end_POSTSUPERSCRIPT divide start_ARG ∂ italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_α end_ARG + italic_λ divide start_ARG ∂ italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_α end_ARG ,

with 15, we then derive

ltα=ytlt1+(1α)lt1α,subscript𝑙𝑡𝛼subscript𝑦𝑡subscript𝑙𝑡11𝛼subscript𝑙𝑡1𝛼\dfrac{\partial l_{t}}{\partial\alpha}=y_{t}-l_{t-1}+(1-\alpha)\dfrac{\partial l% _{t-1}}{\partial\alpha},divide start_ARG ∂ italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_α end_ARG = italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_l start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + ( 1 - italic_α ) divide start_ARG ∂ italic_l start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_α end_ARG ,

and

btα=β(ltαlt1α)+(1β)bt1α,subscript𝑏𝑡𝛼𝛽subscript𝑙𝑡𝛼subscript𝑙𝑡1𝛼1𝛽subscript𝑏𝑡1𝛼\dfrac{\partial b_{t}}{\partial\alpha}=\beta\left(\dfrac{\partial l_{t}}{% \partial\alpha}-\dfrac{\partial l_{t-1}}{\partial\alpha}\right)+(1-\beta)% \dfrac{\partial b_{t-1}}{\partial\alpha},divide start_ARG ∂ italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_α end_ARG = italic_β ( divide start_ARG ∂ italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_α end_ARG - divide start_ARG ∂ italic_l start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_α end_ARG ) + ( 1 - italic_β ) divide start_ARG ∂ italic_b start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_α end_ARG ,

with the initial states being

l1α=0,b1α=0.formulae-sequencesubscript𝑙1𝛼0subscript𝑏1𝛼0\dfrac{\partial l_{1}}{\partial\alpha}=0,\dfrac{\partial b_{1}}{\partial\alpha% }=0.divide start_ARG ∂ italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_α end_ARG = 0 , divide start_ARG ∂ italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_α end_ARG = 0 .

The calculation is similar for β𝛽\betaitalic_β, firstly from (27),

L(α,β,ζ)β=ν2t=1T11σ^t+12σ^t+12β+ν+12t=1T1νσ^t+12β+et+12βνσ^t+12+et+12,𝐿𝛼𝛽𝜁𝛽𝜈2superscriptsubscript𝑡1𝑇11superscriptsubscript^𝜎𝑡12superscriptsubscript^𝜎𝑡12𝛽𝜈12superscriptsubscript𝑡1𝑇1𝜈superscriptsubscript^𝜎𝑡12𝛽superscriptsubscript𝑒𝑡12𝛽𝜈superscriptsubscript^𝜎𝑡12superscriptsubscript𝑒𝑡12\dfrac{\partial L(\alpha,\beta,\zeta)}{\partial\beta}=-\frac{\nu}{2}\sum_{t=1}% ^{T-1}\frac{1}{\hat{\sigma}_{t+1}^{2}}\dfrac{\partial\hat{\sigma}_{t+1}^{2}}{% \partial\beta}+\frac{\nu+1}{2}\sum_{t=1}^{T-1}\frac{\nu\dfrac{\partial\hat{% \sigma}_{t+1}^{2}}{\partial\beta}+\dfrac{\partial e_{t+1}^{2}}{\partial\beta}}% {\nu\hat{\sigma}_{t+1}^{2}+e_{t+1}^{2}},divide start_ARG ∂ italic_L ( italic_α , italic_β , italic_ζ ) end_ARG start_ARG ∂ italic_β end_ARG = - divide start_ARG italic_ν end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG ∂ over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_β end_ARG + divide start_ARG italic_ν + 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT divide start_ARG italic_ν divide start_ARG ∂ over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_β end_ARG + divide start_ARG ∂ italic_e start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_β end_ARG end_ARG start_ARG italic_ν over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_e start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ,

and

et+12β=2(yt+1y^t+1)y^t+1β.superscriptsubscript𝑒𝑡12𝛽2subscript𝑦𝑡1subscript^𝑦𝑡1subscript^𝑦𝑡1𝛽\dfrac{\partial e_{t+1}^{2}}{\partial\beta}=-2(y_{t+1}-\hat{y}_{t+1})\dfrac{% \partial\hat{y}_{t+1}}{\partial\beta}.divide start_ARG ∂ italic_e start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_β end_ARG = - 2 ( italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) divide start_ARG ∂ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_β end_ARG .

The error term does not contain β𝛽\betaitalic_β, which means σ^t+12β=0.superscriptsubscript^𝜎𝑡12𝛽0\dfrac{\partial\hat{\sigma}_{t+1}^{2}}{\partial\beta}=0.divide start_ARG ∂ over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_β end_ARG = 0 . Then from (14) we get

y^t+1β=λbtβ,subscript^𝑦𝑡1𝛽𝜆subscript𝑏𝑡𝛽\dfrac{\partial\hat{y}_{t+1}}{\partial\beta}=\lambda\dfrac{\partial b_{t}}{% \partial\beta},divide start_ARG ∂ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_β end_ARG = italic_λ divide start_ARG ∂ italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_β end_ARG ,

as ltβ=0subscript𝑙𝑡𝛽0\dfrac{\partial l_{t}}{\partial\beta}=0divide start_ARG ∂ italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_β end_ARG = 0. Similarly with 16, we derive

btβ=(ltlt1)bt1+(1β)bt1β,subscript𝑏𝑡𝛽subscript𝑙𝑡subscript𝑙𝑡1subscript𝑏𝑡11𝛽subscript𝑏𝑡1𝛽\dfrac{\partial b_{t}}{\partial\beta}=(l_{t}-l_{t-1})-b_{t-1}+(1-\beta)\dfrac{% \partial b_{t-1}}{\partial\beta},divide start_ARG ∂ italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_β end_ARG = ( italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_l start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) - italic_b start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + ( 1 - italic_β ) divide start_ARG ∂ italic_b start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_β end_ARG ,

with the initial states being b1β=0.subscript𝑏1𝛽0\dfrac{\partial b_{1}}{\partial\beta}=0.divide start_ARG ∂ italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_β end_ARG = 0 .

Appendix E Derivation of the gradients for sampling initial seasonal factors

From (27), we first get

L(s1,,sm1)logsi=ν2t=1T11σ^t+12σ^t+12logsi+ν+12t=1T1νσ^t+12logsi+et+12logsiνσ^t+12+et+12.𝐿subscript𝑠1subscript𝑠𝑚1logsubscript𝑠𝑖𝜈2superscriptsubscript𝑡1𝑇11superscriptsubscript^𝜎𝑡12superscriptsubscript^𝜎𝑡12logsubscript𝑠𝑖𝜈12superscriptsubscript𝑡1𝑇1𝜈superscriptsubscript^𝜎𝑡12logsubscript𝑠𝑖superscriptsubscript𝑒𝑡12logsubscript𝑠𝑖𝜈superscriptsubscript^𝜎𝑡12superscriptsubscript𝑒𝑡12\dfrac{\partial L(s_{1},\dots,s_{m-1})}{\partial{\rm log}s_{i}}=-\frac{\nu}{2}% \sum_{t=1}^{T-1}\frac{1}{\hat{\sigma}_{t+1}^{2}}\dfrac{\partial\hat{\sigma}_{t% +1}^{2}}{\partial{\rm log}s_{i}}+\frac{\nu+1}{2}\sum_{t=1}^{T-1}\frac{\nu% \dfrac{\partial\hat{\sigma}_{t+1}^{2}}{\partial{\rm log}s_{i}}+\dfrac{\partial e% _{t+1}^{2}}{\partial{\rm log}s_{i}}}{\nu\hat{\sigma}_{t+1}^{2}+e_{t+1}^{2}}.divide start_ARG ∂ italic_L ( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_s start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = - divide start_ARG italic_ν end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG ∂ over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG + divide start_ARG italic_ν + 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT divide start_ARG italic_ν divide start_ARG ∂ over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG + divide start_ARG ∂ italic_e start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_ARG start_ARG italic_ν over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_e start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

Since et+12=(yt+1y^t+1)2superscriptsubscript𝑒𝑡12superscriptsubscript𝑦𝑡1subscript^𝑦𝑡12e_{t+1}^{2}=(y_{t+1}-\hat{y}_{t+1})^{2}italic_e start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ( italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT,

et+12logsi=2(yt+1y^t+1)y^t+1logsi.superscriptsubscript𝑒𝑡12logsubscript𝑠𝑖2subscript𝑦𝑡1subscript^𝑦𝑡1subscript^𝑦𝑡1logsubscript𝑠𝑖\dfrac{\partial e_{t+1}^{2}}{\partial{\rm log}s_{i}}=-2(y_{t+1}-\hat{y}_{t+1})% \dfrac{\partial\hat{y}_{t+1}}{\partial{\rm log}s_{i}}.divide start_ARG ∂ italic_e start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = - 2 ( italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) divide start_ARG ∂ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG .

With (14) we can obtain

y^t+1logsi=(ltlogsi+γρltρ1ltlogsi)st+1m+(lt+γltρ)st+1mlogsi,subscript^𝑦𝑡1logsubscript𝑠𝑖subscript𝑙𝑡logsubscript𝑠𝑖𝛾𝜌superscriptsubscript𝑙𝑡𝜌1subscript𝑙𝑡logsubscript𝑠𝑖subscript𝑠𝑡1𝑚subscript𝑙𝑡𝛾superscriptsubscript𝑙𝑡𝜌subscript𝑠𝑡1𝑚logsubscript𝑠𝑖\dfrac{\partial\hat{y}_{t+1}}{\partial{\rm log}s_{i}}=\left(\dfrac{\partial l_% {t}}{\partial{\rm log}s_{i}}+\gamma\rho l_{t}^{\rho-1}\dfrac{\partial l_{t}}{% \partial{\rm log}s_{i}}\right)s_{t+1-m}+(l_{t}+\gamma l_{t}^{\rho})\dfrac{% \partial s_{t+1-m}}{\partial{\rm log}s_{i}},divide start_ARG ∂ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = ( divide start_ARG ∂ italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG + italic_γ italic_ρ italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ρ - 1 end_POSTSUPERSCRIPT divide start_ARG ∂ italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) italic_s start_POSTSUBSCRIPT italic_t + 1 - italic_m end_POSTSUBSCRIPT + ( italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_γ italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ρ end_POSTSUPERSCRIPT ) divide start_ARG ∂ italic_s start_POSTSUBSCRIPT italic_t + 1 - italic_m end_POSTSUBSCRIPT end_ARG start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ,

where terms containing λ𝜆\lambdaitalic_λ are dropped for simplicity since they would equal to zero in the seasonal version. Then from 15 and 17, we obtain the following recursively

ltlogsi=αytstmlogstmlogsi+(1α)lt1logsi,subscript𝑙𝑡logsubscript𝑠𝑖𝛼subscript𝑦𝑡subscript𝑠𝑡𝑚logsubscript𝑠𝑡𝑚logsubscript𝑠𝑖1𝛼subscript𝑙𝑡1logsubscript𝑠𝑖\dfrac{\partial l_{t}}{\partial{\rm log}s_{i}}=-\frac{\alpha y_{t}}{s_{t-m}}% \dfrac{\partial{\rm log}s_{t-m}}{\partial{\rm log}s_{i}}+(1-\alpha)\dfrac{% \partial l_{t-1}}{\partial{\rm log}s_{i}},divide start_ARG ∂ italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = - divide start_ARG italic_α italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_s start_POSTSUBSCRIPT italic_t - italic_m end_POSTSUBSCRIPT end_ARG divide start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_t - italic_m end_POSTSUBSCRIPT end_ARG start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG + ( 1 - italic_α ) divide start_ARG ∂ italic_l start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ,
logstlogsi=ζltltlogsi+(1ζ)logstmlogsi,logsubscript𝑠𝑡logsubscript𝑠𝑖𝜁subscript𝑙𝑡subscript𝑙𝑡logsubscript𝑠𝑖1𝜁logsubscript𝑠𝑡𝑚logsubscript𝑠𝑖\dfrac{\partial{\rm log}s_{t}}{\partial{\rm log}s_{i}}=-\frac{\zeta}{l_{t}}% \dfrac{\partial l_{t}}{\partial{\rm log}s_{i}}+(1-\zeta)\dfrac{\partial{\rm log% }s_{t-m}}{\partial{\rm log}s_{i}},divide start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = - divide start_ARG italic_ζ end_ARG start_ARG italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG divide start_ARG ∂ italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG + ( 1 - italic_ζ ) divide start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_t - italic_m end_POSTSUBSCRIPT end_ARG start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ,

with initial states being

logstlogsi=0,tm,i<m,ti,logstlogsi=1,tm,t=iandlogsmlogsi=1,i<m;formulae-sequenceformulae-sequencelogsubscript𝑠𝑡logsubscript𝑠𝑖0formulae-sequence𝑡𝑚formulae-sequence𝑖𝑚formulae-sequence𝑡𝑖formulae-sequencelogsubscript𝑠𝑡logsubscript𝑠𝑖1formulae-sequence𝑡𝑚𝑡𝑖andlogsubscript𝑠𝑚logsubscript𝑠𝑖1𝑖𝑚\dfrac{\partial{\rm log}s_{t}}{\partial{\rm log}s_{i}}=0,\;\;t\leq m,i<m,t\neq i% \;\;,\;\;\dfrac{\partial{\rm log}s_{t}}{\partial{\rm log}s_{i}}=1,\;\;t\leq m,% t=i\;\;{\rm and}\;\;\dfrac{\partial{\rm log}s_{m}}{\partial{\rm log}s_{i}}=-1,% i<m;divide start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = 0 , italic_t ≤ italic_m , italic_i < italic_m , italic_t ≠ italic_i , divide start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = 1 , italic_t ≤ italic_m , italic_t = italic_i roman_and divide start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = - 1 , italic_i < italic_m ;

and

l1logsi=y1s1,i=1;andl1logsi=0,1<i<m;formulae-sequencesubscript𝑙1logsubscript𝑠𝑖subscript𝑦1subscript𝑠1formulae-sequence𝑖1formulae-sequenceandsubscript𝑙1logsubscript𝑠𝑖01𝑖𝑚\dfrac{\partial l_{1}}{\partial{\rm log}s_{i}}=\frac{y_{1}}{s_{1}},i=1;\;\;{% \rm and}\;\;\dfrac{\partial l_{1}}{\partial{\rm log}s_{i}}=0,1<i<m;divide start_ARG ∂ italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = divide start_ARG italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG , italic_i = 1 ; roman_and divide start_ARG ∂ italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = 0 , 1 < italic_i < italic_m ;

From (18), we have

σ^t+12logsi=2τχ2(1ϕ)2lt2τ1ltlogsi,superscriptsubscript^𝜎𝑡12logsubscript𝑠𝑖2𝜏superscript𝜒2superscript1italic-ϕ2superscriptsubscript𝑙𝑡2𝜏1subscript𝑙𝑡logsubscript𝑠𝑖\dfrac{\partial\hat{\sigma}_{t+1}^{2}}{\partial{\rm log}s_{i}}=2\tau\chi^{2}(1% -\phi)^{2}l_{t}^{2\tau-1}\dfrac{\partial l_{t}}{\partial{\rm log}s_{i}},divide start_ARG ∂ over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = 2 italic_τ italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_ϕ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_τ - 1 end_POSTSUPERSCRIPT divide start_ARG ∂ italic_l start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ roman_log italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ,

which can be obtained based on chain rule with components already derived previously.

References

  • Andrawis and Atiya [2009] Robert R Andrawis and Amir F Atiya. A new Bayesian formulation for Holt’s exponential smoothing. Journal of Forecasting, 28(3):218–234, 2009.
  • Bermúdez et al. [2009] José D Bermúdez, Ana Corberán-Vallet, and Enriqueta Vercher. Multivariate exponential smoothing: A Bayesian forecast approach based on simulation. Mathematics and Computers in Simulation, 79(5):1761–1769, 2009.
  • Bermúdez et al. [2010] José D Bermúdez, José Vicente Segura, and Enriqueta Vercher. Bayesian forecasting with the Holt–Winters model. Journal of the Operational Research Society, 61(1):164–171, 2010.
  • Gardner and Mckenzie [1985] Everette S. Gardner and Ed. Mckenzie. Forecasting trends in time series. Management Science, 31(10):1237–1246, oct 1985. doi: 10.1287/mnsc.31.10.1237.
  • Hoffman et al. [2014] Matthew D Hoffman, Andrew Gelman, et al. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res., 15(1):1593–1623, 2014.
  • Holt [2004] Charles C. Holt. Forecasting seasonals and trends by exponentially weighted moving averages. International Journal of Forecasting, 20(1):5–10, 2004. ISSN 0169-2070. doi: https://doi.org/10.1016/j.ijforecast.2003.09.015. URL https://www.sciencedirect.com/science/article/pii/S0169207003001134.
  • Hyndman [2018] Rob Hyndman. Mcomp: Data from the M-Competitions, 2018. URL https://CRAN.R-project.org/package=Mcomp. R package version 2.8.
  • Hyndman et al. [2024] Rob Hyndman, George Athanasopoulos, Christoph Bergmeir, Gabriel Caceres, Leanne Chhay, Mitchell O’Hara-Wild, Fotios Petropoulos, Slava Razbash, Earo Wang, and Farah Yasmeen. forecast: Forecasting functions for time series and linear models, 2024. URL http://pkg.robjhyndman.com/forecast. R package version 8.7.
  • Hyndman and Athanasopoulos [2021] Rob J Hyndman and George Athanasopoulos. Forecasting: principles and practice, 3rd edition. OTexts, 2021.
  • Kullback and Leibler [1951] Solomon Kullback and Richard A Leibler. On information and sufficiency. The annals of mathematical statistics, 22(1):79–86, 1951.
  • Lange et al. [1989] Kenneth L Lange, Roderick JA Little, and Jeremy MG Taylor. Robust statistical modeling using the t distribution. Journal of the American Statistical Association, 84(408):881–896, 1989.
  • Liu [1994] Jun S Liu. The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem. Journal of the American Statistical Association, 89(427):958–966, 1994.
  • Makalic and Schmidt [2015] Enes Makalic and Daniel F Schmidt. A simple sampler for the horseshoe estimator. IEEE Signal Processing Letters, 23(1):179–182, 2015.
  • Makridakis and Hibon [2000] Spyros Makridakis and Michele Hibon. The M3-competition: results, conclusions and implications. International journal of forecasting, 16(4):451–476, 2000.
  • Makridakis et al. [2020] Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos. The M4 competition: 100,000 time series and 61 forecasting methods. International Journal of Forecasting, 36(1):54–74, 2020.
  • McElreath [2018] Richard McElreath. Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman and Hall/CRC, 2018.
  • O’Hara-Wild et al. [2021] Mitchell O’Hara-Wild, Rob Hyndman, and Earo Wang. fable: Forecasting Models for Tidy Time Series, 2021. URL https://CRAN.R-project.org/package=fable. R package version 0.3.1.
  • Plummer [2003] Martyn Plummer. Jags: A program for analysis of Bayesian graphical models using Gibbs sampling. 3rd International Workshop on Distributed Statistical Computing (DSC 2003); Vienna, Austria, 124, 04 2003.
  • Robert [2015] Christian P. Robert. The Metropolis–Hastings Algorithm, pages 1–15. John Wiley & Sons, Ltd, 2015. ISBN 9781118445112. doi: https://doi.org/10.1002/9781118445112.stat07834. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/9781118445112.stat07834.
  • Robert et al. [1999] Christian P Robert, George Casella, and George Casella. Monte Carlo statistical methods, volume 2. Springer, 1999.
  • Schmidt and Makalic [2020] Daniel F Schmidt and Enes Makalic. Bayesian generalized horseshoe estimation of generalized linear models. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2019, Würzburg, Germany, September 16–20, 2019, Proceedings, Part II, pages 598–613. Springer, 2020.
  • Smyl et al. [2024] Slawek Smyl, Christoph Bergmeir, Alexander Dokumentov, Xueying Long, Erwin Wibowo, and Daniel Schmidt. Local and global trend Bayesian exponential smoothing models. International Journal of Forecasting, 2024.
  • Stan Development Team [2022] Stan Development Team. Stan Modeling Language Users Guide and Reference Manual, 2022. URL http://mc-stan.org/. Version 2.31.0.
  • Stan Development Team [2023] Stan Development Team. RStan: the R interface to Stan, 2023. URL https://mc-stan.org/. R package version 2.21.8.
  • Titsias and Papaspiliopoulos [2018] Michalis K Titsias and Omiros Papaspiliopoulos. Auxiliary gradient-based sampling algorithms. Journal of the Royal Statistical Society Series B: Statistical Methodology, 80(4):749–767, 2018.
  • Wand et al. [2011] Matthew P Wand, John T Ormerod, Simone A Padoan, and Rudolf Frühwirth. Mean field variational Bayes for elaborate distributions. 2011.
  • Winters [1960] Peter R Winters. Forecasting sales by exponentially weighted moving averages. Management science, 6(3):324–342, 1960.