1 Introduction

Optimal Stock Portfolio Selection with a Multivariate Hidden Markov Model
Reetam Majumder¹¹1University of Maryland, Baltimore County, Qing Ji²²2Procter & Gamble and Nagaraj K. Neerchal¹

Abstract

The underlying market trends that drive stock price fluctuations are often referred to in terms of bull and bear markets. Optimal stock portfolio selection methods need to take into account these market trends; however, the bull and bear market states tend to be unobserved and can only be assigned retrospectively. We fit a linked hidden Markov model (LHMM) to relative stock price changes for S&P 500 stocks from 2011–2016 based on weekly closing values. The LHMM consists of a multivariate state process whose individual components correspond to HMMs for each of the 12 sectors of the S&P 500 stocks. The state processes are linked using a Gaussian copula so that the states of the component chains are correlated at any given time point. The LHMM allows us to capture more heterogeneity in the underlying market dynamics for each sector. In this study, stock performances are evaluated in terms of capital gains using the LHMM by utilizing historical stock price data. Based on the fitted LHMM, optimal stock portfolios are constructed to maximize capital gain while balancing reward and risk. Under out-of-sample testing, the annual capital gain for the portfolios for 2016–2017 are calculated. Portfolios constructed using the LHMM are able to generate returns comparable to the S&P 500 index.

Key words: Linked hidden Markov model, Multivariate Markov chain, Stochastic simulations, Portfolio allocation, Gaussian copula

1 Introduction

A stock portfolio refers to a collection of stocks selected and owned by an investor, and stock portfolio selection has been at the center of investment methodology research for many years. Depending on the investment goals, various methods have been developed by researchers for selecting stocks and allocating assets. The modern portfolio selection methodology developed by Markowitz (1952) has guided a large section of portfolio research. There are two essential components to the portfolio selection procedure, namely the evaluation of stocks, and portfolio assets allocation. A good introductory reference for topic of portfolio selection is Malkiel (2019). In this paper, we follow the groundwork laid out by Ji and Neerchal (2019) of connecting portfolio selection to the estimation of the underlying statistical model. We first build statistical models using past data on stock prices. Then, optimal stock portfolios are constructed based on the technique in Markowitz (1952) to maximize the capital gain while balancing reward and risk. The performance of the optimal portfolios are evaluated by comparing annual gains based on the portfolios against the S&P 500 gains for the same time period.

Stock markets around the world use the terms bull and bear to describe market trends. Stock prices are relatively stable and generally increasing in a bull market. A bear market, on the other hand, indicates strong market volatility with decreasing stock prices. A bull to bear market switch or vice versa is recognized after an increase/decrease of 20% or more in multiple stock indices (Kole and Dijk, 2016). While bull and bear markets cannot be directly observed, the behaviour of individual stocks point to the state of the market. The current state of a stock can be estimated by analysts, but the true state is unknown unless evaluating stocks retrospectively. Therefore, the state of a stock can be treated as an unobserved (latent) random variable, and the prices of the stock are the observed values. In addition, the market conditions can switch states at any time point. Given these characteristics, a hidden Markov model (HMM) is well suited for modeling the bull/bear trend of the market.

Refer to caption — Figure 1: A directed acyclic graph (DAG) specifying the conditional independence structure for a hidden Markov model.

An HMM is a discrete-time stochastic process that is controlled through a Markov chain with latent (hidden) states. A Markov chain (MC) is a well-known stochastic model that describes a sequence of discrete events. Let a sequence of random variables $(Z_{1},Z_{2},\ldots,Z_{n})$ form a Markov chain. The characterizing property of a first order Markov chains states that

P(Z_{t}=z_{t}\mid Z_{t-1}=z_{t-1},\ldots,Z_{1}=z_{1})=P(Z_{t}=z_{t}\mid Z_{t-1% }=z_{t-1}).

(1)

Assuming the Markov chain is stationary and has $J$ states, the transition probabilities $P(Z_{t}=z_{t}\mid Z_{t-1}=z_{t-1})$ can be arranged into a $J$ by $J$ matrix known as the transition probability matrix,

{\bf{\Pi}}=\left[{\begin{array}[]{cccc}\pi_{11}&\pi_{12}&\cdots&\pi_{1J}\\ \pi_{21}&\pi_{22}&\cdots&\pi_{2J}\\ \vdots&\vdots&\ddots&\vdots\\ \pi_{J1}&\pi_{J2}&\cdots&\pi_{JJ}\\ \end{array}}\right],

where $\pi_{hj}=P(Z_{t}=j\mid Z_{t-1}=h)$ is the probability of transitioning from state $h$ to state $j$ at any $t$ , and $\sum\limits^{J}_{j=1}\pi_{hj}=1$ given any $h$ . In an HMM, observations are assumed to be drawn from one of several sub-distributions determined by the unobserved variable $Z_{t}$ . Formally, an HMM consists of a pair of random processes $\{Z_{t},Y_{t}\}_{t\geq 1}$ , where $\{Z_{t}\}$ is a Markov chain with $J$ states. Conditional on $\{Z_{t}\}$ , $\{Y_{t}\}$ is a sequence of independent random variables such that the distribution of $Y_{t}$ depends only on $Z_{t}$ . The conditional distribution of $Y_{t}$ given $Z_{t}$ is given by

Y_{t}\mid Z_{t}=j\stackrel{{\scriptstyle}}{{\sim}}f_{j}(y\mid\boldsymbol{% \theta}_{j}),\mbox{\hskip 5.0pt}j=1,\ldots,J,

(2)

where $f_{1},\ldots,f_{J}$ are the different sub-distributions. $\{Z_{t}\}$ is known as the state process of the HMM, and $\{Y_{t}\}$ is known as the emission process. Note that at any time point $t$ , $Y_{t}$ could be distributed as a univariate or as a multivariate distribution. Figure 1 depicts the graphical representation of an HMM. The variables in this representation are denoted as the nodes of the graph, and the arrows connecting them are denoted as the edges and represent the dependence among the nodes.

Some previous work on predicting stock prices based on HMMs include Hassan and Nath (2005), and Nguyen (2018). Their models were trained directly using the stock closing values and were used to predict stock prices in the near future; there is however no extension to portfolio selection in their work. Hamilton (1989) combined the HMM structure with autoregressive models in order to capture the market trend, where parameters of an autoregressive model were considered to arise from an HMM. Elliott and van der Hoek (1997) and Elliott et al. (2010) further extended the work of Hamilton (1989) to include a portfolio selection procedure.

Although we may largely expect the bull and bear states of the market to be consistently reflected in weekly stock price changes, it is likely that stocks in different sectors will have different underlying dynamics of the state process. This paper considers a model where each sector is driven by a different state process with its own bull and bear states. This results in a multivariate state process, whose individual components are Markov chains. The dependence between the individual Markov chains of the multivariate Markov chain (MMC) can propagate in different ways, and Majumder (2021) discusses some of the common ways an MMC has been specified in previous studies. Additonal work in the area of HMMs and correlations among the prices of different assests or markets include Ensor and Koev (2014), who investigated the correlations between different stock sectors while applying a regime-switch model to the correlation matrix, and Fiecas et al. (2017), who address the estimation of a multivariate HMM using shrinkage estimators. More recently, Xu and Cao (2021) have incorporated a vine-coupla into an artificial neural network so that the inter-market correlations were considered while estimating the return of a portfolio. We thank one of the anonymous referees for bringing these references to our attention.

In this paper, we assume that the state processes of the MMC evolve in lockstep, i.e., the nodes of two Markov chains are connected by an edge if and only if the nodes are at the same time point. The resulting multivariate HMM is known as a linked HMM (LHMM). Figure 2 represents our approach through an example where $\textbf{Z}_{t}^{\prime}=(Z_{1,t},Z_{2,t})$ is a bivariate state process corresponding to bull/bear states for 2 sectors of the stock market, and $\textbf{Y}_{t}^{\prime}=(\textbf{Y}_{1,t},\textbf{Y}_{2,t})$ are the stock returns for all stocks within the two sectors. This is a modification of the default LHMM specification; the state processes of the different sectors can be considered to evolve in lockstep, and each state process affects the stock price changes for that sector’s stocks. Partitioning the stocks by sector allows for more heterogeneity in the market dynamics while still using two-state latent processes with an intuitive bull/bear labeling. We can extend this idea to an LHMM with $D$ clusters corresponding to $D$ sectors. We specify the dependency structure for the $D$ -variate LHMM using a Gaussian copula, which allows us to generate correlated states from the MMC at every time point (Majumder, 2021). To demonstrate our LHMM, we propose a stock portfolio selection method based on the work of Ji and Neerchal (2019).

The rest of this paper is structured as follows. In Section 2, we describe an LHMM with its dependency structure specified using a Gaussian copula which can be used to model weekly stock price changes. Section 3 introduces the methods to evaluate stocks and portfolios and explains the portfolio selection methodology. In Section 4, we validate the portfolio selection method using historical S&P 500 stock data. Finally, Section 5 discusses our results and proposes ways that our approach can be improved.

2 Parameter Estimation for a Linked Hidden Markov Model

2.1 Parameterizing an LHMM using a Gaussian copula

Suppose a stock portfolio consists of $K$ stocks. For the $k$ ^th stock, $k=1,2,\ldots,K$ , let $X_{k,t}$ be the closing price at the end of the $t$ ^th week, $t=1,2,\ldots,n$ . The price changes in percentage are given by

Y_{k,t}=\frac{X_{k,t}-X_{k,t-1}}{X_{k,t-1}}.

Let $\textbf{Y}_{t}=(Y_{1,t},\ldots,Y_{K,t})$ be the vector of stock price changes at the end of the $t$ ^th week, and $Z_{t}$ be the binary latent state at that time point. The HMM for the stock price changes is given by

\textbf{Y}_{t}\mid Z_{t}=j\stackrel{{\scriptstyle}}{{\sim}}\prod_{k=1}^{K}N(% \mu_{k,j},\sigma^{2}_{k,j}),\mbox{\hskip 5.0pt}j=1,2,

(3)

where $(Z_{1},\ldots,Z_{n})$ is a Markov chain with initial distribution $\boldsymbol{\alpha}$ and a $2\times 2$ transition matrix ${\bf{\Pi}}$ . The latent states represent the bull or bear state of the market, which is observed in the emission process as the buy or sell trend for each of the stocks. The emission distribution assumes a conditional independence structure at each time point, i.e., the price changes of any stock is independent of the remaining stocks conditional on the state. Following notation established in (2), we use $\boldsymbol{\theta}$ to denote all the parameters of the emission distribution. Parameter estimation for an HMM of this form is carried out using the Baum-Welch (B-W) algorithm (Baum and Petrie, 1966), which is a special case of the expectation-maximization (EM) algorithm (Dempster et al., 1977). A comprehensive tutorial of parameter estimation in HMMs is provided by Rabiner (1989).

Now, let us consider the case where the $K$ stocks belong to $D$ different sectors. If we assign each sector its own underlying state process, we can denote the state of LHMM at the end of the $t$ ^th week as $\textbf{Z}_{t}=(Z_{1,t},\ldots,Z_{D,t})$ . If the $d$ ^th sector consists of $n_{d}$ stocks, the HMM for price changes in the $d$ ^th sector is given by,

\displaystyle\mathbf{Y}_{d,t}|Z_{d,t}=j\sim\prod_{k_{d}=1}^{n_{d}}N(\mu_{k_{d}% ,j},\sigma^{2}_{k_{d},j}),\mbox{\hskip 5.0pt}j=1,2,

(4)

with $\sum_{d=1}^{D}n_{d}=K$ . As before, $Z_{d}^{\prime}=(Z_{d,1},\ldots,Z_{d,n})$ is the latent state process. The HMM for sector $d$ is parameterized by the initial distribution $\boldsymbol{\alpha}_{d}$ , transition matrix $\mathbf{\Pi}_{d}$ , and emission distribution parameters $\boldsymbol{\theta}_{d}$ . Furthermore, $\{Z_{1},\ldots,Z_{D}\}$ is a $D$ -component MMC, and $Y_{d,t}$ given $Z_{d,t}$ is independent to $Y_{d^{\prime},t^{\prime}}$ and $Z_{d^{\prime},t^{\prime}}$ for any $d^{\prime}\neq d$ . The full likelihood of the LHMM at time t can be written as,

\displaystyle f(\mathbf{Y}_{1,t},\ldots,\mathbf{Y}_{D,t},Z_{1,t},\ldots,Z_{D,t% })=f(Z_{1,t},\ldots,Z_{D,t})\prod_{d=1}^{D}f(\mathbf{Y}_{d,t}|Z_{d,t}),

(5)

where the parameter dependencies have been suppressed for convenience. We want to parametererize the association between the component Markov chains of the MMC at every time point, and our approach to that end is to construct a Gaussian copula for the state processes. Let $F$ be the D-dimensional joint CDF of $\{Z_{1},\ldots,Z_{D}\}$ , and let $F_{1},\ldots,F_{D}$ be the marginal CDFs of $Z_{1},\ldots,Z_{D}$ respectively. We define a Gaussian copula over the state processes as:

$\displaystyle F(Z_{1},\ldots,Z_{D})$	$\displaystyle=\mathcal{C}\bigl{(}F_{1}(Z_{1};\boldsymbol{\alpha}_{1},\mathbf{% \Pi}_{1}),\ldots,F_{D}(Z_{D};\boldsymbol{\alpha}_{D},\mathbf{\Pi}_{D})\bigr{)}$
	$\displaystyle={\Phi}_{D}\bigl{(}\Phi^{-1}(U_{1}),\ldots,\Phi^{-1}(U_{D});% \Sigma\bigr{)}$
	$\displaystyle={\Phi}_{D}\bigl{(}W_{1},\ldots,W_{D};\Sigma\bigr{)},$	(6)

where $U_{1},\ldots,U_{D}$ are $Uniform(0,1)$ variates and $W_{1},\ldots W_{D}$ are standard Normal variates. ${\Phi}_{D}$ is a $D$ -dimensional multivariate Normal CDF with correlation matrix $\Sigma$ , while $\Phi^{-1}$ is the inverse CDF of a univariate standard Normal distribution. Note that $W_{d}=\Phi^{-1}\bigl{(}F_{d}(Z_{d};\boldsymbol{\alpha}_{d},\mathbf{\Pi}_{d})% \bigr{)}$ , and therefore the joint distribution of the state processes can be obtained by using the chain rule as,

	$\displaystyle f(Z_{1},\ldots,Z_{D})$	$\displaystyle=\frac{{\phi}_{D}\bigl{(}W_{1},\ldots,W_{D};\Sigma\bigr{)}}{\phi(% W_{1})\times\ldots\times\phi(W_{D})}\prod_{d=1}^{D}f_{d}(Z_{d})$
		$\displaystyle=c(Z_{1},\ldots,Z_{D};\Sigma)\prod_{d=1}^{D}f_{d}(Z_{d}),$		(7)

where $\phi_{D}$ and $\phi$ are the density functions corresponding to $\Phi_{D}$ and $\Phi$ , $f_{d}(\cdot)$ denotes the distribution of $Z_{d}$ , and $c(\cdot)$ denotes the copula density. The likelihood in (5) can thus be simplified to

	$\displaystyle f(\mathbf{Y}_{1,t},\ldots,\mathbf{Y}_{D,t},Z_{1,t},\ldots,Z_{D,t})$	$\displaystyle=c(Z_{1,t},\ldots,Z_{D,t})\prod_{d=1}^{D}f(\mathbf{Y}_{d,t}\|Z_{d,% t})f_{d}(Z_{d,t})$
		$\displaystyle=c(Z_{1,t},\ldots,Z_{D,t})\prod_{d=1}^{D}f(\mathbf{Y}_{d,t},Z_{d,% t}).$		(8)

The copula augmented model has an LHMM structure similar to Figure 2. A discussion of the fundamentals and theoretical properties of copulas can be found in Nelsen (2006). Copulas of continuous variables are well defined and have been extensively studied, but constructing a copula for discrete variables is not as straightforward. Since a Markov chain is either a nominal or an ordinal random variable, finding an appropriate measure of association between latent state processes to construct a copula can be challenging. To address this, we will take advantage of a unique relationship which exists between the Spearman and Pearson correlations of a bivariate Normal distribution. Since the Spearman correlation can be used as a measure of association for ordinal data, we choose a Gaussian copula parameterized by a correlation matrix $\Sigma$ . We assume that the state processes evolve in lockstep, and correlated $D$ -vectors from $\Phi_{D}(\cdot\mid\Sigma)$ can be linked to an MMC with correlated Markov chains by means of an appropriate transformation. One such method is described below.

2.2 Constructing an MMC from Uniform random variates

Since the copula is a $D$ -dimensional CDF with Uniform marginals, we discuss a method to generate an MMC from a Gaussian copula in this section. This will be relevant to our method of estimating the copula parameters.

Let us first review a method to generate a univariate Markov chain; Serfozo (2009) describes how to construct a Markov chain from a Uniform variable. Suppose that the desired Markov chain has the initial distribution vector $\boldsymbol{\alpha}$ and the transition probability matrix ${\bf\Pi}$ . Let $h(u)$ and $f(j,u)$ be functions transforming continuous values into categorical values $\mathcal{J}=\{1,2,\ldots,J\}$ . They are given by

h(u)=j\text{ if }u\in I_{j}\text{ for some }j\in\mathcal{J},

(9)

where $I_{1}=\left[0,\alpha_{1}\right)$ and $I_{j}=\left[\sum^{j-1}_{l=1}\alpha_{l},\sum^{j}_{l=1}\alpha_{l}\right)$ for any $j>1$ , and

f(i,u)=j\text{ if }u\in I_{ij}\text{ for some }j\in\mathcal{J},

(10)

where $I_{i1}=\left[0,\pi_{i1}\right)$ and $I_{i,j}=\left[\sum^{j-1}_{l=1}\pi_{il},\sum^{j}_{l=1}\pi_{il}\right)$ for any $j>1$ .

Let ${\bf U}=(U_{1},U_{2},\ldots,U_{n})$ be a vector of independent random variables where $U_{t}$ has an uniform distribution on $[0,1]$ . We will denote $h(U_{1})$ as $Z_{1}$ and $f(Z_{t-1},U_{t})$ as $Z_{t}$ for any $t>1$ . Serfozo (2009) showed that $(Z_{1},Z_{2},\ldots,Z_{n})$ is a Markov chain with the initial distribution $\boldsymbol{\alpha}$ and the transition probability matrix ${\bf\Pi}$ . Ji (2019) modified this method in order to generate correlated Markov chains. In a univariate Markov chain, the random value at $t$ ^th time point, $Z_{t}$ , is generated from a single random variable $U_{t}$ . To create an MMC with $D$ Markov chains, we need a vector of possibly correlated random variables $(U_{1,t},\ldots,U_{D,t})$ at each time point $t$ . Therefore, we will use a $D$ -dimensional Normal distribution to generate correlated random values. Suppose that an MMC has a length of $n$ with $D$ sequences. Let the stationary distribution and the transition probability matrix of the $d$ ^th sequence be $\boldsymbol{\eta}_{d}$ and ${\bf\Pi}_{d}$ respectively. We use the inverse transform method to create Uniform variables from Normal variables (Rizzo, 2019). For the $t$ ^th time step, $1\leq t\leq n$ , let ${\bf W}_{t}=(W_{1,t},\ldots,W_{D,t})$ and ${\bf W}_{t}\overset{\text{i.i.d}}{\sim}\text{MVN}(\boldsymbol{0},{\Sigma})$ where ${\Sigma}$ is a correlation matrix. For each $d=1,\ldots,D$ and $t=1,\ldots,n$ , a Uniform random variable is created using the inverse transform method, namely $U_{d,t}=\Phi(W_{d,t})$ . Each $U_{d,t}$ is thus uniformly distributed on $[0,1]$ . The correlations among $U_{1,t},\ldots,U_{D,t}$ stem from the correlations among $W_{1,t},\ldots,W_{D,t}$ .

Now let us apply (9) and (10) to $U_{d,t}$ . A random variable $Z_{d,t}$ is created for each $(d,t)$ pair where $Z_{d,1}=h(U_{d,1})$ and $Z_{d,t}=f(Z_{d,t-1},U_{d,t})$ . Thus, we have an MMC $\{{\bf Z}_{1},\ldots,{\bf Z}_{n}\}$ where ${\bf Z}_{t}=(Z_{1,t},Z_{2,t},\ldots,Z_{D,t})$ and $\{Z_{d,1},Z_{d,2},\ldots,Z_{d,n}\}$ is a Markov chain marginally. In addition, $Z_{1,t},\ldots,Z_{D,t}$ are correlated at the $t$ ^th time step. The functions in (9) and (10) are collectively referred to as $g(\cdot)$ going forward, and describes the overall process of transforming marginally Uniform random vectors into an MMC.

2.3 Two-stage parameter estimation for the LHMM

The construction of a copula for the state processes requires knowledge of the states that give rise to the data. This is usually obtained as the most likely sequence of states using the Viterbi Algorithm (Viterbi, 1967). The Viterbi Algorithm is applied after the model parameters have been estimated - this means that we need to resort to a two-stage estimation process. In the first stage, the parameters for each sector’s HMMs are estimated independently using the B-W algorithm. The Viterbi Algorithm then provides us the most likely sequence of states to have generated the data, which is used to estimate the copula correlation matrix $\Sigma$ . Afterwards, the marginal parameters can be re-estimated conditioned on the correlation structure.

Estimating $\Sigma$ in (6) is challenging using conventional approaches like the inversion method (Nelsen, 2006) or the inference functions for margins method (Joe and Xu, 1996), since neither the CDF $F_{d}(Z_{d})$ nor its associated probability mass function that appear in (6) and (7) can be evaluated easily. Instead, we choose $\Sigma$ in a manner such that states generated from the Gaussian copula using the methodology discussed in Section 2.2 can be used to reproduce a desired measure of association for the MMC. For each stock, we assumed that the HMM has two hidden states, the bear state and the bull state. However, the B-W algorithm produces two states, State 1 and State 2 without labels identifying them as bear/bull. So without loss of generality, we relabel the states for the $d$ ^th Markov chain such that for State 1,

\displaystyle\sum_{k_{d}=1}^{n_{d}}\dfrac{\mu_{k_{d},1}}{\sigma_{k_{d},1}}>% \sum_{k_{d}=1}^{n_{d}}\dfrac{\mu_{k_{d},2}}{\sigma_{k_{d},2}},

where $\mu_{k_{d},1},\mu_{k_{d},2},\sigma_{k_{d},1},\mbox{ and }\sigma_{k_{d},1}$ are as defined in (4). State 1 has a higher return to volatility ratio and can be considered a good stock to buy (Nguyen and Nguyen, 2015). It would thus correspond to a bull market, and State 2 can be considered to be bear market states. Since the states are now ordinal in nature, the pairwise Spearman correlation for Markov chains in the MMC is chosen as the desired measure of association. However, there is no obvious way to estimate a $D\times D$ matrix $\Sigma$ whose pairwise correlations are functions of the Spearman correlations between the Markov chains. We make a simplifying assumption for the copula and rewrite (6) as:

\displaystyle{\Phi}_{D}\bigl{(}W_{1},\ldots,W_{D};\Sigma\bigr{)}\approx\prod_{% d_{1}=1}^{D-1}\prod_{d_{2}=d_{1}+1}^{D}{\Phi}_{2}(W_{d_{1}},W_{d_{2}};\rho_{d_% {1}d_{2}}),

(11)

where $\rho_{d_{1}d_{2}}$ denotes the Pearson correlation between $W_{d_{1}}$ and $W_{d_{2}}$ , and corresponds to the $(d_{1},d_{2})$ ^th element of $\Sigma$ . This formulation can be interpreted in a manner similar to a pairwise simplified regular vine (R-vine) copula (Brechmann et al., 2012), with all pair-copula terms involving a conditioning set replaced by bivariate Gaussian copulas. We refer to this as the pair-copula approximation, and it consists of $D(D-1)/2$ terms. The copula density associated with (11) can also be interpreted as a composite likelihood (Varin et al., 2011). In practice, this will allow us to estimate the individual elements $\rho_{d_{1}d_{2}}$ of $\Sigma$ using the right hand side of (11), but simulate data from the copula using the left hand side of (11), as long as we can ensure that $\Sigma$ is a positive-definite matrix. Kruskal (1958) provided a relationship between the Pearson correlation $\rho$ and the Spearman correlation $\rho^{*}$ for bivariate Normal variables $(W_{1},W_{2})$ that we will use to estimate $\rho_{d_{1}d_{2}}$ :

\rho=2\sin\biggl{[}\pi\frac{\rho^{*}}{6}\biggr{]}.

(12)

Note that $\rho^{*}(W_{d_{1}},W_{d_{2}})=\rho^{*}(U_{d_{1}},U_{d_{2}})$ since the Spearman correlation coefficient is invariant under monotone transforms. Recall that we defined $g(\cdot)$ in Section 2.2 as the function which transforms Uniform variates into a Markov chain. Let $g_{1}(\cdot)$ and $g_{2}(\cdot)$ be similar functions such that $g_{1}(U_{d_{1}})=Z_{d_{1}}$ and $g_{2}(U_{d_{2}})=Z_{d_{2}}$ , with $g_{1}(\cdot)\neq g_{2}(\cdot)$ if $d_{1}\neq d_{2}$ . The relationship in (12) and the assumption made in (11) together means that it is sufficient to estimate $\rho^{*}_{d_{1}d_{2}}=\rho^{*}(U_{d_{1}},U_{d_{2}})$ to obtain an estimate of $\rho_{d_{1}d_{2}}=\rho(W_{d_{1}},W_{d_{2}})$ . If we denote the corresponding estimators as $\hat{\rho}^{*}_{d_{1}d_{2}}$ and $\hat{\rho}_{d_{1}d_{2}}$ respectively, the estimate $\hat{\rho}^{*}_{d_{1}d_{2}}$ can be obtained as the numerical solution to

	$\displaystyle r_{d_{1}d_{2}}$	$\displaystyle=\rho^{}(g_{1}(U_{d_{1}}),g_{2}(U_{d_{2}});\rho^{}_{d_{1}d_{2}})$		(13)
		$\displaystyle=\rho^{}(Z_{d_{1}},Z_{d_{2}};\rho^{}_{d_{1}d_{2}}),$

where $r_{d_{1}d_{2}}$ is the sample Spearman correlation between states of the LHMM which is fixed given the data, and $\rho^{*}(Z_{d_{1}},Z_{d_{2}};\rho^{*}_{d_{1}d_{2}})$ is its population version. Note that it is not possible to invert the relationship in (13) and obtain an analytical expression for $\hat{\rho}^{*}_{d_{1}d_{2}}$ as one perhaps would in a method of moments approach. However, given any value of $\rho^{*}_{d_{1}d_{2}}$ it is straightforward to generate data from the MMC and obtain a large sample estimate $r^{*}_{d_{1}d_{2}}$ of $\rho^{*}(Z_{d_{1}},Z_{d_{2}};\rho^{*}_{d_{1}d_{2}})$ . The Spearman correlation is not preserved by this transformation and $\rho^{*}(g_{1}(U_{d_{1}}),g_{2}(U_{d_{2}}))\neq\rho^{*}(U_{d_{1}},U_{d_{2}})$ except in trivial cases. Majumder (2021) has empirically shown that a monotonically increasing relationship exists between $\rho^{*}(g_{1}(U_{d_{1}}),g_{2}(U_{d_{2}}))\mbox{ and }\rho^{*}(U_{d_{1}},U_{d% _{2}})$ , and that $\rho^{*}(g_{1}(U_{d_{1}}),g_{2}(U_{d_{2}}))<\rho^{*}(U_{d_{1}},U_{d_{2}})$ . The inequality is a consequence of $g_{1}(\cdot)$ and $g_{2}(\cdot)$ discretizing continuous variables $U_{d_{1}}$ and $U_{d_{2}}$ into $Z_{d_{1}}$ and $Z_{d_{2}}$ which are ordinal variables with 2 levels and possible ties. This attenuates the maximum and minimum values that the Spearman correlation between the 2 state processes can take. Mhanna and Bauwens (2012) have also demonstrated similar behaviour using empirical studies when Uniform variables are discretized to Bernoulli variables. The monotone relationship between $\rho^{*}(g_{1}(U_{d_{1}}),g_{2}(U_{d_{2}}))\mbox{ and }\rho^{*}(U_{d_{1}},U_{d% _{2}})$ means that for a given target value of $r_{d_{1}d_{2}}$ , it is possible to use a line search to identify the value of $\rho^{*}_{d_{1}d_{2}}$ which generates states with a sample Spearman correlation of $r^{*}_{d_{1}d_{2}}$ arbitrarily close to $r_{d_{1}d_{2}}$ . The $D(D-1)/2$ unique elements of $\Sigma$ can thus be estimated using pairs of state sequences.

2.4 Algorithm to estimate Gaussian copula parameters

Recall that for the LHMM, The states for each sector’s HMM are obtained using the Viterbi algorithm once the marginal parameters have been estimated. Let $\{r_{d_{1}d_{2}}\}$ denote the observed Spearman correlations between the states of each $(d_{1},d_{2})\in\mathcal{D}^{2}$ pair of the $D$ component HMMs. This value is fixed given the marginal models and the data. Given the $n\times D$ matrix of states, the initial distribution $\boldsymbol{\alpha}_{d}$ , and the transition matrix $\mathbf{\Pi}_{d}$ for each $Z_{d}$ , we want to construct a Gaussian copula that can generate an MMC with pairwise Spearman correlations $r^{*}_{d_{1}d_{2}}$ coinciding with $\{r_{d_{1}d_{2}}\}$ . Let $\hat{\rho}_{d_{1}d_{2}}$ be the estimate of the copula correlation in (11) between $(W_{d_{1}},W_{d_{2}})$ , and let $\hat{\rho}_{d_{1}d_{2}}^{*}$ be the corresponding estimate of the Spearman correlation using (12). Since (13) cannot be rewritten as a function of $\rho^{*}_{d_{1}d_{2}}$ , we resort to a simulation approach to compute $\hat{\rho}_{d_{1}d_{2}}^{*}$ and $\hat{\rho}_{d_{1}d_{2}}$ . We initialize $\hat{\rho}_{d_{1}d_{2}}^{*}$ with $r_{d_{1}d_{2}}$ for each pair of Markov chains $(Z_{d_{1}},Z_{d_{2}})$ and simulate an MMC from the Gaussian copula. We compute the pairwise Spearman correlations between the Markov chains in the MMC and denote them by $r_{d_{1}d_{2}}^{*}$ . If $r_{d_{1}d_{2}}^{*}<r_{d_{1}d_{2}}$ , we increment $\hat{\rho}_{d_{1}d_{2}}^{*}$ by a step size $\tau$ and repeat the process. We stop when $|r_{d_{1}d_{2}}^{*}-r_{d_{1}d_{2}}|\leq\epsilon$ , for some predefined tolerance $\epsilon$ . The procedure is formalized in Algorithm 1 below.

Segment

y_{1:K}

into its

D

sectors according to S&P 500

Estimate marginal HMM parameters

\boldsymbol{\alpha}_{d}

\mathbf{\Pi}_{d}

, and

\boldsymbol{\theta}_{d}

for sectors

d=1,\ldots,D

using the Baum-Welch algorithm

Estimate

Z_{d,1},\ldots Z_{d,n}

using the Viterbi algorithm for sectors

d=1,\ldots,D

Set step size

\tau

and tolerance

\epsilon

for sectors $(d_{1},d_{2})\ni d_{1}=1,\ldots,D-1$ and $d_{2}=d_{1}+1,\ldots,D$ do

Compute the observed Spearman correlation

r_{d_{1}d_{2}}

as in (13)

Initialize

\hat{\rho}_{d_{1}d_{2}}^{*}=r_{d_{1}d_{2}}

Initialize

r_{d_{1}d_{2}}^{*}=0

while $|r_{d_{1}d_{2}}^{*}-r_{d_{1}d_{2}}|>\epsilon,$ do

Increment

\hat{\rho}_{d_{1}d_{2}}^{*}

\tau

Compute Pearson correlation

\hat{\rho}_{d_{1}d_{2}}

from

\hat{\rho}_{d_{1}d_{2}}^{*}

using (12)

Generate correlated bivariate sequence from

N_{2}\biggl{(}\begin{pmatrix}0\\ 0\end{pmatrix},\begin{pmatrix}1&\hat{\rho}_{d_{1}d_{2}}\\ \hat{\rho}_{d_{1}d_{2}}&1\end{pmatrix}\biggr{)}

Use estimates of

\boldsymbol{\alpha}_{d_{2}}

\boldsymbol{\alpha}_{d_{2}}

\mathbf{\Pi}_{d_{1}}

\mathbf{\Pi}_{d_{2}}

, and the correlated sequences to generate synthetic states

Calculate Spearman correlation

r_{d_{1}d_{2}}^{*}

of the synthetic states as an estimate of

\rho^{*}(Z_{d_{1}},Z_{d_{2}};\rho^{*}_{d_{1}d_{2}})

as in (13)

end while

end for

Construct correlation matrix

\hat{\Sigma}

with off-diagonals

\hat{\rho}_{d_{1}d_{2}}

and diagonals set to 1

if $\hat{\Sigma}$ is not positive definite then

Eigendecompose

\hat{\Sigma}

\hat{\Sigma}=VRV^{T}

Replace negative and zero eigenvalues in

R

with

10^{-6}

; call new matrix

R^{*}

Recalculate

\hat{\Sigma}=VR^{*}V^{T}

end if

Algorithm 1 Algorithm to construct a Gaussian copula for an LHMM.

Since the entries of $\hat{\Sigma}$ are constructed independently, the resultant matrix is not guaranteed to be positive definite. The final steps of our algorithm ensures the positive-definiteness of $\hat{\Sigma}$ . An alternative approach suggested by one of the anonymous referees is to add a similar small positive quantity to all diagonal elements of $\Sigma$ . In cases when $\Sigma$ is high-dimensional and the eigendecomposition is computationally expensive, this would be a much faster way of ensuring the positive definiteness of $\Sigma$ .

After $\Sigma$ has been estimated, we can use the correlation structure to re-estimate the marginal parameters $\boldsymbol{\alpha}_{d}$ , $\mathbf{\Pi}_{d}$ , and $\boldsymbol{\theta}_{d}$ . One way of doing so is generating a sequence of states from the MMC and use the states as initial values in the Baum-Welch algorithm. Alternatively, we can re-estimate $\boldsymbol{\alpha}_{d}$ and $\mathbf{\Pi}_{d}$ for all sectors from synthetic states generated from the MMC, and use the estimates as the initial distributions in the Baum-Welch algorithm to restimate all marginal parameters. For this study, we have followed the second approach.

3 Stock Portfolio Selection using an LHMM

The return of the $k$ ^th stock over $n$ weeks is defined as follows,

R_{k}=\prod^{n}_{t=1}(1+Y_{k,t}).

(14)

Our desired portfolio generates a high return with a low risk over a period of time, so we seek stocks with these characteristics as well. To evaluate each of the stocks, we use the random variable $R_{k}$ . Given a portfolio of $K$ stocks with allocations $\boldsymbol{w}=(w_{1},\ldots,w_{K})$ , its return over $n$ weeks is defined as,

R(w_{1},w_{2},\ldots,w_{K})=\sum^{K}_{k=1}w_{k}R_{k},

(15)

where the weight $w_{k}$ represents the proportion of the portfolio wealth invested in the $k$ ^th stock. Thus, the expected return of a portfolio is given by

E(R)=\sum^{K}_{k=1}w_{k}E(R_{k}).

(16)

The variance of return is given by,

V(R)=\sum^{K}_{k=1}w_{k}^{2}\mathrm{Var}(R_{k})+\sum^{K}_{k=1}\sum^{K}_{l\neq k% }w_{k}w_{l}\mathrm{Cov}(R_{k},R_{l})

(17)

The goal of portfolio selection in this paper is to find the optimal allocation $\boldsymbol{w}=(w_{1},\ldots,w_{k})$ with high reward $E(R)$ and relatively low risk $V(R)$ based on the results above. The optimal $\boldsymbol{w}$ would maximize $E(R)$ while minimizing $V(R)$ . However, empirical evidence suggests that there exists a trade-off between $E(R)$ and $V(R)$ (Malkiel, 2019, p. 200). The most conservative approach would be to choose $\boldsymbol{w}$ such that $V(R)$ is minimized, i.e.,

\displaystyle\boldsymbol{w}_{v}=\arg\min_{\boldsymbol{w}}V(R)\text{, subject % to }E(R)>0\text{ and }\sum^{K}_{k=1}w_{k}=1.

(18)

Alternatively, Malkiel (2019) suggested that an optimal weight vector $\boldsymbol{w}^{*}$ should maximizes $E(R)$ while $V(R)=v$ ,

\boldsymbol{w}^{*}(v)=\arg\max_{\boldsymbol{w}}E(R)\text{, subject to }V(R)=v% \text{ and }\sum^{K}_{k=1}w_{k}=1.

The Lagrange multiplier method is used to find optimal allocations. The pairs of $E\{R\left(\boldsymbol{w}^{*}(v)\right)\}$ and $v$ are referred to as the efficient ( $R$ , $V$ ) combinations by Markowitz (1952). He claimed that a portfolio created based on an efficient combination is efficient, but did not suggest a specific combination to balance the reward and the risk. Ji and Neerchal (2019) suggested the following approach to find a vector of weights $\boldsymbol{w}_{b}$ for a balanced portfolio,

\boldsymbol{w}_{b}=\arg\max_{\boldsymbol{w}}E(R)-q\sqrt{V(R)}\text{, subject % to }\sum^{K}_{k=1}w_{k}=1.

In this expression, $q$ functions as a tuning parameter which controls the trade-off between reward and risk. A similar technique was also implemented in Elliott and van der Hoek (1997). Assuming $R$ is approximately Normal, this technique maximizes the lower bound of a $95\%$ confidence interval of the return of a portfolio. As $q$ gets higher in value, the resulting portfolio would accept less risk and prioritize more stable stocks. As $q$ gets lower in value ( $q\geq 0$ ), the portfolio would select stocks with higher return despite their higher volatility. The choice of $q$ is based on an investor’s willingness to take risk. For the rest of the paper, we will assume $q=2$ .

In practice, the stocks $Y_{k}$ , for $k=1,\ldots,K$ will not be Normally distributed. We can transform them to Normal variates $Y^{*}_{k}$ using the Yeo-Johnson power transformation (Yeo and Johnson, 2000), and fit HMMs on the transformed variables $Y_{k}^{*}$ . While analytical expressions analogous to $E(R)$ and $V(R)$ can be constructed based on $Y_{k}^{*}$ , the quantities $E(R^{*})$ and $V(R^{*})$ do not have any meaningful interpretations. However, since we can generate data from the fitted LHMM, a simulation based approach allows us to recover data in the original scale.

Consider an LHMM fitted to the transformed stock returns $\mathbf{Y}^{*}=(Y_{1}^{*},\ldots,Y_{K}^{*})$ using the methodology described in Algorithm 1. We can simulate data from this model - let us denote this simulated data by $\widehat{\mathbf{Y}}^{*}=(\widehat{Y}_{1}^{*},\ldots,\widehat{Y}_{K}^{*})$ . Since $\widehat{Y}_{k}^{*}$ has the same distribution as $Y_{k}^{*}$ , we use the inverse of the Yeo-Johnson transform to recover $\hat{Y}_{k}$ for $k=1,\ldots,K$ . $\widehat{Y}_{k}$ are simulated stock price changes, and thus we can estimate $\hat{R}_{k}$ from this data using (14). If we simulate a large number of independent datasets from the fitted model, say $N$ , we have for the $k$ ^th stock $N$ independent annual return samples $R_{k}^{1},\ldots,R_{k}^{N}$ . The vector of expectations $E(R)$ and the covariance matrix $V(R)$ can be computed from this data, and can be used for portfolio optimization.

4 Building a Portfolio for 2016–17 from S&P 500 Data

Table 1: S&P 500 stocks per sector used to fit an LHMM using data from 2011-10-01 to 2016-09-30.

Sector	Number of Stocks
Communication Services	8
Consumer Discretionary	69
Consumer Staples	32
Energy	28
Financials	67
Health Care	43
Industrials	69
Information Technology	45
Materials	40
Real Estate	10
Telecommunications	7
Utilities	29
Total	447

We fit an LHMM to historical S&P 500 data to create a portfolio for 2016–17 and evaluate its performance against the S&P 500 index changes. As described in Section 3, the parameters of the LHMM are used to identify efficient $(R,V)$ combinations and use the associated weights to create a portfolio. Historical data for S&P 500 stocks from 2011-10-01 to 2016-09-30 is used to build an LHMM with 12 Markov chains corresponding to the 12 sectors represented in the data. Stocks with records of fewer than 5 years are ignored; this leaves us with 447 stocks for the study, i.e., $K=447$ .

Table 1 shows the number of stocks available per sector that were used to fit the LHMM. The weekly stock price changes $\mathbf{Y}$ were made to undergo the Yeo-Johnson power transformation; the resulting variable $\mathbf{Y}^{*}$ is Normally distributed and thus meets the distributional assumptions for the LHMM. The HMMs were fitted using the packages depmixS4 (Visser and Speekenbrink, 2010) and hmmr (Visser and Speekenbrink, 2019) on R 4.0.x. For each sector, the B-W algorithm was restarted 20 times with random starting values. Parameter estimates from each of the 20 random restarts were compared on the basis of their Bayesian information criterion (BIC), with lower BIC values corresponding to higher likelihoods. The model which provided the lowest BIC values was chosen as the final model for each sector.

Once HMMs have been fitted to each sector’s data, the most likely sequence of states was obtained using the Viterbi algorithm. The states were labeled such that State 1 is the bear state for each HMM and State 2 is the bull state, and the target Spearman correlation matrix was computed based on each pair of state processes. Next, a Gaussian copula which can generate synthetic states with the same Spearman correlation was constructed using Algorithm 1. Synthetic state sequences from the copula as used to re-estimate $\boldsymbol{\alpha}_{d}$ , $\mathbf{\Pi}_{d}$ , and $\boldsymbol{\theta_{d}}$ for $d=1,\ldots,D$ ; these estimates now take into account the correlation structure between $Z_{1},\ldots,Z_{D}$ .

Once all LHMM parameters have been estimated, 10000 datasets of 5 years ( $n=260$ weeks) each were simulated from this fitted model, and the emissions $\hat{\mathbf{Y}}^{*}$ of the simulated data were transformed back to their original scale $\hat{\mathbf{Y}}$ using the inverse of the Yeo-Johnson transformation. The gains $R_{k}^{i}$ are computed for the $k=1,\ldots,447$ stocks for the $i=1,\ldots,10000$ datasets. This gives us a $10000\times 447$ matrix of $R_{k}^{i}$ values; $E(R)$ and $V(R)$ can be computed from this matrix. These simulated values of $E(R)$ and $V(R)$ were used for constrained optimization to obtain the optimum weight vector $\boldsymbol{w}_{v}$ which minimizes $V(R)$ subject to $E(R)>0$ , and $\boldsymbol{w}_{b}$ which maximizes $E(R)-2\sqrt{V(R)}$ . We denote the latter as a balanced portfolio assignment since it balances the expected return with the uncertainty surrounding it, and portfolios for the period 2016-10-01 to 2017-09-30 can be built based on $\boldsymbol{w}_{b}$ and $\boldsymbol{w}_{v}$ . We repeated this entire process 100 times, to get 100 estimates of the weights, $\boldsymbol{w}_{b}^{(1)},\ldots,\boldsymbol{w}_{b}^{(100)}$ and $\boldsymbol{w}_{v}^{(1)},\ldots,\boldsymbol{w}_{v}^{(100)}$ . These are used to construct confidence intervals for the $\%$ -age gains based on our method, and we evaluated their performance against the capital gains from 2016-10-01 to 2017-09-30.

Table 2: Actual gains in %-age in the one-year period from 2016-10-01 to 2017-09-30 based on four different portfolios. 95% bootstrap confidence intervals are provided in parantheses. The corresponding S&P 500 gains for this time period is 18%.

Sector

% gain from HMMs

% gains from LHMM

Min V(R)

Balanced

Min V(R)

Balanced

Communication Services

- 0.10

(-0.17,-0.03)

(0,0)

Consumer Discretionary

1.52

(1.37,1.67)

0.19

(0.14,0.23)

1.05

(1.83,1.28)

0.34

(0.27,0.48)

Consumer Staples

0.38

(0.27,0.50)

0.60

(0.21,0.97)

0.07

(0,0.12)

1.96

(1.56,2.33)

Energy

0.58

(0.47,0.70)

(0,0)

0.70

(0.51,0.90)

(0,0)

Financials

0.70

(0.49,0.88)

3.63

(3.25,4.08)

0.27

(0.18,0.45)

2.03

(1.55,2.56)

Health Care

2.52

(2.26,2.73)

-0.33

(-0.49,-0.13)

4.48

(4.04,4.93)

-0.99

(-1.27,-0.68)

Industrials

0.33

(0.18,0.50)

0.38

(0.34,0.44)

0.43

(0.22,0.57)

0.16

(0.09,0.23)

Information Technology

0.69

(0.57,0.78)

7.18

(6.59,7.87)

0.35

(0.26,0.44)

9.29

(8.54,9.95)

Materials

1.21

(1.06,1.39)

(0,0)

0.37

(0.20,0.55)

(0,0)

Real Estate

0.07

(-0.03,0.17)

(0,0)

0.16

(-0.03,0.37)

(0,0)

Telecommunications

0.54

(0.45,0.65)

(0,0)

0.40

(0.20,0.49)

(0,0)

Utilities

0.51

(0.39,0.66)

0.49

(0.41,0.58)

-0.12

(-0.21,0)

-0.06

(-0.09,-0.05)

Total

8.97

(8.58,9.31)

12.14

(11.29,13.23)

8.11

(7.57,8.67)

12.72

(11.60,13.58)

A second model was also considered, where we had the 12 marginal HMMs but did not have the Gaussian copula to specify an LHMM. While we also wanted to consider a baseline model where all 447 stocks were modeled using a single state process, numerical issues prevented the model from converging consistently when using random restarts. Table 2 shows the performance of the two portfolios each for the HMMs and the LHMM compared with the S&P 500 capital gains. For each sector, the first row provides the mean %-age gains during the one year test period, and the second row provides the corresponding 95% bootstrap confidence interval. If our aim is to just minimize risk, the LHMM does not provide better returns compared to individual HMMs. This approach results in a diversified portfolio for both models, where nearly every sector contributes to the annual gains. On the other hand, trying to balance expected return and risk leads to portfolios concentrated around a few sectors. In particular, Information Technology stocks were the single largest contributer to the annual gains for both the HMMs and the LHMM in our study. The balanced portfolios have higher annual gains compared to the portfolio which minimizes the variance, and the one based on the LHMM has the highest gain among all portfolios constructed, with a mean of 12.72% with a confidence interval of (11.60%, 13.58%). If our primary goal is to balance return and risk, the LHMM which better encapsulates market dynamics by allowing the different state processes to evolve jointly, provides better overall returns.

Table 3: Mean and standard deviation (SD) of the number of transactions for each type of portfolio based on 100 independent estimates of portfolio weights.

Number of transactions

for HMM portfolios

Number of transactions

for LHMM portfolios

Min V(R)

Balanced

Min V(R)

Balanced

Mean

134.45

37.47

48.84

33.43

3.44

1.00

1.56

1.03

Since we are demonstrating portfolio construction for a single year (2016–2017), the number of non-zero weights in our allocations correspond to the number of transactions for the entire year. This is another important metric to consider when comparing algorithms for portfolifo construction. The 100 different sets of weights in our case study thus correspond to 100 estimates of the number of transactions for each of the 4 approaches to portfolio selection considered here. Table 3 lists the mean and standard deviation for the number of transactions. We note that the LHMM based portfolios require fewer transactions than corresponding portfolios constructed from independent HMMs. In particular, for the portfolio which minimizes risk, the LHMM portfolio requires fewer than half the number of transactions as the independent HMMs portfolio. If we are constrained by the number of allowed transactions, the LHMM portfolio is more likely to produce higher returns based on our empirical studies with S&P 500 data.

5 Discussion

One of the key numerical challenges for fitting HMMs to large datasets using the B-W algorithm is that they often have trouble converging even under repeated random restarts. Using an LHMM allowed us to sidestep this issue to a large extent, since we went from trying to fit a 447-dimensional emission process to at most a 69-dimensional emission process. The LHMM also allows the market dynamics for each sector to evolve in a dependent manner without needing every stock to be in the same state at every time point. A similar form of heterogeneity can also be induced if we increase the number of states, but interpreting a larger number of states can be difficult. Increasing the number of states also increases the number of emission distribution parameters significantly. Extending to a multivariate state process, however, does not result in an increase in the number of emission distribution parameters and a relatively modest increase in the number of state process parameters.

One of the assumptions that is made in this paper is that the stock price changes for different stocks within a sector are distributed as independent Normal variables given the state, as shown in (4). This rarely holds in practice, and something akin to a power transform is necessary to meet the assumption. However, even if the emission distribution of each stock’s price changes is individually Normal, it still fails to adequately capture the correlation within the emission process. Ideally, we would want to model the emissions for each sector (either in its original scale of measurement or in a power-transformed scale so as to ensure Normality) as a multivariate Normal distribution, which would allow us to explicitly parameterize the correlation between the weekly gains for different stocks. We were actually able to do this for sectors with a small number of stocks, but faced computational issues for some of the larger sectors. It might be possible to estimate multivariate Normal parameters for the larger sectors if our data is extended to be longer than 260 weeks. However, the market dynamics do change over time and extending the length of the data might have other negative consequences. This is one aspect that we want to address in future work. In particular, a variational Bayes approach (McGrory and Titterington, 2009) where we can assign priors could potentially alleviate many of the numerical issues associated with B-W parameter estimation.

Acknowledgements

The hardware used in the computational studies is part of the UMBC High Performance Computing Facility (HPCF). The facility is supported by the U.S. National Science Foundation through the MRI program (grant nos. CNS–0821258, CNS–1228778, and OAC–1726023) and the SCREMS program (grant no. DMS–0821311), with additional substantial support from the University of Maryland, Baltimore County (UMBC). See hpcf.umbc.edu for more information on HPCF and the projects using its resources. Reetam Majumder was supported by the Joint Center for Earth Systems Technology and by the HPCF as a Research Assistant.

References

Baum and Petrie (1966) Baum, L. E. and Petrie, T. (1966) Statistical inference for probabilistic functions of finite state Markov chains. The Annals of Mathematical Statistics, 37(6), 1554–1563.
Brechmann et al. (2012) Brechmann, E. C., Czado, C. and Aas, K. (2012) Truncated regular vines in high dimensions with application to financial data. Canadian Journal of Statistics, 40, 68–85.
Dempster et al. (1977) Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39.
Elliott and van der Hoek (1997) Elliott, R. and van der Hoek, J. (1997) An application of hidden Markov models to asset allocation problems (*). Finance and Stochastics, 1, 229–238.
Elliott et al. (2010) Elliott, R., Siu, T. K. and Alex, B. (2010) On mean-variance portfolio selection under a hidden Markovian regime-switching model. Economic Modelling, 27, 678–686.
Ensor and Koev (2014) Ensor, K. B. and Koev, G. M. (2014) Computational finance: correlation, volatility, and markets. WIREs Computational Statistics, 6, 326–340. URLhttps://doi:10.1002/wics.1323.
Fiecas et al. (2017) Fiecas, M., Franke, J., von Sachs, R. and Tadjuidje, J. (2017) Shrinkage estimation for multivariate hidden Markov models. Journal of the American Statistical Association, 112, 326–340. URLhttps://doi.org/10.1080/01621459.2016.1148608.
Hamilton (1989) Hamilton, J. D. (1989) A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica, 57, 357–384.
Hassan and Nath (2005) Hassan, M. R. and Nath, B. (2005) Stock market forecasting using hidden Markov model: a new approach. Proceedings of the IEEE fifth International Conference on Intelligent Systems Design and Applications, 192–96.
Ji (2019) Ji, Q. (2019) Computational methods for hidden Markov models with applications. Ph.D. Thesis, Department of Mathematics and Statistics, University of Maryland, Baltimore County.
Ji and Neerchal (2019) Ji, Q. and Neerchal, N. K. (2019) Creating stock portfolios using hidden Markov models. In JSM Proceedings, Business and Economic Statistics Section, 2105–2118.
Joe and Xu (1996) Joe, H. and Xu, J. J. (1996) The estimation method of inference functions for margins for multivariate models. Tech. Rep. No. 166, Department of Statistics, University of British Columbia, Vancouver.
Kole and Dijk (2016) Kole, E. and Dijk, v. D. (2016) How to identify and forecast bull and bear markets? Journal of Applied Econometrics, 32.
Kruskal (1958) Kruskal, W. H. (1958) Ordinal measures of association. Journal of the American Statistical Association, 53, 814–861.
Majumder (2021) Majumder, R. (2021) Hidden Markov models for high dimensional data with geostatistical applications. Ph.D. Thesis, Department of Mathematics and Statistics, University of Maryland, Baltimore County.
Malkiel (2019) Malkiel, B. G. (2019) A Random Walk Down Wall Street: Including A Life-Cycle Guide To Personal Investing. W.W. Norton & Company, 12th edn.
Markowitz (1952) Markowitz, H. (1952) Portfolio selection. The Journal of Finance, 7, 77–91.
McGrory and Titterington (2009) McGrory, C. A. and Titterington, D. M. (2009) Variational Bayesian analysis for hidden Markov models. Australian and New Zealand Journal of Statistics, 51, 227–244.
Mhanna and Bauwens (2012) Mhanna, M. and Bauwens, W. (2012) A stochastic space-time model for the generation of daily rainfall in the Gaza Strip. International Journal of Climatology, 32, 1098–1112.
Nelsen (2006) Nelsen, R. B. (2006) An Introduction to Copulas. Springer, 2 edn.
Nguyen (2018) Nguyen, N. (2018) Hidden Markov model for stock trading. International Journal of Financial Studies, 36, 192–96.
Nguyen and Nguyen (2015) Nguyen, N. and Nguyen, D. (2015) Hidden Markov model for stock selection. Risks, 3, 455–473.
Rabiner (1989) Rabiner, L. R. (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77.
Rizzo (2019) Rizzo, M. L. (2019) Statistical Computing with R. Chapman & Hall/CRC, 2 edn.
Serfozo (2009) Serfozo, R. (2009) Basics of Applied Stochastic Processes. Springer.
Varin et al. (2011) Varin, C., Reid, N. and Firth, D. (2011) An overview of composite likelihood methods. Statistica Sinica, 21, 5–42.
Visser and Speekenbrink (2010) Visser, I. and Speekenbrink, M. (2010) depmixS4: An R package for hidden Markov models. Journal of Statistical Software, 36, 1–21. URLhttp://www.jstatsoft.org/v36/i07/.
Visser and Speekenbrink (2019) — (2019) Hidden Markov Models with R. Springer.
Viterbi (1967) Viterbi, A. (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13, 260–269.
Xu and Cao (2021) Xu, J. and Cao, L. (2021) High-dimensional cross-market dependence modeling and portfolio forecasting by copula variational LSTM. Available at SSRN:. URLhttps://dx.doi.org/10.2139/ssrn.3881474.
Yeo and Johnson (2000) Yeo, I.-K. and Johnson, R. A. (2000) A new family of power transformations to improve normality or symmetry. Biometrika, 87, 954–959.