Optimal Stock Portfolio Selection with a Multivariate Hidden Markov Model
Reetam Majumder111University of Maryland, Baltimore County, Qing Ji222Procter & Gamble and Nagaraj K. Neerchal1
Abstract
The underlying market trends that drive stock price fluctuations are often referred to in terms of bull and bear markets. Optimal stock portfolio selection methods need to take into account these market trends; however, the bull and bear market states tend to be unobserved and can only be assigned retrospectively. We fit a linked hidden Markov model (LHMM) to relative stock price changes for S&P 500 stocks from 2011โ2016 based on weekly closing values. The LHMM consists of a multivariate state process whose individual components correspond to HMMs for each of the 12 sectors of the S&P 500 stocks. The state processes are linked using a Gaussian copula so that the states of the component chains are correlated at any given time point. The LHMM allows us to capture more heterogeneity in the underlying market dynamics for each sector. In this study, stock performances are evaluated in terms of capital gains using the LHMM by utilizing historical stock price data. Based on the fitted LHMM, optimal stock portfolios are constructed to maximize capital gain while balancing reward and risk. Under out-of-sample testing, the annual capital gain for the portfolios for 2016โ2017 are calculated. Portfolios constructed using the LHMM are able to generate returns comparable to the S&P 500 index.
Key words: Linked hidden Markov model, Multivariate Markov chain, Stochastic simulations, Portfolio allocation, Gaussian copula
1 Introduction
A stock portfolio refers to a collection of stocks selected and owned by an investor, and stock portfolio selection has been at the center of investment methodology research for many years. Depending on the investment goals, various methods have been developed by researchers for selecting stocks and allocating assets. The modern portfolio selection methodology developed by Markowitz (1952) has guided a large section of portfolio research. There are two essential components to the portfolio selection procedure, namely the evaluation of stocks, and portfolio assets allocation. A good introductory reference for topic of portfolio selection is Malkiel (2019). In this paper, we follow the groundwork laid out by Ji and Neerchal (2019) of connecting portfolio selection to the estimation of the underlying statistical model. We first build statistical models using past data on stock prices. Then, optimal stock portfolios are constructed based on the technique in Markowitz (1952) to maximize the capital gain while balancing reward and risk. The performance of the optimal portfolios are evaluated by comparing annual gains based on the portfolios against the S&P 500 gains for the same time period.
Stock markets around the world use the terms bull and bear to describe market trends. Stock prices are relatively stable and generally increasing in a bull market. A bear market, on the other hand, indicates strong market volatility with decreasing stock prices. A bull to bear market switch or vice versa is recognized after an increase/decrease of 20% or more in multiple stock indices (Kole and Dijk, 2016). While bull and bear markets cannot be directly observed, the behaviour of individual stocks point to the state of the market. The current state of a stock can be estimated by analysts, but the true state is unknown unless evaluating stocks retrospectively. Therefore, the state of a stock can be treated as an unobserved (latent) random variable, and the prices of the stock are the observed values. In addition, the market conditions can switch states at any time point. Given these characteristics, a hidden Markov model (HMM) is well suited for modeling the bull/bear trend of the market.
An HMM is a discrete-time stochastic process that is controlled through a Markov chain with latent (hidden) states. A Markov chain (MC) is a well-known stochastic model that describes a sequence of discrete events. Let a sequence of random variables form a Markov chain. The characterizing property of a first order Markov chains states that
(1) |
Assuming the Markov chain is stationary and has states, the transition probabilities can be arranged into a by matrix known as the transition probability matrix,
where is the probability of transitioning from state to state at any , and given any . In an HMM, observations are assumed to be drawn from one of several sub-distributions determined by the unobserved variable . Formally, an HMM consists of a pair of random processes , where is a Markov chain with states. Conditional on , is a sequence of independent random variables such that the distribution of depends only on . The conditional distribution of given is given by
(2) |
where are the different sub-distributions. is known as the state process of the HMM, and is known as the emission process. Note that at any time point , could be distributed as a univariate or as a multivariate distribution. Figure 1 depicts the graphical representation of an HMM. The variables in this representation are denoted as the nodes of the graph, and the arrows connecting them are denoted as the edges and represent the dependence among the nodes.
Some previous work on predicting stock prices based on HMMs include Hassan and Nath (2005), and Nguyen (2018). Their models were trained directly using the stock closing values and were used to predict stock prices in the near future; there is however no extension to portfolio selection in their work. Hamilton (1989) combined the HMM structure with autoregressive models in order to capture the market trend, where parameters of an autoregressive model were considered to arise from an HMM. Elliott and vanย der Hoek (1997) and Elliott etย al. (2010) further extended the work of Hamilton (1989) to include a portfolio selection procedure.
Although we may largely expect the bull and bear states of the market to be consistently reflected in weekly stock price changes, it is likely that stocks in different sectors will have different underlying dynamics of the state process. This paper considers a model where each sector is driven by a different state process with its own bull and bear states. This results in a multivariate state process, whose individual components are Markov chains. The dependence between the individual Markov chains of the multivariate Markov chain (MMC) can propagate in different ways, and Majumder (2021) discusses some of the common ways an MMC has been specified in previous studies. Additonal work in the area of HMMs and correlations among the prices of different assests or markets include Ensor and Koev (2014), who investigated the correlations between different stock sectors while applying a regime-switch model to the correlation matrix, and Fiecas etย al. (2017), who address the estimation of a multivariate HMM using shrinkage estimators. More recently, Xu and Cao (2021) have incorporated a vine-coupla into an artificial neural network so that the inter-market correlations were considered while estimating the return of a portfolio. We thank one of the anonymous referees for bringing these references to our attention.
In this paper, we assume that the state processes of the MMC evolve in lockstep, i.e., the nodes of two Markov chains are connected by an edge if and only if the nodes are at the same time point. The resulting multivariate HMM is known as a linked HMM (LHMM). Figure 2 represents our approach through an example where is a bivariate state process corresponding to bull/bear states for 2 sectors of the stock market, and are the stock returns for all stocks within the two sectors. This is a modification of the default LHMM specification; the state processes of the different sectors can be considered to evolve in lockstep, and each state process affects the stock price changes for that sectorโs stocks. Partitioning the stocks by sector allows for more heterogeneity in the market dynamics while still using two-state latent processes with an intuitive bull/bear labeling. We can extend this idea to an LHMM with clusters corresponding to sectors. We specify the dependency structure for the -variate LHMM using a Gaussian copula, which allows us to generate correlated states from the MMC at every time point (Majumder, 2021). To demonstrate our LHMM, we propose a stock portfolio selection method based on the work of Ji and Neerchal (2019).
The rest of this paper is structured as follows. In Sectionย 2, we describe an LHMM with its dependency structure specified using a Gaussian copula which can be used to model weekly stock price changes. Sectionย 3 introduces the methods to evaluate stocks and portfolios and explains the portfolio selection methodology. In Sectionย 4, we validate the portfolio selection method using historical S&P 500 stock data. Finally, Sectionย 5 discusses our results and proposes ways that our approach can be improved.
2 Parameter Estimation for a Linked Hidden Markov Model
2.1 Parameterizing an LHMM using a Gaussian copula
Suppose a stock portfolio consists of stocks. For the th stock, , let be the closing price at the end of the th week, . The price changes in percentage are given by
Let be the vector of stock price changes at the end of the th week, and be the binary latent state at that time point. The HMM for the stock price changes is given by
(3) |
where is a Markov chain with initial distribution and a transition matrix . The latent states represent the bull or bear state of the market, which is observed in the emission process as the buy or sell trend for each of the stocks. The emission distribution assumes a conditional independence structure at each time point, i.e., the price changes of any stock is independent of the remaining stocks conditional on the state. Following notation established in (2), we use to denote all the parameters of the emission distribution. Parameter estimation for an HMM of this form is carried out using the Baum-Welch (B-W) algorithm (Baum and Petrie, 1966), which is a special case of the expectation-maximization (EM) algorithm (Dempster etย al., 1977). A comprehensive tutorial of parameter estimation in HMMs is provided by Rabiner (1989).
Now, let us consider the case where the stocks belong to different sectors. If we assign each sector its own underlying state process, we can denote the state of LHMM at the end of the th week as . If the th sector consists of stocks, the HMM for price changes in the th sector is given by,
(4) |
with . As before, is the latent state process. The HMM for sector is parameterized by the initial distribution , transition matrix , and emission distribution parameters . Furthermore, is a -component MMC, and given is independent to and for any . The full likelihood of the LHMM at time t can be written as,
(5) |
where the parameter dependencies have been suppressed for convenience. We want to parametererize the association between the component Markov chains of the MMC at every time point, and our approach to that end is to construct a Gaussian copula for the state processes. Let be the D-dimensional joint CDF of , and let be the marginal CDFs of respectively. We define a Gaussian copula over the state processes as:
(6) |
where are variates and are standard Normal variates. is a -dimensional multivariate Normal CDF with correlation matrix , while is the inverse CDF of a univariate standard Normal distribution. Note that , and therefore the joint distribution of the state processes can be obtained by using the chain rule as,
(7) |
where and are the density functions corresponding to and , denotes the distribution of , and denotes the copula density. The likelihood in (5) can thus be simplified to
(8) |
The copula augmented model has an LHMM structure similar to Figure 2. A discussion of the fundamentals and theoretical properties of copulas can be found in Nelsen (2006). Copulas of continuous variables are well defined and have been extensively studied, but constructing a copula for discrete variables is not as straightforward. Since a Markov chain is either a nominal or an ordinal random variable, finding an appropriate measure of association between latent state processes to construct a copula can be challenging. To address this, we will take advantage of a unique relationship which exists between the Spearman and Pearson correlations of a bivariate Normal distribution. Since the Spearman correlation can be used as a measure of association for ordinal data, we choose a Gaussian copula parameterized by a correlation matrix . We assume that the state processes evolve in lockstep, and correlated -vectors from can be linked to an MMC with correlated Markov chains by means of an appropriate transformation. One such method is described below.
2.2 Constructing an MMC from Uniform random variates
Since the copula is a -dimensional CDF with Uniform marginals, we discuss a method to generate an MMC from a Gaussian copula in this section. This will be relevant to our method of estimating the copula parameters.
Let us first review a method to generate a univariate Markov chain; Serfozo (2009) describes how to construct a Markov chain from a Uniform variable. Suppose that the desired Markov chain has the initial distribution vector and the transition probability matrix . Let and be functions transforming continuous values into categorical values . They are given by
(9) |
where and for any , and
(10) |
where and for any .
Let be a vector of independent random variables where has an uniform distribution on . We will denote as and as for any . Serfozo (2009) showed that is a Markov chain with the initial distribution and the transition probability matrix . Ji (2019) modified this method in order to generate correlated Markov chains. In a univariate Markov chain, the random value at th time point, , is generated from a single random variable . To create an MMC with Markov chains, we need a vector of possibly correlated random variables at each time point . Therefore, we will use a -dimensional Normal distribution to generate correlated random values. Suppose that an MMC has a length of with sequences. Let the stationary distribution and the transition probability matrix of the th sequence be and respectively. We use the inverse transform method to create Uniform variables from Normal variables (Rizzo, 2019). For the th time step, , let and where is a correlation matrix. For each and , a Uniform random variable is created using the inverse transform method, namely . Each is thus uniformly distributed on . The correlations among stem from the correlations among .
Now let us apply (9) and (10) to . A random variable is created for each pair where and . Thus, we have an MMC where and is a Markov chain marginally. In addition, are correlated at the th time step. The functions in (9) and (10) are collectively referred to as going forward, and describes the overall process of transforming marginally Uniform random vectors into an MMC.
2.3 Two-stage parameter estimation for the LHMM
The construction of a copula for the state processes requires knowledge of the states that give rise to the data. This is usually obtained as the most likely sequence of states using the Viterbi Algorithm (Viterbi, 1967). The Viterbi Algorithm is applied after the model parameters have been estimated - this means that we need to resort to a two-stage estimation process. In the first stage, the parameters for each sectorโs HMMs are estimated independently using the B-W algorithm. The Viterbi Algorithm then provides us the most likely sequence of states to have generated the data, which is used to estimate the copula correlation matrix . Afterwards, the marginal parameters can be re-estimated conditioned on the correlation structure.
Estimating in (6) is challenging using conventional approaches like the inversion method (Nelsen, 2006) or the inference functions for margins method (Joe and Xu, 1996), since neither the CDF nor its associated probability mass function that appear in (6) and (7) can be evaluated easily. Instead, we choose in a manner such that states generated from the Gaussian copula using the methodology discussed in Section 2.2 can be used to reproduce a desired measure of association for the MMC. For each stock, we assumed that the HMM has two hidden states, the bear state and the bull state. However, the B-W algorithm produces two states, State 1 and State 2 without labels identifying them as bear/bull. So without loss of generality, we relabel the states for the th Markov chain such that for State 1,
where are as defined in (4). State 1 has a higher return to volatility ratio and can be considered a good stock to buy (Nguyen and Nguyen, 2015). It would thus correspond to a bull market, and State 2 can be considered to be bear market states. Since the states are now ordinal in nature, the pairwise Spearman correlation for Markov chains in the MMC is chosen as the desired measure of association. However, there is no obvious way to estimate a matrix whose pairwise correlations are functions of the Spearman correlations between the Markov chains. We make a simplifying assumption for the copula and rewrite (6) as:
(11) |
where denotes the Pearson correlation between and , and corresponds to the th element of . This formulation can be interpreted in a manner similar to a pairwise simplified regular vine (R-vine) copula (Brechmann etย al., 2012), with all pair-copula terms involving a conditioning set replaced by bivariate Gaussian copulas. We refer to this as the pair-copula approximation, and it consists of terms. The copula density associated with (11) can also be interpreted as a composite likelihood (Varin etย al., 2011). In practice, this will allow us to estimate the individual elements of using the right hand side of (11), but simulate data from the copula using the left hand side of (11), as long as we can ensure that is a positive-definite matrix. Kruskal (1958) provided a relationship between the Pearson correlation and the Spearman correlation for bivariate Normal variables that we will use to estimate :
(12) |
Note that since the Spearman correlation coefficient is invariant under monotone transforms. Recall that we defined in Section 2.2 as the function which transforms Uniform variates into a Markov chain. Let and be similar functions such that and , with if . The relationship in (12) and the assumption made in (11) together means that it is sufficient to estimate to obtain an estimate of . If we denote the corresponding estimators as and respectively, the estimate can be obtained as the numerical solution to
(13) | ||||
where is the sample Spearman correlation between states of the LHMM which is fixed given the data, and is its population version. Note that it is not possible to invert the relationship in (13) and obtain an analytical expression for as one perhaps would in a method of moments approach. However, given any value of it is straightforward to generate data from the MMC and obtain a large sample estimate of . The Spearman correlation is not preserved by this transformation and except in trivial cases. Majumder (2021) has empirically shown that a monotonically increasing relationship exists between , and that . The inequality is a consequence of and discretizing continuous variables and into and which are ordinal variables with 2 levels and possible ties. This attenuates the maximum and minimum values that the Spearman correlation between the 2 state processes can take. Mhanna and Bauwens (2012) have also demonstrated similar behaviour using empirical studies when Uniform variables are discretized to Bernoulli variables. The monotone relationship between means that for a given target value of , it is possible to use a line search to identify the value of which generates states with a sample Spearman correlation of arbitrarily close to . The unique elements of can thus be estimated using pairs of state sequences.
2.4 Algorithm to estimate Gaussian copula parameters
Recall that for the LHMM, The states for each sectorโs HMM are obtained using the Viterbi algorithm once the marginal parameters have been estimated. Let denote the observed Spearman correlations between the states of each pair of the component HMMs. This value is fixed given the marginal models and the data. Given the matrix of states, the initial distribution , and the transition matrix for each , we want to construct a Gaussian copula that can generate an MMC with pairwise Spearman correlations coinciding with . Let be the estimate of the copula correlation in (11) between , and let be the corresponding estimate of the Spearman correlation using (12). Since (13) cannot be rewritten as a function of , we resort to a simulation approach to compute and . We initialize with for each pair of Markov chains and simulate an MMC from the Gaussian copula. We compute the pairwise Spearman correlations between the Markov chains in the MMC and denote them by . If , we increment by a step size and repeat the process. We stop when , for some predefined tolerance . The procedure is formalized in Algorithm 1 below.
Since the entries of are constructed independently, the resultant matrix is not guaranteed to be positive definite. The final steps of our algorithm ensures the positive-definiteness of . An alternative approach suggested by one of the anonymous referees is to add a similar small positive quantity to all diagonal elements of . In cases when is high-dimensional and the eigendecomposition is computationally expensive, this would be a much faster way of ensuring the positive definiteness of .
After has been estimated, we can use the correlation structure to re-estimate the marginal parameters , , and . One way of doing so is generating a sequence of states from the MMC and use the states as initial values in the Baum-Welch algorithm. Alternatively, we can re-estimate and for all sectors from synthetic states generated from the MMC, and use the estimates as the initial distributions in the Baum-Welch algorithm to restimate all marginal parameters. For this study, we have followed the second approach.
3 Stock Portfolio Selection using an LHMM
The return of the th stock over weeks is defined as follows,
(14) |
Our desired portfolio generates a high return with a low risk over a period of time, so we seek stocks with these characteristics as well. To evaluate each of the stocks, we use the random variable . Given a portfolio of stocks with allocations , its return over weeks is defined as,
(15) |
where the weight represents the proportion of the portfolio wealth invested in the th stock. Thus, the expected return of a portfolio is given by
(16) |
The variance of return is given by,
(17) |
The goal of portfolio selection in this paper is to find the optimal allocation with high reward and relatively low risk based on the results above. The optimal would maximize while minimizing . However, empirical evidence suggests that there exists a trade-off between and (Malkiel, 2019, p.ย 200). The most conservative approach would be to choose such that is minimized, i.e.,
(18) |
Alternatively, Malkiel (2019) suggested that an optimal weight vector should maximizes while ,
The Lagrange multiplier method is used to find optimal allocations. The pairs of and are referred to as the efficient (, ) combinations by Markowitz (1952). He claimed that a portfolio created based on an efficient combination is efficient, but did not suggest a specific combination to balance the reward and the risk. Ji and Neerchal (2019) suggested the following approach to find a vector of weights for a balanced portfolio,
In this expression, functions as a tuning parameter which controls the trade-off between reward and risk. A similar technique was also implemented in Elliott and vanย der Hoek (1997). Assuming is approximately Normal, this technique maximizes the lower bound of a confidence interval of the return of a portfolio. As gets higher in value, the resulting portfolio would accept less risk and prioritize more stable stocks. As gets lower in value (), the portfolio would select stocks with higher return despite their higher volatility. The choice of is based on an investorโs willingness to take risk. For the rest of the paper, we will assume .
In practice, the stocks , for will not be Normally distributed. We can transform them to Normal variates using the Yeo-Johnson power transformation (Yeo and Johnson, 2000), and fit HMMs on the transformed variables . While analytical expressions analogous to and can be constructed based on , the quantities and do not have any meaningful interpretations. However, since we can generate data from the fitted LHMM, a simulation based approach allows us to recover data in the original scale.
Consider an LHMM fitted to the transformed stock returns using the methodology described in Algorithm 1. We can simulate data from this model - let us denote this simulated data by . Since has the same distribution as , we use the inverse of the Yeo-Johnson transform to recover for . are simulated stock price changes, and thus we can estimate from this data using (14). If we simulate a large number of independent datasets from the fitted model, say , we have for the th stock independent annual return samples . The vector of expectations and the covariance matrix can be computed from this data, and can be used for portfolio optimization.
4 Building a Portfolio for 2016โ17 from S&P 500 Data
Sector | Number of Stocks |
---|---|
Communication Services | 8 |
Consumer Discretionary | 69 |
Consumer Staples | 32 |
Energy | 28 |
Financials | 67 |
Health Care | 43 |
Industrials | 69 |
Information Technology | 45 |
Materials | 40 |
Real Estate | 10 |
Telecommunications | 7 |
Utilities | 29 |
Total | 447 |
We fit an LHMM to historical S&P 500 data to create a portfolio for 2016โ17 and evaluate its performance against the S&P 500 index changes. As described in Section 3, the parameters of the LHMM are used to identify efficient combinations and use the associated weights to create a portfolio. Historical data for S&P 500 stocks from 2011-10-01 to 2016-09-30 is used to build an LHMM with 12 Markov chains corresponding to the 12 sectors represented in the data. Stocks with records of fewer than 5 years are ignored; this leaves us with 447 stocks for the study, i.e., .
Table 1 shows the number of stocks available per sector that were used to fit the LHMM. The weekly stock price changes were made to undergo the Yeo-Johnson power transformation; the resulting variable is Normally distributed and thus meets the distributional assumptions for the LHMM. The HMMs were fitted using the packages depmixS4 (Visser and Speekenbrink, 2010) and hmmr (Visser and Speekenbrink, 2019) on R 4.0.x. For each sector, the B-W algorithm was restarted 20 times with random starting values. Parameter estimates from each of the 20 random restarts were compared on the basis of their Bayesian information criterion (BIC), with lower BIC values corresponding to higher likelihoods. The model which provided the lowest BIC values was chosen as the final model for each sector.
Once HMMs have been fitted to each sectorโs data, the most likely sequence of states was obtained using the Viterbi algorithm. The states were labeled such that State 1 is the bear state for each HMM and State 2 is the bull state, and the target Spearman correlation matrix was computed based on each pair of state processes. Next, a Gaussian copula which can generate synthetic states with the same Spearman correlation was constructed using Algorithmย 1. Synthetic state sequences from the copula as used to re-estimate , , and for ; these estimates now take into account the correlation structure between .
Once all LHMM parameters have been estimated, 10000 datasets of 5 years ( weeks) each were simulated from this fitted model, and the emissions of the simulated data were transformed back to their original scale using the inverse of the Yeo-Johnson transformation. The gains are computed for the stocks for the datasets. This gives us a matrix of values; and can be computed from this matrix. These simulated values of and were used for constrained optimization to obtain the optimum weight vector which minimizes subject to , and which maximizes . We denote the latter as a balanced portfolio assignment since it balances the expected return with the uncertainty surrounding it, and portfolios for the period 2016-10-01 to 2017-09-30 can be built based on and . We repeated this entire process 100 times, to get 100 estimates of the weights, and . These are used to construct confidence intervals for the -age gains based on our method, and we evaluated their performance against the capital gains from 2016-10-01 to 2017-09-30.
Sector | % gain from HMMs | % gains from LHMM | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Min V(R) | Balanced | Min V(R) | Balanced | |||||||||
Communication Services |
|
|
|
|
||||||||
Consumer Discretionary |
|
|
|
|
||||||||
Consumer Staples |
|
|
|
|
||||||||
Energy |
|
|
|
|
||||||||
Financials |
|
|
|
|
||||||||
Health Care |
|
|
|
|
||||||||
Industrials |
|
|
|
|
||||||||
Information Technology |
|
|
|
|
||||||||
Materials |
|
|
|
|
||||||||
Real Estate |
|
|
|
|
||||||||
Telecommunications |
|
|
|
|
||||||||
Utilities |
|
|
|
|
||||||||
Total |
|
|
|
|
A second model was also considered, where we had the 12 marginal HMMs but did not have the Gaussian copula to specify an LHMM. While we also wanted to consider a baseline model where all 447 stocks were modeled using a single state process, numerical issues prevented the model from converging consistently when using random restarts. Table 2 shows the performance of the two portfolios each for the HMMs and the LHMM compared with the S&Pย 500 capital gains. For each sector, the first row provides the mean %-age gains during the one year test period, and the second row provides the corresponding 95% bootstrap confidence interval. If our aim is to just minimize risk, the LHMM does not provide better returns compared to individual HMMs. This approach results in a diversified portfolio for both models, where nearly every sector contributes to the annual gains. On the other hand, trying to balance expected return and risk leads to portfolios concentrated around a few sectors. In particular, Information Technology stocks were the single largest contributer to the annual gains for both the HMMs and the LHMM in our study. The balanced portfolios have higher annual gains compared to the portfolio which minimizes the variance, and the one based on the LHMM has the highest gain among all portfolios constructed, with a mean of 12.72% with a confidence interval of (11.60%, 13.58%). If our primary goal is to balance return and risk, the LHMM which better encapsulates market dynamics by allowing the different state processes to evolve jointly, provides better overall returns.
|
|
|||||||
Min V(R) | Balanced | Min V(R) | Balanced | |||||
Mean | 134.45 | 37.47 | 48.84 | 33.43 | ||||
SD | 3.44 | 1.00 | 1.56 | 1.03 |
Since we are demonstrating portfolio construction for a single year (2016โ2017), the number of non-zero weights in our allocations correspond to the number of transactions for the entire year. This is another important metric to consider when comparing algorithms for portfolifo construction. The 100 different sets of weights in our case study thus correspond to 100 estimates of the number of transactions for each of the 4 approaches to portfolio selection considered here. Table 3 lists the mean and standard deviation for the number of transactions. We note that the LHMM based portfolios require fewer transactions than corresponding portfolios constructed from independent HMMs. In particular, for the portfolio which minimizes risk, the LHMM portfolio requires fewer than half the number of transactions as the independent HMMs portfolio. If we are constrained by the number of allowed transactions, the LHMM portfolio is more likely to produce higher returns based on our empirical studies with S&P 500 data.
5 Discussion
One of the key numerical challenges for fitting HMMs to large datasets using the B-W algorithm is that they often have trouble converging even under repeated random restarts. Using an LHMM allowed us to sidestep this issue to a large extent, since we went from trying to fit a 447-dimensional emission process to at most a 69-dimensional emission process. The LHMM also allows the market dynamics for each sector to evolve in a dependent manner without needing every stock to be in the same state at every time point. A similar form of heterogeneity can also be induced if we increase the number of states, but interpreting a larger number of states can be difficult. Increasing the number of states also increases the number of emission distribution parameters significantly. Extending to a multivariate state process, however, does not result in an increase in the number of emission distribution parameters and a relatively modest increase in the number of state process parameters.
One of the assumptions that is made in this paper is that the stock price changes for different stocks within a sector are distributed as independent Normal variables given the state, as shown in (4). This rarely holds in practice, and something akin to a power transform is necessary to meet the assumption. However, even if the emission distribution of each stockโs price changes is individually Normal, it still fails to adequately capture the correlation within the emission process. Ideally, we would want to model the emissions for each sector (either in its original scale of measurement or in a power-transformed scale so as to ensure Normality) as a multivariate Normal distribution, which would allow us to explicitly parameterize the correlation between the weekly gains for different stocks. We were actually able to do this for sectors with a small number of stocks, but faced computational issues for some of the larger sectors. It might be possible to estimate multivariate Normal parameters for the larger sectors if our data is extended to be longer than 260 weeks. However, the market dynamics do change over time and extending the length of the data might have other negative consequences. This is one aspect that we want to address in future work. In particular, a variational Bayes approach (McGrory and Titterington, 2009) where we can assign priors could potentially alleviate many of the numerical issues associated with B-W parameter estimation.
Acknowledgements
The hardware used in the computational studies is part of the UMBC High Performance Computing Facility (HPCF). The facility is supported by the U.S. National Science Foundation through the MRI program (grant nos.ย CNSโ0821258, CNSโ1228778, and OACโ1726023) and the SCREMS program (grant no.ย DMSโ0821311), with additional substantial support from the University of Maryland, Baltimore County (UMBC). See hpcf.umbc.edu for more information on HPCF and the projects using its resources. Reetam Majumder was supported by the Joint Center for Earth Systems Technology and by the HPCF as a Research Assistant.
References
- Baum and Petrie (1966) Baum, L.ย E. and Petrie, T. (1966) Statistical inference for probabilistic functions of finite state Markov chains. The Annals of Mathematical Statistics, 37(6), 1554โ1563.
- Brechmann etย al. (2012) Brechmann, E.ย C., Czado, C. and Aas, K. (2012) Truncated regular vines in high dimensions with application to financial data. Canadian Journal of Statistics, 40, 68โ85.
- Dempster etย al. (1977) Dempster, A.ย P., Laird, N.ย M. and Rubin, D.ย B. (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39.
- Elliott and vanย der Hoek (1997) Elliott, R. and vanย der Hoek, J. (1997) An application of hidden Markov models to asset allocation problems (*). Finance and Stochastics, 1, 229โ238.
- Elliott etย al. (2010) Elliott, R., Siu, T.ย K. and Alex, B. (2010) On mean-variance portfolio selection under a hidden Markovian regime-switching model. Economic Modelling, 27, 678โ686.
- Ensor and Koev (2014) Ensor, K.ย B. and Koev, G.ย M. (2014) Computational finance: correlation, volatility, and markets. WIREs Computational Statistics, 6, 326โ340. URLhttps://doi:10.1002/wics.1323.
- Fiecas etย al. (2017) Fiecas, M., Franke, J., von Sachs, R. and Tadjuidje, J. (2017) Shrinkage estimation for multivariate hidden Markov models. Journal of the American Statistical Association, 112, 326โ340. URLhttps://doi.org/10.1080/01621459.2016.1148608.
- Hamilton (1989) Hamilton, J.ย D. (1989) A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica, 57, 357โ384.
- Hassan and Nath (2005) Hassan, M.ย R. and Nath, B. (2005) Stock market forecasting using hidden Markov model: a new approach. Proceedings of the IEEE fifth International Conference on Intelligent Systems Design and Applications, 192โ96.
- Ji (2019) Ji, Q. (2019) Computational methods for hidden Markov models with applications. Ph.D.ย Thesis, Department of Mathematics and Statistics, University of Maryland, Baltimore County.
- Ji and Neerchal (2019) Ji, Q. and Neerchal, N.ย K. (2019) Creating stock portfolios using hidden Markov models. In JSM Proceedings, Business and Economic Statistics Section, 2105โ2118.
- Joe and Xu (1996) Joe, H. and Xu, J.ย J. (1996) The estimation method of inference functions for margins for multivariate models. Tech. Rep. No. 166, Department of Statistics, University of British Columbia, Vancouver.
- Kole and Dijk (2016) Kole, E. and Dijk, v.ย D. (2016) How to identify and forecast bull and bear markets? Journal of Applied Econometrics, 32.
- Kruskal (1958) Kruskal, W.ย H. (1958) Ordinal measures of association. Journal of the American Statistical Association, 53, 814โ861.
- Majumder (2021) Majumder, R. (2021) Hidden Markov models for high dimensional data with geostatistical applications. Ph.D. Thesis, Department of Mathematics and Statistics, University of Maryland, Baltimore County.
- Malkiel (2019) Malkiel, B.ย G. (2019) A Random Walk Down Wall Street: Including A Life-Cycle Guide To Personal Investing. W.W. Norton & Company, 12th edn.
- Markowitz (1952) Markowitz, H. (1952) Portfolio selection. The Journal of Finance, 7, 77โ91.
- McGrory and Titterington (2009) McGrory, C.ย A. and Titterington, D.ย M. (2009) Variational Bayesian analysis for hidden Markov models. Australian and New Zealand Journal of Statistics, 51, 227โ244.
- Mhanna and Bauwens (2012) Mhanna, M. and Bauwens, W. (2012) A stochastic space-time model for the generation of daily rainfall in the Gaza Strip. International Journal of Climatology, 32, 1098โ1112.
- Nelsen (2006) Nelsen, R.ย B. (2006) An Introduction to Copulas. Springer, 2 edn.
- Nguyen (2018) Nguyen, N. (2018) Hidden Markov model for stock trading. International Journal of Financial Studies, 36, 192โ96.
- Nguyen and Nguyen (2015) Nguyen, N. and Nguyen, D. (2015) Hidden Markov model for stock selection. Risks, 3, 455โ473.
- Rabiner (1989) Rabiner, L.ย R. (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77.
- Rizzo (2019) Rizzo, M.ย L. (2019) Statistical Computing with R. Chapman & Hall/CRC, 2 edn.
- Serfozo (2009) Serfozo, R. (2009) Basics of Applied Stochastic Processes. Springer.
- Varin etย al. (2011) Varin, C., Reid, N. and Firth, D. (2011) An overview of composite likelihood methods. Statistica Sinica, 21, 5โ42.
- Visser and Speekenbrink (2010) Visser, I. and Speekenbrink, M. (2010) depmixS4: An R package for hidden Markov models. Journal of Statistical Software, 36, 1โ21. URLhttp://www.jstatsoft.org/v36/i07/.
- Visser and Speekenbrink (2019) โ (2019) Hidden Markov Models with R. Springer.
- Viterbi (1967) Viterbi, A. (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13, 260โ269.
- Xu and Cao (2021) Xu, J. and Cao, L. (2021) High-dimensional cross-market dependence modeling and portfolio forecasting by copula variational LSTM. Available at SSRN:. URLhttps://dx.doi.org/10.2139/ssrn.3881474.
- Yeo and Johnson (2000) Yeo, I.-K. and Johnson, R.ย A. (2000) A new family of power transformations to improve normality or symmetry. Biometrika, 87, 954โ959.