\theorembodyfont\theoremheaderfont\theorempostheader

: \theoremsep

Temporal distribution of clusters of investors and their application in prediction with expert advice

\NameWojciech Wisniewski \Email[email protected]
\NameYuri Kalnishkan \Email[email protected]
\addrDepartment of Computer Science
Royal Holloway University of London
Egham United Kingdom \NameDavid Lindsay \Email[email protected]
\NameSiân Lindsay \Email[email protected]
\addrAlgoLabs
Bracknell United Kingdom

Abstract

Financial organisations such as brokers face a significant challenge in servicing the investment needs of thousands of their traders worldwide. This task is further compounded since individual traders will have their own risk appetite and investment goals. Traders may look to capture short-term trends in the market which last only seconds to minutes, or they may have longer-term views which last several days to months. To reduce the complexity of this task, client trades can be clustered. By examining such clusters, we would likely observe many traders following common patterns of investment, but how do these patterns vary through time? Knowledge regarding the temporal distributions of such clusters may help financial institutions manage the overall portfolio of risk that accumulates from underlying trader positions. This study contributes to the field by demonstrating that the distribution of clusters derived from the real-world trades of 20k Foreign Exchange (FX) traders (from 2015 to 2017) is described in accordance with Ewens’ Sampling Distribution. Further, we show that the Aggregating Algorithm (AA), an on-line prediction with expert advice algorithm, can be applied to the aforementioned real-world data in order to improve the returns of portfolios of trader risk. However we found that the AA ’struggles’ when presented with too many trader “experts”, especially when there are many trades with similar overall patterns. To help overcome this challenge, we have applied and compared the use of Statistically Validated Networks (SVN) with a hierarchical clustering approach on a subset of the data, demonstrating that both approaches can be used to significantly improve results of the AA in terms of profitability and smoothness of returns.

keywords:

statistically validated networks, Ewens sampling distribution, foreign exchange, behavioural finance, clusters of investors, aggregating algorithm

1 Introduction

In recent years, published research has highlighted a growing interest in methods to cluster together traders based on their strategic and behavioural features, as well as studying how they influence each other. We summarise the key contributions in this area since 2012. Since 2012, various studies have investigated clustering of investors in financial markets. Tumminello et al. (2011a) used Statistically Validated Networks (SVN) and the Infomap method to detect clusters of investors with similar trading decisions. Similarly, Bohlin and Rosvall (2014) identified clusters by studying the relationship between portfolio and trading decisions of Swedish investors. Musciotto et al. (2016) used hierarchical clustering and SVN to detect clusters of investors with similar trading decisions. Challet et al. (2018) extended the SVN methodology and inferred a lead-lag network of clusters of investors trading in FX, which showed that most of the trading activity has a market endogenous origin. Sueshige et al. (2018) detected and clustered traders with respect to limit-order and market-order strategies, and classified trading strategies based on their response pattern to historical price changes. Gutiérrez-Roig et al. (2019) used mutual information and transfer entropy to identify a network of synchronization and anticipation relationships between financial traders. Baltakiene et al. (2019) constructed multilink networks covering 2 years after IPOs and obtained clusters of investors characterized by synchronization in the timing of trading decisions. Cordi et al. (2020) proposed a method to detect lead-lag networks between the states of traders determined at different timescales and observed that institutional and retail traders have different causality structures of lead-lag networks. Barreau et al. (2020) proposed a deep learning architecture, ExNet, for both investor clustering and modeling. Finally, Viet et al. Baltakys et al. (2021) analyzed the structure of investor networks during a financial crisis and showed changes in investor trading behavior and mutual interactions in the stock market.

In his analysis of the topic entitled “Clusters of Traders in Financial Markets” Mantegna (2020), Mantegna describes a working hypothesis for analysing the dynamics of clusters of investors trading in financial markets, by remarking that the empirical results of Musciotto et al. (2018) are fully consistent with Aoki’s modeling hypothesis in Aoki (2000). Aoki proposed to use the framework of the Ewens’ sampling formula Ewens (1972) for the characterisation of clusters of economic agents having reached a dynamical equilibrium in a specific market.

Using a proprietary dataset derived from traders investing in the Foreign Exchange (FX) market, we set out to contribute to the body of research concerning the clustering of financial market traders by focusing on the clusters’ temporal distributions. For this purpose we constructed sliding window investor networks (Statistically Validated Networks) based on statistically significant trade time synchronisation and showed that the Ewens’ Sampling Distribution is a good fit. Having greater insights into the variations of trader clusters throughout time will likely assist financial institutions as they manage the overall portfolio of risk that builds up from underlying trader positions.

Whilst the literature concerning clustering in portfolio selection is very broad, there is no such analysis on how clusters may be leveraged by prediction with expert advice techniques such as the Aggregating Algorithm (AA). This may be because the theoretical dependency of the AA on the number of experts is mild (under uniform initial distribution, the dependency is logarithmic). This suggests that with a larger number of experts, over time the algorithm will manage to work out if they are needed. In the domain of AA in several financial contexts, several researchers have contributed significantly. Vovk (1998), Vovk (1990) have developed the general problem of prediction with expert advice. Specifically, Vovk and Watkins (1998) proposed a portfolio selection method using prediction with expert advice, considering realistic trading scenarios. V’yugin (2013) constructed a universal trading strategy based on well-calibrated forecasts using prediction with expert advice methods, namely, AdaHedge-type algorithms. Zhang and Yang (2017) considered constant rebalanced portfolios as expert advice and constructed a universal portfolio strategy using the weak aggregating algorithm. Recent papers such as Al-Baghdadi et al. (2020) have demonstrated how the AA can be applied to FX data for improving returns of portfolios of trader risk. However in this practical situation the number of experts present a problem, they overwhelm the advantages of the best expert. Thus we hypothesise that clustering of the trading experts in our dataset should make prediction more reliable and robust by reducing noise.

The organisation of the paper is as follows. The paper is split into two major parts. First, we demonstrate how the cluster distributions of trading activity can be described according to Ewens’ sampling distribution. We also investigate whether these distributions are stationary and depend on the way the clusters are detected. Secondly, we show how clusters of traders over time can be used to improve the profitability of the AA.

2 Clustering of retail traders by their synchronicity

We largely follow the methods developed in Tumminello et al. (2011b), i.e. introduce a behavioural synchronicity measure between traders and then construct a statistically validated network on which an unsupervised clustering method is used.

2.1 Synchronicity between traders

A simple way to infer if two traders have similar trading behaviours is to compare their scaled trading volume (referred to as the imbalance ratio) in a specific time frame. In order to do so, we partition the time line into disjoint intervals $\cup[t,t+\delta t[$ and let $r(i,t)$ be the imbalance ratio of trader i in interval $[t,t+\delta t[$ :

r(i,t)=\frac{b(i,t)-s(i,t)}{b(i,t)+s(i,t)}.

(1)

where $b(.,.)$ and $s(.,.)$ denote the total volumes bought and sold by the trader in a given time frame.

For a given threshold $a$ we define a trader state as follows:

state(i,t)=\begin{cases}\text{buying state},&\text{if}\ r(i,t)>a\\ \text{selling state},&\text{if}\ r(i,t)<$- a$\\ \text{neutral state},&\text{if}\ $- a$\leq r(i,t)\leq a\\ \text{inactive state},&\text{if}\ b(i,t)+s(i,t)=0\end{cases}

(2)

The synchronicity of a pair of traders is measured by counting the co-occurrences in the time series of their states, and attributing a $p$ -value that reflects the statistical significance of this synchronicity assuming pure randomness. The hypergeometric distribution is used to calculate the $p$ -value. The $p$ -value is the probability that in a series of $n$ trades, where one trader was $n_{p}$ times in a state $p$ and the other was $n_{q}$ times in a state $q$ , these occurrences overlapped $n_{p,q}$ times or more:

p(n_{p,q})=1-\sum_{i=0}^{n_{p,q}-1}H(i|n,n_{p},n_{q})

where

H(i|n,n_{p},n_{q})=\frac{\genfrac{(}{)}{0.0pt}{}{n_{p}}{i}\genfrac{(}{)}{0.0pt% }{}{n-n_{p}}{n_{q}-i}}{\genfrac{(}{)}{0.0pt}{}{n}{n_{q}}}

This method is often used to measure similarity in genetic sequences. In literature there exist many other alternative methods; however, the hyper-geometric test can be used with sparse data and it is not sensitive to outliers. Therefore it suits our needs.

To deal with the testing of all pairs of traders and all types of co-occurrences a multiple hypothesis testing correction is needed. For this purpose we use the Bonferroni correction which is the statistical significance (0.05) divided by the number of tests. It is worth pointing out alternatively one could use the false discovery rate for multiple test correction.

2.2 Statistically validated network

A statistically validated network is a network built by validating links between pairs of traders if the $p$ -value of their synchronisation is smaller than the corrected threshold. Traders without any links are dropped. In the resulting network we exclude links between opposite actions (buy-sell), links between neutral states and links between inactive states. The reason being that we are mostly interested in active traders with the same kind of behaviour (buy-buy and sell-sell) and which manifest high trading activity.

3 Clustering distribution

This section will define Ewens’ sampling formula, which is the backbone of Aoki’s modeling hypothesis concerning the dynamics of trading behaviour.

3.1 Partition vector

We will now introduce a partition vector which will be useful for our modelling purposes.

Let $c_{i}$ be the number of clusters with exactly $i$ traders. Then $K_{n}=\sum_{i=1}^{n}c_{i}$ is the total number of clusters formed by $n$ traders and $\sum_{i=1}^{n}ic_{i}=n$ . We will call the vector $c=(c_{1},c_{2},\ldots,c_{n})$ a partition vector.

3.2 The Ewens Sampling distribution

Ewens’ sampling formula describes a specific probability for the partition of the positive integer into parts. It was discovered by Ewens Ewens (1972) as providing the probability of the partition of a sample of $n$ selectively equivalent genes into a number of different gene types (alleles). For positive integers $c_{1},c_{2},...,c_{n}$ , satisfying $\sum_{j}jc_{k}=n$ and being a realisation of a random partition vector $({C}_{1}(n),{C}_{2}(n),\ldots,{C}_{n}(n))$ , we have:

\mathbb{P}_{\theta}({C}_{1}(n)=c_{1},\ldots,{C}_{n}(n)=c_{n})=\frac{n!}{\theta% _{(n)}}\,\prod_{j=1}^{n}\left(\frac{\theta}{j}\right)^{c_{j}}\frac{1}{c_{j}!},

(3)

for $\theta\in(0,\infty)$ , $\theta_{(n)}:=\theta(\theta+1)\cdots(\theta+n-1)=\Gamma(n+\theta)/\Gamma(% \theta),n\geq 1$ and $\theta_{(0)}=1$ .¹¹1We define $\theta_{(-k)}=0$ , for $k\in\mathbb{N}$ .

We are interested in groups of traders with strategies manifesting similar synchronisation, and since the SVN clustering process builds a network with strong links between pairs of traders we conjecture there should be no (or only a small number of) mono-communities. Thus it is natural to fit a distribution conditional on the event: $c_{1}=0$ .

Let us define the conditional distribution $(\tilde{C}_{2}(n),\ldots,\tilde{C}_{n}(n))$ (see da Silva et al. (2020)) in the following way:

\mathcal{L}(\tilde{C}_{2}(n),\ldots,\tilde{C}_{n}(n))=\mathcal{L}(C_{2}(n),% \ldots,C_{n}(n)|C_{1}(n)=0)

(4)

The probability of the condition is given by:

\lambda_{n}(\theta):=\mathbb{P}(C_{1}(n)=0)=\frac{1}{\theta_{(n)}}\sum_{k=1}^{% n}\theta^{k}D(n,k),

(5)

(with $\lambda_{0}(\theta)=1$ and $\lambda_{1}(\theta)=0.$ ), where $D(n,k)$ is the number of derangements of size $n$ having $k$ cycles:

D(n,k):=\sum_{l=0}^{k}(-1)^{l}\genfrac{(}{)}{0.0pt}{}{n}{l}\genfrac{[}{]}{0.0% pt}{}{n-l}{k-l}\enspace;

(6)

here $\genfrac{[}{]}{0.0pt}{}{n}{k}$ is the unsigned Stirling number of the first kind.

Accurate computation of alternating series, present in $D(n,k)$ , is a well-known hard problem therefore it is useful to give the recursive relation of $\lambda_{n}(\theta)$ which verifies:

\lambda_{n+1}(\theta):=\frac{n}{n+\theta}\Bigg{[}\lambda_{n}(\theta)+\frac{% \theta}{n+\theta-1}\lambda_{n-1}(\theta)\Bigg{]}.

(7)

Table LABEL:tab:itemize summarises relevant characteristics of the Ewens’ conditional and non conditional distribution.

\floatconts

tab:itemize $Feature$ Ewens distribution Conditional Ewens distribution Probability $\frac{n!}{\theta_{(n)}}\prod_{j=1}^{n}\left(\frac{\theta}{j}\right)^{c_{j}}% \frac{1}{a_{j}!},$ $\frac{n!}{\theta_{(n)}\lambda_{n}(\theta)}\prod_{j=2}^{n}\left(\frac{\theta}{j% }\right)^{c_{j}}\frac{1}{a_{j}!}.$ Expected cycle counts $\mathbb{E}C_{j}(n)=\frac{n!}{\theta_{(n)}}\frac{\theta_{(n-j)}}{(n-j)!}\,\frac% {\theta}{j},$ $\frac{\lambda_{(n-j)}(\theta)}{\lambda_{(n)}(\theta)}\mathbb{E}C_{j}(n).$ Expected number of cycles $\mathbb{E}K_{n}=\sum_{i=0}^{n-1}\frac{{\theta}}{{\theta}+i},$ $\sum_{j=2}^{n}\frac{\lambda_{(n-j)}(\theta)}{\lambda_{(n)}(\theta)}\mathbb{E}(% C_{j}(n)).$

Table 1: Comparison of Ewens conditional and non-conditional distributions

4 Experiments

In this section we describe the proprietary dataset and the experiments.

4.1 The dataset description

We consider financial data gathered from the client trades of a retail Foreign Exchange (FX) broker. A typical retail broker will provide their clients with an online trading platform software such as MetaTrader 4 (MT4) where they can place trades, monitor positions, track both historic and live movements in prices, and access the latest world economic news. Online trading platforms often operate under the stipulation that once an order is placed (opened), it must be closed in its entirety. Source data is essentially stored in a temporal table with each row representing a client order that provides the opening and closing time, as well as the currency traded (symbol), amount traded and side (buy or sell) of the order.

The proprietary dataset comprises the trades made by over 20k clients during 2015-2017. Each client was allowed to buy or sell any of available currency pairs and they could place trades as many times as they wanted, at any time of day provided they stayed within the confines of their leveraged funds. The dataset contains only necessary features for further investigation namely an investor’s anonymised ID, opening and closing trade times, amount of lots traded, sign (long or short position), and the traded symbol.

4.2 Experimental protocol

It is convenient to use sliding windows in order to track the temporal evolution of clustering. For each in-sample time window, we filtered out traders with less than 100, 500 or 1000 trades (referred to as the cut-off). We observe that the number of traders grows in an approximately linear fashion throughout time which is related directly with the business growth. We focus our investigation on trading activity that occurs during standard business days within the most active hours (6am - 6pm). Investigations are conducted solely considering the EUR/USD currency pair. We construct a sliding window of size 6 months and shift it every 2 weeks. Then we build a SVN network at every step using the imbalance ratio time series for $\delta t$ ranging from 10, 15, 30, 60, 120, 180, 360, and 1440 minutes (referred to as deltas).

4.3 SVN clustering and its descriptive statistics over time

To categorize traders into distinct groups, we used Infomap clustering algorithm Rosvall and Bergstrom (2008) since its popularity can be attributed to its information-theoretic approach, scalability, high quality clusters, flexibility, and statistical significance. According to the study Lancichinetti and Fortunato (2009) the Infomap clustering algorithm empirically gave the best results in Lancichinetti and Fortunato (2009) when applied to different benchmarks on Community Detection methods. Our empirical findings indicate that evolution of the proportion vector (with respect to its normalised version) allure satisfies our conjecture of a sparse number of mono-communities (see Figure 2). From the figures, we notice a smooth evolution of proportions, and also the appearance of new and larger clusters - this is to be expected since the number of traders is growing over time. Moreover we observe a pattern of having less clusters of significant cardinality. An existence of a very big cluster (and many very small ones) would negate the heterogeneity of trading strategies. We observe that Infomap is consistent with the resolution scale and number of trades cut-off. We calculated several pertinent statistics to evaluate how the SVN’s are affected by different time resolutions sampled throughout the lifespan of the entire dataset (i.e. from 2015 - 2017), as illustrated in Figure 2. As previously stated, the number of traders in the dataset increases over time however we notice a sudden increase in the number of links and clusters from July 2016.

This results in an increase in the number of clusters and links in the sliding networks. We remark stability over time in the ratio of numbers of traders against the number of clusters. At each slide an SVN is built and some traders are never taken into consideration and the ratio of existent traders is increasing slightly with increase of the resolution delta. The modularity is slowly decreasing and is low besides deltas of 360 and 1440 minutes, which testifies about rather weak connections between clusters.

4.4 Goodness of fit

In order to assess the goodness of fit to the data we refer to what is conventionally used: a classical $\chi^{2}$ test. The parameter $\theta$ was estimated for every sliding window and since the formula is not explicit for $\mathbb{E}K_{n}$ (see table LABEL:tab:itemize) we approximate it to the closest integer. It is worth noting that for a non conditional Ewens distribution one can readily find an explicit formula for $\theta$ using $\mathbb{E}K_{n}$ .

Taking the example for $\delta$ equal to 10 mins and cut-off of minimum 100 trades we apply the $\chi^{2}$ test for 50 sliding windows at significance of 0.05. We find a $95\%$ pass rate which confirms that most of the time the conditional Ewens distribution is a good fit.

Figure 6 shows that for all studied scenarios in most cases we have a high pass-rate. In general for a cut-off of 100 the pass-rate is above $85\%$ , for others it seems to increase with delta. Figure 4 illustrates a typical comparison between empirical and theoretical fit on a given sliding window which is satisfying. Figure 4 shows the evolution of the Ewens distribution fitted parameter. It is more or less stationary for bigger deltas and increasing for smaller ones. Larger estimated parameter $\hat{\theta}$ indicates a higher so-called mutation rate, therefore the existence of more clusters.

4.5 Temporal cluster evolution and consistent grou** identification issue

In some cases we require consistent grou** identification and the main difficulty comes from the lack of consistent naming of clusters for subsequent time frames. The latter allows us to, amongst other things, produce meaningful visualisations. The technique used relies on a total consistency measure which is in close relation to the Jaccard index (for more details see Liechti and Bonhoeffer (2020)).

In Figure 6 we see a so-called alluvial plot where at a given time, traders belonging to the same group are stacked together to form a continuous flow. The stability of group composition is shown when the same colouring persists between two time steps. However a group can split, merge, die out, appear suddenly or persist throughout time. These changes in groups are to be expected as traders’ investment strategies evolve over time, and existing traders leave and new traders join. Overall we remark some stability, however as expected eventually there are die outs, merges, splits and new appearances. When we considered different deltas (results not shown), we found that larger groups were more prevalent for smaller time frames.

5 Clusterised Aggregating Algorithm

We wish to study the temporal evolution of clusters of trading activity and investigate how they can be used for practical purposes. Clustering evolution could be used in prediction problems since grou** has the advantage of simplifying the description of the system state by reducing the dimensionality of the prediction problem. In the literature there are numerous examples of the latter set in a financial context. For example, in Challet et al. (2018) the authors used SVN’s to demonstrate improvement in predicting both the sign of the order flow and the direction of the average transaction price for a retail trader dataset. In this study we have applied the clustering evolution to prediction with an online expert advice model, namely the Aggregating Algorithm (AA) Vovk (1990) and Vovk (1998). The AA is given a series of online predictions from a pool of experts (in our case the traders). At each time epoch, the loss of each experts’ prediction (in our case a trader’s investment decision) is fed back into the AA and over time adjusts its trust in each expert to make future predictions. In the next subsections we introduce the framework of the AA and the games of investment with expert advice.

5.1 Aggregating Algorithm

Suppose that the learner $L$ is tasked with predicting elements of a sequence $\omega_{1},\omega_{2},\ldots$ called outcomes. The outcomes occur in discrete time. Before seeing outcome $\omega_{t}$ , the learner is outputting a prediction $\gamma_{t}$ . The quality of the prediction is measured by a loss function $\lambda(.,.)$ . The expert aims to suffer low cumulative loss:

\mathop{\mathrm{Loss}}\nolimits_{T}(L)=\sum_{t=1}^{T}\lambda(\omega_{t},\gamma% _{t})

We assume that the set of all possible outcomes (outcome space) $\Omega$ is known to us in advance and we are allowed to draw predictions from a known prediction space $\Gamma$ , which may or may not be the same as $\Omega$ . The function $\lambda$ is also known and maps $\Gamma\times\Omega$ to a subset of the extended real line, typically $[0,+\infty]$ . The choice of a triple $G=\langle\Omega,\Gamma,\lambda\rangle$ , is referred to as a game.

Suppose that the learner gets help from experts. The experts predict the same sequence and their predictions are made available to the learner before it commits to its own predictions. We are not concerned with their internal mechanics, which may well be inaccessible to us (e.g., the experts may rely on some sources of information unavailable or even unknown to us). The interaction with experts may be described by the following protocol. Here we assume that experts are parameterised by $\theta\in\Theta$ .

Expert $E_{\theta}$ suffers loss $\mathop{\mathrm{Loss}}\nolimits_{T}(E_{\theta})=\sum_{t=1}^{T}\lambda(\gamma^{% \theta}_{t},\omega_{t})$ . The goal of the learner is to merge experts’ predictions $\gamma^{\theta}_{t}$ into its own prediction $\gamma_{t}$ in such a way that the learner’s loss $\mathop{\mathrm{Loss}}\nolimits_{T}(L)$ is low as compared to retrospectively best experts. It may use information about past outcomes and predictions. Formally, we are seeking a merging strategy:

S:(\Gamma^{\Theta}\times\Omega)^{*}\times\Gamma^{\Theta}\rightarrow\Gamma

We typically want S to guarantee an upper bound on $\mathop{\mathrm{Loss}}\nolimits_{T}(L)$ in terms of $\inf\limits_{\underset{\theta\in\Theta}{}}\mathop{\mathrm{Loss}}\nolimits_{T}(% E_{\theta})$ ; we want $\mathop{\mathrm{Loss}}\nolimits_{T}(L)$ to be low whenever $\mathop{\mathrm{Loss}}\nolimits_{T}(E_{\theta})$ is low for some $\theta$ . We assume that the pool of experts is finite, i.e., $|\Theta|=n<+\infty$ .

Consider a game $G=\langle\Omega,\Gamma,\lambda\rangle$ a constant $C>0$ is admissible for a learning rate $\eta>0$ if for every $N=1,2,\ldots,$ every set of predictions $\gamma_{1},\ldots,\gamma_{n}\in\Gamma$ , and every distribution $(p_{1},p_{2},\ldots,p_{n})\in\Delta_{n-1}$ , there is $\gamma\in\Gamma$ ensuring for all outcomes $\omega\in\Omega$ the inequality:

\lambda(\gamma,\omega)\leq\frac{C}{\eta}\ln\sum_{i=1}^{N}p_{i}e^{-\eta\lambda(% \gamma,\omega)}

The mixability constant $C_{\eta}$ is the infimum of all $C>0$ admissible for $\eta$ . This infimum is usually achieved. The admissibility is required to ensure the learner’s predictions exist and belong to $\Gamma$ since for example the learner’s prediction of the form $\gamma_{t}=\sum_{i=1}^{N}p_{i}\gamma^{i}_{t}$ is a linear combination and $\Gamma$ may not be convex. The AA takes as parameters a set of prior experts’ weights $(q_{1},\ldots,q_{N})\in\Delta_{N-1}$ , a learning rate $\eta>0$ and an admissible $C>0$ . The algorithm works as shown in the pseudocode below.

Input:

\eta,C,q,N

1 initialization of weights

\omega_{0}^{i}\sim q_{i}

for

i=1,\ldots,N

2 choice of loss

\lambda(.,.)

3 for $t=1,2,\dots$ do

4 read experts’ predictions

\gamma_{t}^{i}

5 normalise the weights

p_{t}^{i}=\frac{\omega_{t-1}^{i}}{\sum_{j}\omega_{t-1}^{j}}

6 output

\gamma_{t}\in\Gamma

satisfying for all

\omega\in\Omega

\lambda(\gamma,\omega)\leq\frac{C}{\eta}\ln\sum_{i=1}^{N}p_{i}e^{-\eta\lambda(% \gamma,\omega)}

7 observe outcome

\omega_{t}

8 update the weights

\omega_{t}^{i}=\omega_{t-1}^{i}\cdot e^{-\eta\cdot\lambda(\gamma_{t}^{i},% \omega_{t})}

10 end for

Algorithm 1 Aggregating Algorithm

The validity of the AA holds under some mild regularity assumptions on the game and assuming the uniform initial distribution, it can be shown (as in Equation 8) that the constants in the following inequality are optimal:

\mathop{\mathrm{Loss}}\nolimits_{T}(L)\leq C\mathop{\mathrm{Loss}}\nolimits_{T% }(E_{i})+\frac{C}{\eta}\ln N

(8)

5.2 Long Short Game

The problem of portfolio selection is a natural special case of a prediction with expert advice problem where in Vovk and Watkins (1998) considered realistic trading scenarios i.e. the Long Short game.

The Long-Short game aims to represent a realistic trading scenario. A trader is allowed to open positions, both long and short, within certain limits based on their deposit and money they had earned previously. The limits aim to minimise the chance of bankruptcy. Given the wealth $W_{t-1}$ at time $t-1$ trader $i$ opens a position of size $W_{t-1}\gamma^{i}_{t}$ when the return $\omega_{t}$ is known, the trader’s wealth changes accordingly:

W_{t}=W_{t-1}\cdot\lambda(\gamma_{t}^{i},\omega_{t})=W_{t-1}\cdot(1+\gamma_{t}% \cdot\omega_{t})

In this framework one can apply the AA with $\eta=1,C=1$ and the substitution rule given by $\gamma_{t}=\sum_{i=1}^{N}p_{i}\gamma^{i}_{t}$ to the general long-short game. If $1+\gamma_{t}\cdot\omega_{t}>0$ for $t=1,\ldots,T$ i.e., the learner does not get bankrupt along the way, the bound (8) will hold.

5.3 AA with Slee** Experts

In Al-Baghdadi et al. (2020), an evaluation of the performance of the AA was made using a real-life trading dataset. Some modifications of the AA were proposed in order to improve the practical performance of the resulting portfolio. In particular, a downside loss and weighted average between the latter and the long short loss were introduced. Downside loss, in contrast to long short loss (originally used in Vovk and Watkins (1998)), penalises financial losses but does not reward gains since a strategy not to lose money may be more important than the ability to earn money.

	$\displaystyle\lambda_{\mathrm{Long~{}Short~{}Loss}}(\rho,\gamma,r)$	$\displaystyle=-\log[\max(1+\rho\cdot\gamma\cdot r,0)]$		(9)
	$\displaystyle\lambda_{\mathrm{Downside~{}Loss}}(\rho,\gamma,r)$	$\displaystyle=-\log\{\max[1+\rho\cdot\min(\gamma\cdot r,0),0]\}$		(9)

where:

$\rho$ ${}-{}$	scaling factor
$\gamma$ ${}-{}$	investment decision $\in$ [-1, 1]
$r$ ${}-{}$	return

In our research we faced one particular challenge with our dataset: the pool of traders constantly changes through time. For example, traders may choose to cease trading with the broker at any time, they may take breaks from trading, new ones may join, or traders may close their account entirely. The AA requires such experts to continually provide predictions through time - a natural way to encode such activities is to use the so-called “slee**” experts extension.

Input:

\eta,\rho,n

1 Initialization of weigths

\omega_{0}^{i}=1

for

i=1,\ldots,n

2 Choice of loss

\lambda(\gamma,r)

3 for $t=1,2,\dots$ do

4 Get set of awake experts

A_{t}

and slee** experts

S_{t}

5 Get set of awake experts

A_{t}

and slee** experts

S_{t}

6 Read investment of awake experts

\gamma_{t}^{i}

for

i\in A_{t}

7 Normalise the weights of awake experts

p_{t}^{i}=\frac{\omega_{t-1}^{i}}{\sum_{j:A_{t}}\omega_{t-1}^{j}}

8 Calculate investment prediction

\gamma_{t}={\sum_{j:A_{t}}p_{t}^{j}\cdot\gamma_{t-1}^{j}}

9 Observe return

r_{t}

10 Update for

i\in A_{t}

the weights

\omega_{t}^{i}=\omega_{t-1}^{i}\cdot\exp[-\eta\cdot\lambda(\gamma_{t}^{i},r_{t% })]

11 Update for

i\in S_{t}

the weights

\omega_{t}^{i}=\omega_{t-1}^{i}\cdot\exp[-\eta\cdot\lambda(\gamma_{t},r_{t})]

13 end for

Algorithm 2 Aggregating Algorithm With Slee** Experts

5.4 Clusterised Aggregating Algorithm (CAA) and decision rules

The classical AA learner prediction is:

\displaystyle{\gamma_{t}}=\sum_{k}{{p^{k}_{t}\gamma}^{k}_{t-1}}

(10)

Which is a weighted average of experts’ predictions. For clusterised aggregating algorithm (CAA) we introduced two decision rules:

	$\displaystyle{\gamma_{t}}^{\mathrm{MEAN}}$	$\displaystyle=\sum_{i}\sum_{j}^{n_{i}}p^{i,j}_{t}\cdot\sum_{k}^{n_{i}}\frac{{% \gamma}^{i,k}_{t-1}}{n_{i}}$	take the mean of experts’ predictions in a cluster
	$\displaystyle{\gamma_{t}}^{\mathrm{PEN}}$	$\displaystyle=\sum_{i}\sum_{j}^{n_{i}}p^{i,j}_{t}\frac{{\gamma}^{i,j}_{t-1}}{n% _{i}}$	penalise by dividing by the cardinality of a cluster

where $n_{i}$ is the cardinality of $i$ -th cluster and $p^{i}$ is the sum of probabilities of $i$ -th cluster.

The decision rule of ${\gamma_{t}}^{\mathrm{MEAN}}$ is interesting in a trivial case scenario i.e. having the same duplicated experts in every cluster. Let’s suppose that we have $m$ identical experts in the pool. It appears desirable to collate them into one. However, this is done by the AA automatically. The behaviour of the AA would be the same as if one expert with the combined weight is present in the pool. Assuming the uniform distribution on the initial experts, the weight of the combined expert will be $m/N$ and the loss bound for the duplicated experts $E_{i}$ (again assuming the mixable case $C=1$ ) turns into:

\mathop{\mathrm{Loss}}\nolimits_{T}(L)\leq\mathop{\mathrm{Loss}}\nolimits_{T}(% E_{i})+\frac{1}{\eta}\ln\frac{N}{m}

However, if duplicate experts are bad, this creates a problem: needlessly increasing $n$ worsens the bound for good experts. For example, if there were two clusters, with each having different duplicated experts and the bigger cluster had better-performing experts then the AA bound would be improved.

The second decision rule i.e. ${\gamma_{t}}^{\mathrm{PEN}}$ has an interpretation of partially awake experts if the penalising factor is normalised i.e. $\frac{\frac{1}{n_{i}}}{\sum_{k\in\mathrm{Clusters}}\frac{1}{n_{k}}}$ . This idea was generalised in V’yugin and Trunov (2022). Apart from a prediction $\gamma_{t}$ such an expert produces a confidence value $c_{t}\in[0,1]$ , which quantifies its confidence (a fully slee** expert would output confidence of 0 and a fully awake expert would output a confidence of 1). Here the confidence would be inverse proportional to the cardinality of the cluster. This is similar to inverse-variance weighting in portfolio selection problems in particular the equal risk contributions portfolio Maillard et al. (2010).

5.5 Experts as Clusters approach to AA (ECAA)

Up until now we only clusterised via the decision rules, and the experts were identified as the traders. It seems natural to consider treating clusters of traders as meta-experts. We averaged experts’ investement decisions per cluster in order to obtain the meta-experts’ predictions. In appendix 11, we derive a condition to which these extensions to the AA would outperform the original set up of the AA with duplicated experts. In practice we identified the flow of meta-experts according to the alluvial plot (see Figure 6). There are several things to consider in this scenario especially the splitting and merging of clusters on every epoch. We suggested the following approach:

•

If the cluster is split then the children would inherit the parents weight divided by number of splits.
•

If clusters are merged then the resultant weight is the sum of the parents weights.

5.6 Experiments

First we applied a data staging technique known as DAPRA (see Al-baghdadi et al. (2019)) which, when applied to data streams pertaining to trades and prices, allows one to sample the data at regular time intervals (required for this study). We then compared the performance of the AA with its clusterised counterparts (CAA and ECAA) with the expectation that these extensions would improve scalability and reduce noise. The CAA extension simply takes the mean of investments of awake experts $\gamma$ in a given cluster (MEAN), or divides their decision by the cardinality of the cluster (PEN). As a benchmark we used the equally weighted portfolio strategy. We compared the CAA and the ECAA using the SVN-infomap approach with hierarchical clustering based on correlations of the traders’ net positions (i.e. difference between total open long (buy) and open short (sell) positions in USD dollars) with a chosen distance metric: $1-|\mathrm{correlation}|$ . The latter approach has a possibility of adjusting the construction of clusters by changing the dissimilarity threshold. The rationale behind clustering based on net position correlation is that it is a desirable feature for the broker since it is a measure of risk. The SVN approach is focused on trading synchronicity therefore we have less control on the quality of clustering in regards to the net position. Ideally all traders would trade all the time or have a high trading intersection period but since it is not the case one can end up with “noisy” clusters.

\floatconts

Tbl:summary

Table 2: Table summarising the experimental results for CAA.

Strategy Type Scaling factor Return Sharpe Ratio Max Drawdown Calmar Ratio EW Benchmark - 1.4% 0.6 1.2% 1.2 AA Slee** Experts 70 2.8% 1.1 1.8% 1.5 CAA MEAN/ SVN 70 3% 1.2 1.85% 1.8 CAA MEAN/Hierarchical 70 4.8% 2 1.15% 4 CAA PEN/SVN 70 2.5% 1.35 0.9% 2.5 CAA PEN/Hierarchical 70 2.5% 1.4 0.9% 2.6 ECAA Hierarchical 80 200 1% 1.65 0.3% 3.5 ECAA SVN 1 0.5% 0.4 0.8% 0.6

We obtained optimistic results - especially for the downside loss (see 9) which is more appropriate in this framework. We evaluated the performance using four well established portfolio risk measures: the return of the portfolio, sharpe ratio is the amount of return an investor receives per unit of risk, the maximum drawdown is the maximum observed loss from a peak to a trough of a portfolio, before a new peak is attained and calmar ratio measures the risk-adjusted performance of a portfolio by comparing the return to the maximum drawdown.

The distribution of traders’ returns is close to symmetric and the mean is approximately zero. Performances of CAA are on the whole comparable with those of the MEAN clustering decision rule for the clusters constructed with the SVN - infomap method. However the results using the hierarchical clustering are significantly better across all risk measures. The best performing cutoff for the distance metric is around $70\%$ . On the other hand, the results for the PEN clustering decision rule are comparable for the return on investment but for other metrics we noticed significantly better results for both clustering techniques. Figures LABEL:fig:outsamplepen and LABEL:fig:outsamplemean show the comparison among all results for a return scaling factor up to 400.

Refer to caption — Figure 7: Comparison of results among all four considered measures of risk in the out of sample scenario where the CAA learner prediction is the experts predictions divided by the cardinality of each cluster. The return to maximum drawdown ratio, sharpe ratio, 1 +return and maximum drawdowm are shown for different return scaling factors. The green,blue and pink dotted line denote the equal weights portfolio, AA and CAA for SVN- infomap performances. Other curves represent CAA using clusters done with hierarchical clustering with different thresholds.

For ECAA we consider the scenario of treating clusters as meta-experts. Using the alluvial chart we can readily identify the flow of clusters over time since without it we could not identify clusters at different time epochs since they are unlabeled. Overall performance of the ECAA using SVN-infomap clusters is poor, manifesting lowest return, Sharpe Ratio and Calma Ratio . However for hierarchical clustering all other risk measures are significantly better than the standard AA besides the return (see Figure LABEL:fig:outsampleaasquared). Moreover, ECAA has smoother PnL as seen by much smaller drawdown than CAA, AA and the banchmark.

Table LABEL:Tbl:summary summarises the experimental results for near optimal variations of all algorithms. Figures 11 and 11 show their evolution of returns and drawdowns throughout time. It is worth mentioning that when the scaling factor gets bigger (larger than $100$ ) more and more traders go bankrupt because of the nature of the loss (9). Moreover, the algorithm could suddenly stop investing when the scaling factor gets too big therefore one must be cautious when interpreting the results.

6 Conclusion

In this paper our findings confirm that clustering of traders’ investments can be described by Ewens distribution. The temporal clustering distribution depends on many parameters and market conditions however its clustering could be leveraged to make better investment decisions. We adjusted the aggregating algorithm with slee** experts to test the latter hypothesis using two clustering techniques, namely SVN-infomap and hierarchical clustering. In this framework the latter approach gives better results and gives more meaningful clusters since is based on correlations of the investors’ net positions and not on their trading synchronicity. In particular we compared CAA (used aggregated traders’ decisions per cluster to calculate the investment prediction) and ECAA (clusters played the role of experts) with AA and the equally weighted portfolio strategy. Our introduced modifications to the AA indicate clear performance benefits in our experimental results in terms of four well established portfolio risk measures: return, Sharpe ratio, maximal drawdown and Calmar ratio.

\acks

The authors acknowledge the support of Algorithmic Laboratories Ltd (AlgoLabs) and their their parent company Equiti Group in establishing and develo** this research. Special thanks go to Xudong Li, Tzyy Tong and Samuel Manoharan for setting up the servers necessary to run our experiments. Further thanks go to Simon Tavaré for useful insights.

References

Al-baghdadi et al. (2019) Najim Al-baghdadi, Wojciech Wisniewski, Yuri Kalnishkan, Christopher Watkins, Siân Lindsay, and David Lindsay. Structuring time series data to gain insight into agent behaviour. 12 2019. 10.1109/BigData47090.2019.9006346.
Al-Baghdadi et al. (2020) Najim Al-Baghdadi, David Lindsay, Yuri Kalnishkan, and Sian Lindsay. Practical investment with the long-short game. In Proceedings of the Ninth Symposium on Conformal and Probabilistic Prediction and Applications, volume 128 of Proceedings of Machine Learning Research, pages 209–228, Verona, Italy, 09–11 Sep 2020. PMLR. URL http://proceedings.mlr.press/v128/al-baghdadi20a.html.
Aoki (2000) Masanao Aoki. Cluster size distributions of economic agents of many types in a market. Journal of Mathematical Analysis and Applications, 249:32–52, 09 2000. 10.1006/jmaa.2000.6935.
Baltakiene et al. (2019) Margarita Baltakiene, Kestutis Baltakys, Juho Kanniainen, Dino Pedreschi, and Fabrizio Lillo. Clusters of investors around initial public offering. Palgrave Communications, 5, 12 2019. 10.1057/s41599-019-0342-6.
Baltakys et al. (2021) Kestutis Baltakys, Hung Le Viet, and Juho Kanniainen. Structure of investor networks and financial crises. Entropy, 23(4), 2021. ISSN 1099-4300. 10.3390/e23040381.
Barreau et al. (2020) Baptiste Barreau, Laurent Carlier, and Damien Challet. Deep prediction of investor interest: A supervised clustering approach. Algorithmic Finance, pages 1–13, 06 2020. 10.3233/AF-200296.
Bohlin and Rosvall (2014) Ludvig Bohlin and Martin Rosvall. Stock portfolio structure of individual investors infers future trading behavior. PLoS ONE, 9(7), Jul 2014. ISSN $1932-6203$ . $10.1371/journal.pone.0103006$ . URL $http://dx.doi.org/10.1371/journal.pone.0103006$.
Challet et al. (2018) Damien Challet, Rémy Chicheportiche, Mehdi Lallouache, and Serge Kassibrakis. Statistically validated leadlag networks and inventory prediction in the foreign exchange market. Advances in Complex Systems, December 2018. 10.1142/S0219525918500194.
Cordi et al. (2020) Marcus Cordi, Damien Challet, and Serge Kassibrakis. The market nanostructure origin of asset price time reversal asymmetry, 2020.
da Silva et al. (2020) Poly H. da Silva, Arash Jamshidpey, and Simon Tavaré. Random derangements and the ewens sampling formula, 2020.
Ewens (1972) W.J. Ewens. The sampling theory of selectively neutral alleles. Theoretical Population Biology, 3(1):87 – 112, 1972. ISSN 0040-5809. https://doi.org/10.1016/0040-5809(72)90035-4.
Gutiérrez-Roig et al. (2019) Mario Gutiérrez-Roig, Javier Borge-Holthoefer, Alex Arenas, and Josep Perellȯ. Map** individual behavior in financial markets: synchronization and anticipation. EPJ Data Science, 8, 03 2019. 10.1140/epjds/s13688-019-0188-6.
Lancichinetti and Fortunato (2009) Andrea Lancichinetti and Santo Fortunato. Community detection algorithms: A comparative analysis. Phys. Rev. E, 80:056117, Nov 2009. 10.1103/PhysRevE.80.056117.
Liechti and Bonhoeffer (2020) Jonas I. Liechti and Sebastian Bonhoeffer. A time resolved clustering method revealing longterm structures and their short-term internal dynamics, 2020.
Maillard et al. (2010) Sébastien Maillard, Thierry Roncalli, and Jérôme Teïletche. The properties of equally weighted risk contribution portfolios. The Journal of Portfolio Management, 36(4):60–70, 2010. ISSN 0095-4918. 10.3905/jpm.2010.36.4.060.
Mantegna (2020) Rosario N. Mantegna. Clusters of Traders in Financial Markets, pages 203–212. Springer Singapore, 2020. ISBN 978-981-15-4806-2. $10.1007/978-981-15-4806-2_{1}0$ .
Musciotto et al. (2016) Federico Musciotto, Luca Marotta, Salvatore Micciche, Jyrki Piilo, and Rosario N. Mantegna. Patterns of trading profiles at the nordic stock exchange. a correlation-based approach. Chaos, Solitons & Fractals, 88:267 – 278, 2016. ISSN 0960-0779. https://doi.org/10.1016/j.chaos.2016.02.027.
Musciotto et al. (2018) Federico Musciotto, Luca Marotta, Jyrki Piilo, and Rosario Mantegna. Long-term ecology of investors in a financial market. Palgrave Communications, 4:92, 07 2018. 10.1057/s41599-018-0145-1.
Rosvall and Bergstrom (2008) M. Rosvall and C. T. Bergstrom. Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences, 105(4):1118–1123, Jan 2008. ISSN 1091-6490. 10.1073/pnas.0706851105.
Sueshige et al. (2018) Takumi Sueshige, Kiyoshi Kanazawa, Hideki Takayasu, and Misako Takayasu. Ecology of trading strategies in a forex market for limit and market orders. PLOS ONE, 13(12):1–14, 12 2018. 10.1371/journal.pone.0208332.
Tumminello et al. (2011a) Michele Tumminello, Fabrizio Lillo, Jyrki Piilo, and Rosario Mantegna. Identification of clusters of investors from their real trading activity in a financial market. New Journal of Physics, 14, 07 2011a. 10.2139/ssrn.1890584.
Tumminello et al. (2011b) Michele Tumminello, Salvatore Micciche, Fabrizio Lillo, Jyrki Piilo, and Rosario Mantegna. Statistically validated networks in bipartite complex systems. PloS one, 6:e17994, 03 2011b. 10.1371/journal.pone.0017994.
Vovk (1998) V Vovk. A game of prediction with expert advice. Journal of Computer and System Sciences, 56(2):153–173, 1998. ISSN 0022-0000. https://doi.org/10.1006/jcss.1997.1556.
Vovk and Watkins (1998) V. Vovk and C. Watkins. Universal portfolio selection. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, COLT’ 98, page 12–23, New York, NY, USA, 1998. Association for Computing Machinery. ISBN 1581130570. 10.1145/279943.279947.
Vovk (1990) V. G. Vovk. Aggregating strategies. Proc. of Computational Learning Theory, 1990, 1990. URL https://ci.nii.ac.jp/naid/10021342782/en/.
V’yugin (2013) Vladimir V’yugin. Universal algorithm for trading in stock market based on the method of calibration. In Sanjay Jain, Rémi Munos, Frank Stephan, and Thomas Zeugmann, editors, Algorithmic Learning Theory, pages 53–67, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg. ISBN 978-3-642-40935-6.
V’yugin and Trunov (2022) Vladimir V’yugin and Vladimir Trunov. Online aggregation of probability forecasts with confidence. Pattern Recognition, 121(C), jan 2022. ISSN 0031-3203. 10.1016/j.patcog.2021.108193. URL https://doi.org/10.1016/j.patcog.2021.108193.
Zhang and Yang (2017) Yong Zhang and Xingyu Yang. Online portfolio selection strategy based on combining experts’ advice. Comput. Econ., 50(1):141–159, jun 2017. ISSN 0927-7099. 10.1007/s10614-016-9585-0. URL https://doi.org/10.1007/s10614-016-9585-0.

Appendix: Clusterised AA bound

In this section, we will discuss when it is beneficial to run AA on (equally weighted) cluster experts rather than the original experts and connect this with our intuition about the performance of traders. The analysis will be done on an artificial example but the conclusion is instructive.

Suppose that we have $m$ identical experts in a pool of $N$ . One may want to collate them into one; there is no need though as this is done by the AA automatically. The behaviour of the AA would be the same as if one expert with the combined weight is present in the pool. Assuming the uniform distribution on $N$ original experts, the weight of the combined expert will be $m/N$ and the loss bound for the duplicated experts $E_{i}$ (assuming the mixable case $C=1$ ) turns into

\mathop{\mathrm{Loss}}\nolimits_{T}(L)\leq\mathop{\mathrm{Loss}}\nolimits_{T}(% E_{i})+\frac{1}{\eta}\ln\frac{N}{m}.

This is a stronger bound and if the performance of the expert is actually good, it leads to lower $\mathop{\mathrm{Loss}}\nolimits_{T}(L)$ . However, if duplicate experts perform badly, they create a problem: increasing $N$ worsens the bound for good experts.

Suppose that we have $M$ clusters of experts of cardinalities $c_{1},..,c_{M}$ . Let all experts in each cluster be identical and suffer the same cumulative loss. Applying AA to cluster meta experts (with equal initial weights) will give us the loss bound $U_{-}$ and applying AA to the original experts will give us the loss bound $U_{*}$ :

	$\displaystyle U_{-}$	$\displaystyle=$	$\displaystyle\min_{i=1,2,\ldots,M}\Big{\{}\mathop{\mathrm{Loss}}\nolimits_{T}(% E_{C_{i}})+\frac{1}{\eta}\ln M\Big{\}}=\mathop{\mathrm{Loss}}\nolimits_{T}(E_{% *})+\frac{1}{\eta}\ln M,$
	$\displaystyle{U_{*}}$	$\displaystyle=$	$\displaystyle\min_{i=1,2,\ldots,M}\Big{\{}\mathop{\mathrm{Loss}}\nolimits_{T}(% E_{C_{i}})+\frac{1}{\eta}\ln\frac{N}{c_{i}}\Big{\}}=\mathop{\mathrm{Loss}}% \nolimits_{T}(E_{C_{i_{0}}})+\frac{1}{\eta}\ln\frac{N}{c_{i_{0}}},$

where $E_{C_{i}}$ is an expert from cluster $i$ , $E_{*}$ is the best expert overall, and $i_{0}$ is the number of the cluster where the minimum in $U_{*}$ is achieved.

We get that

\hskip 88.89178ptU_{-}\leq U_{*}\iff c_{i_{0}}\leq\frac{N}{M}e^{\eta[\mathop{% \mathrm{Loss}}\nolimits_{T}(E_{C_{i_{0}}})-\mathop{\mathrm{Loss}}\nolimits_{T}% (E_{*})]},

(11)

where $\mathop{\mathrm{Loss}}\nolimits_{T}(E_{C_{i_{0}}})-\mathop{\mathrm{Loss}}% \nolimits_{T}(E_{*})\geq 0$ . This means that the bound with cluster meta experts is better when there are no good experts in large clusters.

As the practice of trading shows, good trades are usually few and make a minority, which is one of the justification for the cluster AA. Cluster AA gives an advantage to smaller clusters.