\theorembodyfont\theoremheaderfont\theorempostheader

: \theoremsep

[Uncaptioned image]

Temporal distribution of clusters of investors and their application in prediction with expert advice

\NameWojciech Wisniewski \Email[email protected]
\NameYuri Kalnishkan \Email[email protected]
\addrDepartment of Computer Science
Royal Holloway
   University of London   
Egham
   United Kingdom    \NameDavid Lindsay \Email[email protected]
\NameSiân Lindsay \Email[email protected]
\addrAlgoLabs
Bracknell
   United Kingdom
Abstract

Financial organisations such as brokers face a significant challenge in servicing the investment needs of thousands of their traders worldwide. This task is further compounded since individual traders will have their own risk appetite and investment goals. Traders may look to capture short-term trends in the market which last only seconds to minutes, or they may have longer-term views which last several days to months. To reduce the complexity of this task, client trades can be clustered. By examining such clusters, we would likely observe many traders following common patterns of investment, but how do these patterns vary through time? Knowledge regarding the temporal distributions of such clusters may help financial institutions manage the overall portfolio of risk that accumulates from underlying trader positions. This study contributes to the field by demonstrating that the distribution of clusters derived from the real-world trades of 20k Foreign Exchange (FX) traders (from 2015 to 2017) is described in accordance with Ewens’ Sampling Distribution. Further, we show that the Aggregating Algorithm (AA), an on-line prediction with expert advice algorithm, can be applied to the aforementioned real-world data in order to improve the returns of portfolios of trader risk. However we found that the AA ’struggles’ when presented with too many trader “experts”, especially when there are many trades with similar overall patterns. To help overcome this challenge, we have applied and compared the use of Statistically Validated Networks (SVN) with a hierarchical clustering approach on a subset of the data, demonstrating that both approaches can be used to significantly improve results of the AA in terms of profitability and smoothness of returns.

keywords:
statistically validated networks, Ewens sampling distribution, foreign exchange, behavioural finance, clusters of investors, aggregating algorithm

1 Introduction

In recent years, published research has highlighted a growing interest in methods to cluster together traders based on their strategic and behavioural features, as well as studying how they influence each other. We summarise the key contributions in this area since 2012. Since 2012, various studies have investigated clustering of investors in financial markets. Tumminello et al. (2011a) used Statistically Validated Networks (SVN) and the Infomap method to detect clusters of investors with similar trading decisions. Similarly, Bohlin and Rosvall (2014) identified clusters by studying the relationship between portfolio and trading decisions of Swedish investors. Musciotto et al. (2016) used hierarchical clustering and SVN to detect clusters of investors with similar trading decisions. Challet et al. (2018) extended the SVN methodology and inferred a lead-lag network of clusters of investors trading in FX, which showed that most of the trading activity has a market endogenous origin. Sueshige et al. (2018) detected and clustered traders with respect to limit-order and market-order strategies, and classified trading strategies based on their response pattern to historical price changes. Gutiérrez-Roig et al. (2019) used mutual information and transfer entropy to identify a network of synchronization and anticipation relationships between financial traders. Baltakiene et al. (2019) constructed multilink networks covering 2 years after IPOs and obtained clusters of investors characterized by synchronization in the timing of trading decisions. Cordi et al. (2020) proposed a method to detect lead-lag networks between the states of traders determined at different timescales and observed that institutional and retail traders have different causality structures of lead-lag networks. Barreau et al. (2020) proposed a deep learning architecture, ExNet, for both investor clustering and modeling. Finally, Viet et al. Baltakys et al. (2021) analyzed the structure of investor networks during a financial crisis and showed changes in investor trading behavior and mutual interactions in the stock market.

In his analysis of the topic entitled “Clusters of Traders in Financial Markets” Mantegna (2020), Mantegna describes a working hypothesis for analysing the dynamics of clusters of investors trading in financial markets, by remarking that the empirical results of Musciotto et al. (2018) are fully consistent with Aoki’s modeling hypothesis in Aoki (2000). Aoki proposed to use the framework of the Ewens’ sampling formula Ewens (1972) for the characterisation of clusters of economic agents having reached a dynamical equilibrium in a specific market.

Using a proprietary dataset derived from traders investing in the Foreign Exchange (FX) market, we set out to contribute to the body of research concerning the clustering of financial market traders by focusing on the clusters’ temporal distributions. For this purpose we constructed sliding window investor networks (Statistically Validated Networks) based on statistically significant trade time synchronisation and showed that the Ewens’ Sampling Distribution is a good fit. Having greater insights into the variations of trader clusters throughout time will likely assist financial institutions as they manage the overall portfolio of risk that builds up from underlying trader positions.

Whilst the literature concerning clustering in portfolio selection is very broad, there is no such analysis on how clusters may be leveraged by prediction with expert advice techniques such as the Aggregating Algorithm (AA). This may be because the theoretical dependency of the AA on the number of experts is mild (under uniform initial distribution, the dependency is logarithmic). This suggests that with a larger number of experts, over time the algorithm will manage to work out if they are needed. In the domain of AA in several financial contexts, several researchers have contributed significantly. Vovk (1998), Vovk (1990) have developed the general problem of prediction with expert advice. Specifically, Vovk and Watkins (1998) proposed a portfolio selection method using prediction with expert advice, considering realistic trading scenarios. V’yugin (2013) constructed a universal trading strategy based on well-calibrated forecasts using prediction with expert advice methods, namely, AdaHedge-type algorithms. Zhang and Yang (2017) considered constant rebalanced portfolios as expert advice and constructed a universal portfolio strategy using the weak aggregating algorithm. Recent papers such as Al-Baghdadi et al. (2020) have demonstrated how the AA can be applied to FX data for improving returns of portfolios of trader risk. However in this practical situation the number of experts present a problem, they overwhelm the advantages of the best expert. Thus we hypothesise that clustering of the trading experts in our dataset should make prediction more reliable and robust by reducing noise.

The organisation of the paper is as follows. The paper is split into two major parts. First, we demonstrate how the cluster distributions of trading activity can be described according to Ewens’ sampling distribution. We also investigate whether these distributions are stationary and depend on the way the clusters are detected. Secondly, we show how clusters of traders over time can be used to improve the profitability of the AA.

2 Clustering of retail traders by their synchronicity

We largely follow the methods developed in Tumminello et al. (2011b), i.e. introduce a behavioural synchronicity measure between traders and then construct a statistically validated network on which an unsupervised clustering method is used.

2.1 Synchronicity between traders

A simple way to infer if two traders have similar trading behaviours is to compare their scaled trading volume (referred to as the imbalance ratio) in a specific time frame. In order to do so, we partition the time line into disjoint intervals [t,t+δt[\cup[t,t+\delta t[∪ [ italic_t , italic_t + italic_δ italic_t [ and let r(i,t)𝑟𝑖𝑡r(i,t)italic_r ( italic_i , italic_t ) be the imbalance ratio of trader i in interval [t,t+δt[[t,t+\delta t[[ italic_t , italic_t + italic_δ italic_t [:

r(i,t)=b(i,t)s(i,t)b(i,t)+s(i,t).𝑟𝑖𝑡𝑏𝑖𝑡𝑠𝑖𝑡𝑏𝑖𝑡𝑠𝑖𝑡r(i,t)=\frac{b(i,t)-s(i,t)}{b(i,t)+s(i,t)}.italic_r ( italic_i , italic_t ) = divide start_ARG italic_b ( italic_i , italic_t ) - italic_s ( italic_i , italic_t ) end_ARG start_ARG italic_b ( italic_i , italic_t ) + italic_s ( italic_i , italic_t ) end_ARG . (1)

where b(.,.)b(.,.)italic_b ( . , . ) and s(.,.)s(.,.)italic_s ( . , . ) denote the total volumes bought and sold by the trader in a given time frame.

For a given threshold a𝑎aitalic_a we define a trader state as follows:

state(i,t)={buying state,ifr(i,t)>aselling state,ifr(i,t)<- aneutral state,if- ar(i,t)ainactive state,ifb(i,t)+s(i,t)=0𝑠𝑡𝑎𝑡𝑒𝑖𝑡casesbuying stateif𝑟𝑖𝑡𝑎selling stateif𝑟𝑖𝑡- aneutral stateif- a𝑟𝑖𝑡𝑎inactive stateif𝑏𝑖𝑡𝑠𝑖𝑡0state(i,t)=\begin{cases}\text{buying state},&\text{if}\ r(i,t)>a\\ \text{selling state},&\text{if}\ r(i,t)<$- a$\\ \text{neutral state},&\text{if}\ $- a$\leq r(i,t)\leq a\\ \text{inactive state},&\text{if}\ b(i,t)+s(i,t)=0\end{cases}italic_s italic_t italic_a italic_t italic_e ( italic_i , italic_t ) = { start_ROW start_CELL buying state , end_CELL start_CELL if italic_r ( italic_i , italic_t ) > italic_a end_CELL end_ROW start_ROW start_CELL selling state , end_CELL start_CELL if italic_r ( italic_i , italic_t ) < - a end_CELL end_ROW start_ROW start_CELL neutral state , end_CELL start_CELL if - a ≤ italic_r ( italic_i , italic_t ) ≤ italic_a end_CELL end_ROW start_ROW start_CELL inactive state , end_CELL start_CELL if italic_b ( italic_i , italic_t ) + italic_s ( italic_i , italic_t ) = 0 end_CELL end_ROW (2)

The synchronicity of a pair of traders is measured by counting the co-occurrences in the time series of their states, and attributing a p𝑝pitalic_p-value that reflects the statistical significance of this synchronicity assuming pure randomness. The hypergeometric distribution is used to calculate the p𝑝pitalic_p-value. The p𝑝pitalic_p-value is the probability that in a series of n𝑛nitalic_n trades, where one trader was npsubscript𝑛𝑝n_{p}italic_n start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT times in a state p𝑝pitalic_p and the other was nqsubscript𝑛𝑞n_{q}italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT times in a state q𝑞qitalic_q, these occurrences overlapped np,qsubscript𝑛𝑝𝑞n_{p,q}italic_n start_POSTSUBSCRIPT italic_p , italic_q end_POSTSUBSCRIPT times or more:

p(np,q)=1i=0np,q1H(i|n,np,nq)𝑝subscript𝑛𝑝𝑞1superscriptsubscript𝑖0subscript𝑛𝑝𝑞1𝐻conditional𝑖𝑛subscript𝑛𝑝subscript𝑛𝑞p(n_{p,q})=1-\sum_{i=0}^{n_{p,q}-1}H(i|n,n_{p},n_{q})italic_p ( italic_n start_POSTSUBSCRIPT italic_p , italic_q end_POSTSUBSCRIPT ) = 1 - ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_p , italic_q end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT italic_H ( italic_i | italic_n , italic_n start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT )

where

H(i|n,np,nq)=(npi)(nnpnqi)(nnq)𝐻conditional𝑖𝑛subscript𝑛𝑝subscript𝑛𝑞FRACOPsubscript𝑛𝑝𝑖FRACOP𝑛subscript𝑛𝑝subscript𝑛𝑞𝑖FRACOP𝑛subscript𝑛𝑞H(i|n,n_{p},n_{q})=\frac{\genfrac{(}{)}{0.0pt}{}{n_{p}}{i}\genfrac{(}{)}{0.0pt% }{}{n-n_{p}}{n_{q}-i}}{\genfrac{(}{)}{0.0pt}{}{n}{n_{q}}}italic_H ( italic_i | italic_n , italic_n start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) = divide start_ARG ( FRACOP start_ARG italic_n start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG start_ARG italic_i end_ARG ) ( FRACOP start_ARG italic_n - italic_n start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT - italic_i end_ARG ) end_ARG start_ARG ( FRACOP start_ARG italic_n end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_ARG ) end_ARG

This method is often used to measure similarity in genetic sequences. In literature there exist many other alternative methods; however, the hyper-geometric test can be used with sparse data and it is not sensitive to outliers. Therefore it suits our needs.

To deal with the testing of all pairs of traders and all types of co-occurrences a multiple hypothesis testing correction is needed. For this purpose we use the Bonferroni correction which is the statistical significance (0.05) divided by the number of tests. It is worth pointing out alternatively one could use the false discovery rate for multiple test correction.

2.2 Statistically validated network

A statistically validated network is a network built by validating links between pairs of traders if the p𝑝pitalic_p-value of their synchronisation is smaller than the corrected threshold. Traders without any links are dropped. In the resulting network we exclude links between opposite actions (buy-sell), links between neutral states and links between inactive states. The reason being that we are mostly interested in active traders with the same kind of behaviour (buy-buy and sell-sell) and which manifest high trading activity.

3 Clustering distribution

This section will define Ewens’ sampling formula, which is the backbone of Aoki’s modeling hypothesis concerning the dynamics of trading behaviour.

3.1 Partition vector

We will now introduce a partition vector which will be useful for our modelling purposes.

Let cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT be the number of clusters with exactly i𝑖iitalic_i traders. Then Kn=i=1ncisubscript𝐾𝑛superscriptsubscript𝑖1𝑛subscript𝑐𝑖K_{n}=\sum_{i=1}^{n}c_{i}italic_K start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the total number of clusters formed by n𝑛nitalic_n traders and i=1nici=nsuperscriptsubscript𝑖1𝑛𝑖subscript𝑐𝑖𝑛\sum_{i=1}^{n}ic_{i}=n∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_i italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_n. We will call the vector c=(c1,c2,,cn)𝑐subscript𝑐1subscript𝑐2subscript𝑐𝑛c=(c_{1},c_{2},\ldots,c_{n})italic_c = ( italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_c start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) a partition vector.

3.2 The Ewens Sampling distribution

Ewens’ sampling formula describes a specific probability for the partition of the positive integer into parts. It was discovered by Ewens Ewens (1972) as providing the probability of the partition of a sample of n𝑛nitalic_n selectively equivalent genes into a number of different gene types (alleles). For positive integers c1,c2,,cnsubscript𝑐1subscript𝑐2subscript𝑐𝑛c_{1},c_{2},...,c_{n}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_c start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , satisfying jjck=nsubscript𝑗𝑗subscript𝑐𝑘𝑛\sum_{j}jc_{k}=n∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_j italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_n and being a realisation of a random partition vector (C1(n),C2(n),,Cn(n))subscript𝐶1𝑛subscript𝐶2𝑛subscript𝐶𝑛𝑛({C}_{1}(n),{C}_{2}(n),\ldots,{C}_{n}(n))( italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_n ) , italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_n ) , … , italic_C start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_n ) ), we have:

θ(C1(n)=c1,,Cn(n)=cn)=n!θ(n)j=1n(θj)cj1cj!,subscript𝜃formulae-sequencesubscript𝐶1𝑛subscript𝑐1subscript𝐶𝑛𝑛subscript𝑐𝑛𝑛subscript𝜃𝑛superscriptsubscriptproduct𝑗1𝑛superscript𝜃𝑗subscript𝑐𝑗1subscript𝑐𝑗\mathbb{P}_{\theta}({C}_{1}(n)=c_{1},\ldots,{C}_{n}(n)=c_{n})=\frac{n!}{\theta% _{(n)}}\,\prod_{j=1}^{n}\left(\frac{\theta}{j}\right)^{c_{j}}\frac{1}{c_{j}!},blackboard_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_n ) = italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_C start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_n ) = italic_c start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = divide start_ARG italic_n ! end_ARG start_ARG italic_θ start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT end_ARG ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( divide start_ARG italic_θ end_ARG start_ARG italic_j end_ARG ) start_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ! end_ARG , (3)

for θ(0,)𝜃0\theta\in(0,\infty)italic_θ ∈ ( 0 , ∞ ), θ(n):=θ(θ+1)(θ+n1)=Γ(n+θ)/Γ(θ),n1formulae-sequenceassignsubscript𝜃𝑛𝜃𝜃1𝜃𝑛1Γ𝑛𝜃Γ𝜃𝑛1\theta_{(n)}:=\theta(\theta+1)\cdots(\theta+n-1)=\Gamma(n+\theta)/\Gamma(% \theta),n\geq 1italic_θ start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT := italic_θ ( italic_θ + 1 ) ⋯ ( italic_θ + italic_n - 1 ) = roman_Γ ( italic_n + italic_θ ) / roman_Γ ( italic_θ ) , italic_n ≥ 1 and θ(0)=1subscript𝜃01\theta_{(0)}=1italic_θ start_POSTSUBSCRIPT ( 0 ) end_POSTSUBSCRIPT = 1.111We define θ(k)=0subscript𝜃𝑘0\theta_{(-k)}=0italic_θ start_POSTSUBSCRIPT ( - italic_k ) end_POSTSUBSCRIPT = 0, for k𝑘k\in\mathbb{N}italic_k ∈ blackboard_N.

We are interested in groups of traders with strategies manifesting similar synchronisation, and since the SVN clustering process builds a network with strong links between pairs of traders we conjecture there should be no (or only a small number of) mono-communities. Thus it is natural to fit a distribution conditional on the event: c1=0subscript𝑐10c_{1}=0italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.

Let us define the conditional distribution (C~2(n),,C~n(n))subscript~𝐶2𝑛subscript~𝐶𝑛𝑛(\tilde{C}_{2}(n),\ldots,\tilde{C}_{n}(n))( over~ start_ARG italic_C end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_n ) , … , over~ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_n ) ) (see da Silva et al. (2020)) in the following way:

(C~2(n),,C~n(n))=(C2(n),,Cn(n)|C1(n)=0)subscript~𝐶2𝑛subscript~𝐶𝑛𝑛subscript𝐶2𝑛conditionalsubscript𝐶𝑛𝑛subscript𝐶1𝑛0\mathcal{L}(\tilde{C}_{2}(n),\ldots,\tilde{C}_{n}(n))=\mathcal{L}(C_{2}(n),% \ldots,C_{n}(n)|C_{1}(n)=0)caligraphic_L ( over~ start_ARG italic_C end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_n ) , … , over~ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_n ) ) = caligraphic_L ( italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_n ) , … , italic_C start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_n ) | italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_n ) = 0 ) (4)

The probability of the condition is given by:

λn(θ):=(C1(n)=0)=1θ(n)k=1nθkD(n,k),assignsubscript𝜆𝑛𝜃subscript𝐶1𝑛01subscript𝜃𝑛superscriptsubscript𝑘1𝑛superscript𝜃𝑘𝐷𝑛𝑘\lambda_{n}(\theta):=\mathbb{P}(C_{1}(n)=0)=\frac{1}{\theta_{(n)}}\sum_{k=1}^{% n}\theta^{k}D(n,k),italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_θ ) := blackboard_P ( italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_n ) = 0 ) = divide start_ARG 1 end_ARG start_ARG italic_θ start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_θ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_D ( italic_n , italic_k ) , (5)

(with λ0(θ)=1subscript𝜆0𝜃1\lambda_{0}(\theta)=1italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_θ ) = 1 and λ1(θ)=0.subscript𝜆1𝜃0\lambda_{1}(\theta)=0.italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_θ ) = 0 .), where D(n,k)𝐷𝑛𝑘D(n,k)italic_D ( italic_n , italic_k ) is the number of derangements of size n𝑛nitalic_n having k𝑘kitalic_k cycles:

D(n,k):=l=0k(1)l(nl)[nlkl];assign𝐷𝑛𝑘superscriptsubscript𝑙0𝑘superscript1𝑙FRACOP𝑛𝑙FRACOP𝑛𝑙𝑘𝑙D(n,k):=\sum_{l=0}^{k}(-1)^{l}\genfrac{(}{)}{0.0pt}{}{n}{l}\genfrac{[}{]}{0.0% pt}{}{n-l}{k-l}\enspace;italic_D ( italic_n , italic_k ) := ∑ start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( FRACOP start_ARG italic_n end_ARG start_ARG italic_l end_ARG ) [ FRACOP start_ARG italic_n - italic_l end_ARG start_ARG italic_k - italic_l end_ARG ] ; (6)

here [nk]FRACOP𝑛𝑘\genfrac{[}{]}{0.0pt}{}{n}{k}[ FRACOP start_ARG italic_n end_ARG start_ARG italic_k end_ARG ] is the unsigned Stirling number of the first kind.

Accurate computation of alternating series, present in D(n,k)𝐷𝑛𝑘D(n,k)italic_D ( italic_n , italic_k ), is a well-known hard problem therefore it is useful to give the recursive relation of λn(θ)subscript𝜆𝑛𝜃\lambda_{n}(\theta)italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_θ ) which verifies:

λn+1(θ):=nn+θ[λn(θ)+θn+θ1λn1(θ)].assignsubscript𝜆𝑛1𝜃𝑛𝑛𝜃delimited-[]subscript𝜆𝑛𝜃𝜃𝑛𝜃1subscript𝜆𝑛1𝜃\lambda_{n+1}(\theta):=\frac{n}{n+\theta}\Bigg{[}\lambda_{n}(\theta)+\frac{% \theta}{n+\theta-1}\lambda_{n-1}(\theta)\Bigg{]}.italic_λ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ( italic_θ ) := divide start_ARG italic_n end_ARG start_ARG italic_n + italic_θ end_ARG [ italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_θ ) + divide start_ARG italic_θ end_ARG start_ARG italic_n + italic_θ - 1 end_ARG italic_λ start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ( italic_θ ) ] . (7)

Table LABEL:tab:itemize summarises relevant characteristics of the Ewens’ conditional and non conditional distribution.

\floatconts

tab:itemize Feature𝐹𝑒𝑎𝑡𝑢𝑟𝑒Featureitalic_F italic_e italic_a italic_t italic_u italic_r italic_e Ewens distribution Conditional Ewens distribution Probability n!θ(n)j=1n(θj)cj1aj!,𝑛subscript𝜃𝑛superscriptsubscriptproduct𝑗1𝑛superscript𝜃𝑗subscript𝑐𝑗1subscript𝑎𝑗\frac{n!}{\theta_{(n)}}\prod_{j=1}^{n}\left(\frac{\theta}{j}\right)^{c_{j}}% \frac{1}{a_{j}!},divide start_ARG italic_n ! end_ARG start_ARG italic_θ start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT end_ARG ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( divide start_ARG italic_θ end_ARG start_ARG italic_j end_ARG ) start_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ! end_ARG , n!θ(n)λn(θ)j=2n(θj)cj1aj!.𝑛subscript𝜃𝑛subscript𝜆𝑛𝜃superscriptsubscriptproduct𝑗2𝑛superscript𝜃𝑗subscript𝑐𝑗1subscript𝑎𝑗\frac{n!}{\theta_{(n)}\lambda_{n}(\theta)}\prod_{j=2}^{n}\left(\frac{\theta}{j% }\right)^{c_{j}}\frac{1}{a_{j}!}.divide start_ARG italic_n ! end_ARG start_ARG italic_θ start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_θ ) end_ARG ∏ start_POSTSUBSCRIPT italic_j = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( divide start_ARG italic_θ end_ARG start_ARG italic_j end_ARG ) start_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ! end_ARG . Expected cycle counts 𝔼Cj(n)=n!θ(n)θ(nj)(nj)!θj,𝔼subscript𝐶𝑗𝑛𝑛subscript𝜃𝑛subscript𝜃𝑛𝑗𝑛𝑗𝜃𝑗\mathbb{E}C_{j}(n)=\frac{n!}{\theta_{(n)}}\frac{\theta_{(n-j)}}{(n-j)!}\,\frac% {\theta}{j},blackboard_E italic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n ) = divide start_ARG italic_n ! end_ARG start_ARG italic_θ start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT end_ARG divide start_ARG italic_θ start_POSTSUBSCRIPT ( italic_n - italic_j ) end_POSTSUBSCRIPT end_ARG start_ARG ( italic_n - italic_j ) ! end_ARG divide start_ARG italic_θ end_ARG start_ARG italic_j end_ARG , λ(nj)(θ)λ(n)(θ)𝔼Cj(n).subscript𝜆𝑛𝑗𝜃subscript𝜆𝑛𝜃𝔼subscript𝐶𝑗𝑛\frac{\lambda_{(n-j)}(\theta)}{\lambda_{(n)}(\theta)}\mathbb{E}C_{j}(n).divide start_ARG italic_λ start_POSTSUBSCRIPT ( italic_n - italic_j ) end_POSTSUBSCRIPT ( italic_θ ) end_ARG start_ARG italic_λ start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT ( italic_θ ) end_ARG blackboard_E italic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n ) . Expected number of cycles 𝔼Kn=i=0n1θθ+i,𝔼subscript𝐾𝑛superscriptsubscript𝑖0𝑛1𝜃𝜃𝑖\mathbb{E}K_{n}=\sum_{i=0}^{n-1}\frac{{\theta}}{{\theta}+i},blackboard_E italic_K start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT divide start_ARG italic_θ end_ARG start_ARG italic_θ + italic_i end_ARG , j=2nλ(nj)(θ)λ(n)(θ)𝔼(Cj(n)).superscriptsubscript𝑗2𝑛subscript𝜆𝑛𝑗𝜃subscript𝜆𝑛𝜃𝔼subscript𝐶𝑗𝑛\sum_{j=2}^{n}\frac{\lambda_{(n-j)}(\theta)}{\lambda_{(n)}(\theta)}\mathbb{E}(% C_{j}(n)).∑ start_POSTSUBSCRIPT italic_j = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_λ start_POSTSUBSCRIPT ( italic_n - italic_j ) end_POSTSUBSCRIPT ( italic_θ ) end_ARG start_ARG italic_λ start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT ( italic_θ ) end_ARG blackboard_E ( italic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_n ) ) .

Table 1: Comparison of Ewens conditional and non-conditional distributions

4 Experiments

In this section we describe the proprietary dataset and the experiments.

\floatconts

fig:both5 [Uncaptioned image] Figure 1: Evolution of some statistics (number of clusters, number of links, number of traders in clusters vs active traders ratio, number of clusters vs number of traders, mean cluster size, modularity) over time for a network of traders at deltas (10, 15, 30, 60, 120, 180, 360 and 1440 minutes) for EUR/USD currency pair. [Uncaptioned image] Figure 2: Proportion vector and normalised proportion vector of temporal evolution for clustering on EURUSD for 10min delta and cutoff 100. The Infomap algorithm was used to identify clusters after the SVN networks was constructed.

4.1 The dataset description

We consider financial data gathered from the client trades of a retail Foreign Exchange (FX) broker. A typical retail broker will provide their clients with an online trading platform software such as MetaTrader 4 (MT4) where they can place trades, monitor positions, track both historic and live movements in prices, and access the latest world economic news. Online trading platforms often operate under the stipulation that once an order is placed (opened), it must be closed in its entirety. Source data is essentially stored in a temporal table with each row representing a client order that provides the opening and closing time, as well as the currency traded (symbol), amount traded and side (buy or sell) of the order.

The proprietary dataset comprises the trades made by over 20k clients during 2015-2017. Each client was allowed to buy or sell any of available currency pairs and they could place trades as many times as they wanted, at any time of day provided they stayed within the confines of their leveraged funds. The dataset contains only necessary features for further investigation namely an investor’s anonymised ID, opening and closing trade times, amount of lots traded, sign (long or short position), and the traded symbol.

4.2 Experimental protocol

It is convenient to use sliding windows in order to track the temporal evolution of clustering. For each in-sample time window, we filtered out traders with less than 100, 500 or 1000 trades (referred to as the cut-off). We observe that the number of traders grows in an approximately linear fashion throughout time which is related directly with the business growth. We focus our investigation on trading activity that occurs during standard business days within the most active hours (6am - 6pm). Investigations are conducted solely considering the EUR/USD currency pair. We construct a sliding window of size 6 months and shift it every 2 weeks. Then we build a SVN network at every step using the imbalance ratio time series for δt𝛿𝑡\delta titalic_δ italic_t ranging from 10, 15, 30, 60, 120, 180, 360, and 1440 minutes (referred to as deltas).

4.3 SVN clustering and its descriptive statistics over time

To categorize traders into distinct groups, we used Infomap clustering algorithm Rosvall and Bergstrom (2008) since its popularity can be attributed to its information-theoretic approach, scalability, high quality clusters, flexibility, and statistical significance. According to the study Lancichinetti and Fortunato (2009) the Infomap clustering algorithm empirically gave the best results in Lancichinetti and Fortunato (2009) when applied to different benchmarks on Community Detection methods. Our empirical findings indicate that evolution of the proportion vector (with respect to its normalised version) allure satisfies our conjecture of a sparse number of mono-communities (see Figure 2). From the figures, we notice a smooth evolution of proportions, and also the appearance of new and larger clusters - this is to be expected since the number of traders is growing over time. Moreover we observe a pattern of having less clusters of significant cardinality. An existence of a very big cluster (and many very small ones) would negate the heterogeneity of trading strategies. We observe that Infomap is consistent with the resolution scale and number of trades cut-off. We calculated several pertinent statistics to evaluate how the SVN’s are affected by different time resolutions sampled throughout the lifespan of the entire dataset (i.e. from 2015 - 2017), as illustrated in Figure 2. As previously stated, the number of traders in the dataset increases over time however we notice a sudden increase in the number of links and clusters from July 2016.

This results in an increase in the number of clusters and links in the sliding networks. We remark stability over time in the ratio of numbers of traders against the number of clusters. At each slide an SVN is built and some traders are never taken into consideration and the ratio of existent traders is increasing slightly with increase of the resolution delta. The modularity is slowly decreasing and is low besides deltas of 360 and 1440 minutes, which testifies about rather weak connections between clusters.

4.4 Goodness of fit

In order to assess the goodness of fit to the data we refer to what is conventionally used: a classical χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT test. The parameter θ𝜃\thetaitalic_θ was estimated for every sliding window and since the formula is not explicit for 𝔼Kn𝔼subscript𝐾𝑛\mathbb{E}K_{n}blackboard_E italic_K start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT (see table LABEL:tab:itemize) we approximate it to the closest integer. It is worth noting that for a non conditional Ewens distribution one can readily find an explicit formula for θ𝜃\thetaitalic_θ using 𝔼Kn𝔼subscript𝐾𝑛\mathbb{E}K_{n}blackboard_E italic_K start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.

Taking the example for δ𝛿\deltaitalic_δ equal to 10 mins and cut-off of minimum 100 trades we apply the χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT test for 50 sliding windows at significance of 0.05. We find a 95%percent9595\%95 % pass rate which confirms that most of the time the conditional Ewens distribution is a good fit.

\floatconts

fig:both2 [Uncaptioned image] Figure 3: Evolution of θ𝜃\thetaitalic_θ parameter for all δ𝛿\deltaitalic_δ time slices and cutoff of 1000 in the fixed amount of 200 most active traders. Other scenarios bear similarities in the shape of the curves i.e. for deltas 360 and 1440 the parameter θ𝜃\thetaitalic_θ stays more or less stationary and others increase suddenly at some point. [Uncaptioned image] Figure 4: A comparison of empirical and theoretical fit on last sliding window. The plots were obtained using EURUSD data for 10min scale and cutoff 100.

Figure 6 shows that for all studied scenarios in most cases we have a high pass-rate. In general for a cut-off of 100 the pass-rate is above 85%percent8585\%85 %, for others it seems to increase with delta. Figure 4 illustrates a typical comparison between empirical and theoretical fit on a given sliding window which is satisfying. Figure 4 shows the evolution of the Ewens distribution fitted parameter. It is more or less stationary for bigger deltas and increasing for smaller ones. Larger estimated parameter θ^^𝜃\hat{\theta}over^ start_ARG italic_θ end_ARG indicates a higher so-called mutation rate, therefore the existence of more clusters.

4.5 Temporal cluster evolution and consistent grou** identification issue

In some cases we require consistent grou** identification and the main difficulty comes from the lack of consistent naming of clusters for subsequent time frames. The latter allows us to, amongst other things, produce meaningful visualisations. The technique used relies on a total consistency measure which is in close relation to the Jaccard index (for more details see Liechti and Bonhoeffer (2020)).

In Figure 6 we see a so-called alluvial plot where at a given time, traders belonging to the same group are stacked together to form a continuous flow. The stability of group composition is shown when the same colouring persists between two time steps. However a group can split, merge, die out, appear suddenly or persist throughout time. These changes in groups are to be expected as traders’ investment strategies evolve over time, and existing traders leave and new traders join. Overall we remark some stability, however as expected eventually there are die outs, merges, splits and new appearances. When we considered different deltas (results not shown), we found that larger groups were more prevalent for smaller time frames.

\floatconts

fig:both [Uncaptioned image] Figure 5: Pass rate in percent for all δ𝛿\deltaitalic_δ time slices and 100, 500 and 1000 cutoffs. This rate represents the ratio of non rejected null χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT hypothesis for all sliding windows [Uncaptioned image] Figure 6: Alluvial plot with 1 step history (see Liechti and Bonhoeffer (2020) for more details) for 200 most active traders with cutoff 1000 and delta of one day.

5 Clusterised Aggregating Algorithm

We wish to study the temporal evolution of clusters of trading activity and investigate how they can be used for practical purposes. Clustering evolution could be used in prediction problems since grou** has the advantage of simplifying the description of the system state by reducing the dimensionality of the prediction problem. In the literature there are numerous examples of the latter set in a financial context. For example, in Challet et al. (2018) the authors used SVN’s to demonstrate improvement in predicting both the sign of the order flow and the direction of the average transaction price for a retail trader dataset. In this study we have applied the clustering evolution to prediction with an online expert advice model, namely the Aggregating Algorithm (AA) Vovk (1990) and Vovk (1998). The AA is given a series of online predictions from a pool of experts (in our case the traders). At each time epoch, the loss of each experts’ prediction (in our case a trader’s investment decision) is fed back into the AA and over time adjusts its trust in each expert to make future predictions. In the next subsections we introduce the framework of the AA and the games of investment with expert advice.

5.1 Aggregating Algorithm

Suppose that the learner L𝐿Litalic_L is tasked with predicting elements of a sequence ω1,ω2,subscript𝜔1subscript𝜔2\omega_{1},\omega_{2},\ldotsitalic_ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … called outcomes. The outcomes occur in discrete time. Before seeing outcome ωtsubscript𝜔𝑡\omega_{t}italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, the learner is outputting a prediction γtsubscript𝛾𝑡\gamma_{t}italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. The quality of the prediction is measured by a loss function λ(.,.)\lambda(.,.)italic_λ ( . , . ). The expert aims to suffer low cumulative loss:

LossT(L)=t=1Tλ(ωt,γt)subscriptLoss𝑇𝐿superscriptsubscript𝑡1𝑇𝜆subscript𝜔𝑡subscript𝛾𝑡\mathop{\mathrm{Loss}}\nolimits_{T}(L)=\sum_{t=1}^{T}\lambda(\omega_{t},\gamma% _{t})roman_Loss start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_L ) = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_λ ( italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )

We assume that the set of all possible outcomes (outcome space) ΩΩ\Omegaroman_Ω is known to us in advance and we are allowed to draw predictions from a known prediction space ΓΓ\Gammaroman_Γ, which may or may not be the same as ΩΩ\Omegaroman_Ω. The function λ𝜆\lambdaitalic_λ is also known and maps Γ×ΩΓΩ\Gamma\times\Omegaroman_Γ × roman_Ω to a subset of the extended real line, typically [0,+]0[0,+\infty][ 0 , + ∞ ]. The choice of a triple G=Ω,Γ,λ𝐺ΩΓ𝜆G=\langle\Omega,\Gamma,\lambda\rangleitalic_G = ⟨ roman_Ω , roman_Γ , italic_λ ⟩, is referred to as a game.

Suppose that the learner gets help from experts. The experts predict the same sequence and their predictions are made available to the learner before it commits to its own predictions. We are not concerned with their internal mechanics, which may well be inaccessible to us (e.g., the experts may rely on some sources of information unavailable or even unknown to us). The interaction with experts may be described by the following protocol. Here we assume that experts are parameterised by θΘ𝜃Θ\theta\in\Thetaitalic_θ ∈ roman_Θ.

Expert Eθsubscript𝐸𝜃E_{\theta}italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT suffers loss LossT(Eθ)=t=1Tλ(γtθ,ωt)subscriptLoss𝑇subscript𝐸𝜃superscriptsubscript𝑡1𝑇𝜆subscriptsuperscript𝛾𝜃𝑡subscript𝜔𝑡\mathop{\mathrm{Loss}}\nolimits_{T}(E_{\theta})=\sum_{t=1}^{T}\lambda(\gamma^{% \theta}_{t},\omega_{t})roman_Loss start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_λ ( italic_γ start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ). The goal of the learner is to merge experts’ predictions γtθsubscriptsuperscript𝛾𝜃𝑡\gamma^{\theta}_{t}italic_γ start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT into its own prediction γtsubscript𝛾𝑡\gamma_{t}italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in such a way that the learner’s loss LossT(L)subscriptLoss𝑇𝐿\mathop{\mathrm{Loss}}\nolimits_{T}(L)roman_Loss start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_L ) is low as compared to retrospectively best experts. It may use information about past outcomes and predictions. Formally, we are seeking a merging strategy:

S:(ΓΘ×Ω)×ΓΘΓ:𝑆superscriptsuperscriptΓΘΩsuperscriptΓΘΓS:(\Gamma^{\Theta}\times\Omega)^{*}\times\Gamma^{\Theta}\rightarrow\Gammaitalic_S : ( roman_Γ start_POSTSUPERSCRIPT roman_Θ end_POSTSUPERSCRIPT × roman_Ω ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT × roman_Γ start_POSTSUPERSCRIPT roman_Θ end_POSTSUPERSCRIPT → roman_Γ

We typically want S to guarantee an upper bound on LossT(L)subscriptLoss𝑇𝐿\mathop{\mathrm{Loss}}\nolimits_{T}(L)roman_Loss start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_L ) in terms of infθΘLossT(Eθ)subscriptinfimum𝜃ΘabsentsubscriptLoss𝑇subscript𝐸𝜃\inf\limits_{\underset{\theta\in\Theta}{}}\mathop{\mathrm{Loss}}\nolimits_{T}(% E_{\theta})roman_inf start_POSTSUBSCRIPT start_UNDERACCENT italic_θ ∈ roman_Θ end_UNDERACCENT start_ARG end_ARG end_POSTSUBSCRIPT roman_Loss start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ); we want LossT(L)subscriptLoss𝑇𝐿\mathop{\mathrm{Loss}}\nolimits_{T}(L)roman_Loss start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_L ) to be low whenever LossT(Eθ)subscriptLoss𝑇subscript𝐸𝜃\mathop{\mathrm{Loss}}\nolimits_{T}(E_{\theta})roman_Loss start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) is low for some θ𝜃\thetaitalic_θ. We assume that the pool of experts is finite, i.e., |Θ|=n<+Θ𝑛|\Theta|=n<+\infty| roman_Θ | = italic_n < + ∞.

Consider a game G=Ω,Γ,λ𝐺ΩΓ𝜆G=\langle\Omega,\Gamma,\lambda\rangleitalic_G = ⟨ roman_Ω , roman_Γ , italic_λ ⟩ a constant C>0𝐶0C>0italic_C > 0 is admissible for a learning rate η>0𝜂0\eta>0italic_η > 0 if for every N=1,2,,𝑁12N=1,2,\ldots,italic_N = 1 , 2 , … , every set of predictions γ1,,γnΓsubscript𝛾1subscript𝛾𝑛Γ\gamma_{1},\ldots,\gamma_{n}\in\Gammaitalic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ roman_Γ, and every distribution (p1,p2,,pn)Δn1subscript𝑝1subscript𝑝2subscript𝑝𝑛subscriptΔ𝑛1(p_{1},p_{2},\ldots,p_{n})\in\Delta_{n-1}( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∈ roman_Δ start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT, there is γΓ𝛾Γ\gamma\in\Gammaitalic_γ ∈ roman_Γ ensuring for all outcomes ωΩ𝜔Ω\omega\in\Omegaitalic_ω ∈ roman_Ω the inequality:

λ(γ,ω)Cηlni=1Npieηλ(γ,ω)𝜆𝛾𝜔𝐶𝜂superscriptsubscript𝑖1𝑁subscript𝑝𝑖superscript𝑒𝜂𝜆𝛾𝜔\lambda(\gamma,\omega)\leq\frac{C}{\eta}\ln\sum_{i=1}^{N}p_{i}e^{-\eta\lambda(% \gamma,\omega)}italic_λ ( italic_γ , italic_ω ) ≤ divide start_ARG italic_C end_ARG start_ARG italic_η end_ARG roman_ln ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_η italic_λ ( italic_γ , italic_ω ) end_POSTSUPERSCRIPT

The mixability constant Cηsubscript𝐶𝜂C_{\eta}italic_C start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT is the infimum of all C>0𝐶0C>0italic_C > 0 admissible for η𝜂\etaitalic_η. This infimum is usually achieved. The admissibility is required to ensure the learner’s predictions exist and belong to ΓΓ\Gammaroman_Γ since for example the learner’s prediction of the form γt=i=1Npiγtisubscript𝛾𝑡superscriptsubscript𝑖1𝑁subscript𝑝𝑖subscriptsuperscript𝛾𝑖𝑡\gamma_{t}=\sum_{i=1}^{N}p_{i}\gamma^{i}_{t}italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is a linear combination and ΓΓ\Gammaroman_Γ may not be convex. The AA takes as parameters a set of prior experts’ weights (q1,,qN)ΔN1subscript𝑞1subscript𝑞𝑁subscriptΔ𝑁1(q_{1},\ldots,q_{N})\in\Delta_{N-1}( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_q start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) ∈ roman_Δ start_POSTSUBSCRIPT italic_N - 1 end_POSTSUBSCRIPT, a learning rate η>0𝜂0\eta>0italic_η > 0 and an admissible C>0𝐶0C>0italic_C > 0. The algorithm works as shown in the pseudocode below.

Input: η,C,q,N𝜂𝐶𝑞𝑁\eta,C,q,Nitalic_η , italic_C , italic_q , italic_N
1 initialization of weights ω0iqisimilar-tosuperscriptsubscript𝜔0𝑖subscript𝑞𝑖\omega_{0}^{i}\sim q_{i}italic_ω start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∼ italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for i=1,,N𝑖1𝑁i=1,\ldots,Nitalic_i = 1 , … , italic_N
2 choice of loss λ(.,.)\lambda(.,.)italic_λ ( . , . )
3 for t=1,2,𝑡12italic-…t=1,2,\dotsitalic_t = 1 , 2 , italic_… do
4       read experts’ predictions γtisuperscriptsubscript𝛾𝑡𝑖\gamma_{t}^{i}italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT
5       normalise the weights pti=ωt1ijωt1jsuperscriptsubscript𝑝𝑡𝑖superscriptsubscript𝜔𝑡1𝑖subscript𝑗superscriptsubscript𝜔𝑡1𝑗p_{t}^{i}=\frac{\omega_{t-1}^{i}}{\sum_{j}\omega_{t-1}^{j}}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = divide start_ARG italic_ω start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG
6       output γtΓsubscript𝛾𝑡Γ\gamma_{t}\in\Gammaitalic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ roman_Γ satisfying for all ωΩ𝜔Ω\omega\in\Omegaitalic_ω ∈ roman_Ωλ(γ,ω)Cηlni=1Npieηλ(γ,ω)𝜆𝛾𝜔𝐶𝜂superscriptsubscript𝑖1𝑁subscript𝑝𝑖superscript𝑒𝜂𝜆𝛾𝜔\lambda(\gamma,\omega)\leq\frac{C}{\eta}\ln\sum_{i=1}^{N}p_{i}e^{-\eta\lambda(% \gamma,\omega)}italic_λ ( italic_γ , italic_ω ) ≤ divide start_ARG italic_C end_ARG start_ARG italic_η end_ARG roman_ln ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_η italic_λ ( italic_γ , italic_ω ) end_POSTSUPERSCRIPT
7       observe outcome ωtsubscript𝜔𝑡\omega_{t}italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
8       update the weights ωti=ωt1ieηλ(γti,ωt)superscriptsubscript𝜔𝑡𝑖superscriptsubscript𝜔𝑡1𝑖superscript𝑒𝜂𝜆superscriptsubscript𝛾𝑡𝑖subscript𝜔𝑡\omega_{t}^{i}=\omega_{t-1}^{i}\cdot e^{-\eta\cdot\lambda(\gamma_{t}^{i},% \omega_{t})}italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_ω start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ⋅ italic_e start_POSTSUPERSCRIPT - italic_η ⋅ italic_λ ( italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT
9      
10 end for
Algorithm 1 Aggregating Algorithm

The validity of the AA holds under some mild regularity assumptions on the game and assuming the uniform initial distribution, it can be shown (as in Equation 8) that the constants in the following inequality are optimal:

LossT(L)CLossT(Ei)+CηlnNsubscriptLoss𝑇𝐿𝐶subscriptLoss𝑇subscript𝐸𝑖𝐶𝜂𝑁\mathop{\mathrm{Loss}}\nolimits_{T}(L)\leq C\mathop{\mathrm{Loss}}\nolimits_{T% }(E_{i})+\frac{C}{\eta}\ln Nroman_Loss start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_L ) ≤ italic_C roman_Loss start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + divide start_ARG italic_C end_ARG start_ARG italic_η end_ARG roman_ln italic_N (8)

5.2 Long Short Game

The problem of portfolio selection is a natural special case of a prediction with expert advice problem where in Vovk and Watkins (1998) considered realistic trading scenarios i.e. the Long Short game.

The Long-Short game aims to represent a realistic trading scenario. A trader is allowed to open positions, both long and short, within certain limits based on their deposit and money they had earned previously. The limits aim to minimise the chance of bankruptcy. Given the wealth Wt1subscript𝑊𝑡1W_{t-1}italic_W start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT at time t1𝑡1t-1italic_t - 1 trader i𝑖iitalic_i opens a position of size Wt1γtisubscript𝑊𝑡1subscriptsuperscript𝛾𝑖𝑡W_{t-1}\gamma^{i}_{t}italic_W start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT when the return ωtsubscript𝜔𝑡\omega_{t}italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is known, the trader’s wealth changes accordingly:

Wt=Wt1λ(γti,ωt)=Wt1(1+γtωt)subscript𝑊𝑡subscript𝑊𝑡1𝜆superscriptsubscript𝛾𝑡𝑖subscript𝜔𝑡subscript𝑊𝑡11subscript𝛾𝑡subscript𝜔𝑡W_{t}=W_{t-1}\cdot\lambda(\gamma_{t}^{i},\omega_{t})=W_{t-1}\cdot(1+\gamma_{t}% \cdot\omega_{t})italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_W start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ⋅ italic_λ ( italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = italic_W start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ⋅ ( 1 + italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⋅ italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )

In this framework one can apply the AA with η=1,C=1formulae-sequence𝜂1𝐶1\eta=1,C=1italic_η = 1 , italic_C = 1 and the substitution rule given by γt=i=1Npiγtisubscript𝛾𝑡superscriptsubscript𝑖1𝑁subscript𝑝𝑖subscriptsuperscript𝛾𝑖𝑡\gamma_{t}=\sum_{i=1}^{N}p_{i}\gamma^{i}_{t}italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to the general long-short game. If 1+γtωt>01subscript𝛾𝑡subscript𝜔𝑡01+\gamma_{t}\cdot\omega_{t}>01 + italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⋅ italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT > 0 for t=1,,T𝑡1𝑇t=1,\ldots,Titalic_t = 1 , … , italic_T i.e., the learner does not get bankrupt along the way, the bound (8) will hold.

5.3 AA with Slee** Experts

In Al-Baghdadi et al. (2020), an evaluation of the performance of the AA was made using a real-life trading dataset. Some modifications of the AA were proposed in order to improve the practical performance of the resulting portfolio. In particular, a downside loss and weighted average between the latter and the long short loss were introduced. Downside loss, in contrast to long short loss (originally used in Vovk and Watkins (1998)), penalises financial losses but does not reward gains since a strategy not to lose money may be more important than the ability to earn money.

λLongShortLoss(ρ,γ,r)subscript𝜆LongShortLoss𝜌𝛾𝑟\displaystyle\lambda_{\mathrm{Long~{}Short~{}Loss}}(\rho,\gamma,r)italic_λ start_POSTSUBSCRIPT roman_Long roman_Short roman_Loss end_POSTSUBSCRIPT ( italic_ρ , italic_γ , italic_r ) =log[max(1+ργr,0)]absent1𝜌𝛾𝑟0\displaystyle=-\log[\max(1+\rho\cdot\gamma\cdot r,0)]= - roman_log [ roman_max ( 1 + italic_ρ ⋅ italic_γ ⋅ italic_r , 0 ) ] (9)
λDownsideLoss(ρ,γ,r)subscript𝜆DownsideLoss𝜌𝛾𝑟\displaystyle\lambda_{\mathrm{Downside~{}Loss}}(\rho,\gamma,r)italic_λ start_POSTSUBSCRIPT roman_Downside roman_Loss end_POSTSUBSCRIPT ( italic_ρ , italic_γ , italic_r ) =log{max[1+ρmin(γr,0),0]}absent1𝜌𝛾𝑟00\displaystyle=-\log\{\max[1+\rho\cdot\min(\gamma\cdot r,0),0]\}= - roman_log { roman_max [ 1 + italic_ρ ⋅ roman_min ( italic_γ ⋅ italic_r , 0 ) , 0 ] }

where:

ρ𝜌\rhoitalic_ρ {}-{}- scaling factor
γ𝛾\gammaitalic_γ {}-{}- investment decision \in [-1, 1]
r𝑟ritalic_r {}-{}- return

In our research we faced one particular challenge with our dataset: the pool of traders constantly changes through time. For example, traders may choose to cease trading with the broker at any time, they may take breaks from trading, new ones may join, or traders may close their account entirely. The AA requires such experts to continually provide predictions through time - a natural way to encode such activities is to use the so-called “slee**” experts extension.

Input: η,ρ,n𝜂𝜌𝑛\eta,\rho,nitalic_η , italic_ρ , italic_n
1 Initialization of weigths ω0i=1superscriptsubscript𝜔0𝑖1\omega_{0}^{i}=1italic_ω start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = 1 for i=1,,n𝑖1𝑛i=1,\ldots,nitalic_i = 1 , … , italic_n
2 Choice of loss λ(γ,r)𝜆𝛾𝑟\lambda(\gamma,r)italic_λ ( italic_γ , italic_r )
3 for t=1,2,𝑡12italic-…t=1,2,\dotsitalic_t = 1 , 2 , italic_… do
4       Get set of awake experts Atsubscript𝐴𝑡A_{t}italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and slee** experts Stsubscript𝑆𝑡S_{t}italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
5       Get set of awake experts Atsubscript𝐴𝑡A_{t}italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and slee** experts Stsubscript𝑆𝑡S_{t}italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
6       Read investment of awake experts γtisuperscriptsubscript𝛾𝑡𝑖\gamma_{t}^{i}italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT for iAt𝑖subscript𝐴𝑡i\in A_{t}italic_i ∈ italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
7       Normalise the weights of awake experts pti=ωt1ij:Atωt1jsuperscriptsubscript𝑝𝑡𝑖superscriptsubscript𝜔𝑡1𝑖subscript:𝑗subscript𝐴𝑡superscriptsubscript𝜔𝑡1𝑗p_{t}^{i}=\frac{\omega_{t-1}^{i}}{\sum_{j:A_{t}}\omega_{t-1}^{j}}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = divide start_ARG italic_ω start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j : italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG
8       Calculate investment prediction γt=j:Atptjγt1jsubscript𝛾𝑡subscript:𝑗subscript𝐴𝑡superscriptsubscript𝑝𝑡𝑗superscriptsubscript𝛾𝑡1𝑗\gamma_{t}={\sum_{j:A_{t}}p_{t}^{j}\cdot\gamma_{t-1}^{j}}italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j : italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ⋅ italic_γ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT
9       Observe return rtsubscript𝑟𝑡r_{t}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
10       Update for iAt𝑖subscript𝐴𝑡i\in A_{t}italic_i ∈ italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT the weights ωti=ωt1iexp[ηλ(γti,rt)]superscriptsubscript𝜔𝑡𝑖superscriptsubscript𝜔𝑡1𝑖𝜂𝜆superscriptsubscript𝛾𝑡𝑖subscript𝑟𝑡\omega_{t}^{i}=\omega_{t-1}^{i}\cdot\exp[-\eta\cdot\lambda(\gamma_{t}^{i},r_{t% })]italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_ω start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ⋅ roman_exp [ - italic_η ⋅ italic_λ ( italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ]
11       Update for iSt𝑖subscript𝑆𝑡i\in S_{t}italic_i ∈ italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT the weights ωti=ωt1iexp[ηλ(γt,rt)]superscriptsubscript𝜔𝑡𝑖superscriptsubscript𝜔𝑡1𝑖𝜂𝜆subscript𝛾𝑡subscript𝑟𝑡\omega_{t}^{i}=\omega_{t-1}^{i}\cdot\exp[-\eta\cdot\lambda(\gamma_{t},r_{t})]italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = italic_ω start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ⋅ roman_exp [ - italic_η ⋅ italic_λ ( italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ]
12      
13 end for
Algorithm 2 Aggregating Algorithm With Slee** Experts

5.4 Clusterised Aggregating Algorithm (CAA) and decision rules

The classical AA learner prediction is:

γt=kptkγt1ksubscript𝛾𝑡subscript𝑘subscriptsuperscript𝑝𝑘𝑡subscriptsuperscript𝛾𝑘𝑡1\displaystyle{\gamma_{t}}=\sum_{k}{{p^{k}_{t}\gamma}^{k}_{t-1}}italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_p start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT (10)

Which is a weighted average of experts’ predictions. For clusterised aggregating algorithm (CAA) we introduced two decision rules:

γtMEANsuperscriptsubscript𝛾𝑡MEAN\displaystyle{\gamma_{t}}^{\mathrm{MEAN}}italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_MEAN end_POSTSUPERSCRIPT =ijnipti,jkniγt1i,kniabsentsubscript𝑖superscriptsubscript𝑗subscript𝑛𝑖subscriptsuperscript𝑝𝑖𝑗𝑡superscriptsubscript𝑘subscript𝑛𝑖subscriptsuperscript𝛾𝑖𝑘𝑡1subscript𝑛𝑖\displaystyle=\sum_{i}\sum_{j}^{n_{i}}p^{i,j}_{t}\cdot\sum_{k}^{n_{i}}\frac{{% \gamma}^{i,k}_{t-1}}{n_{i}}= ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⋅ ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG italic_γ start_POSTSUPERSCRIPT italic_i , italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG take the mean of experts’ predictions in a cluster
γtPENsuperscriptsubscript𝛾𝑡PEN\displaystyle{\gamma_{t}}^{\mathrm{PEN}}italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_PEN end_POSTSUPERSCRIPT =ijnipti,jγt1i,jniabsentsubscript𝑖superscriptsubscript𝑗subscript𝑛𝑖subscriptsuperscript𝑝𝑖𝑗𝑡subscriptsuperscript𝛾𝑖𝑗𝑡1subscript𝑛𝑖\displaystyle=\sum_{i}\sum_{j}^{n_{i}}p^{i,j}_{t}\frac{{\gamma}^{i,j}_{t-1}}{n% _{i}}= ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT divide start_ARG italic_γ start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG penalise by dividing by the cardinality of a cluster

where nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the cardinality of i𝑖iitalic_i-th cluster and pisuperscript𝑝𝑖p^{i}italic_p start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is the sum of probabilities of i𝑖iitalic_i-th cluster.

The decision rule of γtMEANsuperscriptsubscript𝛾𝑡MEAN{\gamma_{t}}^{\mathrm{MEAN}}italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_MEAN end_POSTSUPERSCRIPT is interesting in a trivial case scenario i.e. having the same duplicated experts in every cluster. Let’s suppose that we have m𝑚mitalic_m identical experts in the pool. It appears desirable to collate them into one. However, this is done by the AA automatically. The behaviour of the AA would be the same as if one expert with the combined weight is present in the pool. Assuming the uniform distribution on the initial experts, the weight of the combined expert will be m/N𝑚𝑁m/Nitalic_m / italic_N and the loss bound for the duplicated experts Eisubscript𝐸𝑖E_{i}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (again assuming the mixable case C=1𝐶1C=1italic_C = 1) turns into:

LossT(L)LossT(Ei)+1ηlnNmsubscriptLoss𝑇𝐿subscriptLoss𝑇subscript𝐸𝑖1𝜂𝑁𝑚\mathop{\mathrm{Loss}}\nolimits_{T}(L)\leq\mathop{\mathrm{Loss}}\nolimits_{T}(% E_{i})+\frac{1}{\eta}\ln\frac{N}{m}roman_Loss start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_L ) ≤ roman_Loss start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + divide start_ARG 1 end_ARG start_ARG italic_η end_ARG roman_ln divide start_ARG italic_N end_ARG start_ARG italic_m end_ARG

However, if duplicate experts are bad, this creates a problem: needlessly increasing n𝑛nitalic_n worsens the bound for good experts. For example, if there were two clusters, with each having different duplicated experts and the bigger cluster had better-performing experts then the AA bound would be improved.

The second decision rule i.e. γtPENsuperscriptsubscript𝛾𝑡PEN{\gamma_{t}}^{\mathrm{PEN}}italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_PEN end_POSTSUPERSCRIPT has an interpretation of partially awake experts if the penalising factor is normalised i.e. 1nikClusters1nk1subscript𝑛𝑖subscript𝑘Clusters1subscript𝑛𝑘\frac{\frac{1}{n_{i}}}{\sum_{k\in\mathrm{Clusters}}\frac{1}{n_{k}}}divide start_ARG divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_k ∈ roman_Clusters end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG end_ARG. This idea was generalised in V’yugin and Trunov (2022). Apart from a prediction γtsubscript𝛾𝑡\gamma_{t}italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT such an expert produces a confidence value ct[0,1]subscript𝑐𝑡01c_{t}\in[0,1]italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ [ 0 , 1 ], which quantifies its confidence (a fully slee** expert would output confidence of 0 and a fully awake expert would output a confidence of 1). Here the confidence would be inverse proportional to the cardinality of the cluster. This is similar to inverse-variance weighting in portfolio selection problems in particular the equal risk contributions portfolio Maillard et al. (2010).

5.5 Experts as Clusters approach to AA (ECAA)

Up until now we only clusterised via the decision rules, and the experts were identified as the traders. It seems natural to consider treating clusters of traders as meta-experts. We averaged experts’ investement decisions per cluster in order to obtain the meta-experts’ predictions. In appendix 11, we derive a condition to which these extensions to the AA would outperform the original set up of the AA with duplicated experts. In practice we identified the flow of meta-experts according to the alluvial plot (see Figure 6). There are several things to consider in this scenario especially the splitting and merging of clusters on every epoch. We suggested the following approach:

  • If the cluster is split then the children would inherit the parents weight divided by number of splits.

  • If clusters are merged then the resultant weight is the sum of the parents weights.

5.6 Experiments

First we applied a data staging technique known as DAPRA (see Al-baghdadi et al. (2019)) which, when applied to data streams pertaining to trades and prices, allows one to sample the data at regular time intervals (required for this study). We then compared the performance of the AA with its clusterised counterparts (CAA and ECAA) with the expectation that these extensions would improve scalability and reduce noise. The CAA extension simply takes the mean of investments of awake experts γ𝛾\gammaitalic_γ in a given cluster (MEAN), or divides their decision by the cardinality of the cluster (PEN). As a benchmark we used the equally weighted portfolio strategy. We compared the CAA and the ECAA using the SVN-infomap approach with hierarchical clustering based on correlations of the traders’ net positions (i.e. difference between total open long (buy) and open short (sell) positions in USD dollars) with a chosen distance metric: 1|correlation|1correlation1-|\mathrm{correlation}|1 - | roman_correlation |. The latter approach has a possibility of adjusting the construction of clusters by changing the dissimilarity threshold. The rationale behind clustering based on net position correlation is that it is a desirable feature for the broker since it is a measure of risk. The SVN approach is focused on trading synchronicity therefore we have less control on the quality of clustering in regards to the net position. Ideally all traders would trade all the time or have a high trading intersection period but since it is not the case one can end up with “noisy” clusters.

\floatconts

Tbl:summary

Table 2: Table summarising the experimental results for CAA.

Strategy Type Scaling factor Return Sharpe Ratio Max Drawdown Calmar Ratio EW Benchmark - 1.4% 0.6 1.2% 1.2 AA Slee** Experts 70 2.8% 1.1 1.8% 1.5 CAA MEAN/ SVN 70 3% 1.2 1.85% 1.8 CAA MEAN/Hierarchical 70 4.8% 2 1.15% 4 CAA PEN/SVN 70 2.5% 1.35 0.9% 2.5 CAA PEN/Hierarchical 70 2.5% 1.4 0.9% 2.6 ECAA Hierarchical 80 200 1% 1.65 0.3% 3.5 ECAA SVN 1 0.5% 0.4 0.8% 0.6

We obtained optimistic results - especially for the downside loss (see 9) which is more appropriate in this framework. We evaluated the performance using four well established portfolio risk measures: the return of the portfolio, sharpe ratio is the amount of return an investor receives per unit of risk, the maximum drawdown is the maximum observed loss from a peak to a trough of a portfolio, before a new peak is attained and calmar ratio measures the risk-adjusted performance of a portfolio by comparing the return to the maximum drawdown.

The distribution of traders’ returns is close to symmetric and the mean is approximately zero. Performances of CAA are on the whole comparable with those of the MEAN clustering decision rule for the clusters constructed with the SVN - infomap method. However the results using the hierarchical clustering are significantly better across all risk measures. The best performing cutoff for the distance metric is around 70%percent7070\%70 %. On the other hand, the results for the PEN clustering decision rule are comparable for the return on investment but for other metrics we noticed significantly better results for both clustering techniques. Figures LABEL:fig:outsamplepen and LABEL:fig:outsamplemean show the comparison among all results for a return scaling factor up to 400.

\floatconts

fig:outsamplepen Refer to caption

Figure 7: Comparison of results among all four considered measures of risk in the out of sample scenario where the CAA learner prediction is the experts predictions divided by the cardinality of each cluster. The return to maximum drawdown ratio, sharpe ratio, 1 +return and maximum drawdowm are shown for different return scaling factors. The green,blue and pink dotted line denote the equal weights portfolio, AA and CAA for SVN- infomap performances. Other curves represent CAA using clusters done with hierarchical clustering with different thresholds.
\floatconts

fig:outsamplemean Refer to caption

Figure 8: Comparison of results among all four considered measures of risk in the out of sample scenario where the CAA learner prediction is the mean of experts prediction for each cluster. The return to maximum drawdown ratio, sharpe ratio, 1 +return and maximum drawdowm are shown for different return scaling factors. The green,blue and pink dotted line denote the equal weights portfolio, AA and CAA for SVN- infomap performances. Other curves represent CAA using clusters done with hierarchical clustering with different thresholds.

For ECAA we consider the scenario of treating clusters as meta-experts. Using the alluvial chart we can readily identify the flow of clusters over time since without it we could not identify clusters at different time epochs since they are unlabeled. Overall performance of the ECAA using SVN-infomap clusters is poor, manifesting lowest return, Sharpe Ratio and Calma Ratio . However for hierarchical clustering all other risk measures are significantly better than the standard AA besides the return (see Figure LABEL:fig:outsampleaasquared). Moreover, ECAA has smoother PnL as seen by much smaller drawdown than CAA, AA and the banchmark.

\floatconts

fig:outsampleaasquared Refer to caption

Figure 9: Comparison of results among all four considered measures of risk in the out of sample scenario where the ECAA learner prediction is the mean of experts prediction for each cluster. The return to maximum drawdown ratio, sharpe ratio, 1 +return and maximum drawdowm are shown for different return scaling factors. The green,blue and pink dotted line denote the equal weights portfolio, AA and ECAA for SVN- infomap performances. Other curves represent ECAA using clusters done with hierarchical clustering with different thresholds.

Table LABEL:Tbl:summary summarises the experimental results for near optimal variations of all algorithms. Figures 11 and 11 show their evolution of returns and drawdowns throughout time. It is worth mentioning that when the scaling factor gets bigger (larger than 100100100100) more and more traders go bankrupt because of the nature of the loss (9). Moreover, the algorithm could suddenly stop investing when the scaling factor gets too big therefore one must be cautious when interpreting the results.

\floatconts

fig:both3 [Uncaptioned image] Figure 10: Comparison of returns for equal weight portfolio , AA and good alternatives for CAA and ECAA. [Uncaptioned image] Figure 11: Comparison of relative drawdowns for equal weight portfolio , AA and good alternatives for CAA and ECAA.

6 Conclusion

In this paper our findings confirm that clustering of traders’ investments can be described by Ewens distribution. The temporal clustering distribution depends on many parameters and market conditions however its clustering could be leveraged to make better investment decisions. We adjusted the aggregating algorithm with slee** experts to test the latter hypothesis using two clustering techniques, namely SVN-infomap and hierarchical clustering. In this framework the latter approach gives better results and gives more meaningful clusters since is based on correlations of the investors’ net positions and not on their trading synchronicity. In particular we compared CAA (used aggregated traders’ decisions per cluster to calculate the investment prediction) and ECAA (clusters played the role of experts) with AA and the equally weighted portfolio strategy. Our introduced modifications to the AA indicate clear performance benefits in our experimental results in terms of four well established portfolio risk measures: return, Sharpe ratio, maximal drawdown and Calmar ratio.

\acks

The authors acknowledge the support of Algorithmic Laboratories Ltd (AlgoLabs) and their their parent company Equiti Group in establishing and develo** this research. Special thanks go to Xudong Li, Tzyy Tong and Samuel Manoharan for setting up the servers necessary to run our experiments. Further thanks go to Simon Tavaré for useful insights.

References

  • Al-baghdadi et al. (2019) Najim Al-baghdadi, Wojciech Wisniewski, Yuri Kalnishkan, Christopher Watkins, Siân Lindsay, and David Lindsay. Structuring time series data to gain insight into agent behaviour. 12 2019. 10.1109/BigData47090.2019.9006346.
  • Al-Baghdadi et al. (2020) Najim Al-Baghdadi, David Lindsay, Yuri Kalnishkan, and Sian Lindsay. Practical investment with the long-short game. In Proceedings of the Ninth Symposium on Conformal and Probabilistic Prediction and Applications, volume 128 of Proceedings of Machine Learning Research, pages 209–228, Verona, Italy, 09–11 Sep 2020. PMLR. URL http://proceedings.mlr.press/v128/al-baghdadi20a.html.
  • Aoki (2000) Masanao Aoki. Cluster size distributions of economic agents of many types in a market. Journal of Mathematical Analysis and Applications, 249:32–52, 09 2000. 10.1006/jmaa.2000.6935.
  • Baltakiene et al. (2019) Margarita Baltakiene, Kestutis Baltakys, Juho Kanniainen, Dino Pedreschi, and Fabrizio Lillo. Clusters of investors around initial public offering. Palgrave Communications, 5, 12 2019. 10.1057/s41599-019-0342-6.
  • Baltakys et al. (2021) Kestutis Baltakys, Hung Le Viet, and Juho Kanniainen. Structure of investor networks and financial crises. Entropy, 23(4), 2021. ISSN 1099-4300. 10.3390/e23040381.
  • Barreau et al. (2020) Baptiste Barreau, Laurent Carlier, and Damien Challet. Deep prediction of investor interest: A supervised clustering approach. Algorithmic Finance, pages 1–13, 06 2020. 10.3233/AF-200296.
  • Bohlin and Rosvall (2014) Ludvig Bohlin and Martin Rosvall. Stock portfolio structure of individual investors infers future trading behavior. PLoS ONE, 9(7), Jul 2014. ISSN 19326203193262031932-62031932 - 6203. 10.1371/journal.pone.0103006formulae-sequence10.1371𝑗𝑜𝑢𝑟𝑛𝑎𝑙𝑝𝑜𝑛𝑒.010300610.1371/journal.pone.010300610.1371 / italic_j italic_o italic_u italic_r italic_n italic_a italic_l . italic_p italic_o italic_n italic_e .0103006. URL $http://dx.doi.org/10.1371/journal.pone.0103006$.
  • Challet et al. (2018) Damien Challet, Rémy Chicheportiche, Mehdi Lallouache, and Serge Kassibrakis. Statistically validated leadlag networks and inventory prediction in the foreign exchange market. Advances in Complex Systems, December 2018. 10.1142/S0219525918500194.
  • Cordi et al. (2020) Marcus Cordi, Damien Challet, and Serge Kassibrakis. The market nanostructure origin of asset price time reversal asymmetry, 2020.
  • da Silva et al. (2020) Poly H. da Silva, Arash Jamshidpey, and Simon Tavaré. Random derangements and the ewens sampling formula, 2020.
  • Ewens (1972) W.J. Ewens. The sampling theory of selectively neutral alleles. Theoretical Population Biology, 3(1):87 – 112, 1972. ISSN 0040-5809. https://doi.org/10.1016/0040-5809(72)90035-4.
  • Gutiérrez-Roig et al. (2019) Mario Gutiérrez-Roig, Javier Borge-Holthoefer, Alex Arenas, and Josep Perellȯ. Map** individual behavior in financial markets: synchronization and anticipation. EPJ Data Science, 8, 03 2019. 10.1140/epjds/s13688-019-0188-6.
  • Lancichinetti and Fortunato (2009) Andrea Lancichinetti and Santo Fortunato. Community detection algorithms: A comparative analysis. Phys. Rev. E, 80:056117, Nov 2009. 10.1103/PhysRevE.80.056117.
  • Liechti and Bonhoeffer (2020) Jonas I. Liechti and Sebastian Bonhoeffer. A time resolved clustering method revealing longterm structures and their short-term internal dynamics, 2020.
  • Maillard et al. (2010) Sébastien Maillard, Thierry Roncalli, and Jérôme Teïletche. The properties of equally weighted risk contribution portfolios. The Journal of Portfolio Management, 36(4):60–70, 2010. ISSN 0095-4918. 10.3905/jpm.2010.36.4.060.
  • Mantegna (2020) Rosario N. Mantegna. Clusters of Traders in Financial Markets, pages 203–212. Springer Singapore, 2020. ISBN 978-981-15-4806-2. 10.1007/97898115480621010.1007978981154806subscript21010.1007/978-981-15-4806-2_{1}010.1007 / 978 - 981 - 15 - 4806 - 2 start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 0.
  • Musciotto et al. (2016) Federico Musciotto, Luca Marotta, Salvatore Micciche, Jyrki Piilo, and Rosario N. Mantegna. Patterns of trading profiles at the nordic stock exchange. a correlation-based approach. Chaos, Solitons & Fractals, 88:267 – 278, 2016. ISSN 0960-0779. https://doi.org/10.1016/j.chaos.2016.02.027.
  • Musciotto et al. (2018) Federico Musciotto, Luca Marotta, Jyrki Piilo, and Rosario Mantegna. Long-term ecology of investors in a financial market. Palgrave Communications, 4:92, 07 2018. 10.1057/s41599-018-0145-1.
  • Rosvall and Bergstrom (2008) M. Rosvall and C. T. Bergstrom. Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences, 105(4):1118–1123, Jan 2008. ISSN 1091-6490. 10.1073/pnas.0706851105.
  • Sueshige et al. (2018) Takumi Sueshige, Kiyoshi Kanazawa, Hideki Takayasu, and Misako Takayasu. Ecology of trading strategies in a forex market for limit and market orders. PLOS ONE, 13(12):1–14, 12 2018. 10.1371/journal.pone.0208332.
  • Tumminello et al. (2011a) Michele Tumminello, Fabrizio Lillo, Jyrki Piilo, and Rosario Mantegna. Identification of clusters of investors from their real trading activity in a financial market. New Journal of Physics, 14, 07 2011a. 10.2139/ssrn.1890584.
  • Tumminello et al. (2011b) Michele Tumminello, Salvatore Micciche, Fabrizio Lillo, Jyrki Piilo, and Rosario Mantegna. Statistically validated networks in bipartite complex systems. PloS one, 6:e17994, 03 2011b. 10.1371/journal.pone.0017994.
  • Vovk (1998) V Vovk. A game of prediction with expert advice. Journal of Computer and System Sciences, 56(2):153–173, 1998. ISSN 0022-0000. https://doi.org/10.1006/jcss.1997.1556.
  • Vovk and Watkins (1998) V. Vovk and C. Watkins. Universal portfolio selection. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, COLT’ 98, page 12–23, New York, NY, USA, 1998. Association for Computing Machinery. ISBN 1581130570. 10.1145/279943.279947.
  • Vovk (1990) V. G. Vovk. Aggregating strategies. Proc. of Computational Learning Theory, 1990, 1990. URL https://ci.nii.ac.jp/naid/10021342782/en/.
  • V’yugin (2013) Vladimir V’yugin. Universal algorithm for trading in stock market based on the method of calibration. In Sanjay Jain, Rémi Munos, Frank Stephan, and Thomas Zeugmann, editors, Algorithmic Learning Theory, pages 53–67, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg. ISBN 978-3-642-40935-6.
  • V’yugin and Trunov (2022) Vladimir V’yugin and Vladimir Trunov. Online aggregation of probability forecasts with confidence. Pattern Recognition, 121(C), jan 2022. ISSN 0031-3203. 10.1016/j.patcog.2021.108193. URL https://doi.org/10.1016/j.patcog.2021.108193.
  • Zhang and Yang (2017) Yong Zhang and Xingyu Yang. Online portfolio selection strategy based on combining experts’ advice. Comput. Econ., 50(1):141–159, jun 2017. ISSN 0927-7099. 10.1007/s10614-016-9585-0. URL https://doi.org/10.1007/s10614-016-9585-0.

Appendix: Clusterised AA bound

In this section, we will discuss when it is beneficial to run AA on (equally weighted) cluster experts rather than the original experts and connect this with our intuition about the performance of traders. The analysis will be done on an artificial example but the conclusion is instructive.

Suppose that we have m𝑚mitalic_m identical experts in a pool of N𝑁Nitalic_N. One may want to collate them into one; there is no need though as this is done by the AA automatically. The behaviour of the AA would be the same as if one expert with the combined weight is present in the pool. Assuming the uniform distribution on N𝑁Nitalic_N original experts, the weight of the combined expert will be m/N𝑚𝑁m/Nitalic_m / italic_N and the loss bound for the duplicated experts Eisubscript𝐸𝑖E_{i}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (assuming the mixable case C=1𝐶1C=1italic_C = 1) turns into

LossT(L)LossT(Ei)+1ηlnNm.subscriptLoss𝑇𝐿subscriptLoss𝑇subscript𝐸𝑖1𝜂𝑁𝑚\mathop{\mathrm{Loss}}\nolimits_{T}(L)\leq\mathop{\mathrm{Loss}}\nolimits_{T}(% E_{i})+\frac{1}{\eta}\ln\frac{N}{m}.roman_Loss start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_L ) ≤ roman_Loss start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + divide start_ARG 1 end_ARG start_ARG italic_η end_ARG roman_ln divide start_ARG italic_N end_ARG start_ARG italic_m end_ARG .

This is a stronger bound and if the performance of the expert is actually good, it leads to lower LossT(L)subscriptLoss𝑇𝐿\mathop{\mathrm{Loss}}\nolimits_{T}(L)roman_Loss start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_L ). However, if duplicate experts perform badly, they create a problem: increasing N𝑁Nitalic_N worsens the bound for good experts.

Suppose that we have M𝑀Mitalic_M clusters of experts of cardinalities c1,..,cMc_{1},..,c_{M}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , . . , italic_c start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT. Let all experts in each cluster be identical and suffer the same cumulative loss. Applying AA to cluster meta experts (with equal initial weights) will give us the loss bound Usubscript𝑈U_{-}italic_U start_POSTSUBSCRIPT - end_POSTSUBSCRIPT and applying AA to the original experts will give us the loss bound Usubscript𝑈U_{*}italic_U start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT:

Usubscript𝑈\displaystyle U_{-}italic_U start_POSTSUBSCRIPT - end_POSTSUBSCRIPT =\displaystyle== mini=1,2,,M{LossT(ECi)+1ηlnM}=LossT(E)+1ηlnM,subscript𝑖12𝑀subscriptLoss𝑇subscript𝐸subscript𝐶𝑖1𝜂𝑀subscriptLoss𝑇subscript𝐸1𝜂𝑀\displaystyle\min_{i=1,2,\ldots,M}\Big{\{}\mathop{\mathrm{Loss}}\nolimits_{T}(% E_{C_{i}})+\frac{1}{\eta}\ln M\Big{\}}=\mathop{\mathrm{Loss}}\nolimits_{T}(E_{% *})+\frac{1}{\eta}\ln M,roman_min start_POSTSUBSCRIPT italic_i = 1 , 2 , … , italic_M end_POSTSUBSCRIPT { roman_Loss start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_E start_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + divide start_ARG 1 end_ARG start_ARG italic_η end_ARG roman_ln italic_M } = roman_Loss start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_E start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) + divide start_ARG 1 end_ARG start_ARG italic_η end_ARG roman_ln italic_M ,
Usubscript𝑈\displaystyle{U_{*}}italic_U start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT =\displaystyle== mini=1,2,,M{LossT(ECi)+1ηlnNci}=LossT(ECi0)+1ηlnNci0,subscript𝑖12𝑀subscriptLoss𝑇subscript𝐸subscript𝐶𝑖1𝜂𝑁subscript𝑐𝑖subscriptLoss𝑇subscript𝐸subscript𝐶subscript𝑖01𝜂𝑁subscript𝑐subscript𝑖0\displaystyle\min_{i=1,2,\ldots,M}\Big{\{}\mathop{\mathrm{Loss}}\nolimits_{T}(% E_{C_{i}})+\frac{1}{\eta}\ln\frac{N}{c_{i}}\Big{\}}=\mathop{\mathrm{Loss}}% \nolimits_{T}(E_{C_{i_{0}}})+\frac{1}{\eta}\ln\frac{N}{c_{i_{0}}},roman_min start_POSTSUBSCRIPT italic_i = 1 , 2 , … , italic_M end_POSTSUBSCRIPT { roman_Loss start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_E start_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + divide start_ARG 1 end_ARG start_ARG italic_η end_ARG roman_ln divide start_ARG italic_N end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG } = roman_Loss start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_E start_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + divide start_ARG 1 end_ARG start_ARG italic_η end_ARG roman_ln divide start_ARG italic_N end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ,

where ECisubscript𝐸subscript𝐶𝑖E_{C_{i}}italic_E start_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT is an expert from cluster i𝑖iitalic_i, Esubscript𝐸E_{*}italic_E start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT is the best expert overall, and i0subscript𝑖0i_{0}italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the number of the cluster where the minimum in Usubscript𝑈U_{*}italic_U start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT is achieved.

We get that

UUci0NMeη[LossT(ECi0)LossT(E)],iffsubscript𝑈subscript𝑈subscript𝑐subscript𝑖0𝑁𝑀superscript𝑒𝜂delimited-[]subscriptLoss𝑇subscript𝐸subscript𝐶subscript𝑖0subscriptLoss𝑇subscript𝐸\hskip 88.89178ptU_{-}\leq U_{*}\iff c_{i_{0}}\leq\frac{N}{M}e^{\eta[\mathop{% \mathrm{Loss}}\nolimits_{T}(E_{C_{i_{0}}})-\mathop{\mathrm{Loss}}\nolimits_{T}% (E_{*})]},italic_U start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ≤ italic_U start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ⇔ italic_c start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≤ divide start_ARG italic_N end_ARG start_ARG italic_M end_ARG italic_e start_POSTSUPERSCRIPT italic_η [ roman_Loss start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_E start_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) - roman_Loss start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_E start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) ] end_POSTSUPERSCRIPT , (11)

where LossT(ECi0)LossT(E)0subscriptLoss𝑇subscript𝐸subscript𝐶subscript𝑖0subscriptLoss𝑇subscript𝐸0\mathop{\mathrm{Loss}}\nolimits_{T}(E_{C_{i_{0}}})-\mathop{\mathrm{Loss}}% \nolimits_{T}(E_{*})\geq 0roman_Loss start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_E start_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) - roman_Loss start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_E start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) ≥ 0. This means that the bound with cluster meta experts is better when there are no good experts in large clusters.

As the practice of trading shows, good trades are usually few and make a minority, which is one of the justification for the cluster AA. Cluster AA gives an advantage to smaller clusters.