Search | arXiv e-print repository

Vector Autoregression in Cryptocurrency Markets: Unraveling Complex Causal Networks

Authors: Cameron Cornell, Lewis Mitchell, Matthew Roughan

Abstract: Methodologies to infer financial networks from the price series of speculative assets vary, however, they generally involve bivariate or multivariate predictive modelling to reveal causal and correlational structures within the time series data. The required model complexity intimately relates to the underlying market efficiency, where one expects a highly developed and efficient market to display… ▽ More Methodologies to infer financial networks from the price series of speculative assets vary, however, they generally involve bivariate or multivariate predictive modelling to reveal causal and correlational structures within the time series data. The required model complexity intimately relates to the underlying market efficiency, where one expects a highly developed and efficient market to display very few simple relationships in price data. This has spurred research into the applications of complex nonlinear models for developed markets. However, it remains unclear if simple models can provide meaningful and insightful descriptions of the dependency and interconnectedness of the rapidly developed cryptocurrency market. Here we show that multivariate linear models can create informative cryptocurrency networks that reflect economic intuition, and demonstrate the importance of high-influence nodes. The resulting network confirms that node degree, a measure of influence, is significantly correlated to the market capitalisation of each coin ($ρ=0.193$). However, there remains a proportion of nodes whose influence extends beyond what their market capitalisation would imply. We demonstrate that simple linear model structure reveals an inherent complexity associated with the interconnected nature of the data, supporting the use of multivariate modelling to prevent surrogate effects and achieve accurate causal representation. In a reductive experiment we show that most of the network structure is contained within a small portion of the network, consistent with the Pareto principle, whereby a fraction of the inputs generates a large proportion of the effects. Our results demonstrate that simple multivariate models provide nontrivial information about cryptocurrency market dynamics, and that these dynamics largely depend upon a few key high-influence coins. △ Less

Submitted 30 August, 2023; originally announced August 2023.

arXiv:2211.05350 [pdf, other]

The entropy rate of Linear Additive Markov Processes

Authors: Bridget Smart, Matthew Roughan, Lewis Mitchell

Abstract: This work derives a theoretical value for the entropy of a Linear Additive Markov Process (LAMP), an expressive model able to generate sequences with a given autocorrelation structure. While a first-order Markov Chain model generates new values by conditioning on the current state, the LAMP model takes the transition state from the sequence's history according to some distribution which does not h… ▽ More This work derives a theoretical value for the entropy of a Linear Additive Markov Process (LAMP), an expressive model able to generate sequences with a given autocorrelation structure. While a first-order Markov Chain model generates new values by conditioning on the current state, the LAMP model takes the transition state from the sequence's history according to some distribution which does not have to be bounded. The LAMP model captures complex relationships and long-range dependencies in data with similar expressibility to a higher-order Markov process. While a higher-order Markov process has a polynomial parameter space, a LAMP model is characterised only by a probability distribution and the transition matrix of an underlying first-order Markov Chain. We prove that the theoretical entropy rate of a LAMP is equivalent to the theoretical entropy rate of the underlying first-order Markov Chain. This surprising result is explained by the randomness introduced by the random process which selects the LAMP transitioning state, and provides a tool to model complex dependencies in data while retaining useful theoretical results. We use the LAMP model to estimate the entropy rate of the LastFM, BrightKite, Wikispeedia and Reuters-21578 datasets. We compare estimates calculated using frequency probability estimates, a first-order Markov model and the LAMP model, and consider two approaches to ensuring the transition matrix is irreducible. In most cases the LAMP entropy rates are lower than those of the alternatives, suggesting that LAMP model is better at accommodating structural dependencies in the processes. △ Less

Submitted 9 January, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

Comments: 9 pages, code available on Github

arXiv:2208.07038 [pdf, other]

doi 10.1007/978-3-031-19097-1_3

#IStandWithPutin versus #IStandWithUkraine: The interaction of bots and humans in discussion of the Russia/Ukraine war

Authors: Bridget Smart, Joshua Watt, Sara Benedetti, Lewis Mitchell, Matthew Roughan

Abstract: The 2022 Russian invasion of Ukraine emphasises the role social media plays in modern-day warfare, with conflict occurring in both the physical and information environments. There is a large body of work on identifying malicious cyber-activity, but less focusing on the effect this activity has on the overall conversation, especially with regards to the Russia/Ukraine Conflict. Here, we employ a va… ▽ More The 2022 Russian invasion of Ukraine emphasises the role social media plays in modern-day warfare, with conflict occurring in both the physical and information environments. There is a large body of work on identifying malicious cyber-activity, but less focusing on the effect this activity has on the overall conversation, especially with regards to the Russia/Ukraine Conflict. Here, we employ a variety of techniques including information theoretic measures, sentiment and linguistic analysis, and time series techniques to understand how bot activity influences wider online discourse. By aggregating account groups we find significant information flows from bot-like accounts to non-bot accounts with behaviour differing between sides. Pro-Russian non-bot accounts are most influential overall, with information flows to a variety of other account groups. No significant outward flows exist from pro-Ukrainian non-bot accounts, with significant flows from pro-Ukrainian bot accounts into pro-Ukrainian non-bot accounts. We find that bot activity drives an increase in conversations surrounding angst (with p = 2.450 x 1e-4) as well as those surrounding work/governance (with p = 3.803 x 1e-18). Bot activity also shows a significant relationship with non-bot sentiment (with p = 3.76 x 1e-4), where we find the relationship holds in both directions. This work extends and combines existing techniques to quantify how bots are influencing people in the online conversation around the Russia/Ukraine invasion. It opens up avenues for researchers to understand quantitatively how these malicious campaigns operate, and what makes them impactful. △ Less

Submitted 19 August, 2022; v1 submitted 15 August, 2022; originally announced August 2022.

Comments: 12 pages, 7 figures, to be published in SocInfo 2022. Dataset available at https://figshare.com/articles/dataset/Tweet_IDs_Botometer_results/20486910

arXiv:2205.06029 [pdf]

doi 10.1016/j.osnem.2022.100231

Information flow estimation: a study of news on Twitter

Authors: Tobin South, Bridget Smart, Matthew Roughan, Lewis Mitchell

Abstract: News media has long been an ecosystem of creation, reproduction, and critique, where news outlets report on current events and add commentary to ongoing stories. Understanding the dynamics of news information creation and dispersion is important to accurately ascribe credit to influential work and understand how societal narratives develop. These dynamics can be modelled through a combination of i… ▽ More News media has long been an ecosystem of creation, reproduction, and critique, where news outlets report on current events and add commentary to ongoing stories. Understanding the dynamics of news information creation and dispersion is important to accurately ascribe credit to influential work and understand how societal narratives develop. These dynamics can be modelled through a combination of information-theoretic natural language processing and networks; and can be parameterised using large quantities of textual data. However, it is challenging to see "the wood for the trees", i.e., to detect small but important flows of information in a sea of noise. Here we develop new comparative techniques to estimate temporal information flow between pairs of text producers. Using both simulated and real text data we compare the reliability and sensitivity of methods for estimating textual information flow, showing that a metric that normalises by local neighbourhood structure provides a robust estimate of information flow in large networks. We apply this metric to a large corpus of news organisations on Twitter and demonstrate its usefulness in identifying influence within an information ecosystem, finding that average information contribution to the network is not correlated with the number of followers or the number of tweets. This suggests that small local organisations and right-wing organisations which have lower average follower counts still contribute significant information to the ecosystem. Further, the methods are applied to smaller full-text datasets of specific news events across news sites and Russian troll accounts on Twitter. The information flow estimation reveals and quantifies features of how these events develop and the role of groups of trolls in setting disinformation narratives. △ Less

Submitted 28 September, 2022; v1 submitted 12 May, 2022; originally announced May 2022.

Journal ref: Online Social Networks and Media, Volume 31, September 2022, 100231

arXiv:1908.03318 [pdf, other]

Bayesian inference of network structure from information cascades

Authors: Caitlin Gray, Lewis Mitchell, Matthew Roughan

Abstract: Contagion processes are strongly linked to the network structures on which they propagate, and learning these structures is essential for understanding and intervention on complex network processes such as epidemics and (mis)information propagation. However, using contagion data to infer network structure is a challenging inverse problem. In particular, it is imperative to have appropriate measure… ▽ More Contagion processes are strongly linked to the network structures on which they propagate, and learning these structures is essential for understanding and intervention on complex network processes such as epidemics and (mis)information propagation. However, using contagion data to infer network structure is a challenging inverse problem. In particular, it is imperative to have appropriate measures of uncertainty in network structure estimates, however these are largely ignored in most machine-learning approaches. We present a probabilistic framework that uses samples from the distribution of networks that are compatible with the dynamics observed to produce network and uncertainty estimates. We demonstrate the method using the well known independent cascade model to sample from the distribution of networks P(G) conditioned on the observation of a set of infections C. We evaluate the accuracy of the method by using the marginal probabilities of each edge in the distribution, and show the bene ts of quantifying uncertainty to improve estimates and understanding, particularly with small amounts of data. △ Less

Submitted 9 August, 2019; originally announced August 2019.

arXiv:1906.08403 [pdf, other]

How the Avengers assemble: Ecological modelling of effective cast sizes for movies

Authors: Matthew Roughan, Lewis Mitchell, Tobin South

Abstract: The number of characters in a movie is an interesting feature. However, it is non-trivial to measure directly. Naive metrics such as the number of credited characters vary wildly. Here, we show that a metric based on the notion of "ecological diversity" as expressed through a Shannon-entropy based metric can characterise the number of characters in a movie, and is useful in taxonomic classificatio… ▽ More The number of characters in a movie is an interesting feature. However, it is non-trivial to measure directly. Naive metrics such as the number of credited characters vary wildly. Here, we show that a metric based on the notion of "ecological diversity" as expressed through a Shannon-entropy based metric can characterise the number of characters in a movie, and is useful in taxonomic classification. We also show how the metric can be generalised using Jensen-Shannon divergence to provide a measure of the similarity of characters appearing in different movies, for instance of use in recommender systems, e.g., Netflix. We apply our measures to the Marvel Cinematic Universe (MCU), and show what they teach us about this highly successful franchise of movies. In particular, these measures provide a useful predictor of "success" for films in the MCU, as well as a natural means to understand the relationships between the stories in the overall film arc. △ Less

Submitted 19 June, 2019; originally announced June 2019.

arXiv:1811.01467 [pdf, other]

The one comparing narrative social network extraction techniques

Authors: Michelle Edwards, Lewis Mitchell, Jonathan Tuke, Matthew Roughan

Abstract: Analysing narratives through their social networks is an expanding field in quantitative literary studies. Manually extracting a social network from any narrative can be time consuming, so automatic extraction methods of varying complexity have been developed. However, the effect of different extraction methods on the analysis is unknown. Here we model and compare three extraction methods for soci… ▽ More Analysing narratives through their social networks is an expanding field in quantitative literary studies. Manually extracting a social network from any narrative can be time consuming, so automatic extraction methods of varying complexity have been developed. However, the effect of different extraction methods on the analysis is unknown. Here we model and compare three extraction methods for social networks in narratives: manual extraction, co-occurrence automated extraction and automated extraction using machine learning. Although the manual extraction method produces more precise results in the network analysis, it is much more time consuming and the automatic extraction methods yield comparable conclusions for density, centrality measures and edge weights. Our results provide evidence that social networks extracted automatically are reliable for many analyses. We also describe which aspects of analysis are not reliable with such a social network. We anticipate that our findings will make it easier to analyse more narratives, which help us improve our understanding of how stories are written and evolve, and how people interact with each other. △ Less

Submitted 4 November, 2018; originally announced November 2018.

arXiv:1802.05039 [pdf, other]

Super-blockers and the effect of network structure on information cascades

Authors: Caitlin Gray, Lewis Mitchell, Matthew Roughan

Abstract: Modelling information cascades over online social networks is important in fields from marketing to civil unrest prediction, however the underlying network structure strongly affects the probability and nature of such cascades. Even with simple cascade dynamics the probability of large cascades are almost entirely dictated by network properties, with well-known networks such as Erdos-Renyi and Bar… ▽ More Modelling information cascades over online social networks is important in fields from marketing to civil unrest prediction, however the underlying network structure strongly affects the probability and nature of such cascades. Even with simple cascade dynamics the probability of large cascades are almost entirely dictated by network properties, with well-known networks such as Erdos-Renyi and Barabasi-Albert producing wildly different cascades from the same model. Indeed, the notion of 'superspreaders' has arisen to describe highly influential nodes promoting global cascades in a social network. Here we use a simple model of global cascades to show that the presence of locality in the network increases the probability of a global cascade due to the increased vulnerability of connecting nodes. Rather than 'super-spreaders', we find that the presence of these highly connected 'super-blockers' in heavy-tailed networks in fact reduces the probability of global cascades, while promoting information spread when targeted as the initial spreader. △ Less

Submitted 21 March, 2018; v1 submitted 14 February, 2018; originally announced February 2018.

Showing 1–8 of 8 results for author: Roughan, M