License: arXiv.org perpetual non-exclusive license
arXiv:2403.03504v1 [cs.DS] 06 Mar 2024

Graph Visualization for Blockchain Data

Marcell Dietl frontmark GmbH, Taunusstraße 63, 65183 Wiesbaden, Germany. Andre Gemünd Fraunhofer Institute for Algorithms and Scientific Computing SCAI, Schloss Birlinghoven, 53757 Sankt Augustin, Germany. Daniel Oeltz Fraunhofer Institute for Algorithms and Scientific Computing SCAI, Schloss Birlinghoven, 53757 Sankt Augustin, Germany. Felix M. Thiele Fraunhofer Institute for Algorithms and Scientific Computing SCAI, Schloss Birlinghoven, 53757 Sankt Augustin, Germany. Christian Werner frontmark GmbH, Taunusstraße 63, 65183 Wiesbaden, Germany.
(February 2024)
Abstract

In this report, we introduce a novel approach to visualize extremely large graphs efficiently. Our method combines two force-directed algorithms, Kamada-Kawai and ForceAtlas2, to handle different graph components based on their node count. Additionally, we suggest utilizing the Fast Multipole method to enhance the speed of ForceAtlas2. Although initially designed for analyzing bitcoin transaction graphs, for which we present results here, this algorithm can also be applied to other crypto currency transaction graphs or graphs from diverse domains.

1 Introduction

Blockchain technology is gaining increasing importance across various fields such as healthcare [8], supply chain management [18], finance [17], energy [1], voting systems [11] and more [22]. As the significance of blockchain technology continues to expand, there is a corresponding rise in the demand for methodologies to analyze blockchain data. A crucial aspect of such methodologies is the development of algorithms that facilitate data visualization to enable users to discern underlying patterns and structures with greater clarity. In this report, we discuss an approach to visualize so-called transaction graphs that typically arise in the context of crypto currency data. Here, in contrast to other approaches, we put special focus on the efficiency of our method w.r.t. the number of nodes and edges in the graph to handle the massive amount of transaction data. It is worth noting that the algorithm proposed herein is not limited to transaction graphs but can be employed in scenarios where large graphs require visualization. Such scenarios may also occur within the domains mentioned earlier.

2 Bitcoin Transaction Graphs

Refer to caption
Figure 1: Number of bitcoin transactions and addresses on a monthly basis.

Bitcoin is a digital currency that has experienced significant growth and popularity since it was launched in 2009. The number of bitcoin transactions and new addresses has significantly increased over time, as we can see in Figure 1, which shows the history of the number of transactions and new addresses on a monthly basis. Bitcoin is not issued by any central organization. Instead, it operates with a public ledger, also known as the blockchain [16]. Therefore, bitcoin, among other crypto currencies, provides a great framework for studying transaction behavior. The transaction graphs that arise typically become very large and, thus, hard to process. Bitcoin processes around seven transactions per second, which sums up to around half a million transactions a day. Even worse, each transaction can contain many participating entities.

The bitcoin transaction data is encoded in the blockchain. The blockchain consists of many blocks organized in a linear ordering over time, see Figure 2 for a schematic illustration of the chain.

Each block has two main parts: the header section and the list of transactions. The header section contains general information about the block, such as the time it was created and a reference to the previous block. The list of transactions, on the other hand, is composed of inputs and outputs. Inputs refer to entities that send value, while outputs refer to entities that receive value. Each output contains a value to be received and a script that must be solved to authorize the spending of the value. Each input, on the other hand, consists of a hash of a previous transaction and a script that solves the problem to one of the outputs, thereby authorizing the spending of its value. It is worth noting that no value is required as an input since it spends the entire amount of a previous output.

Refer to caption
Figure 2: Schematic illustration of a blockchain.

In most cases, both the input and output scripts follow one of a couple of standardized formats, where the output problems, containing a public key, can easily be solved with the knowledge of a private key. The public keys can be interpreted as addresses. From this, we can build a first transaction graph, the vertices being the addresses and the transactions being directed hyperedges.

Working with hyperedges, albeit structurally encoded into the bitcoin transaction format, is not very practical. Following [3], we will define the bitcoin transaction graph as follows. The transaction graph is a bipartite graph composed of two sets of vertices: addresses and transactions. An edge exists between an address and a transaction if the former served as an input in the transaction. Conversely, an edge exists between a transaction and an address if the latter was an output in the transaction.

Note that every user usually corresponds to multiple addresses in this resulting graph. The work of [21] suggests to cluster addresses used as inputs to the same transaction. Another heuristic in [21] tries to abuse the fact that, by definition, the input address always spends all of its value in a transaction, so a user wanting to spend some partial value of an address might create a new address to receive the remaining value of the input address.

Better clustering is closely related to the anonymity of the blockchain [19], [6], [20].

Our goal is to visualize this bitcoin transaction graph in certain time frames, for example, the visualization of transactions of one block or one day.

3 Force-Directed Algorithms

In this section, we describe the two force-directed algorithms we use as building blocks for the method described in this report. Force-directed algorithms are inspired by physical particle systems where the minimization of a certain energy functional w.r.t. the node locations leads to the layout. Here, the energy functional is induced by one or multiple forces defined between the nodes of the graph.

3.1 Algorithm from Kamada and Kawai

In this section, we describe the algorithm from Kamada and Kawai [13] to visualize graphs. The algorithm computes coordinates pi2subscript𝑝𝑖superscript2p_{i}\in\mathbb{R}^{2}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for each vertex visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, 1iN1𝑖𝑁1\leq i\leq N1 ≤ italic_i ≤ italic_N, in the graph by minimizing the energy norm

i=1Nj=i+1N1dij2(|pipj|lij)2.superscriptsubscript𝑖1𝑁superscriptsubscript𝑗𝑖1𝑁1superscriptsubscript𝑑𝑖𝑗2superscriptsubscript𝑝𝑖subscript𝑝𝑗subscript𝑙𝑖𝑗2\sum_{i=1}^{N}\sum_{j=i+1}^{N}\frac{1}{d_{ij}^{2}}\left(|p_{i}-p_{j}|-l_{ij}% \right)^{2}.∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_d start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( | italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | - italic_l start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Here, dijsubscript𝑑𝑖𝑗d_{ij}italic_d start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT denotes the length of the shortest path between vertex i𝑖iitalic_i and j𝑗jitalic_j and lij:=ldijassignsubscript𝑙𝑖𝑗𝑙subscript𝑑𝑖𝑗l_{ij}:=l\cdot d_{ij}italic_l start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT := italic_l ⋅ italic_d start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT for some rescaling constant l>0𝑙0l>0italic_l > 0. To minimize the energy, we use the Newton-Raphson method. Note that to find the shortest path between each pair of vertices, we use the Floyd–Warshall algorithm [4], which has a runtime of O(|V|3)𝑂superscript𝑉3O(|V|^{3})italic_O ( | italic_V | start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) and needs O(|V|2)𝑂superscript𝑉2O(|V|^{2})italic_O ( | italic_V | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) storage for the pairwise distances. One may further reduce the runtime for our sparse transactions graphs to O(|V|2log|V|+|E||V|)𝑂superscript𝑉2𝑉𝐸𝑉O(|V|^{2}\log|V|+|E||V|)italic_O ( | italic_V | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log | italic_V | + | italic_E | | italic_V | ) by using Johnson’s algorithm [12]. However, in both cases, typical daily bitcoin transaction graphs with the number of nodes in the range of hundreds of thousands or even millions are too large to be directly handled by this approach.

One can further speed up this algorithm by clustering portions of the graph to reduce the number of vertices, rendering the layout, and finally, declustering. Examples of such clustering/hierarchical methods are:

  1. 1.

    Contracting nodes with only one edge to their neighbours. Then declustering can be done by locating this node using the given distance of the edge to the neighbour away from the center of gravity of the layout.

  2. 2.

    Removing nodes with only two edges and replacing both edges with a single edge. To decluster, simply replace the new edges with two edges with a node in the center.

Compare also [14] for a discussion about handling large graphs to speed up the algorithm. However, we apply a different method in the case of components with many edges or vertices that lead to good results in a lot of examples, which we will describe in the following section.

3.2 ForceAtlas2 with Fast Multipole

The approach presented here belongs to a line of approaches simulating a physical system where nodes repulse each other and edges attract the incident nodes [5, 15, 9]. It is closely related to the ForceAtlas2 algorithm [10] while we are replacing the Barnes-Hut algorithm with the Fast Multipole method, see subsection 3.2.1.

In each iteration, we calculate several forces that apply to the different nodes. For each edge, we calculate an attraction force on the incident nodes, which is proportional to the distance between incident nodes. We further have a gravity force pulling all nodes to the center, proportional to the distance to the center. Finally, we have repulsion forces between all node pairs proportional to one over their distance.

The next step is to apply the calculated forces to each node. We follow [10] on how to choose the step size. For this, in each iteration t𝑡titalic_t, we define two values, the swing:

swg=n|Ft(n)Ft1(n)|𝑠𝑤𝑔subscript𝑛subscript𝐹𝑡𝑛subscript𝐹𝑡1𝑛swg=\sum_{n}|F_{t}(n)-F_{t-1}(n)|italic_s italic_w italic_g = ∑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n ) - italic_F start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( italic_n ) |

and traction:

trc=n|Ft(n)+Ft1(n)|𝑡𝑟𝑐subscript𝑛subscript𝐹𝑡𝑛subscript𝐹𝑡1𝑛trc=\sum_{n}|F_{t}(n)+F_{t-1}(n)|italic_t italic_r italic_c = ∑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n ) + italic_F start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( italic_n ) |

where Ft(n)subscript𝐹𝑡𝑛F_{t}(n)italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_n ) are the aggregated forces of a node n𝑛nitalic_n in iteration t𝑡titalic_t. Note that a big swing signals a large variance in the forces between iterations and, therefore, much more erratic movement. On the other hand, if the traction is large, this signals progress in some sense as we continue to push nodes in the same direction. Now, we iteratively adjust the step size to keep the ratio trcswg𝑡𝑟𝑐𝑠𝑤𝑔\frac{trc}{swg}divide start_ARG italic_t italic_r italic_c end_ARG start_ARG italic_s italic_w italic_g end_ARG in some tolerated interval.

3.2.1 Fast Multipole Algorithm

In each iteration of the force-directed algorithm, the forces on each node need to be calculated. As the bitcoin transaction graphs are not very dense, we can efficiently calculate the forces generated by the edge attraction and also the gravity attraction. However, computing the repulsion forces between each pair of nodes in a straightforward way is of quadratic complexity in the number of nodes, which may be very time-consuming considering the number of nodes of a typical transaction graph. Here, the Barnes-Hut algorithm [2], developed initially to speed up the computation for physical systems, reducing the complexity to O(Nlog(N))𝑂𝑁𝑁O(N\log(N))italic_O ( italic_N roman_log ( italic_N ) ), has been successfully applied in the context of graph visualization, see [10]. However, we can further improve complexity to O(N)𝑂𝑁O(N)italic_O ( italic_N ) by using the Fast Multipole algorithm [7], which we will briefly describe in the following.

In a first step, the algorithm computes a quad tree structure of the given particles. Here, we start with a square that covers all the particles. If a square contains more than a certain number of particles, we divide it into four new squares. Each of these squares is called a cell. This generates a tree structure of cells, a cell being the parent of another if it was generated by one subdivision of the parent. For each cell, we define the neighbouring cells to be those adjacent cells that have minimal size but are not smaller than the original cell. Further, we define interacting cells of a cell to be the minimal cells that are neighbouring the parent of the cell or are children of a cell neighbouring the parent.

The trick of the Fast Multipole method is to calculate two Taylor expansions of a chosen degree for each cell around its center. The first approximation is for the force that nodes act on particles that are far away, called the outgoing expansion. The second approximation is for the force that nodes in the cell feel from particles that are far away, called the incoming expansion. Here, the term “far away” refers to the nodes not located in neighbouring cells.

The idea is to calculate these expansions recursively. The recursive calculation of the expansions is performed by first computing the outgoing expansions. The method starts by computing the outgoing expansion for each leaf cell. Subsequently, the quad tree is traversed upwards, and the outgoing expansions of the four children of a cell are used to calculate its outgoing expansion. This is accomplished by shifting the center of the children’s outgoing expansions to the center of the new cell and adding them up. The quad tree is then traversed in reverse order to calculate the incoming expansions. This is accomplished by applying a center shift of the incoming expansion of the parent and adding all outgoing expansions of the interaction neighbours reformulated as incoming expansions of the cell to it.

Finally, we can calculate the force that applies to each single node. For each node, let the corresponding leaf cell be the minimal cell containing the node. Now, for every node, we apply the incoming expansion of its leaf cell to the node to get the force from far away nodes and additionally add all the forces from nodes in neighbouring cells of the leaf cell and the leaf cell itself to the node.

4 Final Algorithm

Refer to caption
Figure 3: Illustration of the overall algorithm for blockchain data visualization.

The final algorithm that we apply to visualize the transaction data now combines the algorithms above to optimize the tradeoff between quality and computational time.

Here, we first split the given graph into its components (ignoring the directions of the edges). Note that the typical transaction graph for a certain time period most often consists of many components, where most of the components consist of only a few nodes. Now, for each component, we calculate a separate layout. In cases with a few nodes, we opt for the Kamada-Kawai algorithm, which typically generates more expressive layouts. For components containing a high number of nodes, we employ the force-directed algorithm based on the Fast Multipole method described above.

These component layouts are then rescaled to have a similar density. Next, we assemble the separate component layouts into a single figure. To accomplish this, we construct a new graph with the components represented as nodes, connecting them with edges to form a tree structure. These edges are determined by selecting a random order and sequentially linking the next component to the largest previous component. We choose the edge sizes to be half of the diameters of both components summed up plus some constant. If this resulting graph is small enough, we apply the Kamada-Kawai algorithm. Otherwise, we divide it into several components and apply the Kamada-Kawai algorithm individually to each component and reassemble them. Figure 3 shows an illustration of the overall algorithm.

Figures 4 and 5 show the resulting visualization of the transaction graphs at two different points in time.

Refer to caption
Figure 4: A visualization of the transaction graph on April 19, 2011. Transaction nodes are red, address nodes are blue. Contains around 1.51041.5superscript1041.5\cdot 10^{4}1.5 ⋅ 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT nodes.
Refer to caption
Figure 5: A visualization of the transaction graph on December 12, 2013. Transaction nodes are red, address nodes are blue. Contains around 21052superscript1052\cdot 10^{5}2 ⋅ 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT nodes.

5 Summary

In this report we described an algorithm to efficiently visualize graphs with a very high number of nodes and edges. In our experiments, this algorithm produced meaningful graphs within a reasonable amount of computing time, while other algorithms were not applicable due to their computational complexity.

Although developed for analyzing bitcoin blockchain data, it may be worth applying this algorithm to other blockchain transaction graphs or large graphs from other domains.

We provide the source code of this algorithm in the public GitHub repository at https://github.com/frontmark/research.

References

  • [1] Jiabin Bao, Debiao He, Min Luo, and Kim-Kwang Raymond Choo. A survey of blockchain applications in the energy sector. IEEE Systems Journal, 15(3):3370–3381, sep 2021.
  • [2] Josh Barnes and Piet Hut. A hierarchical o(n log n) force-calculation algorithm. Nature, 324(6096):446–449, dec 1986.
  • [3] Michael Fleder, Michael S. Kester, and Sudeep Pillai. Bitcoin transaction graph analysis. CoRR, February 2015.
  • [4] Robert W. Floyd. Algorithm 97: Shortest path. Communications of the ACM, 5(6):345, jun 1962.
  • [5] Thomas M. J. Fruchterman and Edward M. Reingold. Graph drawing by force-directed placement. Software: Practice and Experience, 21(11):1129–1164, nov 1991.
  • [6] Steven Goldfeder, Harry Kalodner, Dillon Reisman, and Arvind Narayanan. When the cookie meets the blockchain: Privacy risks of web payments via cryptocurrencies. Proceedings on Privacy Enhancing Technologies, 2018, 08 2017.
  • [7] L Greengard and V Rokhlin. A fast algorithm for particle simulations. Journal of Computational Physics, 73(2):325–348, dec 1987.
  • [8] Anton Hasselgren, Katina Kralevska, Danilo Gligoroski, Sindre A. Pedersen, and Arild Faxvaag. Blockchain in healthcare and health sciences—a sco** review. International Journal of Medical Informatics, 134:104040, feb 2020.
  • [9] Yifan Hu. Efficient and high quality force-directed graph drawing. Mathematica Journal, 10:37–71, 01 2005.
  • [10] Mathieu Jacomy, Tommaso Venturini, Sebastien Heymann, and Mathieu Bastian. ForceAtlas2, a continuous graph layout algorithm for handy network visualization designed for the gephi software. PLoS ONE, 9(6):e98679, jun 2014.
  • [11] Uzma Jafar, Mohd Juzaiddin Ab Aziz, and Zarina Shukur. Blockchain for electronic voting system—review and open research challenges. Sensors, 21(17):5874, aug 2021.
  • [12] Donald B. Johnson. Efficient algorithms for shortest paths in sparse networks. Journal of the ACM, 24(1):1–13, January 1977.
  • [13] Tomihisa Kamada and Satoru Kawai. An algorithm for drawing general undirected graphs. Information Processing Letters, 31(1):7–15, apr 1989.
  • [14] Stephen Kobourov. Handbook of Graph Drawing and Visualization, chapter Force-Directed Algorithms, pages 383–408. Chapman & Hall, 2016.
  • [15] Shawn Martin, W. Michael Brown, Richard Klavans, and Kevin W. Boyack. OpenOrd: an open-source toolbox for large graph layout. In Visualization and Data Analysis 2011. SPIE, jan 2011.
  • [16] Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system. Decentralized business review, 2008.
  • [17] Ritesh Patel, Milena Migliavacca, and Marco E. Oriani. Blockchain in banking and finance: A bibliometric review. Research in International Business and Finance, 62:101718, dec 2022.
  • [18] Maciel M Queiroz, Renato Telles, and Silvia H Bonilla. Blockchain and supply chain management integration: a systematic review of the literature. Supply chain management: An international journal, 25(2):241–254, 2020.
  • [19] Fergal Reid and Martin Harrigan. An analysis of anonymity in the bitcoin system. In Security and Privacy in Social Networks, pages 197–223. Springer New York, jul 2012.
  • [20] QingChun ShenTu and Jian** Yu. Research on anonymization and de-anonymization in the bitcoin system. CoRR, October 2015.
  • [21] Yuhang Zhang, Jun Wang, and Jie Luo. Heuristic-based address clustering in bitcoin. IEEE Access, 8:210582–210591, 2020.
  • [22] Zibin Zheng, Shaoan ** Chen, and Huaimin Wang. Blockchain challenges and opportunities: a survey. International Journal of Web and Grid Services, 14(4):352, 2018.