-
The rich still get richer: Empirical comparison of preferential attachment via linking statistics in Bitcoin and Ethereum
Authors:
Dániel Kondor,
Nikola Bulatovic,
József Stéger,
István Csabai,
Gábor Vattay
Abstract:
Bitcoin and Ethereum transactions present one of the largest real-world complex networks that are publicly available for study, including a detailed picture of their time evolution. As such, they have received a considerable amount of attention from the network science community, beside analysis from an economic or cryptography perspective. Among these studies, in an analysis on the early instance…
▽ More
Bitcoin and Ethereum transactions present one of the largest real-world complex networks that are publicly available for study, including a detailed picture of their time evolution. As such, they have received a considerable amount of attention from the network science community, beside analysis from an economic or cryptography perspective. Among these studies, in an analysis on the early instance of the Bitcoin network, we have shown the clear presence of the preferential attachment, or "rich-get-richer" phenomenon. Now, we revisit this question, using a recent version of the Bitcoin network that has grown almost 100-fold since our original analysis. Furthermore, we additionally carry out a comparison with Ethereum, the second most important cryptocurrency. Our results show that preferential attachment continues to be a key factor in the evolution of both the Bitcoin and Ethereum transactoin networks. To facilitate further analysis, we publish a recent version of both transaction networks, and an efficient software implementation that is able to evaluate linking statistics necessary for learn about preferential attachment on networks with several hundred million edges.
△ Less
Submitted 23 February, 2021;
originally announced February 2021.
-
Kooplex: collaborative data analytics portal for advancing sciences
Authors:
Dávid Visontai,
József Stéger,
János Márk Szalai-Gindl,
László Dobos,
László Oroszlány,
István Ervin Csabai
Abstract:
Research collaborations are continuously emerging catalyzed by online platforms, where people can share their codes, calculations, data and results. These virtual research platforms are innovative, community oriented, flexible and secure as required by modern scientific approaches. A wide range of open source and commercial solutions are available in this field emphasizing the relevant aspects of…
▽ More
Research collaborations are continuously emerging catalyzed by online platforms, where people can share their codes, calculations, data and results. These virtual research platforms are innovative, community oriented, flexible and secure as required by modern scientific approaches. A wide range of open source and commercial solutions are available in this field emphasizing the relevant aspects of such a platform differently. In this paper we present our open source and modular platform, KOOPLEX, which combines the key concepts of dynamic collaboration, customizable research environment, data sharing, access to datahubs, reproducible research and reporting. It is easily deployable and scalable to serve more users or access large computational resources.
△ Less
Submitted 21 November, 2019;
originally announced November 2019.
-
High Quality Queueing Information from Accelerated Active Network Tomography
Authors:
Tommaso Rizzo,
Jozsef Steger,
Péter Pollner,
Istvan Csabai,
Gabor Vattay
Abstract:
Monitoring network state can be crucial in Future Internet infrastructures. Passive monitoring of all the routers is expensive and prohibitive. Storing, accessing and sharing the data is a technological challenge among networks with conflicting economic interests. Active monitoring methods can be attractive alternatives as they are free from most of these issues. Here we demonstrate that it is pos…
▽ More
Monitoring network state can be crucial in Future Internet infrastructures. Passive monitoring of all the routers is expensive and prohibitive. Storing, accessing and sharing the data is a technological challenge among networks with conflicting economic interests. Active monitoring methods can be attractive alternatives as they are free from most of these issues. Here we demonstrate that it is possible to improve the active network tomography methodology to such extent that the quality of the extracted link or router level delay is comparable to the passively measurable information. We show that the temporal precision of the measurements and the performance of the data analysis should be simultaneously improved to achieve this goal. In this paper we not only introduce a new efficient message-passing based algorithm but we also show that it is applicable for data collected by the ETOMIC high precision active measurement infrastructure. The measurements are conducted in the GEANT2 high speed academic network connecting the sites, which is an ideal test ground for such Future Internet applications.
△ Less
Submitted 5 December, 2017; v1 submitted 25 July, 2017;
originally announced July 2017.
-
Video Pandemics: Worldwide Viral Spreading of Psy's Gangnam Style Video
Authors:
Zsofia Kallus,
Daniel Kondor,
Jozsef Steger,
Istvan Csabai,
Eszter Bokanyi,
Gabor Vattay
Abstract:
Viral videos can reach global penetration traveling through international channels of communication similarly to real diseases starting from a well-localized source. In past centuries, disease fronts propagated in a concentric spatial fashion from the the source of the outbreak via the short range human contact network. The emergence of long-distance air-travel changed these ancient patterns. Howe…
▽ More
Viral videos can reach global penetration traveling through international channels of communication similarly to real diseases starting from a well-localized source. In past centuries, disease fronts propagated in a concentric spatial fashion from the the source of the outbreak via the short range human contact network. The emergence of long-distance air-travel changed these ancient patterns. However, recently, Brockmann and Helbing have shown that concentric propagation waves can be reinstated if propagation time and distance is measured in the flight-time and travel volume weighted underlying air-travel network. Here, we adopt this method for the analysis of viral meme propagation in Twitter messages, and define a similar weighted network distance in the communication network connecting countries and states of the World. We recover a wave-like behavior on average and assess the randomizing effect of non-locality of spreading. We show that similar result can be recovered from Google Trends data as well.
△ Less
Submitted 14 July, 2017;
originally announced July 2017.
-
Audio-based performance evaluation of squash players
Authors:
Katalin Hajdu-Szucs,
Nora Fenyvesi,
Jozsef Steger,
Gabor Vattay
Abstract:
In competitive sports it is often very hard to quantify the performance. A player to score or overtake may depend on only millesimal of seconds or millimeters. In racquet sports like tennis, table tennis and squash many events will occur in a short time duration, whose recording and analysis can help reveal the differences in performance. In this paper we show that it is possible to architect a fr…
▽ More
In competitive sports it is often very hard to quantify the performance. A player to score or overtake may depend on only millesimal of seconds or millimeters. In racquet sports like tennis, table tennis and squash many events will occur in a short time duration, whose recording and analysis can help reveal the differences in performance. In this paper we show that it is possible to architect a framework that utilizes the characteristic sound patterns to precisely classify the types of and localize the positions of these events. From these basic information the shot types and the ball speed along the trajectories can be estimated. Comparing these estimates with the optimal speed and target the precision of the shot can be defined. The detailed shot statistics and precision information significantly enriches and improves data available today. Feeding them back to the players and the coaches facilitates to describe playing performance objectively and to improve strategy skills. The framework is implemented, its hardware and software components are installed and tested in a squash court.
△ Less
Submitted 20 April, 2017;
originally announced April 2017.
-
A Bayesian Approach to Identify Bitcoin Users
Authors:
Péter L. Juhász,
József Stéger,
Dániel Kondor,
Gábor Vattay
Abstract:
Bitcoin is a digital currency and electronic payment system operating over a peer-to-peer network on the Internet. One of its most important properties is the high level of anonymity it provides for its users. The users are identified by their Bitcoin addresses, which are random strings in the public records of transactions, the blockchain. When a user initiates a Bitcoin-transaction, his Bitcoin…
▽ More
Bitcoin is a digital currency and electronic payment system operating over a peer-to-peer network on the Internet. One of its most important properties is the high level of anonymity it provides for its users. The users are identified by their Bitcoin addresses, which are random strings in the public records of transactions, the blockchain. When a user initiates a Bitcoin-transaction, his Bitcoin client program relays messages to other clients through the Bitcoin network. Monitoring the propagation of these messages and analyzing them carefully reveal hidden relations. In this paper, we develop a mathematical model using a probabilistic approach to link Bitcoin addresses and transactions to the originator IP address. To utilize our model, we carried out experiments by installing more than a hundred modified Bitcoin clients distributed in the network to observe as many messages as possible. During a two month observation period we were able to identify several thousand Bitcoin clients and bind their transactions to geographical locations.
△ Less
Submitted 9 March, 2017; v1 submitted 20 December, 2016;
originally announced December 2016.
-
Race, Religion and the City: Twitter Word Frequency Patterns Reveal Dominant Demographic Dimensions in the United States
Authors:
Eszter Bokányi,
Dániel Kondor,
László Dobos,
Tamás Sebők,
József Stéger,
István Csabai,
Gábor Vattay
Abstract:
Recently, numerous approaches have emerged in the social sciences to exploit the opportunities made possible by the vast amounts of data generated by online social networks (OSNs). Having access to information about users on such a scale opens up a range of possibilities, all without the limitations associated with often slow and expensive paper-based polls. A question that remains to be satisfact…
▽ More
Recently, numerous approaches have emerged in the social sciences to exploit the opportunities made possible by the vast amounts of data generated by online social networks (OSNs). Having access to information about users on such a scale opens up a range of possibilities, all without the limitations associated with often slow and expensive paper-based polls. A question that remains to be satisfactorily addressed, however, is how demography is represented in the OSN content? Here, we study language use in the US using a corpus of text compiled from over half a billion geo-tagged messages from the online microblogging platform Twitter. Our intention is to reveal the most important spatial patterns in language use in an unsupervised manner and relate them to demographics. Our approach is based on Latent Semantic Analysis (LSA) augmented with the Robust Principal Component Analysis (RPCA) methodology. We find spatially correlated patterns that can be interpreted based on the words associated with them. The main language features can be related to slang use, urbanization, travel, religion and ethnicity, the patterns of which are shown to correlate plausibly with traditional census data. Our findings thus validate the concept of demography being represented in OSN language use and show that the traits observed are inherently present in the word frequencies without any previous assumptions about the dataset. Thus, they could form the basis of further research focusing on the evaluation of demographic data estimation from other big data sources, or on the dynamical processes that result in the patterns found here.
△ Less
Submitted 11 May, 2016; v1 submitted 10 May, 2016;
originally announced May 2016.
-
Regional properties of global communication as reflected in aggregated Twitter data
Authors:
Zsofia Kallus,
Norbert Barankai,
Daniel Kondor,
Laszlo Dobos,
Tamas Hanyecz,
Janos Szule,
Jozsef Steger,
Tamas Sebok,
Gabor Vattay,
Istvan Csabai
Abstract:
Twitter is a popular public conversation platform with world-wide audience and diverse forms of connections between users. In this paper we introduce the concept of aggregated regional Twitter networks in order to characterize communication between geopolitical regions. We present the study of a follower and a mention graph created from an extensive data set collected during the second half of the…
▽ More
Twitter is a popular public conversation platform with world-wide audience and diverse forms of connections between users. In this paper we introduce the concept of aggregated regional Twitter networks in order to characterize communication between geopolitical regions. We present the study of a follower and a mention graph created from an extensive data set collected during the second half of the year of $2012$. With a k-shell decomposition the global core-periphery structure is revealed and by means of a modified Regional-SIR model we also consider basic information spreading properties.
△ Less
Submitted 6 November, 2013;
originally announced November 2013.
-
A multi-terabyte relational database for geo-tagged social network data
Authors:
László Dobos,
János Szüle,
Tamás Bodnár,
Tamás Hanyecz,
Tamás Sebők,
Dániel Kondor,
Zsófia Kallus,
József Stéger,
István Csabai,
Gábor Vattay
Abstract:
Despite their relatively low sampling factor, the freely available, randomly sampled status streams of Twitter are very useful sources of geographically embedded social network data. To statistically analyze the information Twitter provides via these streams, we have collected a year's worth of data and built a multi-terabyte relational database from it. The database is designed for fast data load…
▽ More
Despite their relatively low sampling factor, the freely available, randomly sampled status streams of Twitter are very useful sources of geographically embedded social network data. To statistically analyze the information Twitter provides via these streams, we have collected a year's worth of data and built a multi-terabyte relational database from it. The database is designed for fast data loading and to support a wide range of studies focusing on the statistics and geographic features of social networks, as well as on the linguistic analysis of tweets. In this paper we present the method of data collection, the database design, the data loading procedure and special treatment of geo-tagged and multi-lingual data. We also provide some SQL recipes for computing network statistics.
△ Less
Submitted 5 November, 2013; v1 submitted 4 November, 2013;
originally announced November 2013.
-
SONoMA: A Service Oriented Network Measurement Architecture
Authors:
Béla Hullár,
Sándor Laki,
József Stéger,
István Csabai,
Gábor Vattay
Abstract:
To characterize the structure, dynamics and operational state of the Internet it requires distributed measurements. Although in the last decades several systems capable to do this have been created, the easy access of these infrastructures and orchestration of complex measurements is not solved. We propose a system architecture that combines the flexibility of mature network measurement infrastruc…
▽ More
To characterize the structure, dynamics and operational state of the Internet it requires distributed measurements. Although in the last decades several systems capable to do this have been created, the easy access of these infrastructures and orchestration of complex measurements is not solved. We propose a system architecture that combines the flexibility of mature network measurement infrastructures such as PlanetLab or ETOMIC with the general accessibility and popularity of public services like Web based bandwidth measurement or traceroute servers. To realize these requirements we developed a multi-layer architecture based on Web Services and the basic principles of SOA, which is a very popular paradigm in distributed business application development. Our approach opens the door to perform complex network measurements, handles heterogeneous measurement devices, automatically stores the results in a public database and protects against malicious users as well. To demonstrate our concept we developed a public prototype system, called SONoMA.
△ Less
Submitted 7 June, 2010;
originally announced June 2010.
-
Measuring the Dynamical State of the Internet: Large Scale Network Tomography via the ETOMIC Infrastructure
Authors:
Gabor Simon,
Jozsef Steger,
Peter Haga Istvan Csabai,
Gabor Vattay
Abstract:
In this paper we show how to go beyond the study of the topological properties of the Internet, by measuring its dynamical state using special active probing techniques and the methods of network tomography. We demonstrate this approach by measuring the key state parameters of Internet paths, the characteristics of queueing delay, in a part of the European Internet. In the paper we describe in d…
▽ More
In this paper we show how to go beyond the study of the topological properties of the Internet, by measuring its dynamical state using special active probing techniques and the methods of network tomography. We demonstrate this approach by measuring the key state parameters of Internet paths, the characteristics of queueing delay, in a part of the European Internet. In the paper we describe in detail the ETOMIC measurement platform that was used to conduct the experiments, and the applied method of queueing delay tomography. The main results of the paper are maps showing various spatial structure in the characteristics of queueing delay corresponding to the resolved part of the European Internet. These maps reveal that the average queueing delay of network segments spans more than two orders of magnitude, and that the distribution of this quantity is very well fitted by the log-normal distribution.
△ Less
Submitted 27 January, 2008;
originally announced January 2008.