-
Detecting Financial Bots on the Ethereum Blockchain
Authors:
Thomas Niedermayer,
Pietro Saggese,
Bernhard Haslhofer
Abstract:
The integration of bots in Distributed Ledger Technologies (DLTs) fosters efficiency and automation. However, their use is also associated with predatory trading and market manipulation, and can pose threats to system integrity. It is therefore essential to understand the extent of bot deployment in DLTs; despite this, current detection systems are predominantly rule-based and lack flexibility. In…
▽ More
The integration of bots in Distributed Ledger Technologies (DLTs) fosters efficiency and automation. However, their use is also associated with predatory trading and market manipulation, and can pose threats to system integrity. It is therefore essential to understand the extent of bot deployment in DLTs; despite this, current detection systems are predominantly rule-based and lack flexibility. In this study, we present a novel approach that utilizes machine learning for the detection of financial bots on the Ethereum platform. First, we systematize existing scientific literature and collect anecdotal evidence to establish a taxonomy for financial bots, comprising 7 categories and 24 subcategories. Next, we create a ground-truth dataset consisting of 133 human and 137 bot addresses. Third, we employ both unsupervised and supervised machine learning algorithms to detect bots deployed on Ethereum. The highest-performing clustering algorithm is a Gaussian Mixture Model with an average cluster purity of 82.6%, while the highest-performing model for binary classification is a Random Forest with an accuracy of 83%. Our machine learning-based detection mechanism contributes to understanding the Ethereum ecosystem dynamics by providing additional insights into the current bot landscape.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Increasing the Efficiency of Cryptoasset Investigations by Connecting the Cases
Authors:
Bernhard Haslhofer,
Christiane Hanslbauer,
Michael Fröwis,
Thomas Goger
Abstract:
Law enforcement agencies are confronted with a rapidly growing number of cryptoasset-related cases, often redundantly investigating the same cases without mutual knowledge or shared insights. In this paper, we explore the hypothesis that recognizing and acting upon connections between these cases can significantly streamline investigative processes. Through an analysis of a dataset comprising 34 c…
▽ More
Law enforcement agencies are confronted with a rapidly growing number of cryptoasset-related cases, often redundantly investigating the same cases without mutual knowledge or shared insights. In this paper, we explore the hypothesis that recognizing and acting upon connections between these cases can significantly streamline investigative processes. Through an analysis of a dataset comprising 34 cyberfraud and 1793 sextortion spam cases, we discovered that 41% of the cyberfraud and 96.9% of the sextortion spam incidents can be interconnected. We introduce a straightforward yet effective tool, which is integrated into a broader cryptoasset forensics workflow and allows investigators to highlight and share case connections. Our research unequivocally demonstrates that recognizing case connections can lead to remarkable efficiencies, especially when extended across crime areas, international borders, and jurisdictions.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Map** the DeFi Crime Landscape: An Evidence-based Picture
Authors:
Catherine Carpentier-Desjardins,
Masarah Paquet-Clouston,
Stefan Kitzler,
Bernhard Haslhofer
Abstract:
Over the past years, decentralized finance (DeFi) has been the target of numerous profit-driven crimes. However, until now, the full prevalence and cumulative impact of these crimes have not been assessed. This study provides a first comprehensive assessment of profit-driven crimes targeting the DeFi sector. To achieve this, we collected data on 1155 crime events from 2017 to 2022. Of these, 1050…
▽ More
Over the past years, decentralized finance (DeFi) has been the target of numerous profit-driven crimes. However, until now, the full prevalence and cumulative impact of these crimes have not been assessed. This study provides a first comprehensive assessment of profit-driven crimes targeting the DeFi sector. To achieve this, we collected data on 1155 crime events from 2017 to 2022. Of these, 1050 were related to the DeFi industry and 105 to the centralized finance (CeFi) industry. Focusing on the former, a taxonomy was developed to clarify the similarities and differences among these crimes. All events were mapped onto the DeFi stack to assess the impacted technical layers, and the financial damages were quantified to gauge their scale. The findings show that the entire cryptoasset industry has suffered a minimum loss of US$30B, with two thirds related to centralized finance (CeFi) and one third to DeFi. Focusing solely on the latter, the results highlight that during an attack, a DeFi actor (an entity develo** a DeFi technology) can serve as a direct target, as a perpetrator, or as an intermediary. The findings show that DeFi actors are the first victims of crimes targeting the DeFi industry: 52% of crime events targeted them, primarily due to technical vulnerabilities at the protocol layer, and these events accounted for 83% of all recorded financial damages. On the other hand, in 40% of crime events, DeFi actors were themselves malicious perpetrators, predominantly misusing contracts at the cryptoasset layer (e.g., rug pull scams). However, these events accounted for only 17% of all financial damages. The study's findings offer a preliminary assessment of the size and scope of crime events within the DeFi sector and highlight the vulnerable position of DeFi actors in the ecosystem.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
Assessing the Solvency of Virtual Asset Service Providers: Are Current Standards Sufficient?
Authors:
Pietro Saggese,
Esther Segalla,
Michael Sigmund,
Burkhard Raunig,
Felix Zangerl,
Bernhard Haslhofer
Abstract:
Entities like centralized cryptocurrency exchanges fall under the business category of virtual asset service providers (VASPs). As any other enterprise, they can become insolvent. VASPs enable the exchange, custody, and transfer of cryptoassets organized in wallets across distributed ledger technologies (DLTs). Despite the public availability of DLT transactions, the cryptoasset holdings of VASPs…
▽ More
Entities like centralized cryptocurrency exchanges fall under the business category of virtual asset service providers (VASPs). As any other enterprise, they can become insolvent. VASPs enable the exchange, custody, and transfer of cryptoassets organized in wallets across distributed ledger technologies (DLTs). Despite the public availability of DLT transactions, the cryptoasset holdings of VASPs are not yet subject to systematic auditing procedures. In this paper, we propose an approach to assess the solvency of a VASP by cross-referencing data from three distinct sources: cryptoasset wallets, balance sheets from the commercial register, and data from supervisory entities. We investigate 24 VASPs registered with the Financial Market Authority in Austria and provide regulatory data insights such as who are the customers and where do they come from. Their yearly incoming and outgoing transaction volume amount to 2 billion EUR for around 1.8 million users. We describe what financial services they provide and find that they are most similar to traditional intermediaries such as brokers, money exchanges, and funds, rather than banks. Next, we empirically measure DLT transaction flows of four VASPs and compare their cryptoasset holdings to balance sheet entries. Data are consistent for two VASPs only. This enables us to identify gaps in the data collection and propose strategies to address them. We remark that any entity in charge of auditing requires proof that a VASP actually controls the funds associated with its on-chain wallets. It is also important to report fiat and cryptoasset and liability positions broken down by asset types at a reasonable frequency.
△ Less
Submitted 18 April, 2024; v1 submitted 28 September, 2023;
originally announced September 2023.
-
The Governance of Decentralized Autonomous Organizations: A Study of Contributors' Influence, Networks, and Shifts in Voting Power
Authors:
Stefan Kitzler,
Stefano Balietti,
Pietro Saggese,
Bernhard Haslhofer,
Markus Strohmaier
Abstract:
We present a study analyzing the voting behavior of contributors, or vested users, in Decentralized Autonomous Organizations (DAOs). We evaluate their involvement in decision-making processes, discovering that in at least 7.54% of all DAOs, contributors, on average, held the necessary majority to control governance decisions. Furthermore, contributors have singularly decided at least one proposal…
▽ More
We present a study analyzing the voting behavior of contributors, or vested users, in Decentralized Autonomous Organizations (DAOs). We evaluate their involvement in decision-making processes, discovering that in at least 7.54% of all DAOs, contributors, on average, held the necessary majority to control governance decisions. Furthermore, contributors have singularly decided at least one proposal in 20.41% of DAOs. Notably, contributors tend to be centrally positioned within the DAO governance ecosystem, suggesting the presence of inner power circles. Additionally, we observed a tendency for shifts in governance token ownership shortly before governance polls take place in 1202 (14.81%) of 8116 evaluated proposals. Our findings highlight the central role of contributors across a spectrum of DAOs, including Decentralized Finance protocols. Our research also offers important empirical insights pertinent to ongoing regulatory activities aimed at increasing transparency to DAO governance frameworks.
△ Less
Submitted 28 September, 2023; v1 submitted 25 September, 2023;
originally announced September 2023.
-
Predictability and Comprehensibility in Post-Hoc XAI Methods: A User-Centered Analysis
Authors:
Anahid Jalali,
Bernhard Haslhofer,
Simone Kriglstein,
Andreas Rauber
Abstract:
Post-hoc explainability methods aim to clarify predictions of black-box machine learning models. However, it is still largely unclear how well users comprehend the provided explanations and whether these increase the users ability to predict the model behavior. We approach this question by conducting a user study to evaluate comprehensibility and predictability in two widely used tools: LIME and S…
▽ More
Post-hoc explainability methods aim to clarify predictions of black-box machine learning models. However, it is still largely unclear how well users comprehend the provided explanations and whether these increase the users ability to predict the model behavior. We approach this question by conducting a user study to evaluate comprehensibility and predictability in two widely used tools: LIME and SHAP. Moreover, we investigate the effect of counterfactual explanations and misclassifications on users ability to understand and predict the model behavior. We find that the comprehensibility of SHAP is significantly reduced when explanations are provided for samples near a model's decision boundary. Furthermore, we find that counterfactual explanations and misclassifications can significantly increase the users understanding of how a machine learning model is making decisions. Based on our findings, we also derive design recommendations for future post-hoc explainability methods with increased comprehensibility and predictability.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Autoencoder based Anomaly Detection and Explained Fault Localization in Industrial Cooling Systems
Authors:
Stephanie Holly,
Robin Heel,
Denis Katic,
Leopold Schoeffl,
Andreas Stiftinger,
Peter Holzner,
Thomas Kaufmann,
Bernhard Haslhofer,
Daniel Schall,
Clemens Heitzinger,
Jana Kemnitz
Abstract:
Anomaly detection in large industrial cooling systems is very challenging due to the high data dimensionality, inconsistent sensor recordings, and lack of labels. The state of the art for automated anomaly detection in these systems typically relies on expert knowledge and thresholds. However, data is viewed isolated and complex, multivariate relationships are neglected. In this work, we present a…
▽ More
Anomaly detection in large industrial cooling systems is very challenging due to the high data dimensionality, inconsistent sensor recordings, and lack of labels. The state of the art for automated anomaly detection in these systems typically relies on expert knowledge and thresholds. However, data is viewed isolated and complex, multivariate relationships are neglected. In this work, we present an autoencoder based end-to-end workflow for anomaly detection suitable for multivariate time series data in large industrial cooling systems, including explained fault localization and root cause analysis based on expert knowledge. We identify system failures using a threshold on the total reconstruction error (autoencoder reconstruction error including all sensor signals). For fault localization, we compute the individual reconstruction error (autoencoder reconstruction error for each sensor signal) allowing us to identify the signals that contribute most to the total reconstruction error. Expert knowledge is provided via look-up table enabling root-cause analysis and assignment to the affected subsystem. We demonstrated our findings in a cooling system unit including 34 sensors over a 8-months time period using 4-fold cross validation approaches and automatically created labels based on thresholds provided by domain experts. Using 4-fold cross validation, we reached a F1-score of 0.56, whereas the autoencoder results showed a higher consistency score (CS of 0.92) compared to the automatically created labels (CS of 0.62) -- indicating that the anomaly is recognized in a very stable manner. The main anomaly was found by the autoencoder and automatically created labels and was also recorded in the log files. Further, the explained fault localization highlighted the most affected component for the main anomaly in a very consistent manner.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.
-
Machine Learning Methods for Health-Index Prediction in Coating Chambers
Authors:
Clemens Heistracher,
Anahid Jalali,
Jürgen Schneeweiss,
Klaudia Kovacs,
Catherine Laflamme,
Bernhard Haslhofer
Abstract:
Coating chambers create thin layers that improve the mechanical and optical surface properties in jewelry production using physical vapor deposition. In such a process, evaporated material condensates on the walls of such chambers and, over time, causes mechanical defects and unstable processes. As a result, manufacturers perform extensive maintenance procedures to reduce production loss. Current…
▽ More
Coating chambers create thin layers that improve the mechanical and optical surface properties in jewelry production using physical vapor deposition. In such a process, evaporated material condensates on the walls of such chambers and, over time, causes mechanical defects and unstable processes. As a result, manufacturers perform extensive maintenance procedures to reduce production loss. Current rule-based maintenance strategies neglect the impact of specific recipes and the actual condition of the vacuum chamber. Our overall goal is to predict the future condition of the coating chamber to allow cost and quality optimized maintenance of the equipment. This paper describes the derivation of a novel health indicator that serves as a step toward condition-based maintenance for coating chambers. We indirectly use gas emissions of the chamber's contamination to evaluate the machine's condition. Our approach relies on process data and does not require additional hardware installation. Further, we evaluated multiple machine learning algorithms for a condition-based forecast of the health indicator that also reflects production planning. Our results show that models based on decision trees are the most effective and outperform all three benchmarks, improving at least $0.22$ in the mean average error. Our work paves the way for cost and quality optimized maintenance of coating applications.
△ Less
Submitted 30 May, 2022;
originally announced May 2022.
-
How to Peel a Million: Validating and Expanding Bitcoin Clusters
Authors:
George Kappos,
Haaroon Yousaf,
Rainer Stütz,
Sofia Rollet,
Bernhard Haslhofer,
Sarah Meiklejohn
Abstract:
One of the defining features of Bitcoin and the thousands of cryptocurrencies that have been derived from it is a globally visible transaction ledger. While Bitcoin uses pseudonyms as a way to hide the identity of its participants, a long line of research has demonstrated that Bitcoin is not anonymous. This has been perhaps best exemplified by the development of clustering heuristics, which have i…
▽ More
One of the defining features of Bitcoin and the thousands of cryptocurrencies that have been derived from it is a globally visible transaction ledger. While Bitcoin uses pseudonyms as a way to hide the identity of its participants, a long line of research has demonstrated that Bitcoin is not anonymous. This has been perhaps best exemplified by the development of clustering heuristics, which have in turn given rise to the ability to track the flow of bitcoins as they are sent from one entity to another.
In this paper, we design a new heuristic that is designed to track a certain type of flow, called a peel chain, that represents many transactions performed by the same entity; in doing this, we implicitly cluster these transactions and their associated pseudonyms together. We then use this heuristic to both validate and expand the results of existing clustering heuristics. We also develop a machine learning-based validation method and, using a ground-truth dataset, evaluate all our approaches and compare them with the state of the art. Ultimately, our goal is to not only enable more powerful tracking techniques but also call attention to the limits of anonymity in these systems.
△ Less
Submitted 27 May, 2022;
originally announced May 2022.
-
Disentangling Decentralized Finance (DeFi) Compositions
Authors:
Stefan Kitzler,
Friedhelm Victor,
Pietro Saggese,
Bernhard Haslhofer
Abstract:
We present a measurement study on compositions of Decentralized Finance protocols, which aim to disrupt traditional finance and offer services on top of distributed ledgers, such as Ethereum. DeFi compositions may impact the development of ecosystem interoperability, are increasingly integrated with web technologies, and may introduce risks through complexity. Starting from a dataset of 23 labeled…
▽ More
We present a measurement study on compositions of Decentralized Finance protocols, which aim to disrupt traditional finance and offer services on top of distributed ledgers, such as Ethereum. DeFi compositions may impact the development of ecosystem interoperability, are increasingly integrated with web technologies, and may introduce risks through complexity. Starting from a dataset of 23 labeled DeFi protocols and 10,663,881 associated Ethereum accounts, we study the interactions of protocols and associated smart contracts. From a network perspective, we find that decentralized exchanges and lending protocols have high degree and centrality values, that interactions among protocol nodes primarily occur in a strongly connected component, and that known community detection methods cannot disentangle DeFi protocols. Therefore, we propose an algorithm to decompose a protocol call into a nested set of building blocks that may be part of other DeFi protocols. With a ground truth dataset we have collected, we can demonstrate the algorithm's capability by finding that swaps are the most frequently used building blocks. As building blocks can be nested, i.e., contained in each other, we provide visualizations of composition trees for deeper inspections. We also present a broad picture of DeFi compositions by extracting and flattening the entire nested building block structure across multiple DeFi protocols. Finally, to demonstrate the practicality of our approach, we present a case study that is inspired by the recent collapse of the UST stablecoin in the Terra ecosystem. Under the hypothetical assumption that the stablecoin USD Tether would experience a similar fate, we study which building blocks and, thereby, DeFi protocols would be affected. Overall, our results and methods contribute to a better understanding of a new family of financial products.
△ Less
Submitted 30 September, 2022; v1 submitted 5 November, 2021;
originally announced November 2021.
-
Minimal-Configuration Anomaly Detection for IIoT Sensors
Authors:
Clemens Heistracher,
Anahid Jalali,
Axel Suendermann,
Sebastian Meixner,
Daniel Schall,
Bernhard Haslhofer,
Jana Kemnitz
Abstract:
The increasing deployment of low-cost IoT sensor platforms in industry boosts the demand for anomaly detection solutions that fulfill two key requirements: minimal configuration effort and easy transferability across equipment. Recent advances in deep learning, especially long-short-term memory (LSTM) and autoencoders, offer promising methods for detecting anomalies in sensor data recordings. We c…
▽ More
The increasing deployment of low-cost IoT sensor platforms in industry boosts the demand for anomaly detection solutions that fulfill two key requirements: minimal configuration effort and easy transferability across equipment. Recent advances in deep learning, especially long-short-term memory (LSTM) and autoencoders, offer promising methods for detecting anomalies in sensor data recordings. We compared autoencoders with various architectures such as deep neural networks (DNN), LSTMs and convolutional neural networks (CNN) using a simple benchmark dataset, which we generated by operating a peristaltic pump under various operating conditions and inducing anomalies manually. Our preliminary results indicate that a single model can detect anomalies under various operating conditions on a four-dimensional data set without any specific feature engineering for each operating condition. We consider this work as being the first step towards a generic anomaly detection method, which is applicable for a wide range of industrial equipment.
△ Less
Submitted 8 October, 2021;
originally announced October 2021.
-
Adoption and Actual Privacy of Decentralized CoinJoin Implementations in Bitcoin
Authors:
Rainer Stütz,
Johann Stockinger,
Bernhard Haslhofer,
Pedro Moreno-Sanchez,
Matteo Maffei
Abstract:
We present a first measurement study on the adoption and actual privacy of two popular decentralized CoinJoin implementations, Wasabi and Samourai, in the broader Bitcoin ecosystem. By applying highly accurate (> 99%) algorithms we can effectively detect 30,251 Wasabi and 223,597 Samourai transactions within the block range 530,500 to 725,348 (2018-07-05 to 2022-02-28). We also found a steady adop…
▽ More
We present a first measurement study on the adoption and actual privacy of two popular decentralized CoinJoin implementations, Wasabi and Samourai, in the broader Bitcoin ecosystem. By applying highly accurate (> 99%) algorithms we can effectively detect 30,251 Wasabi and 223,597 Samourai transactions within the block range 530,500 to 725,348 (2018-07-05 to 2022-02-28). We also found a steady adoption of these services with a total value of mixed coins of ca. 4.74 B USD and average monthly mixing amounts of ca. 172.93 M USD) for Wasabi and ca. 41.72 M USD for Samourai. Furthermore, we could trace ca. 322 M USD directly received by cryptoasset exchanges and ca. 1.16 B USD indirectly received via two hops. Our analysis further shows that the traceability of addresses during the pre-mixing and post-mixing narrows down the anonymity set provided by these coin mixing services. It also shows that the selection of addresses for the CoinJoin transaction can harm anonymity. Overall, this is the first paper to provide a comprehensive picture of the adoption and privacy of distributed CoinJoin transactions. Understanding this picture is particularly interesting in the light of ongoing regulatory efforts that will, on the one hand, affect compliance measures implemented in cryptocurrency exchanges and, on the other hand, the privacy of end-users.
△ Less
Submitted 14 September, 2022; v1 submitted 21 September, 2021;
originally announced September 2021.
-
GraphSense: A General-Purpose Cryptoasset Analytics Platform
Authors:
Bernhard Haslhofer,
Rainer Stütz,
Matteo Romiti,
Ross King
Abstract:
There is currently an increasing demand for cryptoasset analysis tools among cryptoasset service providers, the financial industry in general, as well as across academic fields. At the moment, one can choose between commercial services or low-level open-source tools providing programmatic access. In this paper, we present the design and implementation of another option: the GraphSense Cryptoasset…
▽ More
There is currently an increasing demand for cryptoasset analysis tools among cryptoasset service providers, the financial industry in general, as well as across academic fields. At the moment, one can choose between commercial services or low-level open-source tools providing programmatic access. In this paper, we present the design and implementation of another option: the GraphSense Cryptoasset Analytics Platform, which can be used for interactive investigations of monetary flows and, more importantly, for executing advanced analytics tasks using a standard data science tool stack. By providing a growing set of open-source components, GraphSense could ultimately become an instrument for scientific investigations in academia and a possible response to emerging compliance and regulation challenges for businesses and organizations dealing with cryptoassets.
△ Less
Submitted 26 February, 2021;
originally announced February 2021.
-
The Future is Big Graphs! A Community View on Graph Processing Systems
Authors:
Sherif Sakr,
Angela Bonifati,
Hannes Voigt,
Alexandru Iosup,
Khaled Ammar,
Renzo Angles,
Walid Aref,
Marcelo Arenas,
Maciej Besta,
Peter A. Boncz,
Khuzaima Daudjee,
Emanuele Della Valle,
Stefania Dumbrava,
Olaf Hartig,
Bernhard Haslhofer,
Tim Hegeman,
Jan Hidders,
Katja Hose,
Adriana Iamnitchi,
Vasiliki Kalavri,
Hugo Kapp,
Wim Martens,
M. Tamer Özsu,
Eric Peukert,
Stefan Plantikow
, et al. (16 additional authors not shown)
Abstract:
Graphs are by nature unifying abstractions that can leverage interconnectedness to represent, explore, predict, and explain real- and digital-world phenomena. Although real users and consumers of graph instances and graph workloads understand these abstractions, future problems will require new abstractions and systems. What needs to happen in the next decade for big graph processing to continue t…
▽ More
Graphs are by nature unifying abstractions that can leverage interconnectedness to represent, explore, predict, and explain real- and digital-world phenomena. Although real users and consumers of graph instances and graph workloads understand these abstractions, future problems will require new abstractions and systems. What needs to happen in the next decade for big graph processing to continue to succeed?
△ Less
Submitted 11 December, 2020;
originally announced December 2020.
-
Exploring investor behavior in Bitcoin: a study of the disposition effect
Authors:
Jürgen E. Schatzmann,
Bernhard Haslhofer
Abstract:
Investors commonly exhibit the disposition effect - the irrational tendency to sell their winning investments and hold onto their losing ones. While this phenomenon has been observed in many traditional markets, it remains unclear whether it also applies to atypical markets like cryptoassets. This paper investigates the prevalence of the disposition effect in Bitcoin using transactions targeting c…
▽ More
Investors commonly exhibit the disposition effect - the irrational tendency to sell their winning investments and hold onto their losing ones. While this phenomenon has been observed in many traditional markets, it remains unclear whether it also applies to atypical markets like cryptoassets. This paper investigates the prevalence of the disposition effect in Bitcoin using transactions targeting cryptoasset exchanges as proxies for selling transactions. Our findings suggest that investors in Bitcoin were indeed subject to the disposition effect, with varying intensity. They also show that the disposition effect was not consistently present throughout the observation period. Its prevalence was more evident from the boom and bust year 2017 onwards, as confirmed by various technical indicators. Our study suggests irrational investor behavior is also present in atypical markets like Bitcoin.
△ Less
Submitted 23 July, 2023; v1 submitted 23 October, 2020;
originally announced October 2020.
-
Cross-Layer Deanonymization Methods in the Lightning Protocol
Authors:
Matteo Romiti,
Friedhelm Victor,
Pedro Moreno-Sanchez,
Peter Sebastian Nordholt,
Bernhard Haslhofer,
Matteo Maffei
Abstract:
Bitcoin (BTC) pseudonyms (layer 1) can effectively be deanonymized using heuristic clustering techniques. However, while performing transactions off-chain (layer 2) in the Lightning Network (LN) seems to enhance privacy, a systematic analysis of the anonymity and privacy leakages due to the interaction between the two layers is missing. We present clustering heuristics that group BTC addresses, ba…
▽ More
Bitcoin (BTC) pseudonyms (layer 1) can effectively be deanonymized using heuristic clustering techniques. However, while performing transactions off-chain (layer 2) in the Lightning Network (LN) seems to enhance privacy, a systematic analysis of the anonymity and privacy leakages due to the interaction between the two layers is missing. We present clustering heuristics that group BTC addresses, based on their interaction with the LN, as well as LN nodes, based on shared naming and hosting information. We also present linking heuristics that link 45.97% of all LN nodes to 29.61% BTC addresses interacting with the LN. These links allow us to attribute information (e.g., aliases, IP addresses) to 21.19% of the BTC addresses contributing to their deanonymization. Further, these deanonymization results suggest that the security and privacy of LN payments are weaker than commonly believed, with LN users being at the mercy of as few as five actors that control 36 nodes and over 33% of the total capacity. Overall, this is the first paper to present a method for linking LN nodes with BTC addresses across layers and to discuss privacy and security implications.
△ Less
Submitted 10 February, 2021; v1 submitted 1 July, 2020;
originally announced July 2020.
-
All that Glitters is not Bitcoin -- Unveiling the Centralized Nature of the BTC (IP) Network
Authors:
Sami Ben Mariem,
Pedro Casas,
Matteo Romiti,
Benoit Donnet,
Rainer Stütz,
Bernhard Haslhofer
Abstract:
Blockchains are typically managed by peer-to-peer (P2P) networks providing the support and substrate to the so-called distributed ledger (DLT), a replicated, shared, and synchronized data structure, geographically spread across multiple nodes. The Bitcoin (BTC) blockchain is by far the most well known DLT, used to record transactions among peers, based on the BTC digital currency. In this paper, w…
▽ More
Blockchains are typically managed by peer-to-peer (P2P) networks providing the support and substrate to the so-called distributed ledger (DLT), a replicated, shared, and synchronized data structure, geographically spread across multiple nodes. The Bitcoin (BTC) blockchain is by far the most well known DLT, used to record transactions among peers, based on the BTC digital currency. In this paper, we focus on the network side of the BTC P2P network, analyzing its nodes from a purely network measurements-based approach. We present a BTC crawler able to discover and track the BTC P2P network through active measurements, and use it to analyze its main properties. Through the combined analysis of multiple snapshots of the BTC network as well as by using other publicly available data sources on the BTC network and DLT, we unveil the BTC P2P network, locate its active nodes, study their performance, and track the evolution of the network over the past two years. Among other relevant findings, we show that (i) the size of the BTC network has remained almost constant during the last 12 months - since the major BTC price drop in early 2018, (ii) most of the BTC P2P network resides in US and EU countries, and (iii) despite this western network locality, most of the mining activity and corresponding revenue is controlled by major mining pools located in China. By additionally analyzing the distribution of BTC coins among independent BTC entities (i.e., single BTC addresses or groups of BTC addresses controlled by the same actor), we also conclude that (iv) BTC is very far from being the decentralized and uncontrolled system it is so much advertised to be, with only 4.5% of all the BTC entities holding about 85% of all circulating BTC coins.
△ Less
Submitted 19 February, 2020; v1 submitted 24 January, 2020;
originally announced January 2020.
-
Stake Shift in Major Cryptocurrencies: An Empirical Study
Authors:
Rainer Stütz,
Peter Gaži,
Bernhard Haslhofer,
Jacob Illum
Abstract:
In the proof-of-stake (PoS) paradigm for maintaining decentralized, permissionless cryptocurrencies, Sybil attacks are prevented by basing the distribution of roles in the protocol execution on the stake distribution recorded in the ledger itself. However, for various reasons this distribution cannot be completely up-to-date, introducing a gap between the present stake distribution, which determin…
▽ More
In the proof-of-stake (PoS) paradigm for maintaining decentralized, permissionless cryptocurrencies, Sybil attacks are prevented by basing the distribution of roles in the protocol execution on the stake distribution recorded in the ledger itself. However, for various reasons this distribution cannot be completely up-to-date, introducing a gap between the present stake distribution, which determines the parties' current incentives, and the one used by the protocol. In this paper, we investigate this issue, and empirically quantify its effects. We survey existing provably secure PoS proposals to observe that the above time gap between the two stake distributions, which we call stake distribution lag, amounts to several days for each of these protocols. Based on this, we investigate the ledgers of four major cryptocurrencies (Bitcoin, Bitcoin Cash, Litecoin and Zcash) and compute the average stake shift (the statistical distance of the two distributions) for each value of stake distribution lag between 1 and 14 days, as well as related statistics. We also empirically quantify the sublinear growth of stake shift with the length of the considered lag interval. Finally, we turn our attention to unusual stake-shift spikes in these currencies: we observe that hard forks trigger major stake shifts and that single real-world actors, mostly exchanges, account for major stake shifts in established cryptocurrency ecosystems.
△ Less
Submitted 13 January, 2020;
originally announced January 2020.
-
Identifying Historical Travelogues in Large Text Corpora Using Machine Learning
Authors:
Jan Rörden,
Doris Gruber,
Martin Krickl,
Bernhard Haslhofer
Abstract:
Travelogues represent an important and intensively studied source for scholars in the humanities, as they provide insights into people, cultures, and places of the past. However, existing studies rarely utilize more than a dozen primary sources, since the human capacities of working with a large number of historical sources are naturally limited. In this paper, we define the notion of travelogue a…
▽ More
Travelogues represent an important and intensively studied source for scholars in the humanities, as they provide insights into people, cultures, and places of the past. However, existing studies rarely utilize more than a dozen primary sources, since the human capacities of working with a large number of historical sources are naturally limited. In this paper, we define the notion of travelogue and report upon an interdisciplinary method that, using machine learning as well as domain knowledge, can effectively identify German travelogues in the digitized inventory of the Austrian National Library with F1 scores between 0.94 and 1.00. We applied our method on a corpus of 161,522 German volumes and identified 345 travelogues that could not be identified using traditional search methods, resulting in the most extensive collection of early modern German travelogues ever created. To our knowledge, this is the first time such a method was implemented for the bibliographic indexing of a text corpus on this scale, improving and extending the traditional methods in the humanities. Overall, we consider our technique to be an important first step in a broader effort of develo** a novel mixed-method approach for the large-scale serial analysis of travelogues.
△ Less
Submitted 6 January, 2020;
originally announced January 2020.
-
Spams meet Cryptocurrencies: Sextortion in the Bitcoin Ecosystem
Authors:
Masarah Paquet-Clouston,
Matteo Romiti,
Bernhard Haslhofer,
Thomas Charvat
Abstract:
In the past year, a new spamming scheme has emerged: sexual extortion messages requiring payments in the cryptocurrency Bitcoin, also known as sextortion. This scheme represents a first integration of the use of cryptocurrencies by members of the spamming industry. Using a dataset of 4,340,736 sextortion spams, this research aims at understanding such new amalgamation by uncovering spammers' opera…
▽ More
In the past year, a new spamming scheme has emerged: sexual extortion messages requiring payments in the cryptocurrency Bitcoin, also known as sextortion. This scheme represents a first integration of the use of cryptocurrencies by members of the spamming industry. Using a dataset of 4,340,736 sextortion spams, this research aims at understanding such new amalgamation by uncovering spammers' operations. To do so, a simple, yet effective method for projecting Bitcoin addresses mentioned in sextortion spams onto transaction graph abstractions is computed over the entire Bitcoin blockchain. This allows us to track and investigate monetary flows between involved actors and gain insights into the financial structure of sextortion campaigns. We find that sextortion spammers are somewhat sophisticated, following pricing strategies and benefiting from cost reductions as their operations cut the upper-tail of the spamming supply chain. We discover that one single entity is likely controlling the financial backbone of the majority of the sextortion campaigns and that the 11-month operation studied yielded a lower-bound revenue between \$1,300,620 and \$1,352,266. We conclude that sextortion spamming is a lucrative business and spammers will likely continue to send bulk emails that try to extort money through cryptocurrencies.
△ Less
Submitted 2 August, 2019;
originally announced August 2019.
-
Safeguarding the Evidential Value of Forensic Cryptocurrency Investigations
Authors:
Michael Fröwis,
Thilo Gottschalk,
Bernhard Haslhofer,
Christian Rückert,
Paulina Pesch
Abstract:
Analyzing cryptocurrency payment flows has become a key forensic method in law enforcement and is nowadays used to investigate a wide spectrum of criminal activities. However, despite its widespread adoption, the evidential value of obtained findings in court is still largely unclear. In this paper, we focus on the key ingredients of modern cryptocurrency analytics techniques, which are clustering…
▽ More
Analyzing cryptocurrency payment flows has become a key forensic method in law enforcement and is nowadays used to investigate a wide spectrum of criminal activities. However, despite its widespread adoption, the evidential value of obtained findings in court is still largely unclear. In this paper, we focus on the key ingredients of modern cryptocurrency analytics techniques, which are clustering heuristics and attribution tags. We identify internationally accepted standards and rules for substantiating suspicions and providing evidence in court and project them onto current cryptocurrency forensics practices. By providing an empirical analysis of CoinJoin transactions, we illustrate possible sources of misinterpretation in algorithmic clustering heuristics. Eventually, we derive a set of legal key requirements and translate them into a technical data sharing framework that fosters compliance with existing legal and technical standards in the realm of cryptocurrency forensics. Integrating the proposed framework in modern cryptocurrency analytics tools could allow more efficient and effective investigations, while safeguarding the evidential value of the analysis and the fundamental rights of affected persons.
△ Less
Submitted 2 August, 2021; v1 submitted 28 June, 2019;
originally announced June 2019.
-
A Deep Dive into Bitcoin Mining Pools: An Empirical Analysis of Mining Shares
Authors:
Matteo Romiti,
Aljosha Judmayer,
Alexei Zamyatin,
Bernhard Haslhofer
Abstract:
Miners play a key role in cryptocurrencies such as Bitcoin: they invest substantial computational resources in processing transactions and minting new currency units. It is well known that an attacker controlling more than half of the network's mining power could manipulate the state of the system at will. While the influence of large mining pools appears evenly split, the actual distribution of m…
▽ More
Miners play a key role in cryptocurrencies such as Bitcoin: they invest substantial computational resources in processing transactions and minting new currency units. It is well known that an attacker controlling more than half of the network's mining power could manipulate the state of the system at will. While the influence of large mining pools appears evenly split, the actual distribution of mining power within these pools and their economic relationships with other actors remain undisclosed. To this end, we conduct the first in-depth analysis of mining reward distribution within three of the four largest Bitcoin mining pools and examine their cross-pool economic relationships. Our results suggest that individual miners are simultaneously operating across all three pools and that in each analyzed pool a small number of actors (<= 20) receives over 50% of all BTC payouts. While the extent of an operator's control over the resources of a mining pool remains an open debate, our findings are in line with previous research, pointing out centralization tendencies in large mining pools and cryptocurrencies in general.
△ Less
Submitted 15 May, 2019;
originally announced May 2019.
-
Predicting Time-to-Failure of Plasma Etching Equipment using Machine Learning
Authors:
Anahid Jalali,
Clemens Heistracher,
Alexander Schindler,
Bernhard Haslhofer,
Tanja Nemeth,
Robert Glawar,
Wilfried Sihn,
Peter De Boer
Abstract:
Predicting unscheduled breakdowns of plasma etching equipment can reduce maintenance costs and production losses in the semiconductor industry. However, plasma etching is a complex procedure and it is hard to capture all relevant equipment properties and behaviors in a single physical model. Machine learning offers an alternative for predicting upcoming machine failures based on relevant data poin…
▽ More
Predicting unscheduled breakdowns of plasma etching equipment can reduce maintenance costs and production losses in the semiconductor industry. However, plasma etching is a complex procedure and it is hard to capture all relevant equipment properties and behaviors in a single physical model. Machine learning offers an alternative for predicting upcoming machine failures based on relevant data points. In this paper, we describe three different machine learning tasks that can be used for that purpose: (i) predicting Time-To-Failure (TTF), (ii) predicting health state, and (iii) predicting TTF intervals of an equipment. Our results show that trained machine learning models can outperform benchmarks resembling human judgments in all three tasks. This suggests that machine learning offers a viable alternative to currently deployed plasma etching equipment maintenance strategies and decision making processes.
△ Less
Submitted 16 April, 2019;
originally announced April 2019.
-
An Empirical Analysis of Monero Cross-Chain Traceability
Authors:
Abraham Hinteregger,
Bernhard Haslhofer
Abstract:
Monero is a privacy-centric cryptocurrency that makes payments untraceable by adding decoys to every real input spent in a transaction. Two studies from 2017 found methods to distinguish decoys from real inputs, which enabled traceability for a majority of transactions. Since then, a number protocol changes have been introduced, but their effectiveness has not yet been reassessed. Furthermore, lit…
▽ More
Monero is a privacy-centric cryptocurrency that makes payments untraceable by adding decoys to every real input spent in a transaction. Two studies from 2017 found methods to distinguish decoys from real inputs, which enabled traceability for a majority of transactions. Since then, a number protocol changes have been introduced, but their effectiveness has not yet been reassessed. Furthermore, little is known about traceability of Monero transactions across hard fork chains. We formalize a new method for tracing Monero transactions, which is based on analyzing currency hard forks. We use that method to perform a (passive) traceability analysis on data from the Monero, MoneroV and Monero Original blockchains and find that only a small amount of inputs are traceable. We then use the results to estimate the effectiveness of known heuristics for recent transactions and find that they do not significantly outperform random guessing. Our findings suggest that Monero is currently mostly immune to known passive attack vectors and resistant to tracking and tracing methods applied to other cryptocurrencies.
△ Less
Submitted 4 January, 2019; v1 submitted 6 December, 2018;
originally announced December 2018.
-
Ransomware Payments in the Bitcoin Ecosystem
Authors:
Masarah Paquet-Clouston,
Bernhard Haslhofer,
Benoit Dupont
Abstract:
Ransomware can prevent a user from accessing a device and its files until a ransom is paid to the attacker, most frequently in Bitcoin. With over 500 known ransomware families, it has become one of the dominant cybercrime threats for law enforcement, security professionals and the public. However, a more comprehensive, evidence-based picture on the global direct financial impact of ransomware atta…
▽ More
Ransomware can prevent a user from accessing a device and its files until a ransom is paid to the attacker, most frequently in Bitcoin. With over 500 known ransomware families, it has become one of the dominant cybercrime threats for law enforcement, security professionals and the public. However, a more comprehensive, evidence-based picture on the global direct financial impact of ransomware attacks is still missing. In this paper, we present a data-driven method for identifying and gathering information on Bitcoin transactions related to illicit activity based on footprints left on the public Bitcoin blockchain. We implement this method on-top-of the GraphSense open-source platform and apply it to empirically analyze transactions related to 35 ransomware families. We estimate the lower bound direct financial impact of each ransomware family and find that, from 2013 to mid-2017, the market for ransomware payments has a minimum worth of USD 12,768,536 (22,967.54 BTC). We also find that the market is highly skewed with only a few number of players responsible for the majority of the payments. Based on these research findings, policy-makers and law enforcement agencies can use the statistics provided to understand the size of the illicit market and make informed decisions on how best to address the threat.
△ Less
Submitted 11 April, 2018;
originally announced April 2018.
-
Knowledge Graphs in the Libraries and Digital Humanities Domain
Authors:
Bernhard Haslhofer,
Antoine Isaac,
Rainer Simon
Abstract:
Knowledge graphs represent concepts (e.g., people, places, events) and their semantic relationships. As a data structure, they underpin a digital information system, support users in resource discovery and retrieval, and are useful for navigation and visualization purposes. Within the libaries and humanities domain, knowledge graphs are typically rooted in knowledge organization systems, which hav…
▽ More
Knowledge graphs represent concepts (e.g., people, places, events) and their semantic relationships. As a data structure, they underpin a digital information system, support users in resource discovery and retrieval, and are useful for navigation and visualization purposes. Within the libaries and humanities domain, knowledge graphs are typically rooted in knowledge organization systems, which have a century-old tradition and have undergone their digital transformation with the advent of the Web and Linked Data. Being exposed to the Web, metadata and concept definitions are now forming an interconnected and decentralized global knowledge network that can be curated and enriched by community-driven editorial processes. In the future, knowledge graphs could be vehicles for formalizing and connecting findings and insights derived from the analysis of possibly large-scale corpora in the libraries and digital humanities domain.
△ Less
Submitted 8 March, 2018;
originally announced March 2018.
-
Web Synchronization Simulations using the ResourceSync Framework
Authors:
Bernhard Haslhofer,
Simeon Warner,
Carl Lagoze,
Martin Klein,
Robert Sanderson,
Herbert van de Sompel,
Michael L. Nelson
Abstract:
Maintenance of multiple, distributed up-to-date copies of collections of changing Web resources is important in many application contexts and is often achieved using ad hoc or proprietary synchronization solutions. ResourceSync is a resource synchronization framework that integrates with the Web architecture and leverages XML sitemaps. We define a model for the ResourceSync framework as a basis fo…
▽ More
Maintenance of multiple, distributed up-to-date copies of collections of changing Web resources is important in many application contexts and is often achieved using ad hoc or proprietary synchronization solutions. ResourceSync is a resource synchronization framework that integrates with the Web architecture and leverages XML sitemaps. We define a model for the ResourceSync framework as a basis for understanding its properties. We then describe experiments in which simulations of a variety of synchronization scenarios illustrate the effects of model configuration on consistency, latency, and data transfer efficiency. These results provide insight into which congurations are appropriate for various application scenarios.
△ Less
Submitted 5 June, 2013;
originally announced June 2013.
-
ResourceSync: Leveraging Sitemaps for Resource Synchronization
Authors:
Bernhard Haslhofer,
Simeon Warner,
Carl Lagoze,
Martin Klein,
Robert Sanderson,
Michael L. Nelson,
Herbert van de Sompel
Abstract:
Many applications need up-to-date copies of collections of changing Web resources. Such synchronization is currently achieved using ad-hoc or proprietary solutions. We propose ResourceSync, a general Web resource synchronization protocol that leverages XML Sitemaps. It provides a set of capabilities that can be combined in a modular manner to meet local or community requirements. We report on work…
▽ More
Many applications need up-to-date copies of collections of changing Web resources. Such synchronization is currently achieved using ad-hoc or proprietary solutions. We propose ResourceSync, a general Web resource synchronization protocol that leverages XML Sitemaps. It provides a set of capabilities that can be combined in a modular manner to meet local or community requirements. We report on work to implement this protocol for arXiv.org and also provide an experimental prototype for the English Wikipedia as well as a client API.
△ Less
Submitted 7 May, 2013;
originally announced May 2013.
-
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text
Authors:
Elizabeth L. Murnane,
Bernhard Haslhofer,
Carl Lagoze
Abstract:
We address the Named Entity Disambiguation (NED) problem for short, user-generated texts on the social Web. In such settings, the lack of linguistic features and sparse lexical context result in a high degree of ambiguity and sharp performance drops of nearly 50% in the accuracy of conventional NED systems. We handle these challenges by develo** a model of user-interest with respect to a persona…
▽ More
We address the Named Entity Disambiguation (NED) problem for short, user-generated texts on the social Web. In such settings, the lack of linguistic features and sparse lexical context result in a high degree of ambiguity and sharp performance drops of nearly 50% in the accuracy of conventional NED systems. We handle these challenges by develo** a model of user-interest with respect to a personal knowledge context; and Wikipedia, a particularly well-established and reliable knowledge base, is used to instantiate the procedure. We conduct systematic evaluations using individuals' posts from Twitter, YouTube, and Flickr and demonstrate that our novel technique is able to achieve substantial performance gains beyond state-of-the-art NED methods.
△ Less
Submitted 8 April, 2013;
originally announced April 2013.
-
Semantic Tagging on Historical Maps
Authors:
Bernhard Haslhofer,
Werner Robitza,
Carl Lagoze,
Francois Guimbretiere
Abstract:
Tags assigned by users to shared content can be ambiguous. As a possible solution, we propose semantic tagging as a collaborative process in which a user selects and associates Web resources drawn from a knowledge context. We applied this general technique in the specific context of online historical maps and allowed users to annotate and tag them. To study the effects of semantic tagging on tag p…
▽ More
Tags assigned by users to shared content can be ambiguous. As a possible solution, we propose semantic tagging as a collaborative process in which a user selects and associates Web resources drawn from a knowledge context. We applied this general technique in the specific context of online historical maps and allowed users to annotate and tag them. To study the effects of semantic tagging on tag production, the types and categories of obtained tags, and user task load, we conducted an in-lab within-subject experiment with 24 participants who annotated and tagged two distinct maps. We found that the semantic tagging implementation does not affect these parameters, while providing tagging relationships to well-defined concept definitions. Compared to label-based tagging, our technique also gathers positive and negative tagging relationships. We believe that our findings carry implications for designers who want to adopt semantic tagging in other contexts and systems on the Web.
△ Less
Submitted 5 April, 2013;
originally announced April 2013.
-
Finding Quality Issues in SKOS Vocabularies
Authors:
Christian Mader,
Bernhard Haslhofer,
Antoine Isaac
Abstract:
The Simple Knowledge Organization System (SKOS) is a standard model for controlled vocabularies on the Web. However, SKOS vocabularies often differ in terms of quality, which reduces their applicability across system boundaries. Here we investigate how we can support taxonomists in improving SKOS vocabularies by pointing out quality issues that go beyond the integrity constraints defined in the SK…
▽ More
The Simple Knowledge Organization System (SKOS) is a standard model for controlled vocabularies on the Web. However, SKOS vocabularies often differ in terms of quality, which reduces their applicability across system boundaries. Here we investigate how we can support taxonomists in improving SKOS vocabularies by pointing out quality issues that go beyond the integrity constraints defined in the SKOS specification. We identified potential quantifiable quality issues and formalized them into computable quality checking functions that can find affected resources in a given SKOS vocabulary. We implemented these functions in the qSKOS quality assessment tool, analyzed 15 existing vocabularies, and found possible quality issues in all of them.
△ Less
Submitted 6 June, 2012;
originally announced June 2012.
-
Open Annotations on Multimedia Web Resources
Authors:
Bernhard Haslhofer,
Robert Sanderson,
Rainer Simon,
Herbert van de Sompel
Abstract:
Many Web portals allow users to associate additional information with existing multimedia resources such as images, audio, and video. However, these portals are usually closed systems and user-generated annotations are almost always kept locked up and remain inaccessible to the Web of Data. We believe that an important step to take is the integration of multimedia annotations and the Linked Data p…
▽ More
Many Web portals allow users to associate additional information with existing multimedia resources such as images, audio, and video. However, these portals are usually closed systems and user-generated annotations are almost always kept locked up and remain inaccessible to the Web of Data. We believe that an important step to take is the integration of multimedia annotations and the Linked Data principles. We present the current state of the Open Annotation Model, explain our design rationale, and describe how the model can represent user annotations on multimedia Web resources. Applying this model in Web portals and devices, which support user annotations, should allow clients to easily publish and consume, thus exchange annotations on multimedia Web resources via common Web standards.
△ Less
Submitted 28 February, 2012;
originally announced February 2012.
-
The Open Annotation Collaboration (OAC) Model
Authors:
Bernhard Haslhofer,
Rainer Simon,
Robert Sanderson,
Herbert van de Sompel
Abstract:
Annotations allow users to associate additional information with existing resources. Using proprietary and closed systems on the Web, users are already able to annotate multimedia resources such as images, audio and video. So far, however, this information is almost always kept locked up and inaccessible to the Web of Data. We believe that an important step to take is the integration of multimedia…
▽ More
Annotations allow users to associate additional information with existing resources. Using proprietary and closed systems on the Web, users are already able to annotate multimedia resources such as images, audio and video. So far, however, this information is almost always kept locked up and inaccessible to the Web of Data. We believe that an important step to take is the integration of multimedia annotations and the Linked Data principles. This should allow clients to easily publish and consume, thus exchange annotations about resources via common Web standards. We first present the current status of the Open Annotation Collaboration, an international initiative that is currently working on annotation interoperability specifications based on best practices from the Linked Data effort. Then we present two use cases and early prototypes that make use of the proposed annotation model and present lessons learned and discuss yet open technical issues.
△ Less
Submitted 25 June, 2011;
originally announced June 2011.