-
Of Non-Linearity and Commutativity in BERT
Authors:
Sumu Zhao,
Damian Pascual,
Gino Brunner,
Roger Wattenhofer
Abstract:
In this work we provide new insights into the transformer architecture, and in particular, its best-known variant, BERT. First, we propose a method to measure the degree of non-linearity of different elements of transformers. Next, we focus our investigation on the feed-forward networks (FFN) inside transformers, which contain 2/3 of the model parameters and have so far not received much attention…
▽ More
In this work we provide new insights into the transformer architecture, and in particular, its best-known variant, BERT. First, we propose a method to measure the degree of non-linearity of different elements of transformers. Next, we focus our investigation on the feed-forward networks (FFN) inside transformers, which contain 2/3 of the model parameters and have so far not received much attention. We find that FFNs are an inefficient yet important architectural element and that they cannot simply be replaced by attention blocks without a degradation in performance. Moreover, we study the interactions between layers in BERT and show that, while the layers exhibit some hierarchical structure, they extract features in a fuzzy manner. Our results suggest that BERT has an inductive bias towards layer commutativity, which we find is mainly due to the skip connections. This provides a justification for the strong performance of recurrent and weight-shared transformer models.
△ Less
Submitted 7 May, 2021; v1 submitted 12 January, 2021;
originally announced January 2021.
-
KM-BART: Knowledge Enhanced Multimodal BART for Visual Commonsense Generation
Authors:
Yiran Xing,
Zai Shi,
Zhao Meng,
Gerhard Lakemeyer,
Yunpu Ma,
Roger Wattenhofer
Abstract:
We present Knowledge Enhanced Multimodal BART (KM-BART), which is a Transformer-based sequence-to-sequence model capable of reasoning about commonsense knowledge from multimodal inputs of images and texts. We adapt the generative BART architecture to a multimodal model with visual and textual inputs. We further develop novel pretraining tasks to improve the model performance on the Visual Commonse…
▽ More
We present Knowledge Enhanced Multimodal BART (KM-BART), which is a Transformer-based sequence-to-sequence model capable of reasoning about commonsense knowledge from multimodal inputs of images and texts. We adapt the generative BART architecture to a multimodal model with visual and textual inputs. We further develop novel pretraining tasks to improve the model performance on the Visual Commonsense Generation (VCG) task. In particular, our pretraining task of Knowledge-based Commonsense Generation (KCG) boosts model performance on the VCG task by leveraging commonsense knowledge from a large language model pretrained on external commonsense knowledge graphs. To the best of our knowledge, we are the first to propose a dedicated task for improving model performance on the VCG task. Experimental results show that our model reaches state-of-the-art performance on the VCG task by applying these novel pretraining tasks.
△ Less
Submitted 15 July, 2021; v1 submitted 2 January, 2021;
originally announced January 2021.
-
Directed Beam Search: Plug-and-Play Lexically Constrained Language Generation
Authors:
Damian Pascual,
Beni Egressy,
Florian Bolli,
Roger Wattenhofer
Abstract:
Large pre-trained language models are capable of generating realistic text. However, controlling these models so that the generated text satisfies lexical constraints, i.e., contains specific words, is a challenging problem. Given that state-of-the-art language models are too large to be trained from scratch in a manageable time, it is desirable to control these models without re-training them. Me…
▽ More
Large pre-trained language models are capable of generating realistic text. However, controlling these models so that the generated text satisfies lexical constraints, i.e., contains specific words, is a challenging problem. Given that state-of-the-art language models are too large to be trained from scratch in a manageable time, it is desirable to control these models without re-training them. Methods capable of doing this are called plug-and-play. Recent plug-and-play methods have been successful in constraining small bidirectional language models as well as forward models in tasks with a restricted search space, e.g., machine translation. However, controlling large transformer-based models to meet lexical constraints without re-training them remains a challenge. In this work, we propose Directed Beam Search (DBS), a plug-and-play method for lexically constrained language generation. Our method can be applied to any language model, is easy to implement and can be used for general language generation. In our experiments we use DBS to control GPT-2. We demonstrate its performance on keyword-to-phrase generation and we obtain comparable results as a state-of-the-art non-plug-and-play model for lexically constrained story generation.
△ Less
Submitted 30 December, 2020;
originally announced December 2020.
-
Sequential Defaulting in Financial Networks
Authors:
Pál András Papp,
Roger Wattenhofer
Abstract:
We consider financial networks, where banks are connected by contracts such as debts or credit default swaps. We study the clearing problem in these systems: we want to know which banks end up in a default, and what portion of their liabilities can these defaulting banks fulfill. We analyze these networks in a sequential model where banks announce their default one at a time, and the system evolve…
▽ More
We consider financial networks, where banks are connected by contracts such as debts or credit default swaps. We study the clearing problem in these systems: we want to know which banks end up in a default, and what portion of their liabilities can these defaulting banks fulfill. We analyze these networks in a sequential model where banks announce their default one at a time, and the system evolves in a step-by-step manner.
We first consider the reversible model of these systems, where banks may return from a default. We show that the stabilization time in this model can heavily depend on the ordering of announcements. However, we also show that there are systems where for any choice of ordering, the process lasts for an exponential number of steps before an eventual stabilization. We also show that finding the ordering with the smallest (or largest) number of banks ending up in default is an NP-hard problem. Furthermore, we prove that defaulting early can be an advantageous strategy for banks in some cases, and in general, finding the best time for a default announcement is NP-hard. Finally, we discuss how changing some properties of this setting affects the stabilization time of the process, and then use these techniques to devise a monotone model of the systems, which ensures that every network stabilizes eventually.
△ Less
Submitted 20 November, 2020;
originally announced November 2020.
-
Space Complexity of Streaming Algorithms on Universal Quantum Computers
Authors:
Yanglin Hu,
Darya Melnyk,
Yuyi Wang,
Roger Wattenhofer
Abstract:
Universal quantum computers are the only general purpose quantum computers known that can be implemented as of today. These computers consist of a classical memory component which controls the quantum memory. In this paper, the space complexity of some data stream problems, such as PartialMOD and Equality, is investigated on universal quantum computers. The quantum algorithms for these problems ar…
▽ More
Universal quantum computers are the only general purpose quantum computers known that can be implemented as of today. These computers consist of a classical memory component which controls the quantum memory. In this paper, the space complexity of some data stream problems, such as PartialMOD and Equality, is investigated on universal quantum computers. The quantum algorithms for these problems are believed to outperform their classical counterparts. Universal quantum computers, however, need classical bits for controlling quantum gates in addition to qubits. Our analysis shows that the number of classical bits used in quantum algorithms is equal to or even larger than that of classical bits used in corresponding classical algorithms. These results suggest that there is no advantage of implementing certain data stream problems on universal quantum computers instead of classical computers when space complexity is considered.
△ Less
Submitted 31 October, 2020;
originally announced November 2020.
-
Contrastive Graph Neural Network Explanation
Authors:
Lukas Faber,
Amin K. Moghaddam,
Roger Wattenhofer
Abstract:
Graph Neural Networks achieve remarkable results on problems with structured data but come as black-box predictors. Transferring existing explanation techniques, such as occlusion, fails as even removing a single node or edge can lead to drastic changes in the graph. The resulting graphs can differ from all training examples, causing model confusion and wrong explanations. Thus, we argue that expl…
▽ More
Graph Neural Networks achieve remarkable results on problems with structured data but come as black-box predictors. Transferring existing explanation techniques, such as occlusion, fails as even removing a single node or edge can lead to drastic changes in the graph. The resulting graphs can differ from all training examples, causing model confusion and wrong explanations. Thus, we argue that explicability must use graphs compliant with the distribution underlying the training data. We coin this property Distribution Compliant Explanation (DCE) and present a novel Contrastive GNN Explanation (CoGE) technique following this paradigm. An experimental study supports the efficacy of CoGE.
△ Less
Submitted 26 October, 2020;
originally announced October 2020.
-
A Geometry-Inspired Attack for Generating Natural Language Adversarial Examples
Authors:
Zhao Meng,
Roger Wattenhofer
Abstract:
Generating adversarial examples for natural language is hard, as natural language consists of discrete symbols, and examples are often of variable lengths. In this paper, we propose a geometry-inspired attack for generating natural language adversarial examples. Our attack generates adversarial examples by iteratively approximating the decision boundary of Deep Neural Networks (DNNs). Experiments…
▽ More
Generating adversarial examples for natural language is hard, as natural language consists of discrete symbols, and examples are often of variable lengths. In this paper, we propose a geometry-inspired attack for generating natural language adversarial examples. Our attack generates adversarial examples by iteratively approximating the decision boundary of Deep Neural Networks (DNNs). Experiments on two datasets with two different models show that our attack fools natural language models with high success rates, while only replacing a few words. Human evaluation shows that adversarial examples generated by our attack are hard for humans to recognize. Further experiments show that adversarial training can improve model robustness against our attack.
△ Less
Submitted 3 October, 2020;
originally announced October 2020.
-
Brain2Word: Decoding Brain Activity for Language Generation
Authors:
Nicolas Affolter,
Beni Egressy,
Damian Pascual,
Roger Wattenhofer
Abstract:
Brain decoding, understood as the process of map** brain activities to the stimuli that generated them, has been an active research area in the last years. In the case of language stimuli, recent studies have shown that it is possible to decode fMRI scans into an embedding of the word a subject is reading. However, such word embeddings are designed for natural language processing tasks rather th…
▽ More
Brain decoding, understood as the process of map** brain activities to the stimuli that generated them, has been an active research area in the last years. In the case of language stimuli, recent studies have shown that it is possible to decode fMRI scans into an embedding of the word a subject is reading. However, such word embeddings are designed for natural language processing tasks rather than for brain decoding. Therefore, they limit our ability to recover the precise stimulus. In this work, we propose to directly classify an fMRI scan, map** it to the corresponding word within a fixed vocabulary. Unlike existing work, we evaluate on scans from previously unseen subjects. We argue that this is a more realistic setup and we present a model that can decode fMRI data from unseen subjects. Our model achieves 5.22% Top-1 and 13.59% Top-5 accuracy in this challenging task, significantly outperforming all the considered competitive baselines. Furthermore, we use the decoded words to guide language generation with the GPT-2 model. This way, we advance the quest for a system that translates brain activities into coherent text.
△ Less
Submitted 11 November, 2020; v1 submitted 10 September, 2020;
originally announced September 2020.
-
FnF-BFT: Exploring Performance Limits of BFT Protocols
Authors:
Zeta Avarikioti,
Lioba Heimbach,
Roland Schmid,
Laurent Vanbever,
Roger Wattenhofer,
Patrick Wintermeyer
Abstract:
We introduce FnF-BFT, a parallel-leader byzantine fault-tolerant state-machine replication protocol for the partially synchronous model with theoretical performance bounds during synchrony. By allowing all replicas to act as leaders and propose requests independently, FnF-BFT parallelizes the execution of requests. Leader parallelization distributes the load over the entire network -- increasing t…
▽ More
We introduce FnF-BFT, a parallel-leader byzantine fault-tolerant state-machine replication protocol for the partially synchronous model with theoretical performance bounds during synchrony. By allowing all replicas to act as leaders and propose requests independently, FnF-BFT parallelizes the execution of requests. Leader parallelization distributes the load over the entire network -- increasing throughput by overcoming the single-leader bottleneck. We further use historical data to ensure that well-performing replicas are in command. FnF-BFT's communication complexity is linear in the number of replicas during synchrony and thus competitive with state-of-the-art protocols. Finally, with FnF-BFT, we introduce a BFT protocol with performance guarantees in stable network conditions under truly byzantine attacks.
A prototype implementation of \prot outperforms (state-of-the-art) HotStuff's throughput, especially as replicas increase, showcasing \prot's significantly improved scaling capabilities.
△ Less
Submitted 10 March, 2021; v1 submitted 4 September, 2020;
originally announced September 2020.
-
Medley2K: A Dataset of Medley Transitions
Authors:
Lukas Faber,
Sandro Luck,
Damian Pascual,
Andreas Roth,
Gino Brunner,
Roger Wattenhofer
Abstract:
The automatic generation of medleys, i.e., musical pieces formed by different songs concatenated via smooth transitions, is not well studied in the current literature. To facilitate research on this topic, we make available a dataset called Medley2K that consists of 2,000 medleys and 7,712 labeled transitions. Our dataset features a rich variety of song transitions across different music genres. W…
▽ More
The automatic generation of medleys, i.e., musical pieces formed by different songs concatenated via smooth transitions, is not well studied in the current literature. To facilitate research on this topic, we make available a dataset called Medley2K that consists of 2,000 medleys and 7,712 labeled transitions. Our dataset features a rich variety of song transitions across different music genres. We provide a detailed description of this dataset and validate it by training a state-of-the-art generative model in the task of generating transitions between songs.
△ Less
Submitted 25 August, 2020;
originally announced August 2020.
-
Asynchronous Byzantine Agreement in Incomplete Networks [Technical Report]
Authors:
Ye Wang,
Roger Wattenhofer
Abstract:
The Byzantine agreement problem is considered to be a core problem in distributed systems. For example, Byzantine agreement is needed to build a blockchain, a totally ordered log of records. Blockchains are asynchronous distributed systems, fault-tolerant against Byzantine nodes.
In the literature, the asynchronous byzantine agreement problem is studied in a fully connected network model where e…
▽ More
The Byzantine agreement problem is considered to be a core problem in distributed systems. For example, Byzantine agreement is needed to build a blockchain, a totally ordered log of records. Blockchains are asynchronous distributed systems, fault-tolerant against Byzantine nodes.
In the literature, the asynchronous byzantine agreement problem is studied in a fully connected network model where every node can directly send messages to every other node. This assumption is questionable in many real-world environments. In the reality, nodes might need to communicate by means of an incomplete network, and Byzantine nodes might not forward messages. Furthermore, Byzantine nodes might not behave correctly and, for example, corrupt messages. Therefore, in order to truly understand Byzantine Agreement, we need both ingredients: asynchrony and incomplete communication networks.
In this paper, we study the asynchronous Byzantine agreement problem in incomplete networks. A classic result by Danny Dolev proved that in a distributed system with n nodes in the presence of f Byzantine nodes, the vertex connectivity of the system communication graph should be at least (2f+1). While Dolev's result was for synchronous deterministic systems, we demonstrate that the same bound also holds for asynchronous randomized systems. We show that the bound is tight by presenting a randomized algorithm, and a matching lower bound.
△ Less
Submitted 26 May, 2020;
originally announced May 2020.
-
Normalized Attention Without Probability Cage
Authors:
Oliver Richter,
Roger Wattenhofer
Abstract:
Attention architectures are widely used; they recently gained renewed popularity with Transformers yielding a streak of state of the art results. Yet, the geometrical implications of softmax-attention remain largely unexplored. In this work we highlight the limitations of constraining attention weights to the probability simplex and the resulting convex hull of value vectors. We show that Transfor…
▽ More
Attention architectures are widely used; they recently gained renewed popularity with Transformers yielding a streak of state of the art results. Yet, the geometrical implications of softmax-attention remain largely unexplored. In this work we highlight the limitations of constraining attention weights to the probability simplex and the resulting convex hull of value vectors. We show that Transformers are sequence length dependent biased towards token isolation at initialization and contrast Transformers to simple max- and sum-pooling - two strong baselines rarely reported. We propose to replace the softmax in self-attention with normalization, yielding a hyperparameter and data-bias robust, generally applicable architecture. We support our insights with empirical results from more than 25,000 trained models. All results and implementations are made available.
△ Less
Submitted 19 May, 2020;
originally announced May 2020.
-
On the Hardness of Red-Blue Pebble Games
Authors:
Pál András Papp,
Roger Wattenhofer
Abstract:
Red-blue pebble games model the computation cost of a two-level memory hierarchy. We present various hardness results in different red-blue pebbling variants, with a focus on the oneshot model. We first study the relationship between previously introduced red-blue pebble models (base, oneshot, nodel). We also analyze a new variant (compcost) to obtain a more realistic model of computation. We then…
▽ More
Red-blue pebble games model the computation cost of a two-level memory hierarchy. We present various hardness results in different red-blue pebbling variants, with a focus on the oneshot model. We first study the relationship between previously introduced red-blue pebble models (base, oneshot, nodel). We also analyze a new variant (compcost) to obtain a more realistic model of computation. We then prove that red-blue pebbling is NP-hard in all of these model variants. Furthermore, we show that in the oneshot model, a $δ$-approximation algorithm for $δ<2$ is only possible if the unique games conjecture is false. Finally, we show that greedy algorithms are not good candidates for approximation, since they can return significantly worse solutions than the optimum.
△ Less
Submitted 18 May, 2020;
originally announced May 2020.
-
A General Stabilization Bound for Influence Propagation in Graphs
Authors:
Pál András Papp,
Roger Wattenhofer
Abstract:
We study the stabilization time of a wide class of processes on graphs, in which each node can only switch its state if it is motivated to do so by at least a $\frac{1+λ}{2}$ fraction of its neighbors, for some $0 < λ< 1$. Two examples of such processes are well-studied dynamically changing colorings in graphs: in majority processes, nodes switch to the most frequent color in their neighborhood, w…
▽ More
We study the stabilization time of a wide class of processes on graphs, in which each node can only switch its state if it is motivated to do so by at least a $\frac{1+λ}{2}$ fraction of its neighbors, for some $0 < λ< 1$. Two examples of such processes are well-studied dynamically changing colorings in graphs: in majority processes, nodes switch to the most frequent color in their neighborhood, while in minority processes, nodes switch to the least frequent color in their neighborhood. We describe a non-elementary function $f(λ)$, and we show that in the sequential model, the worst-case stabilization time of these processes can completely be characterized by $f(λ)$. More precisely, we prove that for any $ε>0$, $O(n^{1+f(λ)+ε})$ is an upper bound on the stabilization time of any proportional majority/minority process, and we also show that there are graph constructions where stabilization indeed takes $Ω(n^{1+f(λ)-ε})$ steps.
△ Less
Submitted 20 April, 2020;
originally announced April 2020.
-
Neural Status Registers
Authors:
Lukas Faber,
Roger Wattenhofer
Abstract:
Standard Neural Networks can learn mathematical operations, but they do not extrapolate. Extrapolation means that the model can apply to larger numbers, well beyond those observed during training. Recent architectures tackle arithmetic operations and can extrapolate; however, the equally important problem of quantitative reasoning remains unaddressed. In this work, we propose a novel architectural…
▽ More
Standard Neural Networks can learn mathematical operations, but they do not extrapolate. Extrapolation means that the model can apply to larger numbers, well beyond those observed during training. Recent architectures tackle arithmetic operations and can extrapolate; however, the equally important problem of quantitative reasoning remains unaddressed. In this work, we propose a novel architectural element, the Neural Status Register (NSR), for quantitative reasoning over numbers. Our NSR relaxes the discrete bit logic of physical status registers to continuous numbers and allows end-to-end learning with gradient descent. Experiments show that the NSR achieves solutions that extrapolate to numbers many orders of magnitude larger than those in the training set. We successfully train the NSR on number comparisons, piecewise discontinuous functions, counting in sequences, recurrently finding minimums, finding shortest paths in graphs, and comparing digits in images.
△ Less
Submitted 11 March, 2021; v1 submitted 15 April, 2020;
originally announced April 2020.
-
Telling BERT's full story: from Local Attention to Global Aggregation
Authors:
Damian Pascual,
Gino Brunner,
Roger Wattenhofer
Abstract:
We take a deep look into the behavior of self-attention heads in the transformer architecture. In light of recent work discouraging the use of attention distributions for explaining a model's behavior, we show that attention distributions can nevertheless provide insights into the local behavior of attention heads. This way, we propose a distinction between local patterns revealed by attention and…
▽ More
We take a deep look into the behavior of self-attention heads in the transformer architecture. In light of recent work discouraging the use of attention distributions for explaining a model's behavior, we show that attention distributions can nevertheless provide insights into the local behavior of attention heads. This way, we propose a distinction between local patterns revealed by attention and global patterns that refer back to the input, and analyze BERT from both angles. We use gradient attribution to analyze how the output of an attention attention head depends on the input tokens, effectively extending the local attention-based analysis to account for the mixing of information throughout the transformer layers. We find that there is a significant discrepancy between attention and attribution distributions, caused by the mixing of context inside the model. We quantify this discrepancy and observe that interestingly, there are some patterns that persist across all layers despite the mixing.
△ Less
Submitted 13 January, 2021; v1 submitted 9 April, 2020;
originally announced April 2020.
-
Default Ambiguity: Finding the Best Solution to the Clearing Problem
Authors:
Pál András Papp,
Roger Wattenhofer
Abstract:
We study financial networks with debt contracts and credit default swaps between specific pairs of banks. Given such a financial system, we want to decide which of the banks are in default, and how much of their liabilities can these defaulting banks pay. There can easily be multiple different solutions to this problem, leading to a situation of default ambiguity, and a range of possible solutions…
▽ More
We study financial networks with debt contracts and credit default swaps between specific pairs of banks. Given such a financial system, we want to decide which of the banks are in default, and how much of their liabilities can these defaulting banks pay. There can easily be multiple different solutions to this problem, leading to a situation of default ambiguity, and a range of possible solutions to implement for a financial authority.
In this paper, we study the properties of the solution space of such financial systems, and analyze a wide range of reasonable objective functions for selecting from the set of solutions. Examples of such objective functions include minimizing the number of defaulting banks, minimizing the amount of unpaid debt, maximizing the number of satisfied banks, and many others. We show that for all of these objectives, it is NP-hard to approximate the optimal solution to an $n^{1-ε}$ factor for any $ε>0$, with $n$ denoting the number of banks. Furthermore, we show that this situation is rather difficult to avoid from a financial regulator's perspective: the same hardness results also hold if we apply strong restrictions on the weights of the debts, the structure of the network, or the amount of funds that banks must possess. However, if we restrict both the network structure and the amount of funds simultaneously, then the solution becomes unique, and it can be found efficiently.
△ Less
Submitted 8 October, 2021; v1 submitted 18 February, 2020;
originally announced February 2020.
-
Network-Aware Strategies in Financial Systems
Authors:
Pál András Papp,
Roger Wattenhofer
Abstract:
We study the incentives of banks in a financial network, where the network consists of debt contracts and credit default swaps (CDSs) between banks. One of the most important questions in such a system is the problem of deciding which of the banks are in default, and how much of their liabilities these banks can pay. We study the payoff and preferences of the banks in the different solutions to th…
▽ More
We study the incentives of banks in a financial network, where the network consists of debt contracts and credit default swaps (CDSs) between banks. One of the most important questions in such a system is the problem of deciding which of the banks are in default, and how much of their liabilities these banks can pay. We study the payoff and preferences of the banks in the different solutions to this problem. We also introduce a more refined model which allows assigning priorities to payment obligations; this provides a more expressive and realistic model of real-life financial systems, while it always ensures the existence of a solution.
The main focus of the paper is an analysis of the actions that a single bank can execute in a financial system in order to influence the outcome to its advantage. We show that removing an incoming debt, or donating funds to another bank can result in a single new solution that is strictly more favorable to the acting bank. We also show that increasing the bank's external funds or modifying the priorities of outgoing payments cannot introduce a more favorable new solution into the system, but may allow the bank to remove some unfavorable solutions, or to increase its recovery rate. Finally, we show how the actions of two banks in a simple financial system can result in classical game theoretic situations like the prisoner's dilemma or the dollar auction, demonstrating the wide expressive capability of the financial system model.
△ Less
Submitted 18 February, 2020;
originally announced February 2020.
-
Ride the Lightning: The Game Theory of Payment Channels
Authors:
Zeta Avarikioti,
Lioba Heimbach,
Yuyi Wang,
Roger Wattenhofer
Abstract:
Payment channels were introduced to solve various eminent cryptocurrency scalability issues. Multiple payment channels build a network on top of a blockchain, the so-called layer 2. In this work, we analyze payment networks through the lens of network creation games. We identify betweenness and closeness centrality as central concepts regarding payment networks. We study the topologies that emerge…
▽ More
Payment channels were introduced to solve various eminent cryptocurrency scalability issues. Multiple payment channels build a network on top of a blockchain, the so-called layer 2. In this work, we analyze payment networks through the lens of network creation games. We identify betweenness and closeness centrality as central concepts regarding payment networks. We study the topologies that emerge when players act selfishly and determine the parameter space in which they constitute a Nash equilibrium. Moreover, we determine the social optima depending on the correlation of betweenness and closeness centrality. When possible, we bound the price of anarchy. We also briefly discuss the price of stability.
△ Less
Submitted 10 December, 2019;
originally announced December 2019.
-
Divide and Scale: Formalization and Roadmap to Robust Sharding
Authors:
Georgia Avarikioti,
Antoine Desjardins,
Eleftherios Kokoris-Kogias,
Roger Wattenhofer
Abstract:
Sharding distributed ledgers is a promising on-chain solution for scaling blockchains but lacks formal grounds, nurturing skepticism on whether such complex systems can scale blockchains securely. We fill this gap by introducing the first formal framework as well as a roadmap to robust sharding. In particular, we first define the properties sharded distributed ledgers should fulfill. We build upon…
▽ More
Sharding distributed ledgers is a promising on-chain solution for scaling blockchains but lacks formal grounds, nurturing skepticism on whether such complex systems can scale blockchains securely. We fill this gap by introducing the first formal framework as well as a roadmap to robust sharding. In particular, we first define the properties sharded distributed ledgers should fulfill. We build upon and extend the Bitcoin backbone protocol by defining consistency and scalability. Consistency encompasses the need for atomic execution of cross-shard transactions to preserve safety, whereas scalability encapsulates the speedup a sharded system can gain in comparison to a non-sharded system.
Using our model, we explore the limitations of sharding. We show that a sharded ledger with $n$ participants cannot scale under a fully adaptive adversary, but it can scale up to $m$ shards where $n=c'm\log m$, under an epoch-adaptive adversary; the constant $c'$ encompasses the trade-off between security and scalability. This is possible only if the sharded ledgers create succinct proofs of the valid state updates at every epoch. We leverage our results to identify the sufficient components for robust sharding, which we incorporate in a protocol abstraction termed Divide & Scale. To demonstrate the power of our framework, we analyze the most prominent sharded blockchains (Elastico, Monoxide, OmniLedger, RapidChain) and pinpoint where they fail to meet the desired properties.
△ Less
Submitted 22 May, 2023; v1 submitted 23 October, 2019;
originally announced October 2019.
-
ABC: Proof-of-Stake without Consensus
Authors:
Jakub Sliwinski,
Roger Wattenhofer
Abstract:
We introduce a new permissionless blockchain architecture called ABC. ABC is completely asynchronous, and does rely on neither randomness nor proof-of-work. ABC can be parallelized, and transactions have finality within one round trip of communication. However, ABC satisfies only a relaxed form of consensus by introducing a weaker termination property. Without full consensus, ABC cannot support ce…
▽ More
We introduce a new permissionless blockchain architecture called ABC. ABC is completely asynchronous, and does rely on neither randomness nor proof-of-work. ABC can be parallelized, and transactions have finality within one round trip of communication. However, ABC satisfies only a relaxed form of consensus by introducing a weaker termination property. Without full consensus, ABC cannot support certain applications, in particular ABC cannot support general smart contracts. However, many important applications do not need general smart contracts, and ABC is a better solution for these applications. In particular, ABC can implement the functionality of a cryptocurrency like Bitcoin, replacing Bitcoin's energy-hungry proof-of-work with a proof-of-stake validation.
△ Less
Submitted 20 July, 2020; v1 submitted 24 September, 2019;
originally announced September 2019.
-
On Identifiability in Transformers
Authors:
Gino Brunner,
Yang Liu,
Damián Pascual,
Oliver Richter,
Massimiliano Ciaramita,
Roger Wattenhofer
Abstract:
In this paper we delve deep in the Transformer architecture by investigating two of its core components: self-attention and contextual embeddings. In particular, we study the identifiability of attention weights and token embeddings, and the aggregation of context into hidden tokens. We show that, for sequences longer than the attention head dimension, attention weights are not identifiable. We pr…
▽ More
In this paper we delve deep in the Transformer architecture by investigating two of its core components: self-attention and contextual embeddings. In particular, we study the identifiability of attention weights and token embeddings, and the aggregation of context into hidden tokens. We show that, for sequences longer than the attention head dimension, attention weights are not identifiable. We propose effective attention as a complementary tool for improving explanatory interpretations based on attention. Furthermore, we show that input tokens retain to a large degree their identity across the model. We also find evidence suggesting that identity information is mainly encoded in the angle of the embeddings and gradually decreases with depth. Finally, we demonstrate strong mixing of input information in the generation of contextual embeddings by means of a novel quantification method based on gradient attribution. Overall, we show that self-attention distributions are not directly interpretable and present tools to better understand and further investigate Transformer models.
△ Less
Submitted 7 February, 2020; v1 submitted 12 August, 2019;
originally announced August 2019.
-
Payment Networks as Creation Games
Authors:
Georgia Avarikioti,
Rolf Scheuner,
Roger Wattenhofer
Abstract:
Payment networks were introduced to address the limitation on the transaction throughput of popular blockchains. To open a payment channel one has to publish a transaction on-chain and pay the appropriate transaction fee. A transaction can be routed in the network, as long as there is a path of channels with the necessary capital. The intermediate nodes on this path can ask for a fee to forward th…
▽ More
Payment networks were introduced to address the limitation on the transaction throughput of popular blockchains. To open a payment channel one has to publish a transaction on-chain and pay the appropriate transaction fee. A transaction can be routed in the network, as long as there is a path of channels with the necessary capital. The intermediate nodes on this path can ask for a fee to forward the transaction. Hence, opening channels, although costly, can benefit a party, both by reducing the cost of the party for sending a transaction and by collecting the fees from forwarding transactions of other parties.
This trade-off spawns a network creation game between the channel parties. In this work, we introduce the first game theoretic model for analyzing the network creation game on blockchain payment channels. Further, we examine various network structures (path, star, complete bipartite graph and clique) and determine for each one of them the constraints (fee value) under which they constitute a Nash equilibrium, given a fixed fee policy. Last, we show that the star is a Nash equilibrium when each channel party can freely decide the channel fee. On the other hand, we prove the complete bipartite graph can never be a Nash equilibrium, given a free fee policy.
△ Less
Submitted 5 August, 2019; v1 submitted 1 August, 2019;
originally announced August 2019.
-
Online Payment Network Design
Authors:
Georgia Avarikioti,
Kenan Besic,
Yuyi Wang,
Roger Wattenhofer
Abstract:
Payment channels allow transactions between participants of the blockchain to be executed securely off-chain, and thus provide a promising solution for the scalability problem of popular blockchains. We study the online network design problem for payment channels, assuming a central coordinator. We focus on a single channel, where the coordinator desires to maximize the number of accepted transact…
▽ More
Payment channels allow transactions between participants of the blockchain to be executed securely off-chain, and thus provide a promising solution for the scalability problem of popular blockchains. We study the online network design problem for payment channels, assuming a central coordinator. We focus on a single channel, where the coordinator desires to maximize the number of accepted transactions under given capital constraints. Despite the simplicity of the problem, we present a flurry of impossibility results, both for deterministic and randomized algorithms against adaptive as well as oblivious adversaries.
△ Less
Submitted 1 August, 2019;
originally announced August 2019.
-
Bitcoin Security under Temporary Dishonest Majority
Authors:
Georgia Avarikioti,
Lukas Kaeppeli,
Yuyi Wang,
Roger Wattenhofer
Abstract:
We prove Bitcoin is secure under temporary dishonest majority. We assume the adversary can corrupt a specific fraction of parties and also introduce crash failures, i.e., some honest participants are offline during the execution of the protocol. We demand a majority of honest online participants on expectation. We explore three different models and present the requirements for proving Bitcoin's se…
▽ More
We prove Bitcoin is secure under temporary dishonest majority. We assume the adversary can corrupt a specific fraction of parties and also introduce crash failures, i.e., some honest participants are offline during the execution of the protocol. We demand a majority of honest online participants on expectation. We explore three different models and present the requirements for proving Bitcoin's security in all of them: we first examine a synchronous model, then extend to a bounded delay model and last we consider a synchronous model that allows message losses.
△ Less
Submitted 1 August, 2019;
originally announced August 2019.
-
Synthetic Epileptic Brain Activities Using Generative Adversarial Networks
Authors:
Damian Pascual,
Amir Aminifar,
David Atienza,
Philippe Ryvlin,
Roger Wattenhofer
Abstract:
Epilepsy is a chronic neurological disorder affecting more than 65 million people worldwide and manifested by recurrent unprovoked seizures. The unpredictability of seizures not only degrades the quality of life of the patients, but it can also be life-threatening. Modern systems monitoring electroencephalography (EEG) signals are being currently developed with the view to detect epileptic seizure…
▽ More
Epilepsy is a chronic neurological disorder affecting more than 65 million people worldwide and manifested by recurrent unprovoked seizures. The unpredictability of seizures not only degrades the quality of life of the patients, but it can also be life-threatening. Modern systems monitoring electroencephalography (EEG) signals are being currently developed with the view to detect epileptic seizures in order to alert caregivers and reduce the impact of seizures on patients' quality of life. Such seizure detection systems employ state-of-the-art machine learning algorithms that require a considerably large amount of labeled personal data for training. However, acquiring EEG signals of epileptic seizures is a costly and time-consuming process for medical experts and patients, currently requiring in-hospital recordings in specialized units. In this work, we generate synthetic seizure-like brain electrical activities, i.e., EEG signals, that can be used to train seizure detection algorithms, alleviating the need for recorded data. First, we train a Generative Adversarial Network (GAN) with data from 30 epilepsy patients. Then, we generate synthetic personalized training sets for new, unseen patients, which overall yield higher detection performance than the real-data training sets. We demonstrate our results using the datasets from the EPILEPSIAE Project, one of the world's largest public databases for seizure detection.
△ Less
Submitted 12 November, 2019; v1 submitted 22 July, 2019;
originally announced July 2019.
-
Attentive Multi-Task Deep Reinforcement Learning
Authors:
Timo Bram,
Gino Brunner,
Oliver Richter,
Roger Wattenhofer
Abstract:
Sharing knowledge between tasks is vital for efficient learning in a multi-task setting. However, most research so far has focused on the easier case where knowledge transfer is not harmful, i.e., where knowledge from one task cannot negatively impact the performance on another task. In contrast, we present an approach to multi-task deep reinforcement learning based on attention that does not requ…
▽ More
Sharing knowledge between tasks is vital for efficient learning in a multi-task setting. However, most research so far has focused on the easier case where knowledge transfer is not harmful, i.e., where knowledge from one task cannot negatively impact the performance on another task. In contrast, we present an approach to multi-task deep reinforcement learning based on attention that does not require any a-priori assumptions about the relationships between tasks. Our attention network automatically groups task knowledge into sub-networks on a state level granularity. It thereby achieves positive knowledge transfer if possible, and avoids negative transfer in cases where tasks interfere. We test our algorithm against two state-of-the-art multi-task/transfer learning approaches and show comparable or superior performance while requiring fewer network parameters.
△ Less
Submitted 5 July, 2019;
originally announced July 2019.
-
Stabilization Time in Minority Processes
Authors:
Pál András Papp,
Roger Wattenhofer
Abstract:
We analyze the stabilization time of minority processes in graphs. A minority process is a dynamically changing coloring, where each node repeatedly changes its color to the color which is least frequent in its neighborhood. First, we present a simple $Ω(n^2)$ stabilization time lower bound in the sequential adversarial model. Our main contribution is a graph construction which proves a…
▽ More
We analyze the stabilization time of minority processes in graphs. A minority process is a dynamically changing coloring, where each node repeatedly changes its color to the color which is least frequent in its neighborhood. First, we present a simple $Ω(n^2)$ stabilization time lower bound in the sequential adversarial model. Our main contribution is a graph construction which proves a $Ω(n^{2-ε})$ stabilization time lower bound for any $ε>0$. This lower bound holds even if the order of nodes is chosen benevolently, not only in the sequential model, but also in any reasonable concurrent model of the process.
△ Less
Submitted 3 July, 2019;
originally announced July 2019.
-
Learning Policies through Quantile Regression
Authors:
Oliver Richter,
Roger Wattenhofer
Abstract:
Policy gradient based reinforcement learning algorithms coupled with neural networks have shown success in learning complex policies in the model free continuous action space control setting. However, explicitly parameterized policies are limited by the scope of the chosen parametric probability distribution. We show that alternatively to the likelihood based policy gradient, a related objective c…
▽ More
Policy gradient based reinforcement learning algorithms coupled with neural networks have shown success in learning complex policies in the model free continuous action space control setting. However, explicitly parameterized policies are limited by the scope of the chosen parametric probability distribution. We show that alternatively to the likelihood based policy gradient, a related objective can be optimized through advantage weighted quantile regression. Our approach models the policy implicitly in the network, which gives the agent the freedom to approximate any distribution in each action dimension, not limiting its capabilities to the commonly used unimodal Gaussian parameterization. This broader spectrum of policies makes our algorithm suitable for problems where Gaussian policies cannot fit the optimal policy. Moreover, our results on the MuJoCo physics simulator benchmarks are comparable or superior to state-of-the-art on-policy methods.
△ Less
Submitted 27 September, 2019; v1 submitted 27 June, 2019;
originally announced June 2019.
-
PermitBFT: Exploring the Byzantine Fast-Path
Authors:
Roland Schmid,
Roger Wattenhofer
Abstract:
PermitBFT establishes a permissioned byzantine ledger in the partially synchronous networking model. For n replicas, PermitBFT tolerates up to f < n/3 byzantine replicas. It is the first BFT protocol to achieve a latency of just 2 message delays despite tolerating byzantine replicas throughout the "fast track", as long as they are not the leader. The design of PermitBFT relies on two fundamental c…
▽ More
PermitBFT establishes a permissioned byzantine ledger in the partially synchronous networking model. For n replicas, PermitBFT tolerates up to f < n/3 byzantine replicas. It is the first BFT protocol to achieve a latency of just 2 message delays despite tolerating byzantine replicas throughout the "fast track", as long as they are not the leader. The design of PermitBFT relies on two fundamental concepts. First, in PermitBFT the participating nodes do not wait for a distinguished leader to act and subsequently confirm its actions, but send permits to the next leader proactively. Second, PermitBFT achieves a separation of the decision powers that are usually concentrated on a single leader node. A leader in PermitBFT controls which transactions to include in a new block, but not where to append the block in the block graph.
△ Less
Submitted 30 October, 2020; v1 submitted 25 June, 2019;
originally announced June 2019.
-
Brick: Asynchronous Payment Channels
Authors:
Georgia Avarikioti,
Eleftherios Kokoris Kogias,
Roger Wattenhofer,
Dionysis Zindros
Abstract:
Off-chain protocols (channels) are a promising solution to the scalability and privacy challenges of blockchain payments. Current proposals, however, require synchrony assumptions to preserve the safety of a channel, leaking to an adversary the exact amount of time needed to control the network for a successful attack. In this paper, we introduce Brick, the first payment channel that remains secur…
▽ More
Off-chain protocols (channels) are a promising solution to the scalability and privacy challenges of blockchain payments. Current proposals, however, require synchrony assumptions to preserve the safety of a channel, leaking to an adversary the exact amount of time needed to control the network for a successful attack. In this paper, we introduce Brick, the first payment channel that remains secure under network asynchrony and concurrently provides correct incentives. The core idea is to incorporate the conflict resolution process within the channel by introducing a rational committee of external parties, called Wardens. Hence, if a party wants to close a channel unilaterally, it can only get the committee's approval for the last valid state. Brick provides sub-second latency because it does not employ heavy-weight consensus. Instead, Brick uses consistent broadcast to announce updates and close the channel, a light-weight abstraction that is powerful enough to preserve safety and liveness to any rational parties. Furthermore, we consider permissioned blockchains, where the additional property of auditability might be desired for regulatory purposes. We introduce Brick+, an off-chain construction that provides auditability on top of Brick without conflicting with its privacy guarantees. We formally define the properties our payment channel construction should fulfill, and prove that both Brick and Brick+ satisfy them. We also design incentives for Brick such that honest and rational behavior aligns. Finally, we provide a reference implementation of the smart contracts in Solidity.
△ Less
Submitted 19 June, 2020; v1 submitted 27 May, 2019;
originally announced May 2019.
-
Online Graph Exploration on a Restricted Graph Class: Optimal Solutions for Tadpole Graphs
Authors:
Sebastian Brandt,
Klaus-Tycho Foerster,
Jonathan Maurer,
Roger Wattenhofer
Abstract:
We study the problem of online graph exploration on undirected graphs, where a searcher has to visit every vertex and return to the origin. Once a new vertex is visited, the searcher learns of all neighboring vertices and the connecting edge weights. The goal such an exploration is to minimize its total cost, where each edge traversal incurs a cost of the corresponding edge weight. We investigate…
▽ More
We study the problem of online graph exploration on undirected graphs, where a searcher has to visit every vertex and return to the origin. Once a new vertex is visited, the searcher learns of all neighboring vertices and the connecting edge weights. The goal such an exploration is to minimize its total cost, where each edge traversal incurs a cost of the corresponding edge weight. We investigate the problem on tadpole graphs (also known as dragons, kites), which consist of a cycle with an attached path. Miyazaki et al. (The online graph exploration problem on restricted graphs, IEICE Transactions 92-D (9), 2009) showed that every online algorithm on these graphs must have a competitive ratio of 2-epsilon, but did not provide upper bounds for non-unit edge weights. We show via amortized analysis that a greedy approach yields a matching competitive ratio of 2 on tadpole graphs, for arbitrary non-negative edge weights.
△ Less
Submitted 18 April, 2020; v1 submitted 1 March, 2019;
originally announced March 2019.
-
Stabilization Time in Weighted Minority Processes
Authors:
Pál András Papp,
Roger Wattenhofer
Abstract:
A minority process in a weighted graph is a dynamically changing coloring. Each node repeatedly changes its color in order to minimize the sum of weighted conflicts with its neighbors. We study the number of steps until such a process stabilizes. Our main contribution is an exponential lower bound on stabilization time. We first present a construction showing this bound in the adversarial sequenti…
▽ More
A minority process in a weighted graph is a dynamically changing coloring. Each node repeatedly changes its color in order to minimize the sum of weighted conflicts with its neighbors. We study the number of steps until such a process stabilizes. Our main contribution is an exponential lower bound on stabilization time. We first present a construction showing this bound in the adversarial sequential model, and then we show how to extend the construction to establish the same bound in the benevolent sequential model, as well as in any reasonable concurrent model. Furthermore, we show that the stabilization time of our construction remains exponential even for very strict switching conditions, namely, if a node only changes color when almost all (i.e., any specific fraction) of its neighbors have the same color. Our lower bound works in a wide range of settings, both for node-weighted and edge-weighted graphs, or if we restrict minority processes to the class of sparse graphs.
△ Less
Submitted 4 February, 2019;
originally announced February 2019.
-
Towards Secure and Efficient Payment Channels
Authors:
Georgia Avarikioti,
Felix Laufenberg,
Jakub Sliwinski,
Yuyi Wang,
Roger Wattenhofer
Abstract:
Micropayment channels are the most prominent solution to the limitation on transaction throughput in current blockchain systems. However, in practice channels are risky because participants have to be online constantly to avoid fraud, and inefficient because participants have to open multiple channels and lock funds in them. To address the security issue, we propose a novel mechanism that involves…
▽ More
Micropayment channels are the most prominent solution to the limitation on transaction throughput in current blockchain systems. However, in practice channels are risky because participants have to be online constantly to avoid fraud, and inefficient because participants have to open multiple channels and lock funds in them. To address the security issue, we propose a novel mechanism that involves watchtowers incentivized to watch the channels and reveal a fraud. Our protocol does not require participants to be online constantly watching the blockchain. The protocol is secure, incentive compatible and lightweight in communication. Furthermore, we present an adaptation of our protocol implementable on the Lightning protocol. Towards efficiency, we examine specific topological structures in the blockchain transaction graph and generalize the construction of channels to enable topologies better suited to specific real-world needs. In these cases, our construction reduces the required amount of signatures for a transaction and the total amount of locked funds in the system.
△ Less
Submitted 30 November, 2018;
originally announced November 2018.
-
High Dimensional Clustering with $r$-nets
Authors:
Georgia Avarikioti,
Alain Ryser,
Yuyi Wang,
Roger Wattenhofer
Abstract:
Clustering, a fundamental task in data science and machine learning, groups a set of objects in such a way that objects in the same cluster are closer to each other than to those in other clusters. In this paper, we consider a well-known structure, so-called $r$-nets, which rigorously captures the properties of clustering. We devise algorithms that improve the run-time of approximating $r$-nets in…
▽ More
Clustering, a fundamental task in data science and machine learning, groups a set of objects in such a way that objects in the same cluster are closer to each other than to those in other clusters. In this paper, we consider a well-known structure, so-called $r$-nets, which rigorously captures the properties of clustering. We devise algorithms that improve the run-time of approximating $r$-nets in high-dimensional spaces with $\ell_1$ and $\ell_2$ metrics from $\tilde{O}(dn^{2-Θ(\sqrtε)})$ to $\tilde{O}(dn + n^{2-α})$, where $α= Ω({ε^{1/3}}/{\log(1/ε)})$. These algorithms are also used to improve a framework that provides approximate solutions to other high dimensional distance problems. Using this framework, several important related problems can also be solved efficiently, e.g., $(1+ε)$-approximate $k$th-nearest neighbor distance, $(4+ε)$-approximate Min-Max clustering, $(4+ε)$-approximate $k$-center clustering. In addition, we build an algorithm that $(1+ε)$-approximates greedy permutations in time $\tilde{O}((dn + n^{2-α}) \cdot \logΦ)$ where $Φ$ is the spread of the input. This algorithm is used to $(2+ε)$-approximate $k$-center with the same time complexity.
△ Less
Submitted 6 November, 2018;
originally announced November 2018.
-
Structure and Content of the Visible Darknet
Authors:
Georgia Avarikioti,
Roman Brunner,
Aggelos Kiayias,
Roger Wattenhofer,
Dionysis Zindros
Abstract:
In this paper, we analyze the topology and the content found on the "darknet", the set of websites accessible via Tor. We created a darknet spider and crawled the darknet starting from a bootstrap list by recursively following links. We explored the whole connected component of more than 34,000 hidden services, of which we found 10,000 to be online. Contrary to folklore belief, the visible part of…
▽ More
In this paper, we analyze the topology and the content found on the "darknet", the set of websites accessible via Tor. We created a darknet spider and crawled the darknet starting from a bootstrap list by recursively following links. We explored the whole connected component of more than 34,000 hidden services, of which we found 10,000 to be online. Contrary to folklore belief, the visible part of the darknet is surprisingly well-connected through hub websites such as wikis and forums. We performed a comprehensive categorization of the content using supervised machine learning. We observe that about half of the visible dark web content is related to apparently licit activities based on our classifier. A significant amount of content pertains to software repositories, blogs, and activism-related websites. Among unlawful hidden services, most pertain to fraudulent websites, services selling counterfeit goods, and drug markets.
△ Less
Submitted 7 November, 2018; v1 submitted 4 November, 2018;
originally announced November 2018.
-
Algorithmic Blockchain Channel Design
Authors:
Georgia Avarikioti,
Yuyi Wang,
Roger Wattenhofer
Abstract:
Payment networks, also known as channels, are a most promising solution to the throughput problem of cryptocurrencies. In this paper we study the design of capital-efficient payment networks, offline as well as online variants. We want to know how to compute an efficient payment network topology, how capital should be assigned to the individual edges, and how to decide which transactions to accept…
▽ More
Payment networks, also known as channels, are a most promising solution to the throughput problem of cryptocurrencies. In this paper we study the design of capital-efficient payment networks, offline as well as online variants. We want to know how to compute an efficient payment network topology, how capital should be assigned to the individual edges, and how to decide which transactions to accept. Towards this end, we present a flurry of interesting results, basic but generally applicable insights on the one hand, and hardness results and approximation algorithms on the other hand.
△ Less
Submitted 17 October, 2018;
originally announced October 2018.
-
Payment Network Design with Fees
Authors:
Georgia Avarikioti,
Gerrit Janssen,
Yuyi Wang,
Roger Wattenhofer
Abstract:
Payment channels are the most prominent solution to the blockchain scalability problem. We introduce the problem of network design with fees for payment channels from the perspective of a Payment Service Provider (PSP). Given a set of transactions, we examine the optimal graph structure and fee assignment to maximize the PSP's profit. A customer prefers to route transactions through the PSP's netw…
▽ More
Payment channels are the most prominent solution to the blockchain scalability problem. We introduce the problem of network design with fees for payment channels from the perspective of a Payment Service Provider (PSP). Given a set of transactions, we examine the optimal graph structure and fee assignment to maximize the PSP's profit. A customer prefers to route transactions through the PSP's network if the cheapest path from sender to receiver is financially interesting, i.e., if the path costs less than the blockchain fee. When the graph structure is a tree, and the PSP facilitates all transactions, the problem can be formulated as a linear program. For a path graph, we present a polynomial time algorithm to assign optimal fees. We also show that the star network, where the center is an additional node acting as an intermediary, is a near-optimal solution to the network design problem.
△ Less
Submitted 17 October, 2018;
originally announced October 2018.
-
Using State Predictions for Value Regularization in Curiosity Driven Deep Reinforcement Learning
Authors:
Gino Brunner,
Manuel Fritsche,
Oliver Richter,
Roger Wattenhofer
Abstract:
Learning in sparse reward settings remains a challenge in Reinforcement Learning, which is often addressed by using intrinsic rewards. One promising strategy is inspired by human curiosity, requiring the agent to learn to predict the future. In this paper a curiosity-driven agent is extended to use these predictions directly for training. To achieve this, the agent predicts the value function of t…
▽ More
Learning in sparse reward settings remains a challenge in Reinforcement Learning, which is often addressed by using intrinsic rewards. One promising strategy is inspired by human curiosity, requiring the agent to learn to predict the future. In this paper a curiosity-driven agent is extended to use these predictions directly for training. To achieve this, the agent predicts the value function of the next state at any point in time. Subsequently, the consistency of this prediction with the current value function is measured, which is then used as a regularization term in the loss function of the algorithm. Experiments were made on grid-world environments as well as on a 3D navigation task, both with sparse rewards. In the first case the extended agent is able to learn significantly faster than the baselines.
△ Less
Submitted 30 September, 2018;
originally announced October 2018.
-
The Urban Last Mile Problem: Autonomous Drone Delivery to Your Balcony
Authors:
Gino Brunner,
Bence Szebedy,
Simon Tanner,
Roger Wattenhofer
Abstract:
Drone delivery has been a hot topic in the industry in the past few years. However, existing approaches either focus on rural areas or rely on centralized drop-off locations from where the last mile delivery is performed. In this paper we tackle the problem of autonomous last mile delivery in urban environments using an off-the-shelf drone. We build a prototype system that is able to fly to the ap…
▽ More
Drone delivery has been a hot topic in the industry in the past few years. However, existing approaches either focus on rural areas or rely on centralized drop-off locations from where the last mile delivery is performed. In this paper we tackle the problem of autonomous last mile delivery in urban environments using an off-the-shelf drone. We build a prototype system that is able to fly to the approximate delivery location using GPS and then find the exact drop-off location using visual navigation. The drop-off location could, e.g., be on a balcony or porch, and simply needs to be indicated by a visual marker on the wall or window. We test our system components in simulated environments, including the visual navigation and collision avoidance. Finally, we deploy our drone in a real-world environment and show how it can find the drop-off point on a balcony. To stimulate future research in this topic we open source our code.
△ Less
Submitted 21 September, 2018;
originally announced September 2018.
-
MIDI-VAE: Modeling Dynamics and Instrumentation of Music with Applications to Style Transfer
Authors:
Gino Brunner,
Andres Konrad,
Yuyi Wang,
Roger Wattenhofer
Abstract:
We introduce MIDI-VAE, a neural network model based on Variational Autoencoders that is capable of handling polyphonic music with multiple instrument tracks, as well as modeling the dynamics of music by incorporating note durations and velocities. We show that MIDI-VAE can perform style transfer on symbolic music by automatically changing pitches, dynamics and instruments of a music piece from, e.…
▽ More
We introduce MIDI-VAE, a neural network model based on Variational Autoencoders that is capable of handling polyphonic music with multiple instrument tracks, as well as modeling the dynamics of music by incorporating note durations and velocities. We show that MIDI-VAE can perform style transfer on symbolic music by automatically changing pitches, dynamics and instruments of a music piece from, e.g., a Classical to a Jazz style. We evaluate the efficacy of the style transfer by training separate style validation classifiers. Our model can also interpolate between short pieces of music, produce medleys and create mixtures of entire songs. The interpolations smoothly change pitches, dynamics and instrumentation to create a harmonic bridge between two music pieces. To the best of our knowledge, this work represents the first successful attempt at applying neural style transfer to complete musical compositions.
△ Less
Submitted 20 September, 2018;
originally announced September 2018.
-
Symbolic Music Genre Transfer with CycleGAN
Authors:
Gino Brunner,
Yuyi Wang,
Roger Wattenhofer,
Sumu Zhao
Abstract:
Deep generative models such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs) have recently been applied to style and domain transfer for images, and in the case of VAEs, music. GAN-based models employing several generators and some form of cycle consistency loss have been among the most successful for image domain transfer. In this paper we apply such a model to symbol…
▽ More
Deep generative models such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs) have recently been applied to style and domain transfer for images, and in the case of VAEs, music. GAN-based models employing several generators and some form of cycle consistency loss have been among the most successful for image domain transfer. In this paper we apply such a model to symbolic music and show the feasibility of our approach for music genre transfer. Evaluations using separate genre classifiers show that the style transfer works well. In order to improve the fidelity of the transformed music, we add additional discriminators that cause the generators to keep the structure of the original music mostly intact, while still achieving strong genre transfer. Visual and audible results further show the potential of our approach. To the best of our knowledge, this paper represents the first application of GANs to symbolic music domain transfer.
△ Less
Submitted 20 September, 2018;
originally announced September 2018.
-
Byzantine Preferential Voting
Authors:
Darya Melnyk,
Yuyi Wang,
Roger Wattenhofer
Abstract:
In the Byzantine agreement problem, n nodes with possibly different input values aim to reach agreement on a common value in the presence of t < n/3 Byzantine nodes which represent arbitrary failures in the system. This paper introduces a generalization of Byzantine agreement, where the input values of the nodes are preference rankings of three or more candidates. We show that consensus on prefere…
▽ More
In the Byzantine agreement problem, n nodes with possibly different input values aim to reach agreement on a common value in the presence of t < n/3 Byzantine nodes which represent arbitrary failures in the system. This paper introduces a generalization of Byzantine agreement, where the input values of the nodes are preference rankings of three or more candidates. We show that consensus on preferences, which is an important question in social choice theory, complements already known results from Byzantine agreement. In addition preferential voting raises new questions about how to approximate consensus vectors. We propose a deterministic algorithm to solve Byzantine agreement on rankings under a generalized validity condition, which we call Pareto-Validity. These results are then extended by considering a special voting rule which chooses the Kemeny median as the consensus vector. For this rule, we derive a lower bound on the approximation ratio of the Kemeny median that can be guaranteed by any deterministic algorithm. We then provide an algorithm matching this lower bound. To our knowledge, this is the first non-trivial multi-dimensional approach which can tolerate a constant fraction of Byzantine nodes.
△ Less
Submitted 7 March, 2018;
originally announced March 2018.
-
Reducing Compare-and-Swap to Consensus Number One Primitives
Authors:
Pankaj Khanchandani,
Roger Wattenhofer
Abstract:
The consensus number of an object is the maximum number of processes among which binary consensus can be solved using any number of instances of the object and read-write registers. Herlihy [6] showed in his seminal work that if an object has a consensus number of n, then there is a universal construction for a wait-free and linearizable implementation of any non-trivial concurrent object or data…
▽ More
The consensus number of an object is the maximum number of processes among which binary consensus can be solved using any number of instances of the object and read-write registers. Herlihy [6] showed in his seminal work that if an object has a consensus number of n, then there is a universal construction for a wait-free and linearizable implementation of any non-trivial concurrent object or data structure that is shared among n processes. Thus, a synchronization object such as compare-and-swap with an infinite consensus number and the corresponding instruction can be viewed as "strong". On the other hand, a synchronization object such as fetch-and-add with consensus number two and the corresponding fetch-and-add instruction can be viewed as "weak".
Ellen et al. [2] observed recently that an object supporting two weak instructions can also achieve infinite consensus number like an object that supports one strong instruction. Using Herlihy's universal construction, this implies that ignoring concerns about efficiency, one can design any concurrent data structure or algorithm using only weak instructions. However, is it possible that a combination of weak instructions is really powerful enough to efficiently replace a strong instruction, like compare-and-swap, without incurring a large overhead in time or space? In this paper, we answer this question by giving an O(1) time wait-free and linearizable implementation of a compare-and-swap register shared among n processes using read-write registers and O(1) registers that support two synchronization primitives half-max and max-write, each having consensus number one. Thus, any algorithm that solves some arbitrary synchronization problem using read-write and compare-and-swap registers can be transformed into an algorithm that has the same asymptotic time complexity and only uses consensus number one instructions.
△ Less
Submitted 20 August, 2018; v1 submitted 11 February, 2018;
originally announced February 2018.
-
Natural Language Multitasking: Analyzing and Improving Syntactic Saliency of Hidden Representations
Authors:
Gino Brunner,
Yuyi Wang,
Roger Wattenhofer,
Michael Weigelt
Abstract:
We train multi-task autoencoders on linguistic tasks and analyze the learned hidden sentence representations. The representations change significantly when translation and part-of-speech decoders are added. The more decoders a model employs, the better it clusters sentences according to their syntactic similarity, as the representation space becomes less entangled. We explore the structure of the…
▽ More
We train multi-task autoencoders on linguistic tasks and analyze the learned hidden sentence representations. The representations change significantly when translation and part-of-speech decoders are added. The more decoders a model employs, the better it clusters sentences according to their syntactic similarity, as the representation space becomes less entangled. We explore the structure of the representation space by interpolating between sentences, which yields interesting pseudo-English sentences, many of which have recognizable syntactic structure. Lastly, we point out an interesting property of our models: The difference-vector between two sentences can be added to change a third sentence with similar features in a meaningful way.
△ Less
Submitted 18 January, 2018;
originally announced January 2018.
-
JamBot: Music Theory Aware Chord Based Generation of Polyphonic Music with LSTMs
Authors:
Gino Brunner,
Yuyi Wang,
Roger Wattenhofer,
Jonas Wiesendanger
Abstract:
We propose a novel approach for the generation of polyphonic music based on LSTMs. We generate music in two steps. First, a chord LSTM predicts a chord progression based on a chord embedding. A second LSTM then generates polyphonic music from the predicted chord progression. The generated music sounds pleasing and harmonic, with only few dissonant notes. It has clear long-term structure that is si…
▽ More
We propose a novel approach for the generation of polyphonic music based on LSTMs. We generate music in two steps. First, a chord LSTM predicts a chord progression based on a chord embedding. A second LSTM then generates polyphonic music from the predicted chord progression. The generated music sounds pleasing and harmonic, with only few dissonant notes. It has clear long-term structure that is similar to what a musician would play during a jam session. We show that our approach is sensible from a music theory perspective by evaluating the learned chord embeddings. Surprisingly, our simple model managed to extract the circle of fifths, an important tool in music theory, from the dataset.
△ Less
Submitted 21 November, 2017;
originally announced November 2017.
-
Teaching a Machine to Read Maps with Deep Reinforcement Learning
Authors:
Gino Brunner,
Oliver Richter,
Yuyi Wang,
Roger Wattenhofer
Abstract:
The ability to use a 2D map to navigate a complex 3D environment is quite remarkable, and even difficult for many humans. Localization and navigation is also an important problem in domains such as robotics, and has recently become a focus of the deep reinforcement learning community. In this paper we teach a reinforcement learning agent to read a map in order to find the shortest way out of a ran…
▽ More
The ability to use a 2D map to navigate a complex 3D environment is quite remarkable, and even difficult for many humans. Localization and navigation is also an important problem in domains such as robotics, and has recently become a focus of the deep reinforcement learning community. In this paper we teach a reinforcement learning agent to read a map in order to find the shortest way out of a random maze it has never seen before. Our system combines several state-of-the-art methods such as A3C and incorporates novel elements such as a recurrent localization cell. Our agent learns to localize itself based on 3D first person images and an approximate orientation angle. The agent generalizes well to bigger mazes, showing that it learned useful localization and navigation capabilities.
△ Less
Submitted 20 November, 2017;
originally announced November 2017.
-
Tight Bounds for Asynchronous Collaborative Grid Exploration
Authors:
Sebastian Brandt,
Jara Uitto,
Roger Wattenhofer
Abstract:
Consider a small group of mobile agents whose goal is to locate a certain cell in a two-dimensional infinite grid. The agents operate in an asynchronous environment, where in each discrete time step, an arbitrary subset of the agents execute one atomic look-compute-move cycle. The protocol controlling each agent is determined by a (possibly distinct) finite automaton. The only means of communicati…
▽ More
Consider a small group of mobile agents whose goal is to locate a certain cell in a two-dimensional infinite grid. The agents operate in an asynchronous environment, where in each discrete time step, an arbitrary subset of the agents execute one atomic look-compute-move cycle. The protocol controlling each agent is determined by a (possibly distinct) finite automaton. The only means of communication is to sense the states of the agents sharing the same grid cell. Whenever an agent moves, the destination cell of the movement is chosen by the agent's automaton from the set of neighboring grid cells. We study the minimum number of agents required to locate the target cell within finite time and our main result states a tight lower bound for agents endowed with a global compass. Furthermore, we show that the lack of such a compass makes the problem strictly more difficult and present tight upper and lower bounds for this case.
△ Less
Submitted 7 May, 2018; v1 submitted 10 May, 2017;
originally announced May 2017.
-
Towards Reduced Instruction Sets for Synchronization
Authors:
Rati Gelashvili,
Idit Keidar,
Alexander Spiegelman,
Roger Wattenhofer
Abstract:
Contrary to common belief, a recent work by Ellen, Gelashvili, Shavit, and Zhu has shown that computability does not require multicore architectures to support "strong" synchronization instructions like compare-and-swap, as opposed to combinations of "weaker" instructions like decrement and multiply. However, this is the status quo, and in turn, most efficient concurrent data-structures heavily re…
▽ More
Contrary to common belief, a recent work by Ellen, Gelashvili, Shavit, and Zhu has shown that computability does not require multicore architectures to support "strong" synchronization instructions like compare-and-swap, as opposed to combinations of "weaker" instructions like decrement and multiply. However, this is the status quo, and in turn, most efficient concurrent data-structures heavily rely on compare-and-swap (e.g. for swinging pointers and in general, conflict resolution).
We show that this need not be the case, by designing and implementing a concurrent linearizable Log data-structure (also known as a History object), supporting two operations: append(item), which appends the item to the log, and get-log(), which returns the appended items so far, in order. Readers are wait-free and writers are lock-free, and this data-structure can be used in a lock-free universal construction to implement any concurrent object with a given sequential specification. Our implementation uses atomic read, xor, decrement, and fetch-and-increment instructions supported on X86 architectures, and provides similar performance to a compare-and-swap-based solution on today's hardware. This raises a fundamental question about minimal set of synchronization instructions that the architectures have to support.
△ Less
Submitted 8 May, 2017;
originally announced May 2017.
-
CLEX: Yet Another Supercomputer Architecture?
Authors:
Christoph Lenzen,
Roger Wattenhofer
Abstract:
We propose the CLEX supercomputer topology and routing scheme. We prove that CLEX can utilize a constant fraction of the total bandwidth for point-to-point communication, at delays proportional to the sum of the number of intermediate hops and the maximum physical distance between any two nodes. Moreover, % applying an asymmetric bandwidth assignment to the links, all-to-all communication can be r…
▽ More
We propose the CLEX supercomputer topology and routing scheme. We prove that CLEX can utilize a constant fraction of the total bandwidth for point-to-point communication, at delays proportional to the sum of the number of intermediate hops and the maximum physical distance between any two nodes. Moreover, % applying an asymmetric bandwidth assignment to the links, all-to-all communication can be realized $(1+o(1))$-optimally both with regard to bandwidth and delays. This is achieved at node degrees of $n^{\varepsilon}$, for an arbitrary small constant $\varepsilon\in (0,1]$. In contrast, these results are impossible in any network featuring constant or polylogarithmic node degrees. Through simulation, we assess the benefits of an implementation of the proposed communication strategy. Our results indicate that, for a million processors, CLEX can increase bandwidth utilization and reduce average routing path length by at least factors $10$ respectively $5$ in comparison to a torus network. Furthermore, the CLEX communication scheme features several other properties, such as deadlock-freedom, inherent fault-tolerance, and canonical partition into smaller subsystems.
△ Less
Submitted 1 July, 2016;
originally announced July 2016.