-
An algorithm for geo-distributed and redundant storage in Garage
Authors:
Mendes Oulamara,
Alex Auvolat
Abstract:
This paper presents an optimal algorithm to compute the assignment of data to storage nodes in the Garage geo-distributed storage system. We discuss the complexity of the different steps of the algorithm and metrics that can be displayed to the user.
This paper presents an optimal algorithm to compute the assignment of data to storage nodes in the Garage geo-distributed storage system. We discuss the complexity of the different steps of the algorithm and metrics that can be displayed to the user.
△ Less
Submitted 27 February, 2023;
originally announced February 2023.
-
$\scriptstyle{BASALT}$: A Rock-Solid Foundation for Epidemic Consensus Algorithms in Very Large, Very Open Networks
Authors:
Alex Auvolat,
Yérom-David Bromberg,
Davide Frey,
François Taïani
Abstract:
Recent works have proposed new Byzantine consensus algorithms for blockchains based on epidemics, a design which enables highly scalable performance at a low cost. These methods however critically depend on a secure random peer sampling service: a service that provides a stream of random network nodes where no attacking entity can become over-represented. To ensure this security property, current…
▽ More
Recent works have proposed new Byzantine consensus algorithms for blockchains based on epidemics, a design which enables highly scalable performance at a low cost. These methods however critically depend on a secure random peer sampling service: a service that provides a stream of random network nodes where no attacking entity can become over-represented. To ensure this security property, current epidemic platforms use a Proof-of-Stake system to select peer samples. However such a system limits the openness of the system as only nodes with significant stake can participate in the consensus, leading to an oligopoly situation. Moreover, this design introduces a complex interdependency between the consensus algorithm and the cryptocurrency built upon it. In this paper, we propose a radically different security design for the peer sampling service, based on the distribution of IP addresses to prevent Sybil attacks. We propose a new algorithm, $\scriptstyle{BASALT}$, that implements our design using a stubborn chaotic search to counter attackers' attempts at becoming over-represented. We show in theory and using Monte Carlo simulations that $\scriptstyle{BASALT}$ provides samples which are extremely close to the optimal distribution even in adversarial scenarios such as tentative Eclipse attacks. Live experiments on a production cryptocurrency platform confirm that the samples obtained using $\scriptstyle{BASALT}$ are equitably distributed amongst nodes, allowing for a system which is both open and where no single entity can gain excessive power.
△ Less
Submitted 8 February, 2021;
originally announced February 2021.
-
Money Transfer Made Simple: a Specification, a Generic Algorithm, and its Proof
Authors:
Alex Auvolat,
Davide Frey,
Michel Raynal,
François Taïani
Abstract:
It has recently been shown that, contrarily to a common belief, money transfer in the presence of faulty (Byzantine) processes does not require strong agreement such as consensus. This article goes one step further: namely, it first proposes a non-sequential specification of the money-transfer object, and then presents a generic algorithm based on a simple FIFO order between each pair of processes…
▽ More
It has recently been shown that, contrarily to a common belief, money transfer in the presence of faulty (Byzantine) processes does not require strong agreement such as consensus. This article goes one step further: namely, it first proposes a non-sequential specification of the money-transfer object, and then presents a generic algorithm based on a simple FIFO order between each pair of processes that implements it. The genericity dimension lies in the underlying reliable broadcast abstraction which must be suited to the appropriate failure model. Interestingly, whatever the failure model, the money transfer algorithm only requires adding a single sequence number to its messages as control information. Moreover, as a side effect of the proposed algorithm, it follows that money transfer is a weaker problem than the construction of a safe/regular/atomic read/write register in the asynchronous message-passing crash-prone model.
△ Less
Submitted 17 February, 2021; v1 submitted 18 June, 2020;
originally announced June 2020.
-
Diet Networks: Thin Parameters for Fat Genomics
Authors:
Adriana Romero,
Pierre Luc Carrier,
Akram Erraqabi,
Tristan Sylvain,
Alex Auvolat,
Etienne Dejoie,
Marc-André Legault,
Marie-Pierre Dubé,
Julie G. Hussin,
Yoshua Bengio
Abstract:
Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude larger than the number of training examples, making it difficult to avoid overfitting, even when using the known regularization techniques. We focus here on tasks in which the input is a description of the genetic variation specific to a patient, the single n…
▽ More
Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude larger than the number of training examples, making it difficult to avoid overfitting, even when using the known regularization techniques. We focus here on tasks in which the input is a description of the genetic variation specific to a patient, the single nucleotide polymorphisms (SNPs), yielding millions of ternary inputs. Improving the ability of deep learning to handle such datasets could have an important impact in precision medicine, where high-dimensional data regarding a particular patient is used to make predictions of interest. Even though the amount of data for such tasks is increasing, this mismatch between the number of examples and the number of inputs remains a concern. Naive implementations of classifier neural networks involve a huge number of free parameters in their first layer: each input feature is associated with as many parameters as there are hidden units. We propose a novel neural network parametrization which considerably reduces the number of free parameters. It is based on the idea that we can first learn or provide a distributed representation for each input feature (e.g. for each position in the genome where variations are observed), and then learn (with another neural network called the parameter prediction network) how to map a feature's distributed representation to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units). We show experimentally on a population stratification task of interest to medical studies that the proposed approach can significantly reduce both the number of parameters and the error rate of the classifier.
△ Less
Submitted 16 March, 2017; v1 submitted 28 November, 2016;
originally announced November 2016.
-
TorchCraft: a Library for Machine Learning Research on Real-Time Strategy Games
Authors:
Gabriel Synnaeve,
Nantas Nardelli,
Alex Auvolat,
Soumith Chintala,
Timothée Lacroix,
Zeming Lin,
Florian Richoux,
Nicolas Usunier
Abstract:
We present TorchCraft, a library that enables deep learning research on Real-Time Strategy (RTS) games such as StarCraft: Brood War, by making it easier to control these games from a machine learning framework, here Torch. This white paper argues for using RTS games as a benchmark for AI research, and describes the design and components of TorchCraft.
We present TorchCraft, a library that enables deep learning research on Real-Time Strategy (RTS) games such as StarCraft: Brood War, by making it easier to control these games from a machine learning framework, here Torch. This white paper argues for using RTS games as a benchmark for AI research, and describes the design and components of TorchCraft.
△ Less
Submitted 3 November, 2016; v1 submitted 1 November, 2016;
originally announced November 2016.
-
Artificial Neural Networks Applied to Taxi Destination Prediction
Authors:
Alexandre de Brébisson,
Étienne Simon,
Alex Auvolat,
Pascal Vincent,
Yoshua Bengio
Abstract:
We describe our first-place solution to the ECML/PKDD discovery challenge on taxi destination prediction. The task consisted in predicting the destination of a taxi based on the beginning of its trajectory, represented as a variable-length sequence of GPS points, and diverse associated meta-information, such as the departure time, the driver id and client information. Contrary to most published co…
▽ More
We describe our first-place solution to the ECML/PKDD discovery challenge on taxi destination prediction. The task consisted in predicting the destination of a taxi based on the beginning of its trajectory, represented as a variable-length sequence of GPS points, and diverse associated meta-information, such as the departure time, the driver id and client information. Contrary to most published competitor approaches, we used an almost fully automated approach based on neural networks and we ranked first out of 381 teams. The architectures we tried use multi-layer perceptrons, bidirectional recurrent neural networks and models inspired from recently introduced memory networks. Our approach could easily be adapted to other applications in which the goal is to predict a fixed-length output from a variable-length sequence.
△ Less
Submitted 21 September, 2015; v1 submitted 31 July, 2015;
originally announced August 2015.
-
Clustering is Efficient for Approximate Maximum Inner Product Search
Authors:
Alex Auvolat,
Sarath Chandar,
Pascal Vincent,
Hugo Larochelle,
Yoshua Bengio
Abstract:
Efficient Maximum Inner Product Search (MIPS) is an important task that has a wide applicability in recommendation systems and classification with a large number of classes. Solutions based on locality-sensitive hashing (LSH) as well as tree-based solutions have been investigated in the recent literature, to perform approximate MIPS in sublinear time. In this paper, we compare these to another ext…
▽ More
Efficient Maximum Inner Product Search (MIPS) is an important task that has a wide applicability in recommendation systems and classification with a large number of classes. Solutions based on locality-sensitive hashing (LSH) as well as tree-based solutions have been investigated in the recent literature, to perform approximate MIPS in sublinear time. In this paper, we compare these to another extremely simple approach for solving approximate MIPS, based on variants of the k-means clustering algorithm. Specifically, we propose to train a spherical k-means, after having reduced the MIPS problem to a Maximum Cosine Similarity Search (MCSS). Experiments on two standard recommendation system benchmarks as well as on large vocabulary word embeddings, show that this simple approach yields much higher speedups, for the same retrieval precision, than current state-of-the-art hashing-based and tree-based methods. This simple method also yields more robust retrievals when the query is corrupted by noise.
△ Less
Submitted 29 November, 2015; v1 submitted 21 July, 2015;
originally announced July 2015.