-
Homomorphic WiSARDs: Efficient Weightless Neural Network training over encrypted data
Authors:
Leonardo Neumann,
Antonio Guimarães,
Diego F. Aranha,
Edson Borin
Abstract:
The widespread application of machine learning algorithms is a matter of increasing concern for the data privacy research community, and many have sought to develop privacy-preserving techniques for it. Among existing approaches, the homomorphic evaluation of ML algorithms stands out by performing operations directly over encrypted data, enabling strong guarantees of confidentiality. The homomorph…
▽ More
The widespread application of machine learning algorithms is a matter of increasing concern for the data privacy research community, and many have sought to develop privacy-preserving techniques for it. Among existing approaches, the homomorphic evaluation of ML algorithms stands out by performing operations directly over encrypted data, enabling strong guarantees of confidentiality. The homomorphic evaluation of inference algorithms is practical even for relatively deep Convolution Neural Networks (CNNs). However, training is still a major challenge, with current solutions often resorting to lightweight algorithms that can be unfit for solving more complex problems, such as image recognition. This work introduces the homomorphic evaluation of Wilkie, Stonham, and Aleksander's Recognition Device (WiSARD) and subsequent Weightless Neural Networks (WNNs) for training and inference on encrypted data. Compared to CNNs, WNNs offer better performance with a relatively small accuracy drop. We develop a complete framework for it, including several building blocks that can be of independent interest. Our framework achieves 91.7% accuracy on the MNIST dataset after only 3.5 minutes of encrypted training (multi-threaded), going up to 93.8% in 3.5 hours. For the HAM10000 dataset, we achieve 67.9% accuracy in just 1.5 minutes, going up to 69.9% after 1 hour. Compared to the state of the art on the HE evaluation of CNN training, Glyph (Lou et al., NeurIPS 2020), these results represent a speedup of up to 1200 times with an accuracy loss of at most 5.4%. For HAM10000, we even achieved a 0.65% accuracy improvement while being 60 times faster than Glyph. We also provide solutions for small-scale encrypted training. In a single thread on a desktop machine using less than 200MB of memory, we train over 1000 MNIST images in 12 minutes or over the entire Wisconsin Breast Cancer dataset in just 11 seconds.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
Employing Simulation to Facilitate the Design of Dynamic Code Generators
Authors:
Vanderson Martins do Rosario,
Raphael Zinsly,
Sandro Rigo,
Edson Borin
Abstract:
Dynamic Translation (DT) is a sophisticated technique that allows the implementation of high-performance emulators and high-level-language virtual machines. In this technique, the guest code is compiled dynamically at runtime. Consequently, achieving good performance depends on several design decisions, including the shape of the regions of code being translated. Researchers and engineers explore…
▽ More
Dynamic Translation (DT) is a sophisticated technique that allows the implementation of high-performance emulators and high-level-language virtual machines. In this technique, the guest code is compiled dynamically at runtime. Consequently, achieving good performance depends on several design decisions, including the shape of the regions of code being translated. Researchers and engineers explore these decisions to bring the best performance possible. However, a real DT engine is a very sophisticated piece of software, and modifying one is a hard and demanding task. Hence, we propose using simulation to evaluate the impact of design decisions on dynamic translators and present RAIn, an open-source DT simulator that facilitates the test of DT's design decisions, such as Region Formation Techniques (RFTs). RAIn outputs several statistics that support the analysis of how design decisions may affect the behavior and the performance of a real DT. We validated RAIn running a set of experiments with six well known RFTs (NET, MRET2, LEI, NETPlus, NET-R, and NETPlus-e-r) and showed that it can reproduce well-known results from the literature without the effort of implementing them on a real and complex dynamic translator engine.
△ Less
Submitted 30 August, 2020;
originally announced August 2020.
-
Accelerating Multi-attribute Unsupervised Seismic Facies Analysis With RAPIDS
Authors:
Otávio O. Napoli,
Vanderson Martins do Rosario,
João Paulo Navarro,
Pedro Mário Cruz e Silva,
Edson Borin
Abstract:
Classification of seismic facies is done by clustering seismic data samples based on their attributes. Year after year, 3D datasets used by exploration geophysics increase in size, complexity, and number of attributes, requiring a continuous rise in the classification performance. In this work, we explore the use of Graphics Processing Units (GPUs) to perform the classification of seismic surveys…
▽ More
Classification of seismic facies is done by clustering seismic data samples based on their attributes. Year after year, 3D datasets used by exploration geophysics increase in size, complexity, and number of attributes, requiring a continuous rise in the classification performance. In this work, we explore the use of Graphics Processing Units (GPUs) to perform the classification of seismic surveys using the well-established Machine Learning (ML) method k-means. We show that the high-performance distributed implementation of the k-means algorithm available at the RAPIDS library can be used to classify facies in large seismic datasets much faster than a classical parallel CPU implementation (up to 258-fold faster in NVIDIA V100 GPUs), especially for large seismic blocks. We tested the algorithm with different real seismic volumes, including Netherlands, Parihaka, and Kahu (from 12GB to 66GB).
△ Less
Submitted 17 September, 2020; v1 submitted 29 July, 2020;
originally announced July 2020.
-
Fast and Low-cost Search for Efficient Cloud Configurations for HPC Workloads
Authors:
Vanderson Martins Do Rosario,
Thais A. Silva Camacho,
Otávio O. Napoli,
Edson Borin
Abstract:
The use of cloud computational resources has become increasingly important for companies and researchers to access on-demand and at any moment high-performance resources. However, given the wide variety of virtual machine types, network configurations, number of instances, among others, finding the best configuration that reduces costs and resource waste while achieving acceptable performance is a…
▽ More
The use of cloud computational resources has become increasingly important for companies and researchers to access on-demand and at any moment high-performance resources. However, given the wide variety of virtual machine types, network configurations, number of instances, among others, finding the best configuration that reduces costs and resource waste while achieving acceptable performance is a hard task even for specialists. Thus, many approaches to find these good or optimal configurations for a given program have been proposed. Observing the performance of an application in some configuration takes time and money. Therefore, most of the approaches aim not only to find good solutions but also to reduce the search cost. One approach is the use of Bayesian Optimization to observe the least amount possible of configurations, reducing the search cost while still finding good solutions. Another approach is the use of a technique named Paramount Iteration to make performance assumptions of HPC workloads without entirely executing them (early-stop**), reducing the cost of making one observation, and making it feasible to grid search solutions. In this work, we show that both techniques can be used together to do fewer and low-cost observations. We show that such an approach can recommend Pareto-optimal solutions that are on average 1.68x better than Random Searching and with a 6-time cheaper search.
△ Less
Submitted 27 June, 2020;
originally announced June 2020.
-
Fog Computing on Constrained Devices: Paving the Way for the Future IoT
Authors:
Flavia Pisani,
Fabiola M. C. de Oliveira,
Eduardo S. Gama,
Roger Immich,
Luiz F. Bittencourt,
Edson Borin
Abstract:
In the long term, the Internet of Things (IoT) is expected to become an integral part of people's daily lives. In light of this technological advancement, an ever-growing number of objects with limited hardware may become connected to the Internet. In this chapter, we explore the importance of these constrained devices as well as how we can use them in conjunction with fog computing to change the…
▽ More
In the long term, the Internet of Things (IoT) is expected to become an integral part of people's daily lives. In light of this technological advancement, an ever-growing number of objects with limited hardware may become connected to the Internet. In this chapter, we explore the importance of these constrained devices as well as how we can use them in conjunction with fog computing to change the future of the IoT. First, we present an overview of the concepts of constrained devices, IoT, and fog and mist computing, and then we present a classification of applications according to the amount of resources they require (e.g., processing power and memory). After that, we tie in these topics with a discussion of what can be expected in a future where constrained devices and fog computing are used to push the IoT to new limits. Lastly, we discuss some challenges and opportunities that these technologies may bring.
△ Less
Submitted 4 March, 2020; v1 submitted 12 February, 2020;
originally announced February 2020.
-
Efficiency and Scalability of Multi-Lane Capsule Networks (MLCN)
Authors:
Vanderson M. do Rosario,
Mauricio Breternitz Jr.,
Edson Borin
Abstract:
Some Deep Neural Networks (DNN) have what we call lanes, or they can be reorganized as such. Lanes are paths in the network which are data-independent and typically learn different features or add resilience to the network. Given their data-independence, lanes are amenable for parallel processing. The Multi-lane CapsNet (MLCN) is a proposed reorganization of the Capsule Network which is shown to a…
▽ More
Some Deep Neural Networks (DNN) have what we call lanes, or they can be reorganized as such. Lanes are paths in the network which are data-independent and typically learn different features or add resilience to the network. Given their data-independence, lanes are amenable for parallel processing. The Multi-lane CapsNet (MLCN) is a proposed reorganization of the Capsule Network which is shown to achieve better accuracy while bringing highly-parallel lanes. However, the efficiency and scalability of MLCN had not been systematically examined. In this work, we study the MLCN network with multiple GPUs finding that it is 2x more efficient than the original CapsNet when using model-parallelism. Further, we present the load balancing problem of distributing heterogeneous lanes in homogeneous or heterogeneous accelerators and show that a simple greedy heuristic can be almost 50% faster than a naive random approach.
△ Less
Submitted 11 August, 2019;
originally announced August 2019.
-
The Multi-Lane Capsule Network (MLCN)
Authors:
Vanderson Martins do Rosario,
Edson Borin,
Mauricio Breternitz Jr
Abstract:
We introduce Multi-Lane Capsule Networks (MLCN), which are a separable and resource efficient organization of Capsule Networks (CapsNet) that allows parallel processing, while achieving high accuracy at reduced cost. A MLCN is composed of a number of (distinct) parallel lanes, each contributing to a dimension of the result, trained using the routing-by-agreement organization of CapsNet. Our result…
▽ More
We introduce Multi-Lane Capsule Networks (MLCN), which are a separable and resource efficient organization of Capsule Networks (CapsNet) that allows parallel processing, while achieving high accuracy at reduced cost. A MLCN is composed of a number of (distinct) parallel lanes, each contributing to a dimension of the result, trained using the routing-by-agreement organization of CapsNet. Our results indicate similar accuracy with a much reduced cost in number of parameters for the Fashion-MNIST and Cifar10 datsets. They also indicate that the MLCN outperforms the original CapsNet when using a proposed novel configuration for the lanes. MLCN also has faster training and inference times, being more than two-fold faster than the original CapsNet in the same accelerator.
△ Less
Submitted 22 February, 2019;
originally announced February 2019.