-
Balancing Performance and Energy Consumption of Bagging Ensembles for the Classification of Data Streams in Edge Computing
Authors:
Guilherme Cassales,
Heitor Gomes,
Albert Bifet,
Bernhard Pfahringer,
Hermes Senger
Abstract:
In recent years, the Edge Computing (EC) paradigm has emerged as an enabling factor for develo** technologies like the Internet of Things (IoT) and 5G networks, bridging the gap between Cloud Computing services and end-users, supporting low latency, mobility, and location awareness to delay-sensitive applications. Most solutions in EC employ machine learning (ML) methods to perform data classifi…
▽ More
In recent years, the Edge Computing (EC) paradigm has emerged as an enabling factor for develo** technologies like the Internet of Things (IoT) and 5G networks, bridging the gap between Cloud Computing services and end-users, supporting low latency, mobility, and location awareness to delay-sensitive applications. Most solutions in EC employ machine learning (ML) methods to perform data classification and other information processing tasks on continuous and evolving data streams. Usually, such solutions have to cope with vast amounts of data that come as data streams while balancing energy consumption, latency, and the predictive performance of the algorithms. Ensemble methods achieve remarkable predictive performance when applied to evolving data streams due to the combination of several models and the possibility of selective resets. This work investigates strategies for optimizing the performance (i.e., delay, throughput) and energy consumption of bagging ensembles to classify data streams. The experimental evaluation involved six state-of-art ensemble algorithms (OzaBag, OzaBag Adaptive Size Hoeffding Tree, Online Bagging ADWIN, Leveraging Bagging, Adaptive RandomForest, and Streaming Random Patches) applying five widely used machine learning benchmark datasets with varied characteristics on three computer platforms. Such strategies can significantly reduce energy consumption in 96% of the experimental scenarios evaluated. Despite the trade-offs, it is possible to balance them to avoid significant loss in predictive performance.
△ Less
Submitted 16 January, 2022;
originally announced January 2022.
-
${\tt simwave}$ -- A Finite Difference Simulator for Acoustic Waves Propagation
Authors:
Jaime Freire de Souza,
João Baptista Dias Moreira,
Keith Jared Roberts,
Roussian di Ramos Alves Gaioso,
Edson Satoshi Gomi,
Emílio Carlos Nelli Silva,
Hermes Senger
Abstract:
${\tt simwave}$ is an open-source Python package to perform wave simulations in 2D or 3D domains. It solves the constant and variable density acoustic wave equation with the finite difference method and has support for domain truncation techniques, several boundary conditions, and the modeling of sources and receivers given a user-defined acquisition geometry. The architecture of ${\tt simwave}$ i…
▽ More
${\tt simwave}$ is an open-source Python package to perform wave simulations in 2D or 3D domains. It solves the constant and variable density acoustic wave equation with the finite difference method and has support for domain truncation techniques, several boundary conditions, and the modeling of sources and receivers given a user-defined acquisition geometry. The architecture of ${\tt simwave}$ is designed for applications with geophysical exploration in mind. Its Python front-end enables straightforward integration with many existing Python scientific libraries for the composition of more complex workflows and applications (e.g., migration and inversion problems). The back-end is implemented in C enabling performance portability across a range of computing hardware and compilers including both CPUs and GPUs.
△ Less
Submitted 13 January, 2022;
originally announced January 2022.
-
Improving the performance of bagging ensembles for data streams through mini-batching
Authors:
Guilherme Cassales,
Heitor Gomes,
Albert Bifet,
Bernhard Pfahringer,
Hermes Senger
Abstract:
Often, machine learning applications have to cope with dynamic environments where data are collected in the form of continuous data streams with potentially infinite length and transient behavior. Compared to traditional (batch) data mining, stream processing algorithms have additional requirements regarding computational resources and adaptability to data evolution. They must process instances in…
▽ More
Often, machine learning applications have to cope with dynamic environments where data are collected in the form of continuous data streams with potentially infinite length and transient behavior. Compared to traditional (batch) data mining, stream processing algorithms have additional requirements regarding computational resources and adaptability to data evolution. They must process instances incrementally because the data's continuous flow prohibits storing data for multiple passes. Ensemble learning achieved remarkable predictive performance in this scenario. Implemented as a set of (several) individual classifiers, ensembles are naturally amendable for task parallelism. However, the incremental learning and dynamic data structures used to capture the concept drift increase the cache misses and hinder the benefit of parallelism. This paper proposes a mini-batching strategy that can improve memory access locality and performance of several ensemble algorithms for stream mining in multi-core environments. With the aid of a formal framework, we demonstrate that mini-batching can significantly decrease the reuse distance (and the number of cache misses). Experiments on six different state-of-the-art ensemble algorithms applying four benchmark datasets with varied characteristics show speedups of up to 5X on 8-core processors. These benefits come at the expense of a small reduction in predictive performance.
△ Less
Submitted 17 December, 2021;
originally announced December 2021.
-
Performance of Devito on HPC-Optimised ARM Processors
Authors:
Hermes Senger,
Jaime F. de Souza,
Edson S. Gomi,
Fabio Luporini,
Gerard J. Gorman
Abstract:
We evaluate the performance of Devito, a domain specific language (DSL) for finite differences on Arm ThunderX2 processors. Experiments with two common seismic computational kernels demonstrate that Arm processors can deliver competitive performance compared to other Intel Xeon processors.
We evaluate the performance of Devito, a domain specific language (DSL) for finite differences on Arm ThunderX2 processors. Experiments with two common seismic computational kernels demonstrate that Arm processors can deliver competitive performance compared to other Intel Xeon processors.
△ Less
Submitted 19 August, 2019; v1 submitted 9 August, 2019;
originally announced August 2019.
-
Modelling Energy Consumption based on Resource Utilization
Authors:
Lucas Venezian Povoa,
Cesar Marcondes,
Hermes Senger
Abstract:
Power management is an expensive and important issue for large computational infrastructures such as datacenters, large clusters, and computational grids. However, measuring energy consumption of scalable systems may be impractical due to both cost and complexity for deploying power metering devices on a large number of machines. In this paper, we propose the use of information about resource util…
▽ More
Power management is an expensive and important issue for large computational infrastructures such as datacenters, large clusters, and computational grids. However, measuring energy consumption of scalable systems may be impractical due to both cost and complexity for deploying power metering devices on a large number of machines. In this paper, we propose the use of information about resource utilization (e.g. processor, memory, disk operations, and network traffic) as proxies for estimating power consumption. We employ machine learning techniques to estimate power consumption using such information which are provided by common operating systems. Experiments with linear regression, regression tree, and multilayer perceptron on data from different hardware resulted into a model with 99.94\% of accuracy and 6.32 watts of error in the best case.
△ Less
Submitted 15 September, 2017;
originally announced September 2017.