Search | arXiv e-print repository

doi 10.1016/j.micpro.2011.08.003

Simulation of high-performance memory allocators

Authors: José L. Risco-Martín, J. Manuel Colmenar, David Atienza, J. Ignacio Hidalgo

Abstract: For the last thirty years, a large variety of memory allocators have been proposed. Since performance, memory usage and energy consumption of each memory allocator differs, software engineers often face difficult choices in selecting the most suitable approach for their applications. To this end, custom allocators are developed from scratch, which is a difficult and error-prone process. This issue… ▽ More For the last thirty years, a large variety of memory allocators have been proposed. Since performance, memory usage and energy consumption of each memory allocator differs, software engineers often face difficult choices in selecting the most suitable approach for their applications. To this end, custom allocators are developed from scratch, which is a difficult and error-prone process. This issue has special impact in the field of portable consumer embedded systems, that must execute a limited amount of multimedia applications, demanding high performance and extensive memory usage at a low energy consumption. This paper presents a flexible and efficient simulator to study Dynamic Memory Managers (DMMs), a composition of one or more memory allocators. This novel approach allows programmers to simulate custom and general DMMs, which can be composed without incurring any additional runtime overhead or additional programming cost. We show that this infrastructure simplifies DMM construction, mainly because the target application does not need to be compiled every time a new DMM must be evaluated and because we propose a structured method to search and build DMMs in an object-oriented fashion. Within a search procedure, the system designer can choose the "best" allocator by simulation for a particular target application and embedded system. In our evaluation, we show that our scheme delivers better performance, less memory usage and less energy consumption than single memory allocators. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2403.04414

Journal ref: Microprocessors and Microsystems, 35(8), pp. 755-765, 2011

arXiv:2406.14263 [pdf, other]

Scalable and RISC-V Programmable Near-Memory Computing Architectures for Edge Nodes

Authors: Michele Caon, Clément Choné, Pasquale Davide Schiavone, Alexandre Levisse, Guido Masera, Maurizio Martina, David Atienza

Abstract: The widespread adoption of data-centric algorithms, particularly Artificial Intelligence (AI) and Machine Learning (ML), has exposed the limitations of centralized processing infrastructures, driving a shift towards edge computing. This necessitates stringent constraints on energy efficiency, which traditional von Neumann architectures struggle to meet. The Compute-In-Memory (CIM) paradigm has eme… ▽ More The widespread adoption of data-centric algorithms, particularly Artificial Intelligence (AI) and Machine Learning (ML), has exposed the limitations of centralized processing infrastructures, driving a shift towards edge computing. This necessitates stringent constraints on energy efficiency, which traditional von Neumann architectures struggle to meet. The Compute-In-Memory (CIM) paradigm has emerged as a superior candidate due to its efficient exploitation of available memory bandwidth. However, existing CIM solutions require high implementation effort and lack flexibility from a software integration standpoint. This work proposes a novel, software-friendly, general-purpose, and low-integration-effort Near-Memory Computing (NMC) approach, paving the way for the adoption of CIM-based systems in the next generation of edge computing nodes. Two architectural variants, NM-Caesar and NM-Carus, are proposed and characterized to target different trade-offs in area efficiency, performance, and flexibility, covering a wide range of embedded microcontrollers. Post-layout simulations show up to $25.8\times$ and $50.0\times$ lower execution time and $23.2\times$ and $33.1\times$ higher energy efficiency at the system level, respectively, compared to executing the same tasks on a state-of-the-art RISC-V CPU (RV32IMC). NM-Carus achieves a peak energy efficiency of $306.7$ GOPS/W in 8-bit matrix multiplications, surpassing recent state-of-the-art in- and near-memory circuits. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 14 pages, 12 figures, submitted to IEEE Transactions on Emerging Topics in Computing

arXiv:2406.03886 [pdf, other]

BiomedBench: A benchmark suite of TinyML biomedical applications for low-power wearables

Authors: Dimitrios Samakovlis, Stefano Albini, Rubén Rodríguez Álvarez, Denisa-Andreea Constantinescu, Pasquale Davide Schiavone, Miguel Peón Quirós, David Atienza

Abstract: The design of low-power wearables for the biomedical domain has received a lot of attention in recent decades, as technological advances in chip manufacturing have allowed real-time monitoring of patients using low-complexity ML within the mW range. Despite advances in application and hardware design research, the domain lacks a systematic approach to hardware evaluation. In this work, we propose… ▽ More The design of low-power wearables for the biomedical domain has received a lot of attention in recent decades, as technological advances in chip manufacturing have allowed real-time monitoring of patients using low-complexity ML within the mW range. Despite advances in application and hardware design research, the domain lacks a systematic approach to hardware evaluation. In this work, we propose BiomedBench, a new benchmark suite composed of complete end-to-end TinyML biomedical applications for real-time monitoring of patients using wearable devices. Each application presents different requirements during typical signal acquisition and processing phases, including varying computational workloads and relations between active and idle times. Furthermore, our evaluation of five state-of-the-art low-power platforms in terms of energy efficiency shows that modern platforms cannot effectively target all types of biomedical applications. BiomedBench will be released as an open-source suite to enable future improvements in the entire domain of bioengineering systems and TinyML application design. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 7 pages, 5 figures. Sumbitted to Design & Test Special Issue TinyML

arXiv:2406.01529 [pdf, other]

How to Count Coughs: An Event-Based Framework for Evaluating Automatic Cough Detection Algorithm Performance

Authors: Lara Orlandic, Jonathan Dan, Jerome Thevenot, Tomas Teijeiro, Alain Sauty, David Atienza

Abstract: Chronic cough disorders are widespread and challenging to assess because they rely on subjective patient questionnaires about cough frequency. Wearable devices running Machine Learning (ML) algorithms are promising for quantifying daily coughs, providing clinicians with objective metrics to track symptoms and evaluate treatments. However, there is a mismatch between state-of-the-art metrics for co… ▽ More Chronic cough disorders are widespread and challenging to assess because they rely on subjective patient questionnaires about cough frequency. Wearable devices running Machine Learning (ML) algorithms are promising for quantifying daily coughs, providing clinicians with objective metrics to track symptoms and evaluate treatments. However, there is a mismatch between state-of-the-art metrics for cough counting algorithms and the information relevant to clinicians. Most works focus on distinguishing cough from non-cough samples, which does not directly provide clinically relevant outcomes such as the number of cough events or their temporal patterns. In addition, typical metrics such as specificity and accuracy can be biased by class imbalance. We propose using event-based evaluation metrics aligned with clinical guidelines on significant cough counting endpoints. We use an ML classifier to illustrate the shortcomings of traditional sample-based accuracy measurements, highlighting their variance due to dataset class imbalance and sample window length. We also present an open-source event-based evaluation framework to test algorithm performance in identifying cough events and rejecting false positives. We provide examples and best practice guidelines in event-based cough counting as a necessary first step to assess algorithm performance with clinical relevance. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.15085 [pdf, other]

Acoustical Features as Knee Health Biomarkers: A Critical Analysis

Authors: Christodoulos Kechris, Jerome Thevenot, Tomas Teijeiro, Vincent A. Stadelmann, Nicola A. Maffiuletti, David Atienza

Abstract: Acoustical knee health assessment has long promised an alternative to clinically available medical imaging tools, but this modality has yet to be adopted in medical practice. The field is currently led by machine learning models processing acoustical features, which have presented promising diagnostic performances. However, these methods overlook the intricate multi-source nature of audio signals… ▽ More Acoustical knee health assessment has long promised an alternative to clinically available medical imaging tools, but this modality has yet to be adopted in medical practice. The field is currently led by machine learning models processing acoustical features, which have presented promising diagnostic performances. However, these methods overlook the intricate multi-source nature of audio signals and the underlying mechanisms at play. By addressing this critical gap, the present paper introduces a novel causal framework for validating knee acoustical features. We argue that current machine learning methodologies for acoustical knee diagnosis lack the required assurances and thus cannot be used to classify acoustic features as biomarkers. Our framework establishes a set of essential theoretical guarantees necessary to validate this claim. We apply our methodology to three real-world experiments investigating the effect of researchers' expectations, the experimental protocol and the wearable employed sensor. This investigation reveals latent issues such as underlying shortcut learning and performance inflation. This study is the first independent result reproduction study in the field of acoustical knee health evaluation. We conclude with actionable insights from our findings, offering valuable guidance to navigate these crucial limitations in future research. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.09559 [pdf, other]

KID-PPG: Knowledge Informed Deep Learning for Extracting Heart Rate from a Smartwatch

Authors: Christodoulos Kechris, Jonathan Dan, Jose Miranda, David Atienza

Abstract: Accurate extraction of heart rate from photoplethysmography (PPG) signals remains challenging due to motion artifacts and signal degradation. Although deep learning methods trained as a data-driven inference problem offer promising solutions, they often underutilize existing knowledge from the medical and signal processing community. In this paper, we address three shortcomings of deep learning mo… ▽ More Accurate extraction of heart rate from photoplethysmography (PPG) signals remains challenging due to motion artifacts and signal degradation. Although deep learning methods trained as a data-driven inference problem offer promising solutions, they often underutilize existing knowledge from the medical and signal processing community. In this paper, we address three shortcomings of deep learning models: motion artifact removal, degradation assessment, and physiologically plausible analysis of the PPG signal. We propose KID-PPG, a knowledge-informed deep learning model that integrates expert knowledge through adaptive linear filtering, deep probabilistic inference, and data augmentation. We evaluate KID-PPG on the PPGDalia dataset, achieving an average mean absolute error of 2.85 beats per minute, surpassing existing reproducible methods. Our results demonstrate a significant performance improvement in heart rate tracking through the incorporation of prior knowledge into deep learning models. This approach shows promise in enhancing various biomedical applications by incorporating existing expert knowledge in deep learning models. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.12503 [pdf, other]

STRELA: STReaming ELAstic CGRA Accelerator for Embedded Systems

Authors: Daniel Vazquez, Jose Miranda, Alfonso Rodriguez, Andres Otero, Pascuale Davide Schiavone, David Atienza

Abstract: Reconfigurable computing offers a good balance between flexibility and energy efficiency. When combined with software-programmable devices such as CPUs, it is possible to obtain higher performance by spatially distributing the parallelizable sections of an application throughout the reconfigurable device while the CPU is in charge of control-intensive sections. This work introduces an elastic Coar… ▽ More Reconfigurable computing offers a good balance between flexibility and energy efficiency. When combined with software-programmable devices such as CPUs, it is possible to obtain higher performance by spatially distributing the parallelizable sections of an application throughout the reconfigurable device while the CPU is in charge of control-intensive sections. This work introduces an elastic Coarse-Grained Reconfigurable Architecture (CGRA) integrated into an energy-efficient RISC-V-based SoC designed for the embedded domain. The microarchitecture of CGRA supports conditionals and irregular loops, making it adaptable to domain-specific applications. Additionally, we propose specific map** strategies that enable the efficient utilization of the CGRA for both simple applications, where the fabric is only reconfigured once (one-shot kernel), and more complex ones, where it is necessary to reconfigure the CGRA multiple times to complete them (multi-shot kernels). Large kernels also benefit from the independent memory nodes incorporated to streamline data accesses. Due to the integration of CGRA as an accelerator of the RISC-V processor enables a versatile and efficient framework, providing adaptability, processing capacity, and overall performance across various applications. The design has been implemented in TSMC 65 nm, achieving a maximum frequency of 250 MHz. It achieves a peak performance of 1.22 GOPs computing one-shot kernels and 1.17 GOPs computing multi-shot kernels. The best energy efficiency is 72.68 MOPs/mW for one-shot kernels and 115.96 MOPs/mW for multi-shot kernels. The design integrates power and clock-gating techniques to tailor the architecture to the embedded domain while maintaining performance. The best speed-ups are 17.63x and 18.61x for one-shot and multi-shot kernels. The best energy savings in the SoC are 9.05x and 11.10x for one-shot and multi-shot kernels. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 14 pages, 11 figures

arXiv:2403.01236 [pdf, other]

Performance evaluation of acceleration of convolutional layers on OpenEdgeCGRA

Authors: Nicolò Carpentieri, Juan Sapriza, Davide Schiavone, Daniele Jahier Pagliari, David Atienza, Maurizio Martina, Alessio Burrello

Abstract: Recently, efficiently deploying deep learning solutions on the edge has received increasing attention. New platforms are emerging to support the increasing demand for flexibility and high performance. In this work, we explore the efficient map** of convolutional layers on an open-hardware, low-power Coarse-Grain Reconfigurable Array (CGRA), namely OpenEdgeCGRA. We explore both direct implementat… ▽ More Recently, efficiently deploying deep learning solutions on the edge has received increasing attention. New platforms are emerging to support the increasing demand for flexibility and high performance. In this work, we explore the efficient map** of convolutional layers on an open-hardware, low-power Coarse-Grain Reconfigurable Array (CGRA), namely OpenEdgeCGRA. We explore both direct implementations of convolution and solutions that transform it into a matrix multiplication through an Im2col transformation, and experiment with various tensor parallelism axes. We show that for this hardware target, direct convolution, coupled with weight parallelism reaches the best latency and energy efficiency, outperforming a CPU implementation by 3.4x and 9.9x in terms of energy and latency, respectively. △ Less

Submitted 2 March, 2024; originally announced March 2024.

arXiv:2402.13005 [pdf, other]

SzCORE: A Seizure Community Open-source Research Evaluation framework for the validation of EEG-based automated seizure detection algorithms

Authors: Jonathan Dan, Una Pale, Alireza Amirshahi, William Cappelletti, Thorir Mar Ingolfsson, Xiaying Wang, Andrea Cossettini, Adriano Bernini, Luca Benini, Sándor Beniczky, David Atienza, Philippe Ryvlin

Abstract: The need for high-quality automated seizure detection algorithms based on electroencephalography (EEG) becomes ever more pressing with the increasing use of ambulatory and long-term EEG monitoring. Heterogeneity in validation methods of these algorithms influences the reported results and makes comprehensive evaluation and comparison challenging. This heterogeneity concerns in particular the choic… ▽ More The need for high-quality automated seizure detection algorithms based on electroencephalography (EEG) becomes ever more pressing with the increasing use of ambulatory and long-term EEG monitoring. Heterogeneity in validation methods of these algorithms influences the reported results and makes comprehensive evaluation and comparison challenging. This heterogeneity concerns in particular the choice of datasets, evaluation methodologies, and performance metrics. In this paper, we propose a unified framework designed to establish standardization in the validation of EEG-based seizure detection algorithms. Based on existing guidelines and recommendations, the framework introduces a set of recommendations and standards related to datasets, file formats, EEG data input content, seizure annotation input and output, cross-validation strategies, and performance metrics. We also propose the 10-20 seizure detection benchmark, a machine-learning benchmark based on public datasets converted to a standardized format. This benchmark defines the machine-learning task as well as reporting metrics. We illustrate the use of the benchmark by evaluating a set of existing seizure detection algorithms. The SzCORE (Seizure Community Open-source Research Evaluation) framework and benchmark are made publicly available along with an open-source software library to facilitate research use, while enabling rigorous evaluation of the clinical significance of the algorithms, fostering a collective effort to more optimally detect seizures to improve the lives of people with epilepsy. △ Less

Submitted 8 March, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.12834 [pdf, other]

SAT-based Exact Modulo Scheduling Map** for Resource-Constrained CGRAs

Authors: Cristian Tirelli, Juan Sapriza, Rubén Rodríguez Álvarez, Lorenzo Ferretti, Benoît Denkinger, Giovanni Ansaloni, José Miranda Calero, David Atienza, Laura Pozzi

Abstract: Coarse-Grain Reconfigurable Arrays (CGRAs) represent emerging low-power architectures designed to accelerate Compute-Intensive Loops (CILs). The effectiveness of CGRAs in providing acceleration relies on the quality of map**: how efficiently the CIL is compiled onto the platform. State of the Art (SoA) compilation techniques utilize modulo scheduling to minimize the Iteration Interval (II) and u… ▽ More Coarse-Grain Reconfigurable Arrays (CGRAs) represent emerging low-power architectures designed to accelerate Compute-Intensive Loops (CILs). The effectiveness of CGRAs in providing acceleration relies on the quality of map**: how efficiently the CIL is compiled onto the platform. State of the Art (SoA) compilation techniques utilize modulo scheduling to minimize the Iteration Interval (II) and use graph algorithms like Max-Clique Enumeration to address map** challenges. Our work approaches the map** problem through a satisfiability (SAT) formulation. We introduce the Kernel Mobility Schedule (KMS), an ad-hoc schedule used with the Data Flow Graph and CGRA architectural information to generate Boolean statements that, when satisfied, yield a valid map**. Experimental results demonstrate SAT-MapIt outperforming SoA alternatives in almost 50\% of explored benchmarks. Additionally, we evaluated the map** results in a synthesizable CGRA design and emphasized the run-time metrics trends, i.e. energy efficiency and latency, across different CILs and CGRA sizes. We show that a hardware-agnostic analysis performed on compiler-level metrics can optimally prune the architectural design space, while still retaining Pareto-optimal configurations. Moreover, by exploring how implementation details impact cost and performance on real hardware, we highlight the importance of holistic software-to-hardware map** flows, as the one presented herein. △ Less

Submitted 29 May, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.06367 [pdf, other]

doi 10.1016/j.artmed.2024.102903

TEE4EHR: Transformer Event Encoder for Better Representation Learning in Electronic Health Records

Authors: Hojjat Karami, David Atienza, Anisoara Ionescu

Abstract: Irregular sampling of time series in electronic health records (EHRs) is one of the main challenges for develo** machine learning models. Additionally, the pattern of missing data in certain clinical variables is not at random but depends on the decisions of clinicians and the state of the patient. Point process is a mathematical framework for analyzing event sequence data that is consistent wit… ▽ More Irregular sampling of time series in electronic health records (EHRs) is one of the main challenges for develo** machine learning models. Additionally, the pattern of missing data in certain clinical variables is not at random but depends on the decisions of clinicians and the state of the patient. Point process is a mathematical framework for analyzing event sequence data that is consistent with irregular sampling patterns. Our model, TEE4EHR, is a transformer event encoder (TEE) with point process loss that encodes the pattern of laboratory tests in EHRs. The utility of our TEE has been investigated in a variety of benchmark event sequence datasets. Additionally, we conduct experiments on two real-world EHR databases to provide a more comprehensive evaluation of our model. Firstly, in a self-supervised learning approach, the TEE is jointly learned with an existing attention-based deep neural network which gives superior performance in negative log-likelihood and future event prediction. Besides, we propose an algorithm for aggregating attention weights that can reveal the interaction between the events. Secondly, we transfer and freeze the learned TEE to the downstream task for the outcome prediction, where it outperforms state-of-the-art models for handling irregularly sampled time series. Furthermore, our results demonstrate that our approach can improve representation learning in EHRs and can be useful for clinical prediction tasks. △ Less

Submitted 9 February, 2024; originally announced February 2024.

arXiv:2402.06318 [pdf, other]

TimEHR: Image-based Time Series Generation for Electronic Health Records

Authors: Hojjat Karami, Mary-Anne Hartley, David Atienza, Anisoara Ionescu

Abstract: Time series in Electronic Health Records (EHRs) present unique challenges for generative models, such as irregular sampling, missing values, and high dimensionality. In this paper, we propose a novel generative adversarial network (GAN) model, TimEHR, to generate time series data from EHRs. In particular, TimEHR treats time series as images and is based on two conditional GANs. The first GAN gener… ▽ More Time series in Electronic Health Records (EHRs) present unique challenges for generative models, such as irregular sampling, missing values, and high dimensionality. In this paper, we propose a novel generative adversarial network (GAN) model, TimEHR, to generate time series data from EHRs. In particular, TimEHR treats time series as images and is based on two conditional GANs. The first GAN generates missingness patterns, and the second GAN generates time series values based on the missingness pattern. Experimental results on three real-world EHR datasets show that TimEHR outperforms state-of-the-art methods in terms of fidelity, utility, and privacy metrics. △ Less

Submitted 9 February, 2024; originally announced February 2024.

arXiv:2401.09420 [pdf, other]

LionHeart: A Layer-based Map** Framework for Heterogeneous Systems with Analog In-Memory Computing Tiles

Authors: Corey Lammie, Flavio Ponzina, Yuxuan Wang, Joshua Klein, Marina Zapater, Irem Boybat, Abu Sebastian, Giovanni Ansaloni, David Atienza

Abstract: When arranged in a crossbar configuration, resistive memory devices can be used to execute MVM, the most dominant operation of many ML algorithms, in constant time complexity. Nonetheless, when performing computations in the analog domain, novel challenges are introduced in terms of arithmetic precision and stochasticity, due to non-ideal circuit and device behaviour. Moreover, these non-idealitie… ▽ More When arranged in a crossbar configuration, resistive memory devices can be used to execute MVM, the most dominant operation of many ML algorithms, in constant time complexity. Nonetheless, when performing computations in the analog domain, novel challenges are introduced in terms of arithmetic precision and stochasticity, due to non-ideal circuit and device behaviour. Moreover, these non-idealities have a temporal dimension, resulting in a degrading application accuracy over time. Facing these challenges, we propose a novel framework, named LionHeart, to obtain hybrid analog-digital map**s to execute DL inference workloads using heterogeneous accelerators. The accuracy-constrained map**s derived by LionHeart showcase, across different DNNs and datasets, high accuracy and potential for speedup. The results of the full system simulations highlight run-time reductions and energy efficiency gains that exceed 6X, with a user-defined accuracy threshold with respect to a fully digital floating point implementation. △ Less

Submitted 17 January, 2024; originally announced January 2024.

arXiv:2401.05548 [pdf, other]

X-HEEP: An Open-Source, Configurable and Extendible RISC-V Microcontroller for the Exploration of Ultra-Low-Power Edge Accelerators

Authors: Simone Machetti, Pasquale Davide Schiavone, Thomas Christoph Müller, Miguel Peón-Quirós, David Atienza

Abstract: The field of edge computing has witnessed remarkable growth owing to the increasing demand for real-time processing of data in applications. However, challenges persist due to limitations in performance and power consumption. To overcome these challenges, heterogeneous architectures have emerged that combine host processors with specialized accelerators tailored to specific applications, leading t… ▽ More The field of edge computing has witnessed remarkable growth owing to the increasing demand for real-time processing of data in applications. However, challenges persist due to limitations in performance and power consumption. To overcome these challenges, heterogeneous architectures have emerged that combine host processors with specialized accelerators tailored to specific applications, leading to improved performance and reduced power consumption. However, most of the existing platforms lack the necessary configurability and extendability options for integrating custom accelerators. To overcome these limitations, we introduce in this paper the eXtendible Heterogeneous Energy-Efficient Platform (X-HEEP). X-HEEP is an open-source platform designed to natively support the integration of ultra-low-power edge accelerators. It provides customization options to match specific application requirements by exploring various core types, bus topologies, addressing modes, memory sizes, and peripherals. Moreover, the platform prioritizes energy efficiency by implementing low-power strategies, such as clock-gating and power-gating. We demonstrate the real-world applicability of X-HEEP by providing an integration example tailored for healthcare applications that includes a coarse-grained reconfigurable array (CGRA) and in-memory computing (IMC) accelerators. The resulting design, called HEEPocrates, has been implemented both in field programmable gate array (FPGA) on the Xilinx Zynq-7020 chip and in silicon with TSMC 65nm low-power CMOS technology. We run a set of healthcare applications and measure their energy consumption to demonstrate the alignment of our chip with other state-of-the-art microcontrollers commonly adopted in this domain. Moreover, we present the energy benefits of 4.9x and 4.8x gained by exploiting the integrated CGRA and IMC accelerators compared to running on the host CPU. △ Less

Submitted 8 March, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

arXiv:2312.13000 [pdf, other]

Accelerator-driven Data Arrangement to Minimize Transformers Run-time on Multi-core Architectures

Authors: Alireza Amirshahi, Giovanni Ansaloni, David Atienza

Abstract: The increasing complexity of transformer models in artificial intelligence expands their computational costs, memory usage, and energy consumption. Hardware acceleration tackles the ensuing challenges by designing processors and accelerators tailored for transformer models, supporting their computation hotspots with high efficiency. However, memory bandwidth can hinder improvements in hardware acc… ▽ More The increasing complexity of transformer models in artificial intelligence expands their computational costs, memory usage, and energy consumption. Hardware acceleration tackles the ensuing challenges by designing processors and accelerators tailored for transformer models, supporting their computation hotspots with high efficiency. However, memory bandwidth can hinder improvements in hardware accelerators. Against this backdrop, in this paper we propose a novel memory arrangement strategy, governed by the hardware accelerator's kernel size, which effectively minimizes off-chip data access. This arrangement is particularly beneficial for end-to-end transformer model inference, where most of the computation is based on general matrix multiplication (GEMM) operations. Additionally, we address the overhead of non-GEMM operations in transformer models within the scope of this memory data arrangement. Our study explores the implementation and effectiveness of the proposed accelerator-driven data arrangement approach in both single- and multi-core systems. Our evaluation demonstrates that our approach can achieve up to a 2.8x speed increase when executing inferences employing state-of-the-art transformers. △ Less

Submitted 20 December, 2023; originally announced December 2023.

arXiv:2309.16333 [pdf, other]

CloudProphet: A Machine Learning-Based Performance Prediction for Public Clouds

Authors: Darong Huang, Luis Costero, Ali Pahlevan, Marina Zapater, David Atienza

Abstract: Computing servers have played a key role in develo** and processing emerging compute-intensive applications in recent years. Consolidating multiple virtual machines (VMs) inside one server to run various applications introduces severe competence for limited resources among VMs. Many techniques such as VM scheduling and resource provisioning are proposed to maximize the cost-efficiency of the com… ▽ More Computing servers have played a key role in develo** and processing emerging compute-intensive applications in recent years. Consolidating multiple virtual machines (VMs) inside one server to run various applications introduces severe competence for limited resources among VMs. Many techniques such as VM scheduling and resource provisioning are proposed to maximize the cost-efficiency of the computing servers while alleviating the performance inference between VMs. However, these management techniques require accurate performance prediction of the application running inside the VM, which is challenging to get in the public cloud due to the black-box nature of the VMs. From this perspective, this paper proposes a novel machine learning-based performance prediction approach for applications running in the cloud. To achieve high accuracy predictions for black-box VMs, the proposed method first identifies the running application inside the virtual machine. It then selects highly-correlated runtime metrics as the input of the machine learning approach to accurately predict the performance level of the cloud application. Experimental results with state-of-the-art cloud benchmarks demonstrate that our proposed method outperforms the existing prediction methods by more than 2x in terms of worst prediction error. In addition, we successfully tackle the challenge in performance prediction for applications with variable workloads by introducing the performance degradation index, which other comparison methods fail to consider. The workflow versatility of the proposed approach has been verified with different modern servers and VM configurations. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Comments: 15 pages, 11 figures, summited to IEEE Transactions on Sustainable Computing

arXiv:2303.18178 [pdf, other]

Robust and IP-Protecting Vertical Federated Learning against Unexpected Quitting of Parties

Authors: **gwei Sun, Zhixu Du, Anna Dai, Saleh Baghersalimi, Alireza Amirshahi, David Atienza, Yiran Chen

Abstract: Vertical federated learning (VFL) enables a service provider (i.e., active party) who owns labeled features to collaborate with passive parties who possess auxiliary features to improve model performance. Existing VFL approaches, however, have two major vulnerabilities when passive parties unexpectedly quit in the deployment phase of VFL - severe performance degradation and intellectual property (… ▽ More Vertical federated learning (VFL) enables a service provider (i.e., active party) who owns labeled features to collaborate with passive parties who possess auxiliary features to improve model performance. Existing VFL approaches, however, have two major vulnerabilities when passive parties unexpectedly quit in the deployment phase of VFL - severe performance degradation and intellectual property (IP) leakage of the active party's labels. In this paper, we propose \textbf{Party-wise Dropout} to improve the VFL model's robustness against the unexpected exit of passive parties and a defense method called \textbf{DIMIP} to protect the active party's IP in the deployment phase. We evaluate our proposed methods on multiple datasets against different inference attacks. The results show that Party-wise Dropout effectively maintains model performance after the passive party quits, and DIMIP successfully disguises label information from the passive party's feature extractor, thereby mitigating IP leakage. △ Less

Submitted 28 March, 2023; originally announced March 2023.

arXiv:2303.14745 [pdf, other]

doi 10.1016/j.artmed.2023.102754

Combining General and Personalized Models for Epilepsy Detection with Hyperdimensional Computing

Authors: Una Pale, Tomas Teijeiro, David Atienza

Abstract: Epilepsy is a chronic neurological disorder with a significant prevalence. However, there is still no adequate technological support to enable epilepsy detection and continuous outpatient monitoring in everyday life. Hyperdimensional (HD) computing is an interesting alternative for wearable devices, characterized by a much simpler learning process and also lower memory requirements. In this work,… ▽ More Epilepsy is a chronic neurological disorder with a significant prevalence. However, there is still no adequate technological support to enable epilepsy detection and continuous outpatient monitoring in everyday life. Hyperdimensional (HD) computing is an interesting alternative for wearable devices, characterized by a much simpler learning process and also lower memory requirements. In this work, we demonstrate a few additional aspects in which HD computing, and the way its models are built and stored, can be used for further understanding, comparing, and creating more advanced machine learning models for epilepsy detection. These possibilities are not feasible with other state-of-the-art models, such as random forests or neural networks. We compare inter-subject similarity of models per different classes (seizure and non-seizure), then study the process of creation of generalized models from personalized ones, and in the end, how to combine personalized and generalized models to create hybrid models. This results in improved epilepsy detection performance. We also tested knowledge transfer between models created on two different datasets. Finally, all those examples could be highly interesting not only from an engineering perspective to create better models for wearables, but also from a neurological perspective to better understand individual epilepsy patterns. △ Less

Submitted 26 March, 2023; originally announced March 2023.

arXiv:2302.10672 [pdf, other]

Importance of methodological choices in data manipulation for validating epileptic seizure detection models

Authors: Una Pale, Tomas Teijeiro, David Atienza

Abstract: Epilepsy is a chronic neurological disorder that affects a significant portion of the human population and imposes serious risks in the daily life of patients. Despite advances in machine learning and IoT, small, nonstigmatizing wearable devices for continuous monitoring and detection in outpatient environments are not yet available. Part of the reason is the complexity of epilepsy itself, includi… ▽ More Epilepsy is a chronic neurological disorder that affects a significant portion of the human population and imposes serious risks in the daily life of patients. Despite advances in machine learning and IoT, small, nonstigmatizing wearable devices for continuous monitoring and detection in outpatient environments are not yet available. Part of the reason is the complexity of epilepsy itself, including highly imbalanced data, multimodal nature, and very subject-specific signatures. However, another problem is the heterogeneity of methodological approaches in research, leading to slower progress, difficulty comparing results, and low reproducibility. Therefore, this article identifies a wide range of methodological decisions that must be made and reported when training and evaluating the performance of epilepsy detection systems. We characterize the influence of individual choices using a typical ensemble random-forest model and the publicly available CHB-MIT database, providing a broader picture of each decision and giving good-practice recommendations, based on our experience, where possible. △ Less

Submitted 21 February, 2023; originally announced February 2023.

arXiv:2212.09358 [pdf, other]

A Soft SIMD Based Energy Efficient Computing Microarchitecture

Authors: Pengbo Yu, Alexandre Levisse, Mohit Gupta, Evenblij Timon, Giovanni Ansaloni, Francky Catthoor, David Atienza

Abstract: The ever-increasing size and computational complexity of today's machine-learning algorithms pose an increasing strain on the underlying hardware. In this light, novel and dedicated architectural solutions are required to optimize energy efficiency by leveraging opportunities (such as intrinsic parallelism and robustness to quantization errors) exposed by algorithms. We herein address this challen… ▽ More The ever-increasing size and computational complexity of today's machine-learning algorithms pose an increasing strain on the underlying hardware. In this light, novel and dedicated architectural solutions are required to optimize energy efficiency by leveraging opportunities (such as intrinsic parallelism and robustness to quantization errors) exposed by algorithms. We herein address this challenge by introducing a flexible two-stages computing pipeline. The pipeline can support fine-grained operand quantization through software-supported Single Instruction Multiple Data (SIMD) operations. Moreover, it can efficiently execute sequential multiplications over SIMD sub-words thanks to zero-skip** and Canonical Signed Digit (CSD) coding. Finally, a lightweight repacking unit allows changing the bitwidth of sub-words at run-time dynamically. These features are implemented within a tight energy and area budget. Indeed, experimental results showcase that our approach greatly outperforms traditional hardware SIMD ones both in terms of area and energy requirements. In particular, our pipeline occupies up to 53.1% smaller than a hardware SIMD one supporting the same sub-word widths, while performing multiplication up to 88.8% more efficiently. △ Less

Submitted 19 December, 2022; originally announced December 2022.

Comments: 6 pages, 10 figures

arXiv:2209.06108 [pdf, other]

Bit-Line Computing for CNN Accelerators Co-Design in Edge AI Inference

Authors: Marco Rios, Flavio Ponzina, Alexandre Levisse, Giovanni Ansaloni, David Atienza

Abstract: By supporting the access of multiple memory words at the same time, Bit-line Computing (BC) architectures allow the parallel execution of bit-wise operations in-memory. At the array periphery, arithmetic operations are then derived with little additional overhead. Such a paradigm opens novel opportunities for Artificial Intelligence (AI) at the edge, thanks to the massive parallelism inherent in m… ▽ More By supporting the access of multiple memory words at the same time, Bit-line Computing (BC) architectures allow the parallel execution of bit-wise operations in-memory. At the array periphery, arithmetic operations are then derived with little additional overhead. Such a paradigm opens novel opportunities for Artificial Intelligence (AI) at the edge, thanks to the massive parallelism inherent in memory arrays and the extreme energy efficiency of computing in-situ, hence avoiding data transfers. Previous works have shown that BC brings disruptive efficiency gains when targeting AI workloads, a key metric in the context of emerging edge AI scenarios. This manuscript builds on these findings by proposing an end-to-end framework that leverages BC-specific optimizations to enable high parallelism and aggressive compression of AI models. Our approach is supported by a novel hardware module performing real-time decoding, as well as new algorithms to enable BC-friendly model compression. Our hardware/software approach results in a 91% energy savings (for a 1% accuracy degradation constraint) regarding state-of-the-art BC computing approaches. △ Less

Submitted 12 September, 2022; originally announced September 2022.

arXiv:2209.04360 [pdf, other]

doi 10.1016/j.cmpb.2023.107743

A Semi-Supervised Algorithm for Improving the Consistency of Crowdsourced Datasets: The COVID-19 Case Study on Respiratory Disorder Classification

Authors: Lara Orlandic, Tomas Teijeiro, David Atienza

Abstract: Cough audio signal classification is a potentially useful tool in screening for respiratory disorders, such as COVID-19. Since it is dangerous to collect data from patients with such contagious diseases, many research teams have turned to crowdsourcing to quickly gather cough sound data, as it was done to generate the COUGHVID dataset. The COUGHVID dataset enlisted expert physicians to diagnose th… ▽ More Cough audio signal classification is a potentially useful tool in screening for respiratory disorders, such as COVID-19. Since it is dangerous to collect data from patients with such contagious diseases, many research teams have turned to crowdsourcing to quickly gather cough sound data, as it was done to generate the COUGHVID dataset. The COUGHVID dataset enlisted expert physicians to diagnose the underlying diseases present in a limited number of uploaded recordings. However, this approach suffers from potential mislabeling of the coughs, as well as notable disagreement between experts. In this work, we use a semi-supervised learning (SSL) approach to improve the labeling consistency of the COUGHVID dataset and the robustness of COVID-19 versus healthy cough sound classification. First, we leverage existing SSL expert knowledge aggregation techniques to overcome the labeling inconsistencies and sparsity in the dataset. Next, our SSL approach is used to identify a subsample of re-labeled COUGHVID audio samples that can be used to train or augment future cough classification models. The consistency of the re-labeled data is demonstrated in that it exhibits a high degree of class separability, 3x higher than that of the user-labeled data, despite the expert label inconsistency present in the original dataset. Furthermore, the spectral differences in the user-labeled audio segments are amplified in the re-labeled data, resulting in significantly different power spectral densities between healthy and COVID-19 coughs, which demonstrates both the increased consistency of the new dataset and its explainability from an acoustic perspective. Finally, we demonstrate how the re-labeled dataset can be used to train a cough classifier. This SSL approach can be used to combine the medical knowledge of several experts to improve the database consistency for any diagnostic classification task. △ Less

Submitted 9 September, 2022; originally announced September 2022.

arXiv:2208.00885 [pdf, other]

Many-to-One Knowledge Distillation of Real-Time Epileptic Seizure Detection for Low-Power Wearable Internet of Things Systems

Authors: Saleh Baghersalimi, Alireza Amirshahi, Farnaz Forooghifar, Tomas Teijeiro, Amir Aminifar, David Atienza

Abstract: Integrating low-power wearable Internet of Things (IoT) systems into routine health monitoring is an ongoing challenge. Recent advances in the computation capabilities of wearables make it possible to target complex scenarios by exploiting multiple biosignals and using high-performance algorithms, such as Deep Neural Networks (DNNs). There is, however, a trade-off between performance of the algori… ▽ More Integrating low-power wearable Internet of Things (IoT) systems into routine health monitoring is an ongoing challenge. Recent advances in the computation capabilities of wearables make it possible to target complex scenarios by exploiting multiple biosignals and using high-performance algorithms, such as Deep Neural Networks (DNNs). There is, however, a trade-off between performance of the algorithms and the low-power requirements of IoT platforms with limited resources. Besides, physically larger and multi-biosignal-based wearables bring significant discomfort to the patients. Consequently, reducing power consumption and discomfort is necessary for patients to use IoT devices continuously during everyday life. To overcome these challenges, in the context of epileptic seizure detection, we propose a many-to-one signals knowledge distillation approach targeting single-biosignal processing in IoT wearable systems. The starting point is to get a highly-accurate multi-biosignal DNN, then apply our approach to develop a single-biosignal DNN solution for IoT systems that achieves an accuracy comparable to the original multi-biosignal DNN. To assess the practicality of our approach to real-life scenarios, we perform a comprehensive simulation experiment analysis on several state-of-the-art edge computing platforms, such as Kendryte K210 and Raspberry Pi Zero. △ Less

Submitted 20 July, 2022; originally announced August 2022.

arXiv:2206.04746 [pdf, other]

doi 10.1145/3508352.3549475

HDTorch: Accelerating Hyperdimensional Computing with GP-GPUs for Design Space Exploration

Authors: William Andrew Simon, Una Pale, Tomas Teijeiro, David Atienza

Abstract: HyperDimensional Computing (HDC) as a machine learning paradigm is highly interesting for applications involving continuous, semi-supervised learning for long-term monitoring. However, its accuracy is not yet on par with other Machine Learning (ML) approaches. Frameworks enabling fast design space exploration to find practical algorithms are necessary to make HD computing competitive with other ML… ▽ More HyperDimensional Computing (HDC) as a machine learning paradigm is highly interesting for applications involving continuous, semi-supervised learning for long-term monitoring. However, its accuracy is not yet on par with other Machine Learning (ML) approaches. Frameworks enabling fast design space exploration to find practical algorithms are necessary to make HD computing competitive with other ML techniques. To this end, we introduce HDTorch, an open-source, PyTorch-based HDC library with CUDA extensions for hypervector operations. We demonstrate HDTorch's utility by analyzing four HDC benchmark datasets in terms of accuracy, runtime, and memory consumption, utilizing both classical and online HD training methodologies. We demonstrate average (training)/inference speedups of (111x/68x)/87x for classical/online HD, respectively. Moreover, we analyze the effects of varying hyperparameters on runtime and accuracy. Finally, we demonstrate how HDTorch enables exploration of HDC strategies applied to large, real-world datasets. We perform the first-ever HD training and inference analysis of the entirety of the CHB-MIT EEG epilepsy database. Results show that the typical approach of training on a subset of the data does not necessarily generalize to the entire dataset, an important factor when develo** future HD models for medical wearable devices. △ Less

Submitted 9 June, 2022; originally announced June 2022.

Comments: Submitted to the ICCAD 2022 conference (23.5.2022.)

arXiv:2205.10042 [pdf, other]

doi 10.1109/TC.2022.3230285

ALPINE: Analog In-Memory Acceleration with Tight Processor Integration for Deep Learning

Authors: Joshua Klein, Irem Boybat, Yasir Qureshi, Martino Dazzi, Alexandre Levisse, Giovanni Ansaloni, Marina Zapater, Abu Sebastian, David Atienza

Abstract: Analog in-memory computing (AIMC) cores offers significant performance and energy benefits for neural network inference with respect to digital logic (e.g., CPUs). AIMCs accelerate matrix-vector multiplications, which dominate these applications' run-time. However, AIMC-centric platforms lack the flexibility of general-purpose systems, as they often have hard-coded data flows and can only support… ▽ More Analog in-memory computing (AIMC) cores offers significant performance and energy benefits for neural network inference with respect to digital logic (e.g., CPUs). AIMCs accelerate matrix-vector multiplications, which dominate these applications' run-time. However, AIMC-centric platforms lack the flexibility of general-purpose systems, as they often have hard-coded data flows and can only support a limited set of processing functions. With the goal of bridging this gap in flexibility, we present a novel system architecture that tightly integrates analog in-memory computing accelerators into multi-core CPUs in general-purpose systems. We developed a powerful gem5-based full system-level simulation framework into the gem5-X simulator, ALPINE, which enables an in-depth characterization of the proposed architecture. ALPINE allows the simulation of the entire computer architecture stack from major hardware components to their interactions with the Linux OS. Within ALPINE, we have defined a custom ISA extension and a software library to facilitate the deployment of inference models. We showcase and analyze a variety of map**s of different neural network types, and demonstrate up to 20.5x/20.8x performance/energy gains with respect to a SIMD-enabled ARM CPU implementation for convolutional neural networks, multi-layer perceptrons, and recurrent neural networks. △ Less

Submitted 13 December, 2022; v1 submitted 20 May, 2022; originally announced May 2022.

Comments: Accepted by IEEE Transactions on Computers, December 2022

ACM Class: C.4; I.6.0

arXiv:2205.07654 [pdf, other]

Hyperdimensional computing encoding for feature selection on the use case of epileptic seizure detection

Authors: Una Pale, Tomas Teijeiro, David Atienza

Abstract: The healthcare landscape is moving from the reactive interventions focused on symptoms treatment to a more proactive prevention, from one-size-fits-all to personalized medicine, and from centralized to distributed paradigms. Wearable IoT devices and novel algorithms for continuous monitoring are essential components of this transition. Hyperdimensional (HD) computing is an emerging ML paradigm ins… ▽ More The healthcare landscape is moving from the reactive interventions focused on symptoms treatment to a more proactive prevention, from one-size-fits-all to personalized medicine, and from centralized to distributed paradigms. Wearable IoT devices and novel algorithms for continuous monitoring are essential components of this transition. Hyperdimensional (HD) computing is an emerging ML paradigm inspired by neuroscience research with various aspects interesting for IoT devices and biomedical applications. Here we explore the not yet addressed topic of optimal encoding of spatio-temporal data, such as electroencephalogram (EEG) signals, and all information it entails to the HD vectors. Further, we demonstrate how the HD computing framework can be used to perform feature selection by choosing an adequate encoding. To the best of our knowledge, this is the first approach to performing feature selection using HD computing in the literature. As a result, we believe it can support the ML community to further foster the research in multiple directions related to feature and channel selection, as well as model interpretability. △ Less

Submitted 16 May, 2022; originally announced May 2022.

arXiv:2204.05009 [pdf, other]

VWR2A: A Very-Wide-Register Reconfigurable-Array Architecture for Low-Power Embedded Devices

Authors: Benoît Walter Denkinger, Miguel Peón-Quirós, Mario Konijnenburg, David Atienza, Francky Catthoor

Abstract: Edge-computing requires high-performance energy-efficient embedded systems. Fixed-function or custom accelerators, such as FFT or FIR filter engines, are very efficient at implementing a particular functionality for a given set of constraints. However, they are inflexible when facing application-wide optimizations or functionality upgrades. Conversely, programmable cores offer higher flexibility,… ▽ More Edge-computing requires high-performance energy-efficient embedded systems. Fixed-function or custom accelerators, such as FFT or FIR filter engines, are very efficient at implementing a particular functionality for a given set of constraints. However, they are inflexible when facing application-wide optimizations or functionality upgrades. Conversely, programmable cores offer higher flexibility, but often with a penalty in area, performance, and, above all, energy consumption. In this paper, we propose VWR2A, an architecture that integrates high computational density and low power memory structures (i.e., very-wide registers and scratchpad memories). VWR2A narrows the energy gap with similar or better performance on FFT kernels with respect to an FFT accelerator. Moreover, VWR2A flexibility allows to accelerate multiple kernels, resulting in significant energy savings at the application level. △ Less

Submitted 2 June, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

arXiv:2201.09759 [pdf, other]

Exploration of Hyperdimensional Computing Strategies for Enhanced Learning on Epileptic Seizure Detection

Authors: Una Pale, Tomas Teijeiro, David Atienza

Abstract: Wearable and unobtrusive monitoring and prediction of epileptic seizures has the potential to significantly increase the life quality of patients, but is still an unreached goal due to challenges of real-time detection and wearable devices design. Hyperdimensional (HD) computing has evolved in recent years as a new promising machine learning approach, especially when talking about wearable applica… ▽ More Wearable and unobtrusive monitoring and prediction of epileptic seizures has the potential to significantly increase the life quality of patients, but is still an unreached goal due to challenges of real-time detection and wearable devices design. Hyperdimensional (HD) computing has evolved in recent years as a new promising machine learning approach, especially when talking about wearable applications. But in the case of epilepsy detection, standard HD computing is not performing at the level of other state-of-the-art algorithms. This could be due to the inherent complexity of the seizures and their signatures in different biosignals, such as the electroencephalogram (EEG), the highly personalized nature, and the disbalance of seizure and non-seizure instances. In the literature, different strategies for improved learning of HD computing have been proposed, such as iterative (multi-pass) learning, multi-centroid learning and learning with sample weight ("OnlineHD"). Yet, most of them have not been tested on the challenging task of epileptic seizure detection, and it stays unclear whether they can increase the HD computing performance to the level of the current state-of-the-art algorithms, such as random forests. Thus, in this paper, we implement different learning strategies and assess their performance on an individual basis, or in combination, regarding detection performance and memory and computational requirements. Results show that the best-performing algorithm, which is a combination of multi-centroid and multi-pass, can indeed reach the performance of the random forest model on a highly unbalanced dataset imitating a real-life epileptic seizure detection application. △ Less

Submitted 24 January, 2022; originally announced January 2022.

arXiv:2112.04369 [pdf, other]

doi 10.1109/TBME.2022.3205304

Adaptive R-Peak Detection on Wearable ECG Sensors for High-Intensity Exercise

Authors: Elisabetta De Giovanni, Tomas Teijeiro, Grégoire P. Millet, David Atienza

Abstract: Objective: Continuous monitoring of biosignals via wearable sensors has quickly expanded in the medical and wellness fields. At rest, automatic detection of vital parameters is generally accurate. However, in conditions such as high-intensity exercise, sudden physiological changes occur to the signals, compromising the robustness of standard algorithms. Methods: Our method, called BayeSlope, is ba… ▽ More Objective: Continuous monitoring of biosignals via wearable sensors has quickly expanded in the medical and wellness fields. At rest, automatic detection of vital parameters is generally accurate. However, in conditions such as high-intensity exercise, sudden physiological changes occur to the signals, compromising the robustness of standard algorithms. Methods: Our method, called BayeSlope, is based on unsupervised learning, Bayesian filtering, and non-linear normalization to enhance and correctly detect the R peaks according to their expected positions in the ECG. Furthermore, as BayeSlope is computationally heavy and can drain the device battery quickly, we propose an online design that adapts its robustness to sudden physiological changes, and its complexity to the heterogeneous resources of modern embedded platforms. This method combines BayeSlope with a lightweight algorithm, executed in cores with different capabilities, to reduce the energy consumption while preserving the accuracy. Results: BayeSlope achieves an F1 score of 99.3% in experiments during intense cycling exercise with 20 subjects. Additionally, the online adaptive process achieves an F1 score of 99% across five different exercise intensities, with a total energy consumption of 1.55+-0.54~mJ. Conclusion: We propose a highly accurate and robust method, and a complete energy-efficient implementation in a modern ultra-low-power embedded platform to improve R peak detection in challenging conditions, such as during high-intensity exercise. Significance: The experiments show that BayeSlope outperforms a state-of-the-art algorithm up to 8.4% in F1 score, while our online adaptive method can reach energy savings up to 38.7% on modern heterogeneous wearable platforms. △ Less

Submitted 8 December, 2021; originally announced December 2021.

Comments: 12 pages, 14 figures, 2 tables

MSC Class: 68U35

arXiv:2111.08463 [pdf, other]

doi 10.3389/fneur.2022.816294

Multi-Centroid Hyperdimensional Computing Approach for Epileptic Seizure Detection

Authors: Una Pale, Tomas Teijeiro, David Atienza

Abstract: Long-term monitoring of patients with epilepsy presents a challenging problem from the engineering perspective of real-time detection and wearable devices design. It requires new solutions that allow continuous unobstructed monitoring and reliable detection and prediction of seizures. A high variability in the electroencephalogram (EEG) patterns exists among people, brain states, and time instance… ▽ More Long-term monitoring of patients with epilepsy presents a challenging problem from the engineering perspective of real-time detection and wearable devices design. It requires new solutions that allow continuous unobstructed monitoring and reliable detection and prediction of seizures. A high variability in the electroencephalogram (EEG) patterns exists among people, brain states, and time instances during seizures, but also during non-seizure periods. This makes epileptic seizure detection very challenging, especially if data is grouped under only seizure and non-seizure labels. Hyperdimensional (HD) computing, a novel machine learning approach, comes in as a promising tool. However, it has certain limitations when the data shows a high intra-class variability. Therefore, in this work, we propose a novel semi-supervised learning approach based on a multi-centroid HD computing. The multi-centroid approach allows to have several prototype vectors representing seizure and non-seizure states, which leads to significantly improved performance when compared to a simple 2-class HD model. Further, real-life data imbalance poses an additional challenge and the performance reported on balanced subsets of data is likely to be overestimated. Thus, we test our multi-centroid approach with three different dataset balancing scenarios, showing that performance improvement is higher for the less balanced dataset. More specifically, up to 14% improvement is achieved on an unbalanced test set with 10 times more non-seizure than seizure data. At the same time, the total number of sub-classes is not significantly increased compared to the balanced dataset. Thus, the proposed multi-centroid approach can be an important element in achieving a high performance of epilepsy detection with real-life data balance or during online learning, where seizures are infrequent. △ Less

Submitted 16 November, 2021; originally announced November 2021.

arXiv:2109.03008 [pdf, other]

Semiparametric Bayesian Networks

Authors: David Atienza, Concha Bielza, Pedro Larrañaga

Abstract: We introduce semiparametric Bayesian networks that combine parametric and nonparametric conditional probability distributions. Their aim is to incorporate the advantages of both components: the bounded complexity of parametric models and the flexibility of nonparametric ones. We demonstrate that semiparametric Bayesian networks generalize two well-known types of Bayesian networks: Gaussian Bayesia… ▽ More We introduce semiparametric Bayesian networks that combine parametric and nonparametric conditional probability distributions. Their aim is to incorporate the advantages of both components: the bounded complexity of parametric models and the flexibility of nonparametric ones. We demonstrate that semiparametric Bayesian networks generalize two well-known types of Bayesian networks: Gaussian Bayesian networks and kernel density estimation Bayesian networks. For this purpose, we consider two different conditional probability distributions required in a semiparametric Bayesian network. In addition, we present modifications of two well-known algorithms (greedy hill-climbing and PC) to learn the structure of a semiparametric Bayesian network from data. To realize this, we employ a score function based on cross-validation. In addition, using a validation dataset, we apply an early-stop** criterion to avoid overfitting. To evaluate the applicability of the proposed algorithm, we conduct an exhaustive experiment on synthetic data sampled by mixing linear and nonlinear functions, multivariate normal data sampled from Gaussian Bayesian networks, real data from the UCI repository, and bearings degradation data. As a result of this experiment, we conclude that the proposed algorithm accurately learns the combination of parametric and nonparametric components, while achieving a performance comparable with those provided by state-of-the-art methods. △ Less

Submitted 7 September, 2021; originally announced September 2021.

Comments: 44 pages, 13 figures, 4 tables, submitted to Information Sciences

MSC Class: 68T05 68T10 ACM Class: I.5.1; I.2.6

arXiv:2105.02808 [pdf, other]

doi 10.1109/CBMS52027.2021.00050

Wearable and Continuous Prediction of Passage of Time Perception for Monitoring Mental Health

Authors: Lara Orlandic, Adriana Arza Valdes, David Atienza

Abstract: A person's passage of time perception (POTP) is strongly linked to their mental state and stress response, and can therefore provide an easily quantifiable means of continuous mental health monitoring. In this work, we develop a custom experiment and Machine Learning (ML) models for predicting POTP from biomarkers acquired from wearable biosensors. We first confirm that individuals experience time… ▽ More A person's passage of time perception (POTP) is strongly linked to their mental state and stress response, and can therefore provide an easily quantifiable means of continuous mental health monitoring. In this work, we develop a custom experiment and Machine Learning (ML) models for predicting POTP from biomarkers acquired from wearable biosensors. We first confirm that individuals experience time passing slower than usual during fear or sadness (p = 0.046) and faster than usual during cognitive tasks (p = 2 x 10^-5). Then, we group together the experimental segments associated with fast, slow, and normal POTP, and train a ML model to classify between these states based on a person's biomarkers. The classifier had a weighted average F-1 score of 79%, with the fast-passing time class having the highest F-1 score of 93%. Next, we classify each individual's POTP regardless of the task at hand, achieving an F-1 score of 77.1% when distinguishing time passing faster rather than slower than usual. In the two classifiers, biomarkers derived from the respiration, electrocardiogram, skin conductance, and skin temperature signals contributed most to the classifier output, thus enabling real-time POTP monitoring using noninvasive, wearable biosensors. △ Less

Submitted 3 May, 2021; originally announced May 2021.

arXiv:2105.00934 [pdf, other]

doi 10.1109/EMBC46164.2021.9629648

Systematic Assessment of Hyperdimensional Computing for Epileptic Seizure Detection

Authors: Una Pale, Tomas Teijeiro, David Atienza

Abstract: Hyperdimensional computing is a promising novel paradigm for low-power embedded machine learning. It has been applied on different biomedical applications, and particularly on epileptic seizure detection. Unfortunately, due to differences in data preparation, segmentation, encoding strategies, and performance metrics, results are hard to compare, which makes building upon that knowledge difficult.… ▽ More Hyperdimensional computing is a promising novel paradigm for low-power embedded machine learning. It has been applied on different biomedical applications, and particularly on epileptic seizure detection. Unfortunately, due to differences in data preparation, segmentation, encoding strategies, and performance metrics, results are hard to compare, which makes building upon that knowledge difficult. Thus, the main goal of this work is to perform a systematic assessment of the HD computing framework for the detection of epileptic seizures, comparing different feature approaches mapped to HD vectors. More precisely, we test two previously implemented features as well as several novel approaches with HD computing on epileptic seizure detection. We evaluate them in a comparable way, i.e., with the same preprocessing setup, and with the identical performance measures. We use two different datasets in order to assess the generalizability of our conclusions. The systematic assessment involved three primary aspects relevant for potential wearable implementations: 1) detection performance, 2) memory requirements, and 3) computational complexity. Our analysis shows a significant difference in detection performance between approaches, but also that the ones with the highest performance might not be ideal for wearable applications due to their high memory or computational requirements. Furthermore, we evaluate a post-processing strategy to adjust the predictions to the dynamics of epileptic seizures, showing that performance is significantly improved in all the approaches and also that after post-processing, differences in performance are much smaller between approaches. △ Less

Submitted 3 May, 2021; originally announced May 2021.

arXiv:2104.14278 [pdf, other]

ReLearn: A Robust Machine Learning Framework in Presence of Missing Data for Multimodal Stress Detection from Physiological Signals

Authors: Arman Iranfar, Adriana Arza, David Atienza

Abstract: Continuous and multimodal stress detection has been performed recently through wearable devices and machine learning algorithms. However, a well-known and important challenge of working on physiological signals recorded by conventional monitoring devices is missing data due to sensors insufficient contact and interference by other equipment. This challenge becomes more problematic when the user/pa… ▽ More Continuous and multimodal stress detection has been performed recently through wearable devices and machine learning algorithms. However, a well-known and important challenge of working on physiological signals recorded by conventional monitoring devices is missing data due to sensors insufficient contact and interference by other equipment. This challenge becomes more problematic when the user/patient is mentally or physically active or stressed because of more frequent conscious or subconscious movements. In this paper, we propose ReLearn, a robust machine learning framework for stress detection from biomarkers extracted from multimodal physiological signals. ReLearn effectively copes with missing data and outliers both at training and inference phases. ReLearn, composed of machine learning models for feature selection, outlier detection, data imputation, and classification, allows us to classify all samples, including those with missing values at inference. In particular, according to our experiments and stress database, while by discarding all missing data, as a simplistic yet common approach, no prediction can be made for 34% of the data at inference, our approach can achieve accurate predictions, as high as 78%, for missing samples. Also, our experiments show that the proposed framework obtains a cross-validation accuracy of 86.8% even if more than 50% of samples within the features are missing. △ Less

Submitted 29 July, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

Comments: 7 pages

arXiv:2103.03044 [pdf, other]

doi 10.1016/j.micpro.2020.103185

The RECIPE Approach to Challenges in Deeply Heterogeneous High Performance Systems

Authors: Giovanni Agosta, William Fornaciari, David Atienza, Ramon Canal, Alessandro Cilardo, José Flich Cardo, Carles Hernandez Luz, Michal Kulczewski, Giuseppe Massari, Rafael Tornero Gavilá, Marina Zapater

Abstract: RECIPE (REliable power and time-ConstraInts-aware Predictive management of heterogeneous Exascale systems) is a recently started project funded within the H2020 FETHPC programme, which is expressly targeted at exploring new High-Performance Computing (HPC) technologies. RECIPE aims at introducing a hierarchical runtime resource management infrastructure to optimize energy efficiency and minimize t… ▽ More RECIPE (REliable power and time-ConstraInts-aware Predictive management of heterogeneous Exascale systems) is a recently started project funded within the H2020 FETHPC programme, which is expressly targeted at exploring new High-Performance Computing (HPC) technologies. RECIPE aims at introducing a hierarchical runtime resource management infrastructure to optimize energy efficiency and minimize the occurrence of thermal hotspots, while enforcing the time constraints imposed by the applications and ensuring reliability for both time-critical and throughput-oriented computation that run on deeply heterogeneous accelerator-based systems. This paper presents a detailed overview of RECIPE, identifying the fundamental challenges as well as the key innovations addressed by the project. In particular, the need for predictive reliability approaches to maximize hardware lifetime and guarantee application performance is identified as the key concern for RECIPE, and is addressed via hierarchical resource management of the heterogeneous architectural components of the system, driven by estimates of the application latency and hardware reliability obtained respectively through timing analysis and modelling thermal properties, mean-time-to-failure of subsystems. We show the impact of prediction accuracy on the overheads imposed by the checkpointing policy, as well as a possible application to a weather forecasting use case. △ Less

Submitted 4 March, 2021; originally announced March 2021.

Journal ref: Microprocessors and Microsystems, Volume 77, 2020

arXiv:2012.11933 [pdf, other]

Interpreting Deep Learning Models for Epileptic Seizure Detection on EEG signals

Authors: Valentin Gabeff, Tomas Teijeiro, Marina Zapater, Leila Cammoun, Sylvain Rheims, Philippe Ryvlin, David Atienza

Abstract: While Deep Learning (DL) is often considered the state-of-the art for Artificial Intelligence-based medical decision support, it remains sparsely implemented in clinical practice and poorly trusted by clinicians due to insufficient interpretability of neural network models. We have tackled this issue by develo** interpretable DL models in the context of online detection of epileptic seizure, bas… ▽ More While Deep Learning (DL) is often considered the state-of-the art for Artificial Intelligence-based medical decision support, it remains sparsely implemented in clinical practice and poorly trusted by clinicians due to insufficient interpretability of neural network models. We have tackled this issue by develo** interpretable DL models in the context of online detection of epileptic seizure, based on EEG signal. This has conditioned the preparation of the input signals, the network architecture, and the post-processing of the output in line with the domain knowledge. Specifically, we focused the discussion on three main aspects: 1) how to aggregate the classification results on signal segments provided by the DL model into a larger time scale, at the seizure-level; 2) what are the relevant frequency patterns learned in the first convolutional layer of different models, and their relation with the delta, theta, alpha, beta and gamma frequency bands on which the visual interpretation of EEG is based; and 3) the identification of the signal waveforms with larger contribution towards the ictal class, according to the activation differences highlighted using the DeepLIFT method. Results show that the kernel size in the first layer determines the interpretability of the extracted features and the sensitivity of the trained models, even though the final performance is very similar after post-processing. Also, we found that amplitude is the main feature leading to an ictal prediction, suggesting that a larger patient population would be required to learn more complex frequency patterns. Still, our methodology was successfully able to generalize patient inter-variability for the majority of the studied population with a classification F1-score of 0.873 and detecting 90% of the seizures. △ Less

Submitted 22 December, 2020; originally announced December 2020.

Comments: 28 pages, 11 figures, 12 tables

arXiv:2011.04107 [pdf, other]

doi 10.1109/MWC.010.2100561

Graphene-based Wireless Agile Interconnects for Massive Heterogeneous Multi-chip Processors

Authors: Sergi Abadal, Robert Guirado, Hamidreza Taghvaee, Akshay Jain, Elana Pereira de Santana, Peter Haring Bolívar, Mohamed Saeed, Renato Negra, Zhenxing Wang, Kun-Ta Wang, Max C. Lemme, Joshua Klein, Marina Zapater, Alexandre Levisse, David Atienza, Davide Rossi, Francesco Conti, Martino Dazzi, Geethan Karunaratne, Irem Boybat, Abu Sebastian

Abstract: The main design principles in computer architecture have recently shifted from a monolithic scaling-driven approach to the development of heterogeneous architectures that tightly co-integrate multiple specialized processor and memory chiplets. In such data-hungry multi-chip architectures, current Networks-in-Package (NiPs) may not be enough to cater to their heterogeneous and fast-changing communi… ▽ More The main design principles in computer architecture have recently shifted from a monolithic scaling-driven approach to the development of heterogeneous architectures that tightly co-integrate multiple specialized processor and memory chiplets. In such data-hungry multi-chip architectures, current Networks-in-Package (NiPs) may not be enough to cater to their heterogeneous and fast-changing communication demands. This position paper makes the case for wireless in-package nanonetworking as the enabler of efficient and versatile wired-wireless interconnect fabrics for massive heterogeneous processors. To that end, the use of graphene-based antennas and transceivers with unique frequency-beam reconfigurability in the terahertz band is proposed. The feasibility of such a nanonetworking vision and the main research challenges towards its realization are analyzed from the technological, communications, and computer architecture perspectives. △ Less

Submitted 21 September, 2023; v1 submitted 8 November, 2020; originally announced November 2020.

Comments: 8 pages, 4 figures, 1 table

Journal ref: IEEE Wireless Communications Magazine, vol. 30, no. 4, pp. 162-169, 2023

arXiv:2009.11644 [pdf, other]

doi 10.1038/s41597-021-00937-4

The COUGHVID crowdsourcing dataset: A corpus for the study of large-scale cough analysis algorithms

Authors: Lara Orlandic, Tomas Teijeiro, David Atienza

Abstract: Cough audio signal classification has been successfully used to diagnose a variety of respiratory conditions, and there has been significant interest in leveraging Machine Learning (ML) to provide widespread COVID-19 screening. However, there is currently no validated database of cough sounds with which to train such ML models. The COUGHVID dataset provides over 20,000 crowdsourced cough recording… ▽ More Cough audio signal classification has been successfully used to diagnose a variety of respiratory conditions, and there has been significant interest in leveraging Machine Learning (ML) to provide widespread COVID-19 screening. However, there is currently no validated database of cough sounds with which to train such ML models. The COUGHVID dataset provides over 20,000 crowdsourced cough recordings representing a wide range of subject ages, genders, geographic locations, and COVID-19 statuses. First, we filtered the dataset using our open-sourced cough detection algorithm. Second, experienced pulmonologists labeled more than 2,000 recordings to diagnose medical abnormalities present in the coughs, thereby contributing one of the largest expert-labeled cough datasets in existence that can be used for a plethora of cough audio classification tasks. Finally, we ensured that coughs labeled as symptomatic and COVID-19 originate from countries with high infection rates, and that their expert labels are consistent. As a result, the COUGHVID dataset contributes a wealth of cough recordings for training ML models to address the world's most urgent health crises. △ Less

Submitted 24 September, 2020; originally announced September 2020.

Comments: 11 pages, 3 figures

arXiv:1907.10518 [pdf, other]

Synthetic Epileptic Brain Activities Using Generative Adversarial Networks

Authors: Damian Pascual, Amir Aminifar, David Atienza, Philippe Ryvlin, Roger Wattenhofer

Abstract: Epilepsy is a chronic neurological disorder affecting more than 65 million people worldwide and manifested by recurrent unprovoked seizures. The unpredictability of seizures not only degrades the quality of life of the patients, but it can also be life-threatening. Modern systems monitoring electroencephalography (EEG) signals are being currently developed with the view to detect epileptic seizure… ▽ More Epilepsy is a chronic neurological disorder affecting more than 65 million people worldwide and manifested by recurrent unprovoked seizures. The unpredictability of seizures not only degrades the quality of life of the patients, but it can also be life-threatening. Modern systems monitoring electroencephalography (EEG) signals are being currently developed with the view to detect epileptic seizures in order to alert caregivers and reduce the impact of seizures on patients' quality of life. Such seizure detection systems employ state-of-the-art machine learning algorithms that require a considerably large amount of labeled personal data for training. However, acquiring EEG signals of epileptic seizures is a costly and time-consuming process for medical experts and patients, currently requiring in-hospital recordings in specialized units. In this work, we generate synthetic seizure-like brain electrical activities, i.e., EEG signals, that can be used to train seizure detection algorithms, alleviating the need for recorded data. First, we train a Generative Adversarial Network (GAN) with data from 30 epilepsy patients. Then, we generate synthetic personalized training sets for new, unseen patients, which overall yield higher detection performance than the real-data training sets. We demonstrate our results using the datasets from the EPILEPSIAE Project, one of the world's largest public databases for seizure detection. △ Less

Submitted 12 November, 2019; v1 submitted 22 July, 2019; originally announced July 2019.

Comments: Machine Learning for Health (ML4H) at NeurIPS 2019 - Extended Abstract

arXiv:1112.4084 [pdf]

doi 10.1109/TMM.2012.2231668

Markov Decision Process Based Energy-Efficient On-Line Scheduling for Slice-Parallel Video Decoders on Multicore Systems

Authors: Nicholas Mastronarde, Karim Kanoun, David Atienza, Pascal Frossard, Mihaela van der Schaar

Abstract: We consider the problem of energy-efficient on-line scheduling for slice-parallel video decoders on multicore systems. We assume that each of the processors are Dynamic Voltage Frequency Scaling (DVFS) enabled such that they can independently trade off performance for power, while taking the video decoding workload into account. In the past, scheduling and DVFS policies in multi-core systems have… ▽ More We consider the problem of energy-efficient on-line scheduling for slice-parallel video decoders on multicore systems. We assume that each of the processors are Dynamic Voltage Frequency Scaling (DVFS) enabled such that they can independently trade off performance for power, while taking the video decoding workload into account. In the past, scheduling and DVFS policies in multi-core systems have been formulated heuristically due to the inherent complexity of the on-line multicore scheduling problem. The key contribution of this report is that we rigorously formulate the problem as a Markov decision process (MDP), which simultaneously takes into account the on-line scheduling and per-core DVFS capabilities; the power consumption of the processor cores and caches; and the loss tolerant and dynamic nature of the video decoder's traffic. In particular, we model the video traffic using a Direct Acyclic Graph (DAG) to capture the precedence constraints among frames in a Group of Pictures (GOP) structure, while also accounting for the fact that frames have different display/decoding deadlines and non-deterministic decoding complexities. The objective of the MDP is to minimize long-term power consumption subject to a minimum Quality of Service (QoS) constraint related to the decoder's throughput. Although MDPs notoriously suffer from the curse of dimensionality, we show that, with appropriate simplifications and approximations, the complexity of the MDP can be mitigated. We implement a slice-parallel version of H.264 on a multiprocessor ARM (MPARM) virtual platform simulator, which provides cycle-accurate and bus signal-accurate simulation for different processors. We use this platform to generate realistic video decoding traces with which we evaluate the proposed on-line scheduling algorithm in Matlab. △ Less

Submitted 24 May, 2012; v1 submitted 17 December, 2011; originally announced December 2011.

Journal ref: IEEE Trans. on Multimedia, vol. 15, no. 2, pp. 268-278, Feb. 2013

Showing 1–40 of 40 results for author: Atienza, D