Skip to main content

Showing 1–24 of 24 results for author: Bartolini, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.18030  [pdf, other

    eess.SY cs.PF

    Modeling and Controlling Many-Core HPC Processors: an Alternative to PID and Moving Average Algorithms

    Authors: Giovanni Bambini, Alessandro Ottaviano, Christian Conficoni, Andrea Tilli, Luca Benini, Andrea Bartolini

    Abstract: The race towards performance increase and computing power has led to chips with heterogeneous and complex designs, integrating an ever-growing number of cores on the same monolithic chip or chiplet silicon die. Higher integration density, compounded with the slowdown of technology-driven power reduction, implies that power and thermal management become increasingly relevant. Unfortunately, existin… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Paper in Review

  2. arXiv:2402.10395  [pdf, other

    cs.CR cs.PF

    Assessing the Performance of OpenTitan as Cryptographic Accelerator in Secure Open-Hardware System-on-Chips

    Authors: Emanuele Parisi, Alberto Musa, Maicol Ciani, Francesco Barchi, Davide Rossi, Andrea Bartolini, Andrea Acquaviva

    Abstract: RISC-V open-source systems are emerging in deployment scenarios where safety and security are critical. OpenTitan is an open-source silicon root-of-trust designed to be deployed in a wide range of systems, from high-end to deeply embedded secure environments. Despite the availability of various cryptographic hardware accelerators that make OpenTitan suitable for offloading cryptographic workloads… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: 8 pages, 2 figures, accepted at CF'24 conference, pre camera-ready version

  3. arXiv:2401.02567  [pdf, other

    cs.CR cs.AR

    TitanCFI: Toward Enforcing Control-Flow Integrity in the Root-of-Trust

    Authors: Emanuele Parisi, Alberto Musa, Simone Manoni, Maicol Ciani, Davide Rossi, Francesco Barchi, Andrea Bartolini, Andrea Acquaviva

    Abstract: Modern RISC-V platforms control and monitor security-critical systems such as industrial controllers and autonomous vehicles. While these platforms feature a Root-of-Trust (RoT) to store authentication secrets and enable secure boot technologies, they often lack Control-Flow Integrity (CFI) enforcement and are vulnerable to cyber-attacks which divert the control flow of an application to trigger m… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: 6 pages, 1 figure, accepted at DATE'24 conference, pre camera-ready version

  4. Design of an energy aware petaflops class high performance cluster based on power architecture

    Authors: W. A. Ahmad, A. Bartolini, F. Beneventi, L. Benini, A. Borghesi, M. Cicala, P. Forestieri, C. Gianfreda, D. Gregori, A. Libri, F. Spiga, S. Tinti

    Abstract: In this paper we present D.A.V.I.D.E. (Development for an Added Value Infrastructure Designed in Europe), an innovative and energy efficient High Performance Computing cluster designed by E4 Computer Engineering for PRACE (Partnership for Advanced Computing in Europe). D.A.V.I.D.E. is built using best-in-class components (IBM's POWER8-NVLink CPUs, NVIDIA TESLA P100 GPUs, Mellanox InfiniBand EDR 10… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

  5. ControlPULP: A RISC-V On-Chip Parallel Power Controller for Many-Core HPC Processors with FPGA-Based Hardware-In-The-Loop Power and Thermal Emulation

    Authors: Alessandro Ottaviano, Robert Balas, Giovanni Bambini, Antonio del Vecchio, Maicol Ciani, Davide Rossi, Luca Benini, Andrea Bartolini

    Abstract: High-Performance Computing (HPC) processors are nowadays integrated Cyber-Physical Systems demanding complex and high-bandwidth closed-loop power and thermal control strategies. To efficiently satisfy real-time multi-input multi-output (MIMO) optimal power requirements, high-end processors integrate an on-die power controller system (PCS). While traditional PCSs are based on a simple microcontro… ▽ More

    Submitted 21 February, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: 33 pages, 11 figures

  6. arXiv:2305.02697  [pdf, other

    cs.DC cs.AI

    DECICE: Device-Edge-Cloud Intelligent Collaboration Framework

    Authors: Julian Kunkel, Christian Boehme, Jonathan Decker, Fabrizio Magugliani, Dirk Pleiter, Bastian Koller, Karthee Sivalingam, Sabri Pllana, Alexander Nikolov, Mujdat Soyturk, Christian Racca, Andrea Bartolini, Adrian Tate, Berkay Yaman

    Abstract: DECICE is a Horizon Europe project that is develo** an AI-enabled open and portable management framework for automatic and adaptive optimization and deployment of applications in computing continuum encompassing from IoT sensors on the Edge to large-scale Cloud / HPC computing infrastructures. In this paper, we describe the DECICE framework and architecture. Furthermore, we highlight use-cases f… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

  7. Experimenting with Emerging RISC-V Systems for Decentralised Machine Learning

    Authors: Gianluca Mittone, Nicolò Tonci, Robert Birke, Iacopo Colonnelli, Doriana Medić, Andrea Bartolini, Roberto Esposito, Emanuele Parisi, Francesco Beneventi, Mirko Polato, Massimo Torquati, Luca Benini, Marco Aldinucci

    Abstract: Decentralised Machine Learning (DML) enables collaborative machine learning without centralised input data. Federated Learning (FL) and Edge Inference are examples of DML. While tools for DML (especially FL) are starting to flourish, many are not flexible and portable enough to experiment with novel processors (e.g., RISC-V), non-fully connected network topologies, and asynchronous collaboration s… ▽ More

    Submitted 18 October, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

    Comments: This paper is the accepted version of ACM copyrighted material presented at the CF'23 conference in Bologna, Italy

    Journal ref: In Proceedings of the 20th ACM International Conference on Computing Frontiers 2023 (CF '23), ACM, New York, NY, USA, 73-83

  8. arXiv:2208.13169  [pdf, other

    cs.LG cs.AI

    RUAD: unsupervised anomaly detection in HPC systems

    Authors: Martin Molan, Andrea Borghesi, Daniele Cesarini, Luca Benini, Andrea Bartolini

    Abstract: The increasing complexity of modern high-performance computing (HPC) systems necessitates the introduction of automated and data-driven methodologies to support system administrators' effort toward increasing the system's availability. Anomaly detection is an integral part of improving the availability as it eases the system administrator's burden and reduces the time between an anomaly and its re… ▽ More

    Submitted 28 August, 2022; originally announced August 2022.

    MSC Class: 68T07 (Primary) 68U01; 68T01 (Secondary) ACM Class: I.2; I.2.6

  9. arXiv:2205.03725  [pdf, other

    cs.DC cs.AR

    Monte Cimone: Paving the Road for the First Generation of RISC-V High-Performance Computers

    Authors: Andrea Bartolini, Federico Ficarelli, Emanuele Parisi, Francesco Beneventi, Francesco Barchi, Daniele Gregori, Fabrizio Magugliani, Marco Cicala, Cosimo Gianfreda, Daniele Cesarini, Andrea Acquaviva, Luca Benini

    Abstract: The new open and royalty-free RISC-V ISA is attracting interest across the whole computing continuum, from microcontrollers to supercomputers. High-performance RISC-V processors and accelerators have been announced, but RISC-V-based HPC systems will need a holistic co-design effort, spanning memory, storage hierarchy interconnects and full software stack. In this paper, we describe Monte Cimone, a… ▽ More

    Submitted 7 May, 2022; originally announced May 2022.

  10. arXiv:2012.06836  [pdf, other

    cs.LG cs.CL

    Source Code Classification for Energy Efficiency in Parallel Ultra Low-Power Microcontrollers

    Authors: Emanuele Parisi, Francesco Barchi, Andrea Bartolini, Giuseppe Tagliavini, Andrea Acquaviva

    Abstract: The analysis of source code through machine learning techniques is an increasingly explored research topic aiming at increasing smartness in the software toolchain to exploit modern architectures in the best possible way. In the case of low-power, parallel embedded architectures, this means finding the configuration, for instance in terms of the number of cores, leading to minimum energy consumpti… ▽ More

    Submitted 12 December, 2020; originally announced December 2020.

  11. arXiv:2008.06571  [pdf, other

    cs.PF cs.AR cs.DC

    Toward an End-to-End Auto-tuning Framework in HPC PowerStack

    Authors: Xingfu Wu, Aniruddha Marathe, Siddhartha Jana, Ondrej Vysocky, Jophin John, Andrea Bartolini, Lubomir Riha, Michael Gerndt, Valerie Taylor, Sridutt Bhalachandra

    Abstract: Efficiently utilizing procured power and optimizing performance of scientific applications under power and energy constraints are challenging. The HPC PowerStack defines a software stack to manage power and energy of high-performance computing systems and standardizes the interfaces between different components of the stack. This survey paper presents the findings of a working group focused on the… ▽ More

    Submitted 14 August, 2020; originally announced August 2020.

    Comments: to be published in Energy Efficient HPC State of Practice 2020

  12. A Machine Learning Approach to Online Fault Classification in HPC Systems

    Authors: Alessio Netti, Zeynep Kiziltan, Ozalp Babaoglu, Alina Sirbu, Andrea Bartolini, Andrea Borghesi

    Abstract: As High-Performance Computing (HPC) systems strive towards the exascale goal, failure rates both at the hardware and software levels will increase significantly. Thus, detecting and classifying faults in HPC systems as they occur and initiating corrective actions before they can transform into failures becomes essential for continued operation. Central to this objective is fault injection, which i… ▽ More

    Submitted 27 July, 2020; originally announced July 2020.

    Comments: arXiv admin note: text overlap with arXiv:1807.10056, arXiv:1810.11208

    Journal ref: Future Generation Computer Systems, Volume 110, September 2020, Pages 1009-1022

  13. arXiv:2004.03670  [pdf, other

    cs.LG cs.CR eess.SP stat.ML

    pAElla: Edge-AI based Real-Time Malware Detection in Data Centers

    Authors: Antonio Libri, Andrea Bartolini, Luca Benini

    Abstract: The increasing use of Internet-of-Things (IoT) devices for monitoring a wide spectrum of applications, along with the challenges of "big data" streaming support they often require for data analysis, is nowadays pushing for an increased attention to the emerging edge computing paradigm. In particular, smart approaches to manage and analyze data directly on the network edge, are more and more invest… ▽ More

    Submitted 7 April, 2020; originally announced April 2020.

  14. arXiv:1909.12684  [pdf, other

    cs.DC

    COUNTDOWN Slack: a Run-time Library to Reduce Energy Footprint in Large-scale MPI Applications

    Authors: Daniele Cesarini, Andrea Bartolini, Andrea Borghesi, Carlo Cavazzoni, Mathieu Luisier, Luca Benini

    Abstract: The power consumption of supercomputers is a major challenge for system owners, users, and society. It limits the capacity of system installations, it requires large cooling infrastructures, and it is the cause of a large carbon footprint. Reducing power during application execution without changing the application source code or increasing time-to-completion is highly desirable in real-life high-… ▽ More

    Submitted 27 September, 2019; originally announced September 2019.

    Comments: 13 pages, 4 figures, 3 tables

  15. arXiv:1902.09400  [pdf

    cs.NI

    A LoRaWAN Wireless Sensor Network for Data Center Temperature Monitoring

    Authors: Tommaso Polonelli, Davide Brunelli, Andrea Bartolini, Luca Benini

    Abstract: High-performance computing installations, which are at the basis of web and cloud servers as well as supercomputers, are constrained by two main conflicting requirements: IT power consumption generated by the computing nodes and the heat that must be removed to avoid thermal hazards. In the worst cases, up to 60% of the energy consumed in a data center is used for cooling, often related to an over… ▽ More

    Submitted 18 February, 2019; originally announced February 2019.

    Comments: 9 pages

  16. Online Anomaly Detection in HPC Systems

    Authors: Andrea Borghesi, Antonio Libri, Luca Benini, Andrea Bartolini

    Abstract: Reliability is a cumbersome problem in High Performance Computing Systems and Data Centers evolution. During operation, several types of fault conditions or anomalies can arise, ranging from malfunctioning hardware to improper configurations or imperfect software. Currently, system administrator and final users have to discover it manually. Clearly this approach does not scale to large scale super… ▽ More

    Submitted 22 February, 2019; originally announced February 2019.

    Comments: Preprint of paper submitted and accepted AICAS2019 Conference (1st IEEE International Conference on Artificial Intelligence Circuits and Systems)

    Journal ref: 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), Hsinchu, Taiwan, 2019, pp. 229-233

  17. arXiv:1901.06175  [pdf, ps, other

    cs.DC

    The ANTAREX Domain Specific Language for High Performance Computing

    Authors: Cristina Silvano, Giovanni Agosta, Andrea Bartolini, Andrea R. Beccari, Luca Benini, Loïc Besnard, João Bispo, Radim Cmar, João M. P. Cardoso, Carlo Cavazzoni, Daniele Cesarini, Stefano Cherubin, Federico Ficarelli, Davide Gadioli, Martin Golasowski, Antonio Libri, Jan Martinovič, Gianluca Palermo, Pedro Pinto, Erven Rohou, Kateřina Slaninová, Emanuele Vitali

    Abstract: The ANTAREX project relies on a Domain Specific Language (DSL) based on Aspect Oriented Programming (AOP) concepts to allow applications to enforce extra functional properties such as energy-efficiency and performance and to optimize Quality of Service (QoS) in an adaptive way. The DSL approach allows the definition of energy-efficiency, performance, and adaptivity strategies as well as their enfo… ▽ More

    Submitted 18 January, 2019; originally announced January 2019.

  18. Anomaly Detection using Autoencoders in High Performance Computing Systems

    Authors: Andrea Borghesi, Andrea Bartolini, Michele Lombardi, Michela Milano, Luca Benini

    Abstract: Anomaly detection in supercomputers is a very difficult problem due to the big scale of the systems and the high number of components. The current state of the art for automated anomaly detection employs Machine Learning methods or statistical regression models in a supervised fashion, meaning that the detection tool is trained to distinguish among a fixed set of behaviour classes (healthy and unh… ▽ More

    Submitted 13 November, 2018; originally announced November 2018.

    Comments: 9 pages, 3 figures

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pages 9428-9433, 2019

  19. arXiv:1810.11208  [pdf, other

    cs.DC

    Online Fault Classification in HPC Systems through Machine Learning

    Authors: Alessio Netti, Zeynep Kiziltan, Ozalp Babaoglu, Alina Sirbu, Andrea Bartolini, Andrea Borghesi

    Abstract: As High-Performance Computing (HPC) systems strive towards the exascale goal, studies suggest that they will experience excessive failure rates. For this reason, detecting and classifying faults in HPC systems as they occur and initiating corrective actions before they can transform into failures will be essential for continued operation. In this paper, we propose a fault classification method for… ▽ More

    Submitted 11 July, 2019; v1 submitted 26 October, 2018; originally announced October 2018.

    Comments: Accepted for publication at the Euro-Par 2019 conference

  20. arXiv:1810.01865  [pdf, other

    cs.LG eess.SP stat.ML

    Robust identification of thermal models for in-production High-Performance-Computing clusters with machine learning-based data selection

    Authors: Federico Pittino, Roberto Diversi, Luca Benini, Andrea Bartolini

    Abstract: Power and thermal management are critical components of High-Performance-Computing (HPC) systems, due to their high power density and large total power consumption. The assessment of thermal dissipation by means of compact models directly from the thermal response of the final device enables more robust and precise thermal control strategies as well as automated diagnosis. However, when dealing wi… ▽ More

    Submitted 7 November, 2018; v1 submitted 3 October, 2018; originally announced October 2018.

  21. arXiv:1807.10056  [pdf, other

    cs.DC

    FINJ: A Fault Injection Tool for HPC Systems

    Authors: Alessio Netti, Zeynep Kiziltan, Ozalp Babaoglu, Alina Sirbu, Andrea Bartolini, Andrea Borghesi

    Abstract: We present FINJ, a high-level fault injection tool for High-Performance Computing (HPC) systems, with a focus on the management of complex experiments. FINJ provides support for custom workloads and allows generation of anomalous conditions through the use of fault-triggering executable programs. FINJ can also be integrated seamlessly with most other lower-level fault injection tools, allowing use… ▽ More

    Submitted 1 September, 2018; v1 submitted 26 July, 2018; originally announced July 2018.

    Comments: To be presented at the 11th Resilience Workshop in the 2018 Euro-Par conference

  22. arXiv:1806.07258  [pdf, other

    cs.DC

    COUNTDOWN: a Run-time Library for Performance-Neutral Energy Saving in MPI Applications

    Authors: Daniele Cesarini, Andrea Bartolini, Pietro Bonfà, Carlo Cavazzoni, Luca Benini

    Abstract: Power and energy consumption is becoming key challenges to deploy the first exascale supercomputer successfully. Large-scale HPC applications waste a significant amount of power in communication and synchronization-related idle times. However, due to the time scale at which communication happens, transitioning in low power states during communication's idle times may introduce unacceptable overhea… ▽ More

    Submitted 23 May, 2019; v1 submitted 19 June, 2018; originally announced June 2018.

    Comments: 14 pages, 11 figures

  23. Pricing Schemes for Energy-Efficient HPC Systems: Design and Exploration

    Authors: Andrea Borghesi, Andrea Bartolini, Michela Milano, Luca Benini

    Abstract: Energy efficiency is of paramount importance for the sustainability of HPC systems. Energy consumption limits the peak performance of supercomputers and accounts for a large share of total cost of ownership. Consequently, system owners and final users have started exploring mechanisms to trade off performance for power consumption, for example through frequency and voltage scaling. However, only… ▽ More

    Submitted 13 June, 2018; originally announced June 2018.

    Journal ref: The International Journal of High Performance Computing Applications. Volume: 33 issue: 4, page(s): 716-734 , 2019

  24. arXiv:1806.02698  [pdf, other

    cs.DC

    DiG: Enabling Out-of-Band Scalable High-Resolution Monitoring for Data-Center Analytics, Automation and Control (Extended)

    Authors: Antonio Libri, Andrea Bartolini, Luca Benini

    Abstract: Data centers are increasing in size and complexity, and we need scalable approaches to support their automated analysis and control. Performance counters and power consumption are their key "vital signs". State-of-the-Art (SoA) monitoring systems provide built-in tools to collect performance measurements, and custom solutions to get insight on their power consumption. However, with the increase in… ▽ More

    Submitted 17 July, 2019; v1 submitted 7 June, 2018; originally announced June 2018.