-
Point containment algorithms for constructive solid geometry with unbounded primitives
Authors:
Paul K. Romano,
Patrick A. Myers,
Seth R. Johnson,
Aljaž Kolšek,
Patrick C. Shriwise
Abstract:
We present several algorithms for evaluating point containment in constructive solid geometry (CSG) trees with unbounded primitives. Three algorithms are presented based on postfix, prefix, and infix notations of the CSG binary expression tree. We show that prefix and infix notations enable short-circuiting logic, which reduces the number of primitives that must be checked during point containment…
▽ More
We present several algorithms for evaluating point containment in constructive solid geometry (CSG) trees with unbounded primitives. Three algorithms are presented based on postfix, prefix, and infix notations of the CSG binary expression tree. We show that prefix and infix notations enable short-circuiting logic, which reduces the number of primitives that must be checked during point containment. To evaluate the performance of the algorithms, each algorithm was implemented in the OpenMC Monte Carlo particle transport code, which relies on CSG to represent solid bodies through which subatomic particles travel. Two sets of tests were carried out. First, the execution time to generate a high-resolution rasterized image of a 2D slice of a detailed CSG model of the ITER tokamak was measured. Use of both prefix and infix notations offered significant speedup over the postfix notation that has traditionally been used in particle transport codes, with infix resulting in a 6$\times$ reduction in execution time relative to postfix. We then measured the execution time of neutron transport simulations of the same ITER model using each of the algorithms. The results and performance improvements reveal the same trends as for the rasterization test, with a 4.59$\times$ overall speedup using the infix notation relative to the original postfix notation in OpenMC.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Whispy: Adapting STT Whisper Models to Real-Time Environments
Authors:
Antonio Bevilacqua,
Paolo Saviano,
Alessandro Amirante,
Simon Pietro Romano
Abstract:
Large general-purpose transformer models have recently become the mainstay in the realm of speech analysis. In particular, Whisper achieves state-of-the-art results in relevant tasks such as speech recognition, translation, language identification, and voice activity detection. However, Whisper models are not designed to be used in real-time conditions, and this limitation makes them unsuitable fo…
▽ More
Large general-purpose transformer models have recently become the mainstay in the realm of speech analysis. In particular, Whisper achieves state-of-the-art results in relevant tasks such as speech recognition, translation, language identification, and voice activity detection. However, Whisper models are not designed to be used in real-time conditions, and this limitation makes them unsuitable for a vast plethora of practical applications. In this paper, we introduce Whispy, a system intended to bring live capabilities to the Whisper pretrained models. As a result of a number of architectural optimisations, Whispy is able to consume live audio streams and generate high level, coherent voice transcriptions, while still maintaining a low computational cost. We evaluate the performance of our system on a large repository of publicly available speech datasets, investigating how the transcription mechanism introduced by Whispy impacts on the Whisper output. Experimental results show how Whispy excels in robustness, promptness, and accuracy.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Error-Driven Uncertainty Aware Training
Authors:
Pedro Mendes,
Paolo Romano,
David Garlan
Abstract:
Neural networks are often overconfident about their predictions, which undermines their reliability and trustworthiness. In this work, we present a novel technique, named Error-Driven Uncertainty Aware Training (EUAT), which aims to enhance the ability of neural models to estimate their uncertainty correctly, namely to be highly uncertain when they output inaccurate predictions and low uncertain w…
▽ More
Neural networks are often overconfident about their predictions, which undermines their reliability and trustworthiness. In this work, we present a novel technique, named Error-Driven Uncertainty Aware Training (EUAT), which aims to enhance the ability of neural models to estimate their uncertainty correctly, namely to be highly uncertain when they output inaccurate predictions and low uncertain when their output is accurate. The EUAT approach operates during the model's training phase by selectively employing two loss functions depending on whether the training examples are correctly or incorrectly predicted by the model. This allows for pursuing the twofold goal of i) minimizing model uncertainty for correctly predicted inputs and ii) maximizing uncertainty for mispredicted inputs, while preserving the model's misprediction rate. We evaluate EUAT using diverse neural models and datasets in the image recognition domains considering both non-adversarial and adversarial settings. The results show that EUAT outperforms existing approaches for uncertainty estimation (including other uncertainty-aware training techniques, calibration, ensembles, and DEUP) by providing uncertainty estimates that not only have higher quality when evaluated via statistical metrics (e.g., correlation with residuals) but also when employed to build binary classifiers that decide whether the model's output can be trusted or not and under distributional data shifts.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs
Authors:
John Tramm,
Paul Romano,
Patrick Shriwise,
Amanda Lund,
Johannes Doerfert,
Patrick Steinbrecher,
Andrew Siegel,
Gavin Ridley
Abstract:
OpenMC is an open source Monte Carlo neutral particle transport application that has recently been ported to GPU using the OpenMP target offloading model. We examine the performance of OpenMC at scale on the Frontier, Polaris, and Aurora supercomputers, demonstrating that performance portability has been achieved by OpenMC across all three major GPU vendors (AMD, NVIDIA, and Intel). OpenMC's GPU p…
▽ More
OpenMC is an open source Monte Carlo neutral particle transport application that has recently been ported to GPU using the OpenMP target offloading model. We examine the performance of OpenMC at scale on the Frontier, Polaris, and Aurora supercomputers, demonstrating that performance portability has been achieved by OpenMC across all three major GPU vendors (AMD, NVIDIA, and Intel). OpenMC's GPU performance is compared to both the traditional CPU-based version of OpenMC as well as several other state-of-the-art CPU-based Monte Carlo particle transport applications. We also provide historical context by analyzing OpenMC's performance on several legacy GPU and CPU architectures. This work includes some of the first published results for a scientific simulation application at scale on a supercomputer featuring Intel's Max series "Ponte Vecchio" GPUs. It is also one of the first demonstrations of a large scientific production application using the OpenMP target offloading model to achieve high performance on all three major GPU platforms.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
PIM-STM: Software Transactional Memory for Processing-In-Memory Systems
Authors:
André Lopes,
Daniel Castro,
Paolo Romano
Abstract:
Processing-In-Memory (PIM) is a novel approach that augments existing DRAM memory chips with lightweight logic. By allowing to offload computations to the PIM system, this architecture allows for circumventing the data-bottleneck problem that affects many modern workloads. This work tackles the problem of how to build efficient software implementations of the Transactional Memory (TM) abstraction…
▽ More
Processing-In-Memory (PIM) is a novel approach that augments existing DRAM memory chips with lightweight logic. By allowing to offload computations to the PIM system, this architecture allows for circumventing the data-bottleneck problem that affects many modern workloads. This work tackles the problem of how to build efficient software implementations of the Transactional Memory (TM) abstraction by introducing PIM-STM, a library that provides a range of diverse TM implementations for UPMEM, the first commercial PIM system. Via an extensive study we assess the efficiency of alternative choices in the design space of TM algorithms on this emerging architecture. We further quantify the impact of using different memory tiers of the UPMEM system (having different trade-offs for what concerns latency vs capacity) to store the metadata used by different TM implementations. Finally, we assess the gains achievable in terms of performance and memory efficiency when using PIM-STM to accelerate TM applications originally conceived for conventional CPU-based systems.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Adversarial training for tabular data with attack propagation
Authors:
Tiago Leon Melo,
João Bravo,
Marco O. P. Sampaio,
Paolo Romano,
Hugo Ferreira,
João Tiago Ascensão,
Pedro Bizarro
Abstract:
Adversarial attacks are a major concern in security-centered applications, where malicious actors continuously try to mislead Machine Learning (ML) models into wrongly classifying fraudulent activity as legitimate, whereas system maintainers try to stop them. Adversarially training ML models that are robust against such attacks can prevent business losses and reduce the work load of system maintai…
▽ More
Adversarial attacks are a major concern in security-centered applications, where malicious actors continuously try to mislead Machine Learning (ML) models into wrongly classifying fraudulent activity as legitimate, whereas system maintainers try to stop them. Adversarially training ML models that are robust against such attacks can prevent business losses and reduce the work load of system maintainers. In such applications data is often tabular and the space available for attackers to manipulate undergoes complex feature engineering transformations, to provide useful signals for model training, to a space attackers cannot access. Thus, we propose a new form of adversarial training where attacks are propagated between the two spaces in the training loop. We then test this method empirically on a real world dataset in the domain of credit card fraud detection. We show that our method can prevent about 30% performance drops under moderate attacks and is essential under very aggressive attacks, with a trade-off loss in performance under no attacks smaller than 7%.
△ Less
Submitted 28 July, 2023;
originally announced July 2023.
-
Hyper-parameter Tuning for Adversarially Robust Models
Authors:
Pedro Mendes,
Paolo Romano,
David Garlan
Abstract:
This work focuses on the problem of hyper-parameter tuning (HPT) for robust (i.e., adversarially trained) models, shedding light on the new challenges and opportunities arising during the HPT process for robust models. To this end, we conduct an extensive experimental study based on 3 popular deep models, in which we explore exhaustively 9 (discretized) HPs, 2 fidelity dimensions, and 2 attack bou…
▽ More
This work focuses on the problem of hyper-parameter tuning (HPT) for robust (i.e., adversarially trained) models, shedding light on the new challenges and opportunities arising during the HPT process for robust models. To this end, we conduct an extensive experimental study based on 3 popular deep models, in which we explore exhaustively 9 (discretized) HPs, 2 fidelity dimensions, and 2 attack bounds, for a total of 19208 configurations (corresponding to 50 thousand GPU hours). Through this study, we show that the complexity of the HPT problem is further exacerbated in adversarial settings due to the need to independently tune the HPs used during standard and adversarial training: succeeding in doing so (i.e., adopting different HP settings in both phases) can lead to a reduction of up to 80% and 43% of the error for clean and adversarial inputs, respectively. On the other hand, we also identify new opportunities to reduce the cost of HPT for robust models. Specifically, we propose to leverage cheap adversarial training methods to obtain inexpensive, yet highly correlated, estimations of the quality achievable using state-of-the-art methods. We show that, by exploiting this novel idea in conjunction with a recent multi-fidelity optimizer (taKG), the efficiency of the HPT process can be enhanced by up to 2.1x.
△ Less
Submitted 13 June, 2024; v1 submitted 5 April, 2023;
originally announced April 2023.
-
Workflows Community Summit 2022: A Roadmap Revolution
Authors:
Rafael Ferreira da Silva,
Rosa M. Badia,
Venkat Bala,
Debbie Bard,
Peer-Timo Bremer,
Ian Buckley,
Silvina Caino-Lores,
Kyle Chard,
Carole Goble,
Shantenu Jha,
Daniel S. Katz,
Daniel Laney,
Manish Parashar,
Frederic Suter,
Nick Tyler,
Thomas Uram,
Ilkay Altintas,
Stefan Andersson,
William Arndt,
Juan Aznar,
Jonathan Bader,
Bartosz Balis,
Chris Blanton,
Kelly Rosa Braghetto,
Aharon Brodutch
, et al. (80 additional authors not shown)
Abstract:
Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given the changing landscape of scientific computing and t…
▽ More
Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given the changing landscape of scientific computing and the evolving needs of emerging scientific applications, it is paramount that the development of novel scientific workflows and system functionalities seek to increase the efficiency, resilience, and pervasiveness of existing systems and applications. Specifically, the proliferation of machine learning/artificial intelligence (ML/AI) workflows, need for processing large scale datasets produced by instruments at the edge, intensification of near real-time data processing, support for long-term experiment campaigns, and emergence of quantum computing as an adjunct to HPC, have significantly changed the functional and operational requirements of workflow systems. Workflow systems now need to, for example, support data streams from the edge-to-cloud-to-HPC enable the management of many small-sized files, allow data reduction while ensuring high accuracy, orchestrate distributed services (workflows, instruments, data movement, provenance, publication, etc.) across computing and user facilities, among others. Further, to accelerate science, it is also necessary that these systems implement specifications/standards and APIs for seamless (horizontal and vertical) integration between systems and applications, as well as enabling the publication of workflows and their associated products according to the FAIR principles. This document reports on discussions and findings from the 2022 international edition of the Workflows Community Summit that took place on November 29 and 30, 2022.
△ Less
Submitted 31 March, 2023;
originally announced April 2023.
-
GOSPF: An energy efficient implementation of the OSPF routing protocol
Authors:
Maurizio D'Arienzo,
Simon Pietro Romano
Abstract:
Energy saving is currently one of the most challenging issues for the Internet research community. Indeed, the exponential growth of applications and services induces a remarkable increase in power consumption and hence calls for novel solutions which are capable to preserve energy of the infrastructures, at the same time maintaining the required Quality of Service guarantees. In this paper we int…
▽ More
Energy saving is currently one of the most challenging issues for the Internet research community. Indeed, the exponential growth of applications and services induces a remarkable increase in power consumption and hence calls for novel solutions which are capable to preserve energy of the infrastructures, at the same time maintaining the required Quality of Service guarantees. In this paper we introduce a new mechanism for saving energy through intelligent switch off of network links. The mechanism has been implemented as an extension to the Open Shortest Path First routing protocol. We first show through simulations that our solution is capable to dramatically reduce energy consumption when compared to the standard OSPF implementation. We then illustrate a real-world implementation of the proposed protocol within the Quagga routing software suite.
△ Less
Submitted 30 June, 2022;
originally announced July 2022.
-
UMPIRE: A Universal Moderator for the Participation in IETF Remote Events
Authors:
Simon Pietro Romano
Abstract:
UMPIRE provides seamless meeting interaction among remote and local participants. It uses the BFCP, an IETF standard for moderation.BFCP introduces automated floor control functions to a centralized conferencing environment. This article discusses the design and implementation of the UMPIRE system and highlights the most notable solutions we devised to handle variegated requirements and constraint…
▽ More
UMPIRE provides seamless meeting interaction among remote and local participants. It uses the BFCP, an IETF standard for moderation.BFCP introduces automated floor control functions to a centralized conferencing environment. This article discusses the design and implementation of the UMPIRE system and highlights the most notable solutions we devised to handle variegated requirements and constraints. We also discuss the lessons learned while experiencing in the first person how the application of research results that have eventually led to new standards still must confront a number of minor yet concrete issues that might completely undermine the overall process of wide adoption by the community.
△ Less
Submitted 30 June, 2022;
originally announced June 2022.
-
Dockerized Android: a container-based platform to build mobile Android scenarios for Cyber Ranges
Authors:
Daniele Capone,
Francesco Caturano,
Angelo Delicato,
Gaetano Perrone,
Simon Pietro Romano
Abstract:
The best way to train people about security is through Cyber Ranges, i.e., the virtual platform used by cyber-security experts to learn new skills and attack vectors. In order to realize such virtual scenarios, container-based virtualization is commonly adopted, as it provides several benefits in terms of performance, resource usage, and portability. Unfortunately, the current generation of Cyber…
▽ More
The best way to train people about security is through Cyber Ranges, i.e., the virtual platform used by cyber-security experts to learn new skills and attack vectors. In order to realize such virtual scenarios, container-based virtualization is commonly adopted, as it provides several benefits in terms of performance, resource usage, and portability. Unfortunately, the current generation of Cyber Ranges does not consider mobile devices, which nowadays are ubiquitous in our daily lives. Such devices do often represent the very first entry point for hackers into target networks. It is thus important to make available tools allowing to emulate mobile devices in a safe environment without incurring the risk of causing any damage in the real world. This work aims to propose Dockerized Android, i.e., a framework that addresses the problem of realizing vulnerable environments for mobile devices in the next generation of Cyber Ranges. We show the platform's design and implementation and show how it is possible to use the implemented features to realize complex virtual mobile kill-chains scenarios.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
ExploitWP2Docker: a Platform for Automating the Generation of Vulnerable WordPress Environments for Cyber Ranges
Authors:
Francesco Caturano,
Nicola d'Ambrosio,
Gaetano Perrone,
Luigi Previdente,
Simon Pietro Romano
Abstract:
A cyber range is a realistic simulation of an organization's network infrastructure, commonly used for cyber security training purposes. It provides a safe environment to assess competencies in both offensive and defensive techniques. An important step during the realization of a cyber range is the generation of vulnerable machines. This step is challenging and requires a laborious manual configur…
▽ More
A cyber range is a realistic simulation of an organization's network infrastructure, commonly used for cyber security training purposes. It provides a safe environment to assess competencies in both offensive and defensive techniques. An important step during the realization of a cyber range is the generation of vulnerable machines. This step is challenging and requires a laborious manual configuration. Several works aim to reduce this overhead, but the current state-of-the-art focuses on generating network services without considering the effort required to build vulnerable environments for web applications. A cyber range should represent a real system, and nowadays, almost all the companies develop their company site by using WordPress, a common Content Management System (CMS), which is also one of the most critical attackers' entry points. The presented work proposes an approach to automatically create and configure vulnerable WordPress applications by using the information presented in public exploits. Our platform automatically extracts information from the most well-known publicly available exploit database in order to generate and configure vulnerable environments. The container-based virtualization is used to generate lightweight and easily deployable infrastructures. A final evaluation highlights promising results regarding the possibility of automating the generation of vulnerable environments through our approach.
△ Less
Submitted 18 May, 2022;
originally announced May 2022.
-
Chirotonia: A Scalable and Secure e-Voting Framework based on Blockchains and Linkable Ring Signatures
Authors:
Antonio Russo,
Antonio Fernández Anta,
Maria Isabel González Vasco,
Simon Pietro Romano
Abstract:
In this paper we propose a comprehensive and scalable framework to build secure-by-design e-voting systems. Decentralization, transparency, determinism, and untamperability of votes are granted by dedicated smart contracts on a blockchain, while voter authenticity and anonymity are achieved through (provable secure) linkable ring signatures. These, in combination with suitable smart contract const…
▽ More
In this paper we propose a comprehensive and scalable framework to build secure-by-design e-voting systems. Decentralization, transparency, determinism, and untamperability of votes are granted by dedicated smart contracts on a blockchain, while voter authenticity and anonymity are achieved through (provable secure) linkable ring signatures. These, in combination with suitable smart contract constraints, also grant protection from double voting. Our design is presented in detail, focusing on its security guarantees and the design choices that allow it to scale to a large number of voters. Finally, we present a proof-of-concept implementation of the proposed framework, made available as open source.
△ Less
Submitted 3 November, 2021;
originally announced November 2021.
-
HyperJump: Accelerating HyperBand via Risk Modelling
Authors:
Pedro Mendes,
Maria Casimiro,
Paolo Romano,
David Garlan
Abstract:
In the literature on hyper-parameter tuning, a number of recent solutions rely on low-fidelity observations (e.g., training with sub-sampled datasets) in order to efficiently identify promising configurations to be then tested via high-fidelity observations (e.g., using the full dataset). Among these, HyperBand is arguably one of the most popular solutions, due to its efficiency and theoretically…
▽ More
In the literature on hyper-parameter tuning, a number of recent solutions rely on low-fidelity observations (e.g., training with sub-sampled datasets) in order to efficiently identify promising configurations to be then tested via high-fidelity observations (e.g., using the full dataset). Among these, HyperBand is arguably one of the most popular solutions, due to its efficiency and theoretically provable robustness. In this work, we introduce HyperJump, a new approach that builds on HyperBand's robust search strategy and complements it with novel model-based risk analysis techniques that accelerate the search by skip** the evaluation of low risk configurations, i.e., configurations that are likely to be eventually discarded by HyperBand. We evaluate HyperJump on a suite of hyper-parameter optimization problems and show that it provides over one-order of magnitude speed-ups, both in sequential and parallel deployments, on a variety of deep-learning, kernel-based learning, and neural architectural search problems when compared to HyperBand and to several state-of-the-art optimizers.
△ Less
Submitted 2 December, 2022; v1 submitted 5 August, 2021;
originally announced August 2021.
-
Leveraging AI to optimize website structure discovery during Penetration Testing
Authors:
Diego Antonelli,
Roberta Cascella,
Gaetano Perrone,
Simon Pietro Romano,
Antonio Schiano
Abstract:
Dirbusting is a technique used to brute force directories and file names on web servers while monitoring HTTP responses, in order to enumerate server contents. Such a technique uses lists of common words to discover the hidden structure of the target website. Dirbusting typically relies on response codes as discovery conditions to find new pages. It is widely used in web application penetration te…
▽ More
Dirbusting is a technique used to brute force directories and file names on web servers while monitoring HTTP responses, in order to enumerate server contents. Such a technique uses lists of common words to discover the hidden structure of the target website. Dirbusting typically relies on response codes as discovery conditions to find new pages. It is widely used in web application penetration testing, an activity that allows companies to detect websites vulnerabilities. Dirbusting techniques are both time and resource consuming and innovative approaches have never been explored in this field. We hence propose an advanced technique to optimize the dirbusting process by leveraging Artificial Intelligence. More specifically, we use semantic clustering techniques in order to organize wordlist items in different groups according to their semantic meaning. The created clusters are used in an ad-hoc implemented next-word intelligent strategy. This paper demonstrates that the usage of clustering techniques outperforms the commonly used brute force methods. Performance is evaluated by testing eight different web applications. Results show a performance increase that is up to 50% for each of the conducted experiments.
△ Less
Submitted 18 January, 2021;
originally announced January 2021.
-
TrimTuner: Efficient Optimization of Machine Learning Jobs in the Cloud via Sub-Sampling
Authors:
Pedro Mendes,
Maria Casimiro,
Paolo Romano,
David Garlan
Abstract:
This work introduces TrimTuner, the first system for optimizing machine learning jobs in the cloud to exploit sub-sampling techniques to reduce the cost of the optimization process while kee** into account user-specified constraints. TrimTuner jointly optimizes the cloud and application-specific parameters and, unlike state of the art works for cloud optimization, eschews the need to train the m…
▽ More
This work introduces TrimTuner, the first system for optimizing machine learning jobs in the cloud to exploit sub-sampling techniques to reduce the cost of the optimization process while kee** into account user-specified constraints. TrimTuner jointly optimizes the cloud and application-specific parameters and, unlike state of the art works for cloud optimization, eschews the need to train the model with the full training set every time a new configuration is sampled. Indeed, by leveraging sub-sampling techniques and data-sets that are up to 60x smaller than the original one, we show that TrimTuner can reduce the cost of the optimization process by up to 50x. Further, TrimTuner speeds-up the recommendation process by 65x with respect to state of the art techniques for hyper-parameter optimization that use sub-sampling techniques. The reasons for this improvement are twofold: i) a novel domain specific heuristic that reduces the number of configurations for which the acquisition function has to be evaluated; ii) the adoption of an ensemble of decision trees that enables boosting the speed of the recommendation process by one additional order of magnitude.
△ Less
Submitted 9 November, 2020;
originally announced November 2020.
-
LIoTS: League of IoT Sovereignties. A Scalable approach for a Transparent Privacy-safe Federation of Secured IoT Platforms
Authors:
Flavio Cirillo,
Nicola Capuano,
Simon Pietro Romano,
Ernö Kovacs
Abstract:
Internet-of-Things has entered all the fields where data are produced and processed, resulting in a plethora of IoT platforms, typically cloud-based, centralizing data and services management. This has brought to many disjoint IoT silos. Significant efforts have been devoted to integration, recurrently resulting into bigger centralized infrastructures. Such an approach often stumbles upon the relu…
▽ More
Internet-of-Things has entered all the fields where data are produced and processed, resulting in a plethora of IoT platforms, typically cloud-based, centralizing data and services management. This has brought to many disjoint IoT silos. Significant efforts have been devoted to integration, recurrently resulting into bigger centralized infrastructures. Such an approach often stumbles upon the reluctance of IoT system owners to loose the dominion over data. We introduce a secured and privacy-safe infrastructure where a federation overlay is distributed among parties and the data control is kept locally. This establishes a league of peers each sovereign of their IoT system and data: League of IoT Sovereignties (LIoTS). LIoTS is scalable by design, allowing iterative formation of domains levels due to the transparency of its federation. Tests show that the overhead is minimal when exchanged data is hefty, and that LIoTS performs better in large IoT deployments than centralized approaches.
△ Less
Submitted 13 May, 2020;
originally announced May 2020.
-
Stretching the capacity of Hardware Transactional Memory in IBM POWER architectures
Authors:
Ricardo Filipe,
Shady Issa,
Paolo Romano,
João Barreto
Abstract:
The hardware transactional memory (HTM) implementations in commercially available processors are significantly hindered by their tight capacity constraints. In practice, this renders current HTMs unsuitable to many real-world workloads of in-memory databases. This paper proposes SI-HTM, which stretches the capacity bounds of the underlying HTM, thus opening HTM to a much broader class of applicati…
▽ More
The hardware transactional memory (HTM) implementations in commercially available processors are significantly hindered by their tight capacity constraints. In practice, this renders current HTMs unsuitable to many real-world workloads of in-memory databases. This paper proposes SI-HTM, which stretches the capacity bounds of the underlying HTM, thus opening HTM to a much broader class of applications. SI-HTM leverages the HTM implementation of the IBM POWER architecture with a software layer to offer a single-version implementation of Snapshot Isolation. When compared to HTM- and software-based concurrency control alternatives, SI-HTM exhibits improved scalability, achieving speedups of up to 300% relatively to HTM on in-memory database benchmarks.
△ Less
Submitted 6 March, 2020;
originally announced March 2020.
-
Bandwidth-Aware Page Placement in NUMA
Authors:
David Gureya,
João Neto,
Reza Karimi,
João Barreto,
Pramod Bhatotia,
Vivien Quema,
Rodrigo Rodrigues,
Paolo Romano,
Vladimir Vlassov
Abstract:
Page placement is a critical problem for memoryintensive applications running on a shared-memory multiprocessor with a non-uniform memory access (NUMA) architecture. State-of-the-art page placement mechanisms interleave pages evenly across NUMA nodes. However, this approach fails to maximize memory throughput in modern NUMA systems, characterised by asymmetric bandwidths and latencies, and sensiti…
▽ More
Page placement is a critical problem for memoryintensive applications running on a shared-memory multiprocessor with a non-uniform memory access (NUMA) architecture. State-of-the-art page placement mechanisms interleave pages evenly across NUMA nodes. However, this approach fails to maximize memory throughput in modern NUMA systems, characterised by asymmetric bandwidths and latencies, and sensitive to memory contention and interconnect congestion phenomena. We propose BWAP, a novel page placement mechanism based on asymmetric weighted page interleaving. BWAP combines an analytical performance model of the target NUMA system with on-line iterative tuning of page distribution for a given memory-intensive application. Our experimental evaluation with representative memory-intensive workloads shows that BWAP performs up to 66% better than state-of-the-art techniques. These gains are particularly relevant when multiple co-located applications run in disjoint partitions of a large NUMA machine or when applications do not scale up to the total number of cores.
△ Less
Submitted 19 May, 2023; v1 submitted 6 March, 2020;
originally announced March 2020.
-
Cooperative Intersection Crossing over 5G
Authors:
Luca Maria Castiglione,
Paolo Falcone,
Alberto Petrillo,
Simon Pietro Romano,
Stefania Santini
Abstract:
Autonomous driving is a safety critical application of sensing and decision-making technologies. Communication technologies extend the awareness capabilities of vehicles, beyond what is achievable with the on-board systems only. Nonetheless, issues typically related to wireless networking must be taken into account when designing safe and reliable autonomous systems. The aim of this work is to pre…
▽ More
Autonomous driving is a safety critical application of sensing and decision-making technologies. Communication technologies extend the awareness capabilities of vehicles, beyond what is achievable with the on-board systems only. Nonetheless, issues typically related to wireless networking must be taken into account when designing safe and reliable autonomous systems. The aim of this work is to present a control algorithm and a communication paradigm over 5G networks for negotiating traffic junctions in urban areas. The proposed control framework has been shown to converge in a finite time and the supporting communication software has been designed with the objective of minimising communication delays. At the same time, the underlying network guarantees reliability of the communication. The proposed framework has been successfully deployed and tested, in partnership with Ericsson AB, at the AstaZero proving ground in Goteborg, Sweden. In our experiments, three autonomous vehicles successfully drove through an intersection of 235 square meters in a urban scenario.
△ Less
Submitted 17 July, 2019;
originally announced July 2019.
-
Lynceus: Cost-efficient Tuning and Provisioning of Data Analytic Jobs
Authors:
Maria Casimiro,
Diego Didona,
Paolo Romano,
Luís Rodrigues,
Willy Zwanepoel,
David Garlan
Abstract:
Modern data analytic and machine learning jobs find in the cloud a natural deployment platform to satisfy their notoriously large resource requirements. Yet, to achieve cost efficiency, it is crucial to identify a deployment configuration that satisfies user-defined QoS constraints (e.g., on execution time), while avoiding unnecessary over-provisioning. This paper introduces Lynceus, a new approac…
▽ More
Modern data analytic and machine learning jobs find in the cloud a natural deployment platform to satisfy their notoriously large resource requirements. Yet, to achieve cost efficiency, it is crucial to identify a deployment configuration that satisfies user-defined QoS constraints (e.g., on execution time), while avoiding unnecessary over-provisioning. This paper introduces Lynceus, a new approach for the optimization of cloud based data analytic jobs that improves overstate-of-the-art approaches by enabling significant cost savings both in terms of the final recommended configuration and of the optimization process used to recommend configurations. Unlike existing solutions, Lynceus optimizes in a joint fashion both the cloud-related and the application-level parameters. This allows for a reduction of the cost of recommended configurations by up to 3.7x at the 90-th percentile with respect to existing approaches, which treat the optimization of cloud-related and application-level parameters as two independent problems. Further, Lynceus reduces the cost of the optimization process (i.e., the cloud cost incurred for testing configurations) by up to 11x. Such an improvement is achieved thanks to two mechanisms: i) a timeout approach which allows to abort the exploration of configurations that are deemed suboptimal, while still extracting useful information to guide future explorations and to improve its predictive model - differently from recent works, which either incur the full cost for testing suboptimal configurations or are unable to extract any knowledge from aborted runs; ii) a long-sighted and budget-aware technique that determines which configurations to test by predicting the long-term impact of each exploration - unlike state-of-the-art approaches for the optimization of cloud jobs, which adopt greedy optimization methods.
△ Less
Submitted 20 January, 2020; v1 submitted 6 May, 2019;
originally announced May 2019.
-
HeTM: Transactional Memory for Heterogeneous Systems
Authors:
Daniel Castro,
Paolo Romano,
Aleksandar Ilic,
Amin M. Khan
Abstract:
Modern heterogeneous computing architectures, which couple multi-core CPUs with discrete many-core GPUs (or other specialized hardware accelerators), enable unprecedented peak performance and energy efficiency levels. Unfortunately, though, develo** applications that can take full advantage of the potential of heterogeneous systems is a notoriously hard task. This work takes a step towards reduc…
▽ More
Modern heterogeneous computing architectures, which couple multi-core CPUs with discrete many-core GPUs (or other specialized hardware accelerators), enable unprecedented peak performance and energy efficiency levels. Unfortunately, though, develo** applications that can take full advantage of the potential of heterogeneous systems is a notoriously hard task. This work takes a step towards reducing the complexity of programming heterogeneous systems by introducing the abstraction of Heterogeneous Transactional Memory (HeTM). HeTM provides programmers with the illusion of a single memory region, shared among the CPUs and the (discrete) GPU(s) of a heterogeneous system, with support for atomic transactions. Besides introducing the abstract semantics and programming model of HeTM, we present the design and evaluation of a concrete implementation of the proposed abstraction, which we named Speculative HeTM (SHeTM). SHeTM makes use of a novel design that leverages on speculative techniques and aims at hiding the inherently large communication latency between CPUs and discrete GPUs and at minimizing inter-device synchronization overhead. SHeTM is based on a modular and extensible design that allows for easily integrating alternative TM implementations on the CPU's and GPU's sides, which allows the flexibility to adopt, on either side, the TM implementation (e.g., in hardware or software) that best fits the applications' workload and the architectural characteristics of the processing unit. We demonstrate the efficiency of the SHeTM via an extensive quantitative study based both on synthetic benchmarks and on a porting of a popular object caching system.
△ Less
Submitted 2 September, 2019; v1 submitted 2 May, 2019;
originally announced May 2019.
-
A Flexible Framework for Accurate Simulation of Cloud In-Memory Data Stores
Authors:
Pierangelo Di Sanzo,
Francesco Quaglia,
Bruno Ciciani,
Alessandro Pellegrini,
Diego Didona,
Paolo Romano,
Roberto Palmieri,
Sebastiano Peluso
Abstract:
In-memory (transactional) data stores are recognized as a first-class data management technology for cloud platforms, thanks to their ability to match the elasticity requirements imposed by the pay-as-you-go cost model. On the other hand, defining the well-suited amount of cache servers to be deployed, and the degree of in-memory replication of slices of data, in order to optimize reliability/avai…
▽ More
In-memory (transactional) data stores are recognized as a first-class data management technology for cloud platforms, thanks to their ability to match the elasticity requirements imposed by the pay-as-you-go cost model. On the other hand, defining the well-suited amount of cache servers to be deployed, and the degree of in-memory replication of slices of data, in order to optimize reliability/availability and performance tradeoffs, is far from being a trivial task. Yet, it is an essential aspect of the provisioning process of cloud platforms, given that it has an impact on how well cloud resources are actually exploited. To cope with the issue of determining optimized configurations of cloud in-memory data stores, in this article we present a flexible simulation framework offering skeleton simulation models that can be easily specialized in order to capture the dynamics of diverse data grid systems, such as those related to the specific protocol used to provide data consistency and/or transactional guarantees. Besides its flexibility, another peculiar aspect of the framework lies in that it integrates simulation and machine-learning (black-box) techniques, the latter being essentially used to capture the dynamics of the data-exchange layer (e.g. the message passing layer) across the cache servers. This is a relevant aspect when considering that the actual data-transport/networking infrastructure on top of which the data grid is deployed might be unknown, hence being not feasible to be modeled via white-box (namely purely simulative) approaches. We also provide an extended experimental study aimed at validating instances of simulation models supported by our framework against execution dynamics of real data grid systems deployed on top of either private or public cloud infrastructures.
△ Less
Submitted 28 November, 2014;
originally announced November 2014.
-
On Bootstrap** Machine Learning Performance Predictors via Analytical Models
Authors:
Diego Didona,
Paolo Romano
Abstract:
Performance modeling typically relies on two antithetic methodologies: white box models, which exploit knowledge on system's internals and capture its dynamics using analytical approaches, and black box techniques, which infer relations among the input and output variables of a system based on the evidences gathered during an initial training phase. In this paper we investigate a technique, which…
▽ More
Performance modeling typically relies on two antithetic methodologies: white box models, which exploit knowledge on system's internals and capture its dynamics using analytical approaches, and black box techniques, which infer relations among the input and output variables of a system based on the evidences gathered during an initial training phase. In this paper we investigate a technique, which we name Bootstrap**, which aims at reconciling these two methodologies and at compensating the cons of the one with the pros of the other. We thoroughly analyze the design space of this gray box modeling technique, and identify a number of algorithmic and parametric trade-offs which we evaluate via two realistic case studies, a Key-Value Store and a Total Order Broadcast service.
△ Less
Submitted 19 October, 2014;
originally announced October 2014.
-
Exploiting Locality in Lease-Based Replicated Transactional Memory via Task Migration
Authors:
Danny Hendler,
Alex Naiman,
Sebastiano Peluso,
Francesco Quaglia,
Paolo Romano,
Adi Suissa
Abstract:
We present Lilac-TM, the first locality-aware Distributed Software Transactional Memory (DSTM) implementation. Lilac-TM is a fully decentralized lease-based replicated DSTM. It employs a novel self- optimizing lease circulation scheme based on the idea of dynamically determining whether to migrate transactions to the nodes that own the leases required for their validation, or to demand the acquisi…
▽ More
We present Lilac-TM, the first locality-aware Distributed Software Transactional Memory (DSTM) implementation. Lilac-TM is a fully decentralized lease-based replicated DSTM. It employs a novel self- optimizing lease circulation scheme based on the idea of dynamically determining whether to migrate transactions to the nodes that own the leases required for their validation, or to demand the acquisition of these leases by the node that originated the transaction. Our experimental evaluation establishes that Lilac-TM provides significant performance gains for distributed workloads exhibiting data locality, while typically incurring no overhead for non-data local workloads.
△ Less
Submitted 9 August, 2013;
originally announced August 2013.