Search | arXiv e-print repository

Zeno: A Scalable Capability-Based Secure Architecture

Authors: Alan Ehret, Jacob Abraham, Mihailo Isakov, Michel A. Kinsy

Abstract: Despite the numerous efforts of security researchers, memory vulnerabilities remain a top issue for modern computing systems. Capability-based solutions aim to solve whole classes of memory vulnerabilities at the hardware level by encoding access permissions with each memory reference. While some capability systems have seen commercial adoption, little work has been done to apply a capability mode… ▽ More Despite the numerous efforts of security researchers, memory vulnerabilities remain a top issue for modern computing systems. Capability-based solutions aim to solve whole classes of memory vulnerabilities at the hardware level by encoding access permissions with each memory reference. While some capability systems have seen commercial adoption, little work has been done to apply a capability model to datacenter-scale systems. Cloud and high-performance computing often require programs to share memory across many compute nodes. This presents a challenge for existing capability models, as capabilities must be enforceable across multiple nodes. Each node must agree on what access permissions a capability has and overheads of remote memory access must remain manageable. To address these challenges, we introduce Zeno, a new capability-based architecture. Zeno supports a Namespace-based capability model to support globally shareable capabilities in a large-scale, multi-node system. In this work, we describe the Zeno architecture, define Zeno's security properties, evaluate the scalability of Zeno as a large-scale capability architecture, and measure the hardware overhead with an FPGA implementation. △ Less

Submitted 21 August, 2022; originally announced August 2022.

Report number: R-V1-2022

arXiv:2204.08180 [pdf, other]

A Taxonomy of Error Sources in HPC I/O Machine Learning Models

Authors: Mihailo Isakov, Mikaela Currier, Eliakin del Rosario, Sandeep Madireddy, Prasanna Balaprakash, Philip Carns, Robert B. Ross, Glenn K. Lockwood, Michel A. Kinsy

Abstract: I/O efficiency is crucial to productivity in scientific computing, but the increasing complexity of the system and the applications makes it difficult for practitioners to understand and optimize I/O behavior at scale. Data-driven machine learning-based I/O throughput models offer a solution: they can be used to identify bottlenecks, automate I/O tuning, or optimize job scheduling with minimal hum… ▽ More I/O efficiency is crucial to productivity in scientific computing, but the increasing complexity of the system and the applications makes it difficult for practitioners to understand and optimize I/O behavior at scale. Data-driven machine learning-based I/O throughput models offer a solution: they can be used to identify bottlenecks, automate I/O tuning, or optimize job scheduling with minimal human intervention. Unfortunately, current state-of-the-art I/O models are not robust enough for production use and underperform after being deployed. We analyze multiple years of application, scheduler, and storage system logs on two leadership-class HPC platforms to understand why I/O models underperform in practice. We propose a taxonomy consisting of five categories of I/O modeling errors: poor application and system modeling, inadequate dataset coverage, I/O contention, and I/O noise. We develop litmus tests to quantify each category, allowing researchers to narrow down failure modes, enhance I/O throughput models, and improve future generations of HPC logging and analysis tools. △ Less

Submitted 18 April, 2022; originally announced April 2022.

Report number: STAM01

arXiv:2002.08339 [pdf, other]

NeuroFabric: Identifying Ideal Topologies for Training A Priori Sparse Networks

Authors: Mihailo Isakov, Michel A. Kinsy

Abstract: Long training times of deep neural networks are a bottleneck in machine learning research. The major impediment to fast training is the quadratic growth of both memory and compute requirements of dense and convolutional layers with respect to their information bandwidth. Recently, training `a priori' sparse networks has been proposed as a method for allowing layers to retain high information bandw… ▽ More Long training times of deep neural networks are a bottleneck in machine learning research. The major impediment to fast training is the quadratic growth of both memory and compute requirements of dense and convolutional layers with respect to their information bandwidth. Recently, training `a priori' sparse networks has been proposed as a method for allowing layers to retain high information bandwidth, while kee** memory and compute low. However, the choice of which sparse topology should be used in these networks is unclear. In this work, we provide a theoretical foundation for the choice of intra-layer topology. First, we derive a new sparse neural network initialization scheme that allows us to explore the space of very deep sparse networks. Next, we evaluate several topologies and show that seemingly similar topologies can often have a large difference in attainable accuracy. To explain these differences, we develop a data-free heuristic that can evaluate a topology independently from the dataset the network will be trained on. We then derive a set of requirements that make a good topology, and arrive at a single topology that satisfies all of them. △ Less

Submitted 19 February, 2020; originally announced February 2020.

Report number: Report-v05

arXiv:1912.01560 [pdf, other]

Drndalo: Lightweight Control Flow Obfuscation Through Minimal Processor/Compiler Co-Design

Authors: Novak Boskov, Mihailo Isakov, Michel A. Kinsy

Abstract: Binary analysis is traditionally used in the realm of malware detection. However, the same technique may be employed by an attacker to analyze the original binaries in order to reverse engineer them and extract exploitable weaknesses. When a binary is distributed to end users, it becomes a common remotely exploitable attack point. Code obfuscation is used to hinder reverse engineering of executabl… ▽ More Binary analysis is traditionally used in the realm of malware detection. However, the same technique may be employed by an attacker to analyze the original binaries in order to reverse engineer them and extract exploitable weaknesses. When a binary is distributed to end users, it becomes a common remotely exploitable attack point. Code obfuscation is used to hinder reverse engineering of executable programs. In this paper, we focus on securing binary distribution, where attackers gain access to binaries distributed to end devices, in order to reverse engineer them and find potential vulnerabilities. Attackers do not however have means to monitor the execution of said devices. In particular, we focus on the control flow obfuscation --- a technique that prevents an attacker from restoring the correct reachability conditions for the basic blocks of a program. By doing so, we thwart attackers in their effort to infer the inputs that cause the program to enter a vulnerable state (e.g., buffer overrun). We propose a compiler extension for obfuscation and a minimal hardware modification for dynamic deobfuscation that takes advantage of a secret key stored in hardware. We evaluate our experiments on the LLVM compiler toolchain and the BRISC-V open source processor. On PARSEC benchmarks, our deobfuscation technique incurs only a 5\% runtime overhead. We evaluate the security of Drndalo by training classifiers on pairs of obfuscated and unobfuscated binaries. Our results shine light on the difficulty of producing obfuscated binaries of arbitrary programs in such a way that they are statistically indistinguishable from plain binaries. △ Less

Submitted 29 November, 2019; originally announced December 2019.

Report number: BU - ECE - ASCS Laboratory 2019 Report v5

arXiv:1911.11934 [pdf, other]

doi 10.1016/j.adhoc.2018.09.007

A Secure and Robust Scheme for Sharing Confidential Information in IoT Systems

Authors: Lake Bu, Mihailo Isakov, Michel A. Kinsy

Abstract: In Internet of Things (IoT) systems with security demands, there is often a need to distribute sensitive information (such as encryption keys, digital signatures, or login credentials, etc.) among the devices, so that it can be retrieved for confidential purposes at a later moment. However, this information cannot be entrusted to any one device, since the failure of that device or an attack on it… ▽ More In Internet of Things (IoT) systems with security demands, there is often a need to distribute sensitive information (such as encryption keys, digital signatures, or login credentials, etc.) among the devices, so that it can be retrieved for confidential purposes at a later moment. However, this information cannot be entrusted to any one device, since the failure of that device or an attack on it will jeopardize the security of the entire network. Even if the information is divided among devices, there is still the danger that an attacker can compromise a group of devices and expose the sensitive information. In this work, we design and implement a secure and robust scheme to enable the distribution of sensitive information in IoT networks. The proposed approach has two important properties: (1) it uses Threshold Secret Sharing (TSS) to split the information into pieces distributed among all devices in the system - and so the information can only be retrieved collaboratively by groups of devices; and (2) it ensures the privacy and integrity of the information, even when attackers hijack a large number of devices and use them in concert - specifically, all the compromised devices can be identified, the confidentiality of information is kept, and authenticity of the secret can be guaranteed. △ Less

Submitted 26 November, 2019; originally announced November 2019.

Journal ref: Ad Hoc Networks, vol. 92, 2019 - Special Issue on Security of IoT-enabled Infrastructures in Smart Cities

arXiv:1911.11932 [pdf, other]

Survey of Attacks and Defenses on Edge-Deployed Neural Networks

Authors: Mihailo Isakov, Vijay Gadepally, Karen M. Gettings, Michel A. Kinsy

Abstract: Deep Neural Network (DNN) workloads are quickly moving from datacenters onto edge devices, for latency, privacy, or energy reasons. While datacenter networks can be protected using conventional cybersecurity measures, edge neural networks bring a host of new security challenges. Unlike classic IoT applications, edge neural networks are typically very compute and memory intensive, their execution i… ▽ More Deep Neural Network (DNN) workloads are quickly moving from datacenters onto edge devices, for latency, privacy, or energy reasons. While datacenter networks can be protected using conventional cybersecurity measures, edge neural networks bring a host of new security challenges. Unlike classic IoT applications, edge neural networks are typically very compute and memory intensive, their execution is data-independent, and they are robust to noise and faults. Neural network models may be very expensive to develop, and can potentially reveal information about the private data they were trained on, requiring special care in distribution. The hidden states and outputs of the network can also be used in reconstructing user inputs, potentially violating users' privacy. Furthermore, neural networks are vulnerable to adversarial attacks, which may cause misclassifications and violate the integrity of the output. These properties add challenges when securing edge-deployed DNNs, requiring new considerations, threat models, priorities, and approaches in securely and privately deploying DNNs to the edge. In this work, we cover the landscape of attacks on, and defenses, of neural networks deployed in edge devices and provide a taxonomy of attacks and defenses targeting edge DNNs. △ Less

Submitted 26 November, 2019; originally announced November 2019.

Journal ref: In the 2019 IEEE High Performance Extreme Computing Conference (HPEC), 2019

arXiv:1903.00841 [pdf, other]

CodeTrolley: Hardware-Assisted Control Flow Obfuscation

Authors: Novak Boskov, Mihailo Isakov, Michel A. Kinsy

Abstract: Many cybersecurity attacks rely on analyzing a binary executable to find exploitable sections of code. Code obfuscation is used to prevent attackers from reverse engineering these executables. In this work, we focus on control flow obfuscation - a technique that prevents attackers from statically determining which code segments are original, and which segments are added in to confuse attackers. We… ▽ More Many cybersecurity attacks rely on analyzing a binary executable to find exploitable sections of code. Code obfuscation is used to prevent attackers from reverse engineering these executables. In this work, we focus on control flow obfuscation - a technique that prevents attackers from statically determining which code segments are original, and which segments are added in to confuse attackers. We propose a RISC-V-based hardware-assisted deobfuscation technique that deobfuscates code at runtime based on a secret safely stored in hardware, along with an LLVM compiler extension for obfuscating binaries. Unlike conventional tools, our work does not rely on compiling hard-to-reverse-engineer code, but on securing a secret key. As such, it can be seen as a lightweight alternative to on-the-fly binary decryption. △ Less

Submitted 26 August, 2019; v1 submitted 3 March, 2019; originally announced March 2019.

Comments: 2019 Boston Area Architecture Workshop (BARC'19)

arXiv:1812.04077 [pdf, ps, other]

BRISC-V Emulator: A Standalone, Installation-Free, Browser-Based Teaching Tool

Authors: Mihailo Isakov, Michel A. Kinsy

Abstract: Many computer organization and computer architecture classes have recently started adopting the RISC-V architecture as an alternative to proprietary RISC ISAs and architectures. Emulators are a common teaching tool used to introduce students to writing assembly. We present the BRISC-V (Boston University RISC-V) Emulator and teaching tool, a RISC-V emulator inspired by existing RISC and CISC emulat… ▽ More Many computer organization and computer architecture classes have recently started adopting the RISC-V architecture as an alternative to proprietary RISC ISAs and architectures. Emulators are a common teaching tool used to introduce students to writing assembly. We present the BRISC-V (Boston University RISC-V) Emulator and teaching tool, a RISC-V emulator inspired by existing RISC and CISC emulators. The emulator is a web-based, pure javascript implementation meant to simplify deployment, as it does not require maintaining support for different operating systems or any installation. Here we present the workings, usage, and extensibility of the BRISC-V emulator. △ Less

Submitted 6 December, 2018; originally announced December 2018.

arXiv:1802.05100 [pdf, other]

SAPA: Self-Aware Polymorphic Architecture

Authors: Michel A. Kinsy, Mihailo Isakov, Alan Ehret, Donato Kava

Abstract: In this work, we introduce a Self-Aware Polymorphic Architecture (SAPA) design approach to support emerging context-aware applications and mitigate the programming challenges caused by the ever-increasing complexity and heterogeneity of high performance computing systems. Through the SAPA design, we examined the salient software-hardware features of adaptive computing systems that allow for (1) th… ▽ More In this work, we introduce a Self-Aware Polymorphic Architecture (SAPA) design approach to support emerging context-aware applications and mitigate the programming challenges caused by the ever-increasing complexity and heterogeneity of high performance computing systems. Through the SAPA design, we examined the salient software-hardware features of adaptive computing systems that allow for (1) the dynamic allocation of computing resources depending on program needs (e.g., the amount of parallelism in the program) and (2) automatic approximation to meet program and system goals (e.g., execution time budget, power constraints and computation resiliency) without the programming complexity of current multicore and many-core systems. The proposed adaptive computer architecture framework applies machine learning algorithms and control theory techniques to the application execution based on information collected about the system runtime performance trade-offs. It has heterogeneous reconfigurable cores with fast hardware-level migration capability, self-organizing memory structures and hierarchies, an adaptive application-aware network-on-chip, and a built-in hardware layer for dynamic, autonomous resource management. Our prototyped architecture performs extremely well on a large pool of applications. △ Less

Submitted 12 February, 2018; originally announced February 2018.

Comments: Boston Area Architecture 2018 Workshop (BARC18)

arXiv:1802.03885 [pdf, other]

ClosNets: a Priori Sparse Topologies for Faster DNN Training

Authors: Mihailo Isakov, Michel A. Kinsy

Abstract: Fully-connected layers in deep neural networks (DNN) are often the throughput and power bottleneck during training. This is due to their large size and low data reuse. Pruning dense layers can significantly reduce the size of these networks, but this approach can only be applied after training. In this work we propose a novel fully-connected layer that reduces the memory requirements of DNNs witho… ▽ More Fully-connected layers in deep neural networks (DNN) are often the throughput and power bottleneck during training. This is due to their large size and low data reuse. Pruning dense layers can significantly reduce the size of these networks, but this approach can only be applied after training. In this work we propose a novel fully-connected layer that reduces the memory requirements of DNNs without sacrificing accuracy. We replace a dense matrix with products of sparse matrices whose topologies we pick in advance. This allows us to: (1) train significantly smaller networks without a loss in accuracy, and (2) store the network weights without having to store connection indices. We therefore achieve significant training speedups due to the smaller network size, and a reduced amount of computation per epoch. We tested several sparse layer topologies and found that Clos networks perform well due to their high path diversity, shallowness, and high model accuracy. With the ClosNets, we are able to reduce dense layer sizes by as much as an order of magnitude without hurting model accuracy. △ Less

Submitted 11 February, 2018; originally announced February 2018.

Comments: Boston Area Architecture 2018 Workshop (BARC18)

arXiv:1710.09039 [pdf, other]

doi 10.1109/MWSCAS.2017.8053051

Janus: An Uncertain Cache Architecture to Cope with Side Channel Attacks

Authors: Hossein Hosseinzadeh, Mihailo Isakov, Mostafa Darabi, Ahmad Patooghy, Michel A. Kinsy

Abstract: Side channel attacks are a major class of attacks to crypto-systems. Attackers collect and analyze timing behavior, I/O data, or power consumption in these systems to undermine their effectiveness in protecting sensitive information. In this work, we propose a new cache architecture, called Janus, to enable crypto-systems to introduce randomization and uncertainty in their runtime timing behavior… ▽ More Side channel attacks are a major class of attacks to crypto-systems. Attackers collect and analyze timing behavior, I/O data, or power consumption in these systems to undermine their effectiveness in protecting sensitive information. In this work, we propose a new cache architecture, called Janus, to enable crypto-systems to introduce randomization and uncertainty in their runtime timing behavior and power utilization profile. In the proposed cache architecture, each data block is equipped with an on-off flag to enable/disable the data block. The Janus architecture has two special instructions in its instruction set to support the on-off flag. Beside the analytical evaluation of the proposed cache architecture, we deploy it in an ARM-7 processor core to study its feasibility and practicality. Results show a significant variation in the timing behavior across all the benchmarks. The new secure processor architecture has minimal hardware overhead and significant improvement in protecting against power analysis and timing behavior attacks. △ Less

Submitted 24 October, 2017; originally announced October 2017.

Comments: 4 pages, 4 figures

Journal ref: 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS)

arXiv:1702.02313 [pdf, other]

FASHION: Fault-Aware Self-Healing Intelligent On-chip Network

Authors: Pengju Ren, Michel A. Kinsy, Mengjiao Zhu, Shreeya Khadka, Mihailo Isakov, Aniruddh Ramrakhyani, Tushar Krishna, Nanning Zheng

Abstract: To avoid packet loss and deadlock scenarios that arise due to faults or power gating in multicore and many-core systems, the network-on-chip needs to possess resilient communication and load-balancing properties. In this work, we introduce the Fashion router, a self-monitoring and self-reconfiguring design that allows for the on-chip network to dynamically adapt to component failures. First, we in… ▽ More To avoid packet loss and deadlock scenarios that arise due to faults or power gating in multicore and many-core systems, the network-on-chip needs to possess resilient communication and load-balancing properties. In this work, we introduce the Fashion router, a self-monitoring and self-reconfiguring design that allows for the on-chip network to dynamically adapt to component failures. First, we introduce a distributed intelligence unit, called Self-Awareness Module (SAM), which allows the router to detect permanent component failures and build a network connectivity map. Using local information, SAM adapts to faults, guarantees connectivity and deadlock-free routing inside the maximal connected subgraph and keeps routing tables up-to-date. Next, to reconfigure network links or virtual channels around faulty/power-gated components, we add bidirectional link and unified virtual channel structure features to the Fashion router. This version of the router, named Ex-Fashion, further mitigates the negative system performance impacts, leads to larger maximal connected subgraph and sustains a relatively high degree of fault-tolerance. To support the router, we develop a fault diagnosis and recovery algorithm executed by the Built-In Self-Test, self-monitoring, and self-reconfiguration units at runtime to provide fault-tolerant system functionalities. The Fashion router places no restriction on topology, position or number of faults. It drops 54.3-55.4% fewer nodes for same number of faults (between 30 and 60 faults) in an 8x8 2D-mesh over other state-of-the-art solutions. It is scalable and efficient. The area overheads are 2.311% and 2.659% when implemented in 8x8 and 16x16 2D-meshes using the TSMC 65nm library at 1.38GHz clock frequency. △ Less

Submitted 8 February, 2017; originally announced February 2017.

Comments: 14 pages, 12 figures

Showing 1–12 of 12 results for author: Isakov, M