-
Neural network accelerator for quantum control
Authors:
David Xu,
A. Barış Özgüler,
Giuseppe Di Guglielmo,
Nhan Tran,
Gabriel N. Perdue,
Luca Carloni,
Farah Fahim
Abstract:
Efficient quantum control is necessary for practical quantum computing implementations with current technologies. Conventional algorithms for determining optimal control parameters are computationally expensive, largely excluding them from use outside of the simulation. Existing hardware solutions structured as lookup tables are imprecise and costly. By designing a machine learning model to approx…
▽ More
Efficient quantum control is necessary for practical quantum computing implementations with current technologies. Conventional algorithms for determining optimal control parameters are computationally expensive, largely excluding them from use outside of the simulation. Existing hardware solutions structured as lookup tables are imprecise and costly. By designing a machine learning model to approximate the results of traditional tools, a more efficient method can be produced. Such a model can then be synthesized into a hardware accelerator for use in quantum systems. In this study, we demonstrate a machine learning algorithm for predicting optimal pulse parameters. This algorithm is lightweight enough to fit on a low-resource FPGA and perform inference with a latency of 175 ns and pipeline interval of 5 ns with $~>~$0.99 gate fidelity. In the long term, such an accelerator could be used near quantum computing hardware where traditional computers cannot operate, enabling quantum control at a reasonable cost at low latencies without incurring large data bandwidths outside of the cryogenic environment.
△ Less
Submitted 18 October, 2022; v1 submitted 4 August, 2022;
originally announced August 2022.
-
Enabling Heterogeneous, Multicore SoC Research with RISC-V and ESP
Authors:
Joseph Zuckerman,
Paolo Mantovani,
Davide Giri,
Luca P. Carloni
Abstract:
Heterogeneous, multicore SoC architectures are a critical component of today's computing landscape. However, supporting both increasing heterogeneity and multicore execution are significant design challenges. Meanwhile, the growing RISC-V and open-source hardware (OSH) movements have resulted in an increased number of open-source RISC-V processor implementations; however, there are fewer open sour…
▽ More
Heterogeneous, multicore SoC architectures are a critical component of today's computing landscape. However, supporting both increasing heterogeneity and multicore execution are significant design challenges. Meanwhile, the growing RISC-V and open-source hardware (OSH) movements have resulted in an increased number of open-source RISC-V processor implementations; however, there are fewer open source SoC design platforms that integrate these processor cores. We present modifications to ESP, an open-source SoC design platform, to enable multicore execution with the RISC-V CVA6 processor. Our implementation is modular and based on standardized interfaces. These properties simplify the integration of new cores. Our modifications enable RISC-V-based SoCs designed with ESP for FPGA to boot Linux SMP and execute multithreaded applications. Coupled with ESP's emphasis on accelerator-centric architectures, our contributions enable the seamless design of a wide range of heterogeneous, multicore SoCs.
△ Less
Submitted 4 June, 2022;
originally announced June 2022.
-
Cohmeleon: Learning-Based Orchestration of Accelerator Coherence in Heterogeneous SoCs
Authors:
Joseph Zuckerman,
Davide Giri,
Jihye Kwon,
Paolo Mantovani,
Luca P. Carloni
Abstract:
One of the most critical aspects of integrating loosely-coupled accelerators in heterogeneous SoC architectures is orchestrating their interactions with the memory hierarchy, especially in terms of navigating the various cache-coherence options: from accelerators accessing off-chip memory directly, bypassing the cache hierarchy, to accelerators having their own private cache. By running real-size…
▽ More
One of the most critical aspects of integrating loosely-coupled accelerators in heterogeneous SoC architectures is orchestrating their interactions with the memory hierarchy, especially in terms of navigating the various cache-coherence options: from accelerators accessing off-chip memory directly, bypassing the cache hierarchy, to accelerators having their own private cache. By running real-size applications on FPGA-based prototypes of many-accelerator multi-core SoCs, we show that the best cache-coherence mode for a given accelerator varies at runtime, depending on the accelerator's characteristics, the workload size, and the overall SoC status.
Cohmeleon applies reinforcement learning to select the best coherence mode for each accelerator dynamically at runtime, as opposed to statically at design time. It makes these selections adaptively, by continuously observing the system and measuring its performance. Cohmeleon is accelerator-agnostic, architecture-independent, and it requires minimal hardware support. Cohmeleon is also transparent to application programmers and has a negligible software overhead. FPGA-based experiments show that our runtime approach offers, on average, a 38% speedup with a 66% reduction of off-chip memory accesses compared to state-of-the-art design-time approaches. Moreover, it can match runtime solutions that are manually tuned for the target architecture.
△ Less
Submitted 13 September, 2021;
originally announced September 2021.
-
Scaling Up Hardware Accelerator Verification using A-QED with Functional Decomposition
Authors:
Saranyu Chattopadhyay,
Florian Lonsing,
Luca Piccolboni,
Deepraj Soni,
Peng Wei,
Xiaofan Zhang,
Yuan Zhou,
Luca Carloni,
Deming Chen,
Jason Cong,
Ramesh Karri,
Zhiru Zhang,
Caroline Trippel,
Clark Barrett,
Subhasish Mitra
Abstract:
Hardware accelerators (HAs) are essential building blocks for fast and energy-efficient computing systems. Accelerator Quick Error Detection (A-QED) is a recent formal technique which uses Bounded Model Checking for pre-silicon verification of HAs. A-QED checks an HA for self-consistency, i.e., whether identical inputs within a sequence of operations always produce the same output. Under modest as…
▽ More
Hardware accelerators (HAs) are essential building blocks for fast and energy-efficient computing systems. Accelerator Quick Error Detection (A-QED) is a recent formal technique which uses Bounded Model Checking for pre-silicon verification of HAs. A-QED checks an HA for self-consistency, i.e., whether identical inputs within a sequence of operations always produce the same output. Under modest assumptions, A-QED is both sound and complete. However, as is well-known, large design sizes significantly limit the scalability of formal verification, including A-QED. We overcome this scalability challenge through a new decomposition technique for A-QED, called A-QED with Decomposition (A-QED$^2$). A-QED$^2$ systematically decomposes an HA into smaller, functional sub-modules, called sub-accelerators, which are then verified independently using A-QED. We prove completeness of A-QED$^2$; in particular, if the full HA under verification contains a bug, then A-QED$^2$ ensures detection of that bug during A-QED verification of the corresponding sub-accelerators. Results on over 100 (buggy) versions of a wide variety of HAs with millions of logic gates demonstrate the effectiveness and practicality of A-QED$^2$.
△ Less
Submitted 17 August, 2021; v1 submitted 13 August, 2021;
originally announced August 2021.
-
hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices
Authors:
Farah Fahim,
Benjamin Hawks,
Christian Herwig,
James Hirschauer,
Sergo **dariani,
Nhan Tran,
Luca P. Carloni,
Giuseppe Di Guglielmo,
Philip Harris,
Jeffrey Krupa,
Dylan Rankin,
Manuel Blanco Valentin,
Josiah Hester,
Yingyi Luo,
John Mamish,
Seda Orgrenci-Memik,
Thea Aarrestad,
Hamza Javed,
Vladimir Loncar,
Maurizio Pierini,
Adrian Alan Pol,
Sioni Summers,
Javier Duarte,
Scott Hauck,
Shih-Chieh Hsu
, et al. (5 additional authors not shown)
Abstract:
Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains. In scientific domains, real-time near-sensor processing can drastically improve experimental design and accelerate scientific discoveries. To support domain scientists, we have developed hls4ml, an open-source software-h…
▽ More
Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains. In scientific domains, real-time near-sensor processing can drastically improve experimental design and accelerate scientific discoveries. To support domain scientists, we have developed hls4ml, an open-source software-hardware codesign workflow to interpret and translate machine learning algorithms for implementation with both FPGA and ASIC technologies. We expand on previous hls4ml work by extending capabilities and techniques towards low-power implementations and increased usability: new Python APIs, quantization-aware pruning, end-to-end FPGA workflows, long pipeline kernels for low power, and new device backends include an ASIC workflow. Taken together, these and continued efforts in hls4ml will arm a new generation of domain scientists with accessible, efficient, and powerful tools for machine-learning-accelerated discovery.
△ Less
Submitted 23 March, 2021; v1 submitted 9 March, 2021;
originally announced March 2021.
-
DB4HLS: A Database of High-Level Synthesis Design Space Explorations
Authors:
Lorenzo Ferretti,
Jihye Kwon,
Giovanni Ansaloni,
Giuseppe Di Guglielmo,
Luca Carloni,
Laura Pozzi
Abstract:
High-Level Synthesis (HLS) frameworks allow to easily specify a large number of variants of the same hardware design by only acting on optimization directives. Nonetheless, the hardware synthesis of implementations for all possible combinations of directive values is impractical even for simple designs. Addressing this shortcoming, many HLS Design Space Exploration (DSE) strategies have been propo…
▽ More
High-Level Synthesis (HLS) frameworks allow to easily specify a large number of variants of the same hardware design by only acting on optimization directives. Nonetheless, the hardware synthesis of implementations for all possible combinations of directive values is impractical even for simple designs. Addressing this shortcoming, many HLS Design Space Exploration (DSE) strategies have been proposed to devise directive settings leading to high-quality implementations while limiting the number of synthesis runs. All these works require considerable efforts to validate the proposed strategies and/or to build the knowledge base employed to tune abstract models, as both tasks mandate the syntheses of large collections of implementations. Currently, such data gathering is performed ad-hoc, a) leading to a lack of standardization, hampering comparisons between DSE alternatives, and b) posing a very high burden to researchers willing to develop novel DSE strategies. Against this backdrop, we here introduce DB4HLS, a database of exhaustive HLS explorations comprising more than 100000 design points collected over 4 years of synthesis time. The open structure of DB4HLS allows the incremental integration of new DSEs, which can be easily defined with a dedicated domain-specific language. We think that of our database, available at https://www.db4hls.inf.usi.ch/, will be a valuable tool for the research community investigating automated strategies for the optimization of HLS-based hardware designs.
△ Less
Submitted 3 January, 2021;
originally announced January 2021.
-
Agile SoC Development with Open ESP
Authors:
Paolo Mantovani,
Davide Giri,
Giuseppe Di Guglielmo,
Luca Piccolboni,
Joseph Zuckerman,
Emilio G. Cota,
Michele Petracca,
Christian Pilato,
Luca P. Carloni
Abstract:
ESP is an open-source research platform for heterogeneous SoC design. The platform combines a modular tile-based architecture with a variety of application-oriented flows for the design and optimization of accelerators. The ESP architecture is highly scalable and strikes a balance between regularity and specialization. The companion methodology raises the level of abstraction to system-level desig…
▽ More
ESP is an open-source research platform for heterogeneous SoC design. The platform combines a modular tile-based architecture with a variety of application-oriented flows for the design and optimization of accelerators. The ESP architecture is highly scalable and strikes a balance between regularity and specialization. The companion methodology raises the level of abstraction to system-level design and enables an automated flow from software and hardware development to full-system prototy** on FPGA. For application developers, ESP offers domain-specific automated solutions to synthesize new accelerators for their software and to map complex workloads onto the SoC architecture. For hardware engineers, ESP offers automated solutions to integrate their accelerator designs into the complete SoC. Conceived as a heterogeneous integration platform and tested through years of teaching at Columbia University, ESP supports the open-source hardware community by providing a flexible platform for agile SoC development.
△ Less
Submitted 2 September, 2020;
originally announced September 2020.
-
CRYLOGGER: Detecting Crypto Misuses Dynamically
Authors:
Luca Piccolboni,
Giuseppe Di Guglielmo,
Luca P. Carloni,
Simha Sethumadhavan
Abstract:
Cryptographic (crypto) algorithms are the essential ingredients of all secure systems: crypto hash functions and encryption algorithms, for example, can guarantee properties such as integrity and confidentiality. Developers, however, can misuse the application programming interfaces (API) of such algorithms by using constant keys and weak passwords. This paper presents CRYLOGGER, the first open-so…
▽ More
Cryptographic (crypto) algorithms are the essential ingredients of all secure systems: crypto hash functions and encryption algorithms, for example, can guarantee properties such as integrity and confidentiality. Developers, however, can misuse the application programming interfaces (API) of such algorithms by using constant keys and weak passwords. This paper presents CRYLOGGER, the first open-source tool to detect crypto misuses dynamically. CRYLOGGER logs the parameters that are passed to the crypto APIs during the execution and checks their legitimacy offline by using a list of crypto rules. We compare CRYLOGGER with CryptoGuard, one of the most effective static tools to detect crypto misuses. We show that our tool complements the results of CryptoGuard, making the case for combining static and dynamic approaches. We analyze 1780 popular Android apps downloaded from the Google Play Store to show that CRYLOGGER can detect crypto misuses on thousands of apps dynamically and automatically. We reverse-engineer 28 Android apps and confirm the issues flagged by CRYLOGGER. We also disclose the most critical vulnerabilities to app developers and collect their feedback.
△ Less
Submitted 2 July, 2020;
originally announced July 2020.
-
The MosaicSim Simulator (Full Technical Report)
Authors:
Opeoluwa Matthews,
Aninda Manocha,
Davide Giri,
Marcelo Orenes-Vera,
Esin Tureci,
Tyler Sorensen,
Tae Jun Ham,
Juan L. Aragón,
Luca P. Carloni,
Margaret Martonosi
Abstract:
As Moore's Law has slowed and Dennard Scaling has ended, architects are increasingly turning to heterogeneous parallelism and domain-specific hardware-software co-designs. These trends present new challenges for simulation-based performance assessments that are central to early-stage architectural exploration. Simulators must be lightweight to support rich heterogeneous combinations of general pur…
▽ More
As Moore's Law has slowed and Dennard Scaling has ended, architects are increasingly turning to heterogeneous parallelism and domain-specific hardware-software co-designs. These trends present new challenges for simulation-based performance assessments that are central to early-stage architectural exploration. Simulators must be lightweight to support rich heterogeneous combinations of general purpose cores and specialized processing units. They must also support agile exploration of hardware-software co-design, i.e. changes in the programming model, compiler, ISA, and specialized hardware.
To meet these challenges, we introduce MosaicSim, a lightweight, modular simulator for heterogeneous systems, offering accuracy and agility designed specifically for hardware-software co-design explorations. By integrating the LLVM toolchain, MosaicSim enables efficient modeling of instruction dependencies and flexible additions across the stack. Its modularity also allows the composition and integration of different hardware components. We first demonstrate that MosaicSim captures architectural bottlenecks in applications, and accurately models both scaling trends in a multicore setting and accelerator behavior. We then present two case-studies where MosaicSim enables straightforward design space explorations for emerging systems, i.e. data science application acceleration and heterogeneous parallel architectures.
△ Less
Submitted 15 April, 2020;
originally announced April 2020.
-
ESP4ML: Platform-Based Design of Systems-on-Chip for Embedded Machine Learning
Authors:
Davide Giri,
Kuan-Lin Chiu,
Giuseppe Di Guglielmo,
Paolo Mantovani,
Luca P. Carloni
Abstract:
We present ESP4ML, an open-source system-level design flow to build and program SoC architectures for embedded applications that require the hardware acceleration of machine learning and signal processing algorithms. We realized ESP4ML by combining two established open-source projects (ESP and HLS4ML) into a new, fully-automated design flow. For the SoC integration of accelerators generated by HLS…
▽ More
We present ESP4ML, an open-source system-level design flow to build and program SoC architectures for embedded applications that require the hardware acceleration of machine learning and signal processing algorithms. We realized ESP4ML by combining two established open-source projects (ESP and HLS4ML) into a new, fully-automated design flow. For the SoC integration of accelerators generated by HLS4ML, we designed a set of new parameterized interface circuits synthesizable with high-level synthesis. For accelerator configuration and management, we developed an embedded software runtime system on top of Linux. With this HW/SW layer, we addressed the challenge of dynamically sha** the data traffic on a network-on-chip to activate and support the reconfigurable pipelines of accelerators that are needed by the application workloads currently running on the SoC. We demonstrate our vertically-integrated contributions with the FPGA-based implementations of complete SoC instances booting Linux and executing computer-vision applications that process images taken from the Google Street View database.
△ Less
Submitted 18 June, 2020; v1 submitted 7 April, 2020;
originally announced April 2020.
-
PAGURUS: Low-Overhead Dynamic Information Flow Tracking on Loosely Coupled Accelerators
Authors:
Luca Piccolboni,
Giuseppe Di Guglielmo,
Luca P. Carloni
Abstract:
Software-based attacks exploit bugs or vulnerabilities to get unauthorized access or leak confidential information. Dynamic information flow tracking (DIFT) is a security technique to track spurious information flows and provide strong security guarantees against such attacks. To secure heterogeneous systems, the spurious information flows must be tracked through all their components, including pr…
▽ More
Software-based attacks exploit bugs or vulnerabilities to get unauthorized access or leak confidential information. Dynamic information flow tracking (DIFT) is a security technique to track spurious information flows and provide strong security guarantees against such attacks. To secure heterogeneous systems, the spurious information flows must be tracked through all their components, including processors, accelerators (i.e., application-specific hardware components) and memories. We present PAGURUS, a flexible methodology to design a low-overhead shell circuit that adds DIFT support to accelerators. The shell uses a coarse-grain DIFT approach, thus not requiring to make modifications to the accelerator's implementation. We analyze the performance and area overhead of the DIFT shell on FPGAs and we propose a metric, called information leakage, to measure its security guarantees. We perform a design-space exploration to show that we can synthesize accelerators with different characteristics in terms of performance, cost and security guarantees. We also present a case study where we use the DIFT shell to secure an accelerator running on a embedded platform with a DIFT-enhanced RISC-V core.
△ Less
Submitted 18 December, 2019;
originally announced December 2019.
-
COSMOS: Coordination of High-Level Synthesis and Memory Optimization for Hardware Accelerators
Authors:
Luca Piccolboni,
Paolo Mantovani,
Giuseppe Di Guglielmo,
Luca P. Carloni
Abstract:
Hardware accelerators are key to the efficiency and performance of system-on-chip (SoC) architectures. With high-level synthesis (HLS), designers can easily obtain several performance-cost trade-off implementations for each component of a complex hardware accelerator. However, navigating this design space in search of the Pareto-optimal implementations at the system level is a hard optimization ta…
▽ More
Hardware accelerators are key to the efficiency and performance of system-on-chip (SoC) architectures. With high-level synthesis (HLS), designers can easily obtain several performance-cost trade-off implementations for each component of a complex hardware accelerator. However, navigating this design space in search of the Pareto-optimal implementations at the system level is a hard optimization task. We present COSMOS, an automatic methodology for the design-space exploration (DSE) of complex accelerators, that coordinates both HLS and memory optimization tools in a compositional way. First, thanks to the co-design of datapath and memory, COSMOS produces a large set of Pareto-optimal implementations for each component of the accelerator. Then, COSMOS leverages compositional design techniques to quickly converge to the desired trade-off point between cost and performance at the system level. When applied to the system-level design (SLD) of an accelerator for wide-area motion imagery (WAMI), COSMOS explores the design space as completely as an exhaustive search, but it reduces the number of invocations to the HLS tool by up to 14.6x.
△ Less
Submitted 18 December, 2019;
originally announced December 2019.
-
Securing Accelerators with Dynamic Information Flow Tracking
Authors:
Luca Piccolboni,
Giuseppe Di Guglielmo,
Luca Carloni
Abstract:
Systems-on-chip (SoCs) are becoming heterogeneous: they combine general-purpose processor cores with application-specific hardware components, also known as accelerators, to improve performance and energy efficiency. The advantages of heterogeneity, however, come at a price of threatening security. The architectural dissimilarities of processors and accelerators require revisiting the current secu…
▽ More
Systems-on-chip (SoCs) are becoming heterogeneous: they combine general-purpose processor cores with application-specific hardware components, also known as accelerators, to improve performance and energy efficiency. The advantages of heterogeneity, however, come at a price of threatening security. The architectural dissimilarities of processors and accelerators require revisiting the current security techniques. With this hardware demo, we show how accelerators can break dynamic information flow tracking (DIFT), a well-known security technique that protects systems against software-based attacks. We also describe how the security guarantees of DIFT can be re-established with a hardware solution that has low performance and area penalties.
△ Less
Submitted 19 February, 2019;
originally announced March 2019.