Search | arXiv e-print repository

arXiv:2206.01901 [pdf, other]

Enabling Heterogeneous, Multicore SoC Research with RISC-V and ESP

Authors: Joseph Zuckerman, Paolo Mantovani, Davide Giri, Luca P. Carloni

Abstract: Heterogeneous, multicore SoC architectures are a critical component of today's computing landscape. However, supporting both increasing heterogeneity and multicore execution are significant design challenges. Meanwhile, the growing RISC-V and open-source hardware (OSH) movements have resulted in an increased number of open-source RISC-V processor implementations; however, there are fewer open sour… ▽ More Heterogeneous, multicore SoC architectures are a critical component of today's computing landscape. However, supporting both increasing heterogeneity and multicore execution are significant design challenges. Meanwhile, the growing RISC-V and open-source hardware (OSH) movements have resulted in an increased number of open-source RISC-V processor implementations; however, there are fewer open source SoC design platforms that integrate these processor cores. We present modifications to ESP, an open-source SoC design platform, to enable multicore execution with the RISC-V CVA6 processor. Our implementation is modular and based on standardized interfaces. These properties simplify the integration of new cores. Our modifications enable RISC-V-based SoCs designed with ESP for FPGA to boot Linux SMP and execute multithreaded applications. Coupled with ESP's emphasis on accelerator-centric architectures, our contributions enable the seamless design of a wide range of heterogeneous, multicore SoCs. △ Less

Submitted 4 June, 2022; originally announced June 2022.

Comments: To appear in the Sixth Workshop on Computer Architecture Research with RISC-V (CARRV 2022)

arXiv:2109.06382 [pdf, other]

doi 10.1145/3466752.3480065

Cohmeleon: Learning-Based Orchestration of Accelerator Coherence in Heterogeneous SoCs

Authors: Joseph Zuckerman, Davide Giri, Jihye Kwon, Paolo Mantovani, Luca P. Carloni

Abstract: One of the most critical aspects of integrating loosely-coupled accelerators in heterogeneous SoC architectures is orchestrating their interactions with the memory hierarchy, especially in terms of navigating the various cache-coherence options: from accelerators accessing off-chip memory directly, bypassing the cache hierarchy, to accelerators having their own private cache. By running real-size… ▽ More One of the most critical aspects of integrating loosely-coupled accelerators in heterogeneous SoC architectures is orchestrating their interactions with the memory hierarchy, especially in terms of navigating the various cache-coherence options: from accelerators accessing off-chip memory directly, bypassing the cache hierarchy, to accelerators having their own private cache. By running real-size applications on FPGA-based prototypes of many-accelerator multi-core SoCs, we show that the best cache-coherence mode for a given accelerator varies at runtime, depending on the accelerator's characteristics, the workload size, and the overall SoC status. Cohmeleon applies reinforcement learning to select the best coherence mode for each accelerator dynamically at runtime, as opposed to statically at design time. It makes these selections adaptively, by continuously observing the system and measuring its performance. Cohmeleon is accelerator-agnostic, architecture-independent, and it requires minimal hardware support. Cohmeleon is also transparent to application programmers and has a negligible software overhead. FPGA-based experiments show that our runtime approach offers, on average, a 38% speedup with a 66% reduction of off-chip memory accesses compared to state-of-the-art design-time approaches. Moreover, it can match runtime solutions that are manually tuned for the target architecture. △ Less

Submitted 13 September, 2021; originally announced September 2021.

Comments: To appear in the 54th IEEE/ACM Symposium on Microarchitecture (MICRO 2021)

arXiv:2009.01178 [pdf, other]

doi 10.1145/3400302.3415753

Agile SoC Development with Open ESP

Authors: Paolo Mantovani, Davide Giri, Giuseppe Di Guglielmo, Luca Piccolboni, Joseph Zuckerman, Emilio G. Cota, Michele Petracca, Christian Pilato, Luca P. Carloni

Abstract: ESP is an open-source research platform for heterogeneous SoC design. The platform combines a modular tile-based architecture with a variety of application-oriented flows for the design and optimization of accelerators. The ESP architecture is highly scalable and strikes a balance between regularity and specialization. The companion methodology raises the level of abstraction to system-level desig… ▽ More ESP is an open-source research platform for heterogeneous SoC design. The platform combines a modular tile-based architecture with a variety of application-oriented flows for the design and optimization of accelerators. The ESP architecture is highly scalable and strikes a balance between regularity and specialization. The companion methodology raises the level of abstraction to system-level design and enables an automated flow from software and hardware development to full-system prototy** on FPGA. For application developers, ESP offers domain-specific automated solutions to synthesize new accelerators for their software and to map complex workloads onto the SoC architecture. For hardware engineers, ESP offers automated solutions to integrate their accelerator designs into the complete SoC. Conceived as a heterogeneous integration platform and tested through years of teaching at Columbia University, ESP supports the open-source hardware community by providing a flexible platform for agile SoC development. △ Less

Submitted 2 September, 2020; originally announced September 2020.

Comments: Invited Paper at the 2020 International Conference On Computer Aided Design (ICCAD) - Special Session on Opensource Tools and Platforms for Agile Development of Specialized Architectures

arXiv:2004.03640 [pdf, other]

doi 10.23919/DATE48585.2020.9116317

ESP4ML: Platform-Based Design of Systems-on-Chip for Embedded Machine Learning

Authors: Davide Giri, Kuan-Lin Chiu, Giuseppe Di Guglielmo, Paolo Mantovani, Luca P. Carloni

Abstract: We present ESP4ML, an open-source system-level design flow to build and program SoC architectures for embedded applications that require the hardware acceleration of machine learning and signal processing algorithms. We realized ESP4ML by combining two established open-source projects (ESP and HLS4ML) into a new, fully-automated design flow. For the SoC integration of accelerators generated by HLS… ▽ More We present ESP4ML, an open-source system-level design flow to build and program SoC architectures for embedded applications that require the hardware acceleration of machine learning and signal processing algorithms. We realized ESP4ML by combining two established open-source projects (ESP and HLS4ML) into a new, fully-automated design flow. For the SoC integration of accelerators generated by HLS4ML, we designed a set of new parameterized interface circuits synthesizable with high-level synthesis. For accelerator configuration and management, we developed an embedded software runtime system on top of Linux. With this HW/SW layer, we addressed the challenge of dynamically sha** the data traffic on a network-on-chip to activate and support the reconfigurable pipelines of accelerators that are needed by the application workloads currently running on the SoC. We demonstrate our vertically-integrated contributions with the FPGA-based implementations of complete SoC instances booting Linux and executing computer-vision applications that process images taken from the Google Street View database. △ Less

Submitted 18 June, 2020; v1 submitted 7 April, 2020; originally announced April 2020.

Comments: Paper published in the proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Journal ref: Design, Automation and Test in Europe Conference & Exhibition (DATE), Grenoble, France, 2020, pp. 1049-1054

arXiv:1912.10823 [pdf, other]

doi 10.1145/3126566

COSMOS: Coordination of High-Level Synthesis and Memory Optimization for Hardware Accelerators

Authors: Luca Piccolboni, Paolo Mantovani, Giuseppe Di Guglielmo, Luca P. Carloni

Abstract: Hardware accelerators are key to the efficiency and performance of system-on-chip (SoC) architectures. With high-level synthesis (HLS), designers can easily obtain several performance-cost trade-off implementations for each component of a complex hardware accelerator. However, navigating this design space in search of the Pareto-optimal implementations at the system level is a hard optimization ta… ▽ More Hardware accelerators are key to the efficiency and performance of system-on-chip (SoC) architectures. With high-level synthesis (HLS), designers can easily obtain several performance-cost trade-off implementations for each component of a complex hardware accelerator. However, navigating this design space in search of the Pareto-optimal implementations at the system level is a hard optimization task. We present COSMOS, an automatic methodology for the design-space exploration (DSE) of complex accelerators, that coordinates both HLS and memory optimization tools in a compositional way. First, thanks to the co-design of datapath and memory, COSMOS produces a large set of Pareto-optimal implementations for each component of the accelerator. Then, COSMOS leverages compositional design techniques to quickly converge to the desired trade-off point between cost and performance at the system level. When applied to the system-level design (SLD) of an accelerator for wide-area motion imagery (WAMI), COSMOS explores the design space as completely as an exhaustive search, but it reduces the number of invocations to the HLS tool by up to 14.6x. △ Less

Submitted 18 December, 2019; originally announced December 2019.

Comments: Published in ACM Transactions on Embedded Computing Systems (TECS)

Journal ref: ACM Trans. Embed. Comput. Syst. 16, 5s, Article 150 (October 2017)

Showing 1–5 of 5 results for author: Mantovani, P