-
Impact of gate-level clustering on automated system partitioning of 3D-ICs
Authors:
Quentin Delhaye,
Eric Beyne,
Joël Goossens,
Geert Van der Plas,
Dragomir Milojevic
Abstract:
When partitioning gate-level netlists using graphs, it is beneficial to cluster gates to reduce the order of the graph and preserve some characteristics of the circuit that the partitioning might degrade. Gate clustering is even more important for netlist partitioning targeting 3D system integration. In this paper, we make the argument that the choice of clustering method for 3D-ICs partitioning i…
▽ More
When partitioning gate-level netlists using graphs, it is beneficial to cluster gates to reduce the order of the graph and preserve some characteristics of the circuit that the partitioning might degrade. Gate clustering is even more important for netlist partitioning targeting 3D system integration. In this paper, we make the argument that the choice of clustering method for 3D-ICs partitioning is not trivial and deserves careful consideration. To support our claim, we implemented three clustering methods that were used prior to partitioning two synthetic designs representing two extremes of the circuits medium/long interconnect diversity spectrum. Automatically partitioned netlists are then placed and routed in 3D to compare the impact of clustering methods on several metrics. From our experiments, we see that the clustering method indeed has a different impact depending on the design considered and that a circuit-blind, universal partitioning method is not the way to go, with wire-length savings of up to 31%, total power of up to 22%, and effective frequency of up to 15% compared to other methods. Furthermore, we highlight that 3D-ICs open new opportunities to design systems with a denser interconnect, drastically reducing the design utilization of circuits that would not be considered viable in 2D.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
MemPool-3D: Boosting Performance and Efficiency of Shared-L1 Memory Many-Core Clusters with 3D Integration
Authors:
Matheus Cavalcante,
Anthony Agnesina,
Samuel Riedel,
Moritz Brunion,
Alberto Garcia-Ortiz,
Dragomir Milojevic,
Francky Catthoor,
Sung Kyu Lim,
Luca Benini
Abstract:
Three-dimensional integrated circuits promise power, performance, and footprint gains compared to their 2D counterparts, thanks to drastic reductions in the interconnects' length through their smaller form factor. We can leverage the potential of 3D integration by enhancing MemPool, an open-source many-core design with 256 cores and a shared pool of L1 scratchpad memory connected with a low-latenc…
▽ More
Three-dimensional integrated circuits promise power, performance, and footprint gains compared to their 2D counterparts, thanks to drastic reductions in the interconnects' length through their smaller form factor. We can leverage the potential of 3D integration by enhancing MemPool, an open-source many-core design with 256 cores and a shared pool of L1 scratchpad memory connected with a low-latency interconnect. MemPool's baseline 2D design is severely limited by routing congestion and wire propagation delay, making the design ideal for 3D integration. In architectural terms, we increase MemPool's scratchpad memory capacity beyond the sweet spot for 2D designs, improving performance in a common digital signal processing kernel. We propose a 3D MemPool design that leverages a smart partitioning of the memory resources across two layers to balance the size and utilization of the stacked dies. In this paper, we explore the architectural and the technology parameter spaces by analyzing the power, performance, area, and energy efficiency of MemPool instances in 2D and 3D with 1 MiB, 2 MiB, 4 MiB, and 8 MiB of scratchpad memory in a commercial 28 nm technology node. We observe a performance gain of 9.1% when running a matrix multiplication on the MemPool-3D design with 4 MiB of scratchpad memory compared to the MemPool 2D counterpart. In terms of energy efficiency, we can implement the MemPool-3D instance with 4 MiB of L1 memory on an energy budget 15% smaller than its 2D counterpart, and even 3.7% smaller than the MemPool-2D instance with one-fourth of the L1 scratchpad memory capacity.
△ Less
Submitted 2 December, 2021;
originally announced December 2021.
-
Co-Design of Embodied Intelligence: A Structured Approach
Authors:
Gioele Zardini,
Dejan Milojevic,
Andrea Censi,
Emilio Frazzoli
Abstract:
We consider the problem of co-designing embodied intelligence as a whole in a structured way, from hardware components such as propulsion systems and sensors to software modules such as control and perception pipelines. We propose a principled approach to formulate and solve complex embodied intelligence co-design problems, leveraging a monotone co-design theory. The methods we propose are intuiti…
▽ More
We consider the problem of co-designing embodied intelligence as a whole in a structured way, from hardware components such as propulsion systems and sensors to software modules such as control and perception pipelines. We propose a principled approach to formulate and solve complex embodied intelligence co-design problems, leveraging a monotone co-design theory. The methods we propose are intuitive and integrate heterogeneous engineering disciplines, allowing analytical and simulation-based modeling techniques and enabling interdisciplinarity. We illustrate through a case study how, given a set of desired behaviors, our framework is able to compute Pareto efficient solutions for the entire hardware and software stack of a self-driving vehicle.
△ Less
Submitted 30 July, 2021; v1 submitted 21 November, 2020;
originally announced November 2020.
-
On the Design of an Optimal Multiprocessor Real-Time Scheduling Algorithm under Practical Considerations (Extended Version)
Authors:
Shelby Funk,
Vincent Nelis,
Joel Goossens,
Dragomir Milojevic,
Geoffrey Nelissen
Abstract:
This research addresses the multiprocessor scheduling problem of hard real-time systems, and it especially focuses on optimal and global schedulers when practical constraints are taken into account. First, we propose an improvement of the optimal algorithm BF. We formally prove that our adaptation is (i) optimal, i.e., it always generates a feasible schedule as long as such a schedule exists, an…
▽ More
This research addresses the multiprocessor scheduling problem of hard real-time systems, and it especially focuses on optimal and global schedulers when practical constraints are taken into account. First, we propose an improvement of the optimal algorithm BF. We formally prove that our adaptation is (i) optimal, i.e., it always generates a feasible schedule as long as such a schedule exists, and (ii) valid, i.e., it complies with the all the requirements. We also show that it outperforms BF by providing a computing complexity of O(n), where n is the number of tasks to be scheduled. Next, we propose a schedulability analysis which indicates a priori whether the real-time application can be scheduled by our improvement of BF without missing any deadline. This analysis is, to the best of our knowledge, the first such test for multiprocessors that takes into account all the main overheads generated by the Operating System.
△ Less
Submitted 24 January, 2011; v1 submitted 25 January, 2010;
originally announced January 2010.
-
Power-Aware Real-Time Scheduling upon Identical Multiprocessor Platforms
Authors:
Vincent Nélis,
Joël Goossens,
Nicolas Navet,
Raymond Devillers,
Dragomir Milojevic
Abstract:
In this paper, we address the power-aware scheduling of sporadic constrained-deadline hard real-time tasks using dynamic voltage scaling upon multiprocessor platforms. We propose two distinct algorithms. Our first algorithm is an off-line speed determination mechanism which provides an identical speed for each processor. That speed guarantees that all deadlines are met if the jobs are scheduled…
▽ More
In this paper, we address the power-aware scheduling of sporadic constrained-deadline hard real-time tasks using dynamic voltage scaling upon multiprocessor platforms. We propose two distinct algorithms. Our first algorithm is an off-line speed determination mechanism which provides an identical speed for each processor. That speed guarantees that all deadlines are met if the jobs are scheduled using EDF. The second algorithm is an on-line and adaptive speed adjustment mechanism which reduces the energy consumption while the system is running.
△ Less
Submitted 10 March, 2008; v1 submitted 18 December, 2007;
originally announced December 2007.