Search | arXiv e-print repository

Lagarto I-Una plataforma hardware/software de arquitectura de computadoras para la academia e investigación

Authors: Cristobal Ramirez Lazo, Cesar Alejandro Hernandez, Carlos Rojas Morales, Gustavo Mondragon Garcia, Luis Alfonso Villa Vargas, Marco Antonio Ramirez Salinas

Abstract: The design of Microprocessors Computer Architectures remains as a fundamental course in Computer Science and Computer Engineering. The technology and organization inside microprocessors have changed quite fast in the last twenty years. That change has increased the information handled in class, difficulting the teaching/learning process among students. Although there are tools, mainly simulators,… ▽ More The design of Microprocessors Computer Architectures remains as a fundamental course in Computer Science and Computer Engineering. The technology and organization inside microprocessors have changed quite fast in the last twenty years. That change has increased the information handled in class, difficulting the teaching/learning process among students. Although there are tools, mainly simulators, available to exemplify abstract concepts during the course, these tools have not come along with the technology. The computer architecture group of the Centro de Investigación en Computación at the IPN Mexico is working on a project called Lagarto to create an open computing platform for research and education to simplify the understanding of fundamental concepts involved in computer architecture and operating systems. This paper introduces Lagarto, our soft-core-processor micro-architecture. It has a scalar pipeline structure and executes a full MIPS 32 R6 ISA [9] [10] and includes an MMU to support modern Operative Systems. The complete design has been described using Verilog HDL and is fully synthesizable in an FPGA. Additionally, this work shows different ways to use and test the microprocessor with codes written in either assembly language or C language. We show that the Lagarto project allows students to incorporate during the course not only the traditional model of visualizing theoretical knowledge in a practical exercise through simulators but also integrate into the teaching process the RTL design to build the Microprocessor Architecture. △ Less

Submitted 26 February, 2022; originally announced February 2022.

Comments: in Spanish. Research in Computing Science. https://www.rcs.cic.ipn.mx/2017_137/Lagarto%20I%20-%20Una%20plataforma%20hardware_software%20de%20arquitectura%20de%20computadoras%20para%20la%20academia.pdf

arXiv:2111.05301 [pdf, other]

doi 10.1109/HPCA53966.2022.00063

Adaptable Register File Organization for Vector Processors

Authors: Cristóbal Ramírez Lazo, Enrico Reggiani, Carlos Rojas Morales, Roger Figueras Bagué, Luis Alfonso Villa Vargas, Marco Antonio Ramírez Salinas, Mateo Valero Cortés, Osman Sabri Unsal, Adrián Cristal

Abstract: Modern scientific applications are getting more diverse, and the vector lengths in those applications vary widely. Contemporary Vector Processors (VPs) are designed either for short vector lengths, e.g., Fujitsu A64FX with 512-bit ARM SVE vector support, or long vectors, e.g., NEC Aurora Tsubasa with 16Kbits Maximum Vector Length (MVL). Unfortunately, both approaches have drawbacks. On the one han… ▽ More Modern scientific applications are getting more diverse, and the vector lengths in those applications vary widely. Contemporary Vector Processors (VPs) are designed either for short vector lengths, e.g., Fujitsu A64FX with 512-bit ARM SVE vector support, or long vectors, e.g., NEC Aurora Tsubasa with 16Kbits Maximum Vector Length (MVL). Unfortunately, both approaches have drawbacks. On the one hand, short vector length VP designs struggle to provide high efficiency for applications featuring long vectors with high Data Level Parallelism (DLP). On the other hand, long vector VP designs waste resources and underutilize the Vector Register File (VRF) when executing low DLP applications with short vector lengths. Therefore, those long vector VP implementations are limited to a specialized subset of applications, where relatively high DLP must be present to achieve excellent performance with high efficiency. To overcome these limitations, we propose an Adaptable Vector Architecture (AVA) that leads to having the best of both worlds. AVA is designed for short vectors (MVL=16 elements) and is thus area and energy-efficient. However, AVA has the functionality to reconfigure the MVL, thereby allowing to exploit the benefits of having a longer vector (up to 128 elements) microarchitecture when abundant DLP is present. We model AVA on the gem5 simulator and evaluate the performance with six applications taken from the RiVEC Benchmark Suite. To obtain area and power consumption metrics, we model AVA on McPAT for 22nm technology. Our results show that by reconfiguring our small VRF (8KB) plus our novel issue queue scheme, AVA yields a 2X speedup over the default configuration for short vectors. Additionally, AVA shows competitive performance when compared to a long vector VP, while saving 50% of area. △ Less

Submitted 29 May, 2022; v1 submitted 9 November, 2021; originally announced November 2021.

Comments: 28th IEEE International Symposium on High-Performance Computer Architecture (HPCA 2022)

arXiv:2111.01949 [pdf]

doi 10.1145/3422667

A RISC-V Simulator and Benchmark Suite for Designing and Evaluating Vector Architectures

Authors: Cristóbal Ramírez Lazo, César Alejandro Hernández, Oscar Palomar, Osman Sabri Unsal, Marco Antonio Ramírez, Adrían Cristal

Abstract: Vector architectures lack tools for research. Consider the gem5 simulator, which is possibly the leading platform for computer-system architecture research. Unfortunately, gem5 does not have an available distribution that includes a flexible and customizable vector architecture model. In consequence, researchers have to develop their own simulation platform to test their ideas, which consume much… ▽ More Vector architectures lack tools for research. Consider the gem5 simulator, which is possibly the leading platform for computer-system architecture research. Unfortunately, gem5 does not have an available distribution that includes a flexible and customizable vector architecture model. In consequence, researchers have to develop their own simulation platform to test their ideas, which consume much research time. However, once the base simulator platform is developed, another question is the following: Which applications should be tested to perform the experiments? The lack of Vectorized Benchmark Suites is another limitation. To face these problems, this work presents a set of tools for designing and evaluating vector architectures. First, the gem5 simulator was extended to support the execution of RISC-V Vector instructions by adding a parameterizable Vector Architecture model for designers to evaluate different approaches according to the target they pursue. Second, a novel Vectorized Benchmark Suite is presented: a collection composed of seven data-parallel applications from different domains that can be classified according to the modules that are stressed in the vector architecture. Finally, a study of the Vectorized Benchmark Suite executing on the gem5-based Vector Architecture model is highlighted. This suite is the first in its category that covers the different possible usage scenarios that may occur within different vector architecture designs such as embedded systems, mainly focused on short vectors, or High-Performance-Computing (HPC), usually designed for large vectors. △ Less

Submitted 29 October, 2021; originally announced November 2021.

Comments: ACM Transactions on Architecture and Code Optimization, Volume 17, Issue 4, December 2020, Article No.38

arXiv:2111.01948 [pdf]

Design and implementation of an out-of-order execution engine of floating-point arithmetic operations

Authors: Cristóbal Ramírez Lazo

Abstract: In this thesis, work is undertaken towards the design in hardware description languages and implementation in FPGA of an out-of-order execution engine of floating-point arithmetic operations for the Lagarto II core. A first proposal covers the design of a low power consumption issue queue for out-of-order processors, register bank, bypass network, and the functional units for addition/subtraction,… ▽ More In this thesis, work is undertaken towards the design in hardware description languages and implementation in FPGA of an out-of-order execution engine of floating-point arithmetic operations for the Lagarto II core. A first proposal covers the design of a low power consumption issue queue for out-of-order processors, register bank, bypass network, and the functional units for addition/subtraction, multiplication, division/reciprocal, and Fused Multiply Accumulate (FMAC) confirming with the IEEE-754 standard. The design supports double-precision format and denormalized numbers; A second proposal is based on a pair of FMAC as functional units which can perform almost all Floating-point operations, this design is more beneficial in area, performance, and energy efficiency compared with the first version. △ Less

Submitted 29 October, 2021; originally announced November 2021.

Comments: Master Thesis Link: https://upcommons.upc.edu/handle/2117/82655

Showing 1–4 of 4 results for author: Lazo, C R