Skip to main content

Showing 1–13 of 13 results for author: Nunez-Yanez, J

Searching in archive cs. Search in all archives.
.
  1. EnergyAnalyzer: Using Static WCET Analysis Techniques to Estimate the Energy Consumption of Embedded Applications

    Authors: Simon Wegener, Kris K. Nikov, Jose Nunez-Yanez, Kerstin Eder

    Abstract: This paper presents EnergyAnalyzer, a code-level static analysis tool for estimating the energy consumption of embedded software based on statically predictable hardware events. The tool utilises techniques usually used for worst-case execution time (WCET) analysis together with bespoke energy models developed for two predictable architectures - the ARM Cortex-M0 and the Gaisler LEON3 - to perform… ▽ More

    Submitted 25 May, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Journal ref: 21th International Workshop on Worst-Case Execution Time Analysis (WCET 2023) (pp 9:1-9:14) Schloss Dagstuhl - Leibniz-Zentrum fur Informatik

  2. Big-Little Adaptive Neural Networks on Low-Power Near-Subthreshold Processors

    Authors: Zichao Shen, Neil Howard, Jose Nunez-Yanez

    Abstract: This paper investigates the energy savings that near-subthreshold processors can obtain in edge AI applications and proposes strategies to improve them while maintaining the accuracy of the application. The selected processors deploy adaptive voltage scaling techniques in which the frequency and voltage levels of the processor core are determined at the run-time. In these systems, embedded RAM and… ▽ More

    Submitted 19 April, 2023; originally announced April 2023.

    Comments: 27 pages, 17 figures, https://github.com/DarkSZChao/Big-Little_NN_Strategies

    Journal ref: Journal of Low Power Electronics and Applications 12.2 (2022): 28

  3. Dynamically Reconfigurable Variable-precision Sparse-Dense Matrix Acceleration in Tensorflow Lite

    Authors: Jose Nunez-Yanez, Andres Otero, Eduardo de la Torre

    Abstract: In this paper, we present a dynamically reconfigurable hardware accelerator called FADES (Fused Architecture for DEnse and Sparse matrices). The FADES design offers multiple configuration options that trade off parallelism and complexity using a dataflow model to create four stages that read, compute, scale and write results. FADES is mapped to the programmable logic (PL) and integrated with the T… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Journal ref: Microprocessors and Microsystems, Volume 98, 2023, 104801, ISSN 0141-9331

  4. Accurate Energy Modelling on the Cortex-M0 Processor for Profiling and Static Analysis

    Authors: Kris Nikov, Kyriakos Georgiou, Zbigniew Chamski, Kerstin Eder, Jose Nunez-Yanez

    Abstract: Energy modelling can enable energy-aware software development and assist the developer in meeting an application's energy budget. Although many energy models for embedded processors exist, most do not account for processor-specific configurations, neither are they suitable for static energy consumption estimation. This paper introduces a set of comprehensive energy models for Arm's Cortex-M0 proce… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2104.01055

    Journal ref: 2022 29th IEEE International Conference on Electronics, Circuits and Systems (ICECS) (pp. 1-4). IEEE

  5. Robust and accurate fine-grain power models for embedded systems with no on-chip PMU

    Authors: Kris Nikov, Marcos Martinez, Simon Wegener, Jose Nunez-Yanez, Zbigniew Chamski, Kyriakos Georgiou, Kerstin Eder

    Abstract: This paper presents a novel approach to event-based power modelling for embedded platforms that do not have a Performance Monitoring Unit (PMU). The method involves complementing the target hardware platform, where the physical power data is measured, with another platform on which the CPU performance data, that is needed for model generation, can be collected. The methodology is used to generate… ▽ More

    Submitted 9 November, 2021; v1 submitted 26 May, 2021; originally announced June 2021.

  6. Evaluation of hybrid run-time power models for the ARM big.LITTLE architecture

    Authors: Kris Nikov, Jose L. Nunez-Yanez, Matthew Horsnell

    Abstract: Heterogeneous processors, formed by binary compatible CPU cores with different microarchitectures, enable energy reductions by better matching processing capabilities and software application requirements. This new hardware platform requires novel techniques to manage power and energy to fully utilize its capabilities, particularly regarding the map** of workloads to appropriate cores. In this p… ▽ More

    Submitted 24 August, 2020; originally announced August 2020.

    Comments: 6 pages, 8 fugures. 2015 IEEE 13th International Conference on Embedded and Ubiquitous Computing

    Journal ref: 2015 IEEE 13th International Conference on Embedded and Ubiquitous Computing, Porto, 2015, pp. 205-210

  7. arXiv:2008.08883  [pdf, other

    cs.DC cs.AR cs.PF

    High-Performance Simultaneous Multiprocessing for Heterogeneous System-on-Chip

    Authors: Kris Nikov, Mohammad Hosseinabady, Rafael Asenjo, Andrés Rodríguezz, Angeles Navarro, Jose Nunez-Yanez

    Abstract: This paper presents a methodology for simultaneous heterogeneous computing, named ENEAC, where a quad core ARM Cortex-A53 CPU works in tandem with a preprogrammed on-board FPGA accelerator. A heterogeneous scheduler distributes the tasks optimally among all the resources and all compute units run asynchronously, which allows for improved performance for irregular workloads. ENEAC achieves up to 17… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.

    Comments: 7 pages, 5 figures, 1 table Presented at the 13th International Workshop on Programmability and Architectures for Heterogeneous Multicores, 2020 (arXiv:2005.07619)

    Report number: MULTIPROG/2020/4

  8. arXiv:2006.12176  [pdf, other

    cs.OH

    Run-Time Power Modelling in Embedded GPUs with Dynamic Voltage and Frequency Scaling

    Authors: Jose Nunez-Yanez, Kris Nikov, Kerstin Eder, Mohammad Hosseinabady

    Abstract: This paper investigates the application of a robust CPU-based power modelling methodology that performs an automatic search of explanatory events derived from performance counters to embedded GPUs. A 64-bit Tegra TX1 SoC is configured with DVFS enabled and multiple CUDA benchmarks are used to train and test models optimized for each frequency and voltage point. These optimized models are then comp… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

  9. arXiv:1905.12465  [pdf, other

    cs.DC

    Relationship Estimation Metrics for Binary SoC Data

    Authors: Dave McEwan, Jose Nunez-Yanez

    Abstract: System-on-Chip (SoC) designs are used in every aspect of computing and their optimization is a difficult but essential task in today's competitive market. Data taken from SoCs to achieve this is often characterised by very long concurrent bit vectors which have unknown relationships to each other. This paper explains and empirically compares the accuracy of several methods used to detect the exist… ▽ More

    Submitted 24 September, 2019; v1 submitted 10 May, 2019; originally announced May 2019.

    Comments: 11 pages, 9 figures, LOD2019

  10. arXiv:1901.07317  [pdf, other

    cs.DC

    High-Performance Ultrasonic Levitation with FPGA-based Phased Arrays

    Authors: William Beasley, Brenda Gatusch, Daniel Connolly-Taylor, Chenyuan Teng, Asier Marzo, Jose Nunez-Yanez

    Abstract: We present a flexible and self-contained platform for acoustic levitation research based on the Xilinx Zynq SoC using an array of ultrasonic emitters. The platform employs an inexpensive ZedBoard and provides fast movement of the levitated objects as well as object detection based on the produced echo. Several features available in the Zynq device are of benefit for this platform: hardware acceler… ▽ More

    Submitted 24 January, 2019; v1 submitted 18 January, 2019; originally announced January 2019.

    Comments: Presented at HIP3ES, 2019

    Report number: HIP3ES/2019/2

  11. arXiv:1901.06331  [pdf, other

    cs.DC

    Heterogeneous FPGA+GPU Embedded Systems: Challenges and Opportunities

    Authors: Mohammad Hosseinabady, Mohd Amiruddin Bin Zainol, Jose Nunez-Yanez

    Abstract: The edge computing paradigm has emerged to handle cloud computing issues such as scalability, security and low response time among others. This new computing trend heavily relies on ubiquitous embedded systems on the edge. Performance and energy consumption are two main factors that should be considered during the design of such systems. Focusing on performance and energy consumption, this paper s… ▽ More

    Submitted 25 January, 2019; v1 submitted 18 January, 2019; originally announced January 2019.

    Comments: Presented at HIP3ES, 2019

    Report number: HIP3ES/2019/4

  12. arXiv:1802.03316  [pdf, other

    cs.DC

    Parallelizing Workload Execution in Embedded and High-Performance Heterogeneous Systems

    Authors: Jose Nunez-Yanez, Mohammad Hosseinabady, Moslem Amiri, Andrés Rodríguez, Rafael Asenjo, Angeles Navarro, Rubén Gran-Tejero, Darío Suárez-Gracia

    Abstract: In this paper, we introduce a software-defined framework that enables the parallel utilization of all the programmable processing resources available in heterogeneous system-on-chip (SoC) including FPGA-based hardware accelerators and programmable CPUs. Two platforms with different architectures are considered, and a single C/C++ source code is used in both of them for the CPU and FPGA resources.… ▽ More

    Submitted 9 February, 2018; originally announced February 2018.

    Comments: Presented at HIP3ES, 2018

    Report number: HIP3ES/2018/2

  13. arXiv:1602.02517  [pdf

    cs.AR cs.PF

    Energy Efficient Video Fusion with Heterogeneous CPU-FPGA Devices

    Authors: Jose Nunez-Yanez, Tom Sun

    Abstract: This paper presents a complete video fusion system with hardware acceleration and investigates the energy trade-offs between computing in the CPU or the FPGA device. The video fusion application is based on the Dual-Tree Complex Wavelet Transforms (DT-CWT). In this work the transforms are mapped to a hardware accelerator using high-level synthesis tools for the FPGA and also vectorized code for th… ▽ More

    Submitted 8 February, 2016; originally announced February 2016.

    Comments: Presented at HIP3ES, 2016

    Report number: HIP3ES/2016/3