Skip to main content

Showing 1–25 of 25 results for author: Arnau, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00052  [pdf

    cs.DC

    Vitamin-V: Expanding Open-Source RISC-V Cloud Environments

    Authors: Ramon Canal, Stefano Di Carlo, Dimitris Gizopoulos, Alberto Scionti, Francesco Lubrano, Josep-Lluís Berral, Aaron Call, Diego Marron, Konstantinos Nikas, Dionisios Pnevmatikatos, Daniel Raho, Alvise Rigo, Yannis Papaefstathiou, José María Arnau, Angelos Arelakis

    Abstract: Among the key contributions of Vitamin-V (2023-2025 Horizon Europe project), we develop a complete RISC-V open-source software stack for cloud services with comparable performance to the cloud-dominant x86 counterpart. In this paper, we detail the software suites and applications ported plus the three cloud setups under evaluation.

    Submitted 12 June, 2024; originally announced July 2024.

    Comments: RISC-V Summit Europe 2024, 24-28 June 2024

  2. arXiv:2310.17501  [pdf, other

    cs.AR

    A Lightweight, Compiler-Assisted Register File Cache for GPGPU

    Authors: Mojtaba Abaie Shoushtary, Jose Maria Arnau, Jordi Tubella Murgadas, Antonio Gonzalez

    Abstract: Modern GPUs require an enormous register file (RF) to store the context of thousands of active threads. It consumes considerable energy and contains multiple large banks to provide enough throughput. Thus, a RF caching mechanism can significantly improve the performance and energy consumption of the GPUs by avoiding reads from the large banks that consume significant energy and may cause port conf… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  3. arXiv:2305.10982  [pdf, other

    cs.DC

    Vitamin-V: Virtual Environment and Tool-boxing for Trustworthy Development of RISC-V based Cloud Services

    Authors: A. Arelakis, J. M. Arnau, J. L. Berral, A. Call, R. Canal, S. Di Carlo, J. Costa, D. Gizopoulos, V. Karakostas, F. Lubrano, K. Nikas, Y. Nikolakopoulos, B. Otero, G. Papadimitriou, I. Papaefstathiou, D. Pnevmatikatos, D. Raho, A. Rigo, E. Rodríguez, A. Savino, A. Scionti, N. Tampouratzis, A. Torregrosa

    Abstract: Vitamin-V is a 2023-2025 Horizon Europe project that aims to develop a complete RISC-V open-source software stack for cloud services with comparable performance to the cloud-dominant x86 counterpart and a powerful virtual execution environment for software development, validation, verification, and test that considers the relevant RISC-V ISA extensions for cloud deployment.

    Submitted 27 June, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Paper accepted and presented at the RISC-V Summit Europe, Barcelona, 5-9th June 2023. arXiv admin note: substantial text overlap with arXiv:2305.01983

  4. K-D Bonsai: ISA-Extensions to Compress K-D Trees for Autonomous Driving Tasks

    Authors: Pedro H. E. Becker, José María Arnau, Antonio González

    Abstract: Autonomous Driving (AD) systems extensively manipulate 3D point clouds for object detection and vehicle localization. Thereby, efficient processing of 3D point clouds is crucial in these systems. In this work we propose K-D Bonsai, a technique to cut down memory usage during radius search, a critical building block of point cloud processing. K-D Bonsai exploits value similarity in the data structu… ▽ More

    Submitted 30 August, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    MSC Class: Article No. 18; 2018 Related DOI: https://doi.org/10.1145/3243176.3243184 Focus to learn more

    Journal ref: ISCA'23 Proceedings of the 50th Annual International Symposium on Computer Architecture, Article No. 20, 2023

  5. arXiv:2212.00608  [pdf, other

    cs.AR cs.CV cs.LG

    Exploiting Kernel Compression on BNNs

    Authors: Franyell Silfa, Jose Maria Arnau, Antonio González

    Abstract: Binary Neural Networks (BNNs) are showing tremendous success on realistic image classification tasks. Notably, their accuracy is similar to the state-of-the-art accuracy obtained by full-precision models tailored to edge devices. In this regard, BNNs are very amenable to edge devices since they employ 1-bit to store the inputs and weights, and thus, their storage requirements are low. Also, BNNs c… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

  6. arXiv:2202.06563  [pdf, other

    cs.NE cs.AR cs.LG

    Saving RNN Computations with a Neuron-Level Fuzzy Memoization Scheme

    Authors: Franyell Silfa, Jose-Maria Arnau, Antonio González

    Abstract: Recurrent Neural Networks (RNNs) are a key technology for applications such as automatic speech recognition or machine translation. Unlike conventional feed-forward DNNs, RNNs remember past information to improve the accuracy of future predictions and, therefore, they are very effective for sequence processing problems. For each application run, recurrent layers are executed many times for process… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

  7. arXiv:2202.04990  [pdf, other

    cs.AR cs.LG cs.NE

    Mixture-of-Rookies: Saving DNN Computations by Predicting ReLU Outputs

    Authors: Dennis Pinto, Jose-María Arnau, Antonio González

    Abstract: Deep Neural Networks (DNNs) are widely used in many applications domains. However, they require a vast amount of computations and memory accesses to deliver outstanding accuracy. In this paper, we propose a scheme to predict whether the output of each ReLu activated neuron will be a zero or a positive number in order to skip the computation of those neurons that will likely output a zero. Our pred… ▽ More

    Submitted 10 February, 2022; originally announced February 2022.

    Comments: 13 pages, 14 figures

  8. arXiv:2202.04971  [pdf, other

    cs.AR cs.SD eess.AS

    ASRPU: A Programmable Accelerator for Low-Power Automatic Speech Recognition

    Authors: Dennis Pinto, Jose-María Arnau, Antonio González

    Abstract: The outstanding accuracy achieved by modern Automatic Speech Recognition (ASR) systems is enabling them to quickly become a mainstream technology. ASR is essential for many applications, such as speech-based assistants, dictation systems and real-time language translation. However, highly accurate ASR systems are computationally expensive, requiring on the order of billions of arithmetic operation… ▽ More

    Submitted 10 February, 2022; originally announced February 2022.

    Comments: 11 pages, 11 figures

  9. arXiv:2107.09408  [pdf, other

    cs.AR cs.LG

    CREW: Computation Reuse and Efficient Weight Storage for Hardware-accelerated MLPs and RNNs

    Authors: Marc Riera, Jose-Maria Arnau, Antonio Gonzalez

    Abstract: Deep Neural Networks (DNNs) have achieved tremendous success for cognitive applications. The core operation in a DNN is the dot product between quantized inputs and weights. Prior works exploit the weight/input repetition that arises due to quantization to avoid redundant computations in Convolutional Neural Networks (CNNs). However, in this paper we show that their effectiveness is severely limit… ▽ More

    Submitted 11 March, 2022; v1 submitted 20 July, 2021; originally announced July 2021.

  10. arXiv:2101.09083  [pdf, other

    cs.SD cs.CL eess.AS

    Exploiting Beam Search Confidence for Energy-Efficient Speech Recognition

    Authors: Dennis Pinto, Jose-María Arnau, Antonio González

    Abstract: With computers getting more and more powerful and integrated in our daily lives, the focus is increasingly shifting towards more human-friendly interfaces, making Automatic Speech Recognition (ASR) a central player as the ideal means of interaction with machines. Consequently, interest in speech technology has grown in the last few years, with more systems being proposed and higher accuracy levels… ▽ More

    Submitted 22 January, 2021; originally announced January 2021.

  11. arXiv:2009.10656  [pdf, other

    cs.DC cs.AR cs.LG

    E-BATCH: Energy-Efficient and High-Throughput RNN Batching

    Authors: Franyell Silfa, Jose Maria Arnau, Antonio Gonzalez

    Abstract: Recurrent Neural Network (RNN) inference exhibits low hardware utilization due to the strict data dependencies across time-steps. Batching multiple requests can increase throughput. However, RNN batching requires a large amount of padding since the batched input sequences may largely differ in length. Schemes that dynamically update the batch every few time-steps avoid padding. However, they requi… ▽ More

    Submitted 22 September, 2020; originally announced September 2020.

  12. arXiv:2007.07131  [pdf, other

    cs.AR

    Irregular Accesses Reorder Unit: Improving GPGPU Memory Coalescing for Graph-Based Workloads

    Authors: Albert Segura, Jose-Maria Arnau, Antonio Gonzalez

    Abstract: GPGPU architectures have become established as the dominant parallelization and performance platform achieving exceptional popularization and empowering domains such as regular algebra, machine learning, image detection and self-driving cars. However, irregular applications struggle to fully realize GPGPU performance as a result of control flow divergence and memory divergence due to irregular mem… ▽ More

    Submitted 15 March, 2022; v1 submitted 14 July, 2020; originally announced July 2020.

    Report number: UPC-DAC-RR-ARCO-2020-1

  13. arXiv:1911.04244  [pdf, other

    eess.SP cs.LG

    Boosting LSTM Performance Through Dynamic Precision Selection

    Authors: Franyell Silfa, Jose-Maria Arnau, Antonio Gonzàlez

    Abstract: The use of low numerical precision is a fundamental optimization included in modern accelerators for Deep Neural Networks (DNNs). The number of bits of the numerical representation is set to the minimum precision that is able to retain accuracy based on an offline profiling, and it is kept constant for DNN inference. In this work, we explore the use of dynamic precision selection during DNN infe… ▽ More

    Submitted 7 November, 2019; originally announced November 2019.

  14. arXiv:1911.01258  [pdf, other

    cs.LG cs.AR cs.NE cs.PF

    SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Network

    Authors: Reza Yazdani, Olatunji Ruwase, Minjia Zhang, Yuxiong He, Jose-Maria Arnau, Antonio Gonzalez

    Abstract: The effectiveness of Recurrent Neural Networks (RNNs) for tasks such as Automatic Speech Recognition has fostered interest in RNN inference acceleration. Due to the recurrent nature and data dependencies of RNN computations, prior work has designed customized architectures specifically tailored to the computation pattern of RNN, getting high computation efficiency for certain chosen model sizes. H… ▽ More

    Submitted 21 May, 2023; v1 submitted 4 November, 2019; originally announced November 2019.

  15. arXiv:1906.02535  [pdf, other

    cs.LG stat.ML

    (Pen-) Ultimate DNN Pruning

    Authors: Marc Riera, Jose-Maria Arnau, Antonio Gonzalez

    Abstract: DNN pruning reduces memory footprint and computational work of DNN-based solutions to improve performance and energy-efficiency. An effective pruning scheme should be able to systematically remove connections and/or neurons that are unnecessary or redundant, reducing the DNN size without any loss in accuracy. In this paper we show that prior pruning schemes require an extremely time-consuming iter… ▽ More

    Submitted 6 June, 2019; originally announced June 2019.

  16. E-PUR: An Energy-Efficient Processing Unit for Recurrent Neural Networks

    Authors: Franyell Silfa, Gem Dot, Jose-Maria Arnau, Antonio Gonzalez

    Abstract: Recurrent Neural Networks (RNNs) are a key technology for emerging applications such as automatic speech recognition, machine translation or image description. Long Short Term Memory (LSTM) networks are the most successful RNN implementation, as they can learn long term dependencies to achieve high accuracy. Unfortunately, the recurrent nature of LSTM networks significantly constrains the amount o… ▽ More

    Submitted 20 November, 2017; originally announced November 2017.

    Report number: UPC-DAC-RR-2017-8

    Journal ref: PACT '18 Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, Article No. 18, 2018

  17. arXiv:1707.08089  [pdf, other

    cs.IT

    Delay Performance of MISO Wireless Communications

    Authors: Jesus Arnau, Marios Kountouris

    Abstract: Ultra-reliable, low latency communications (URLLC) are currently attracting significant attention due to the emergence of mission-critical applications and device-centric communication. URLLC will entail a fundamental paradigm shift from throughput-oriented system design towards holistic designs for guaranteed and reliable end-to-end latency. A deep understanding of the delay performance of wirele… ▽ More

    Submitted 25 July, 2017; originally announced July 2017.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  18. arXiv:1703.06069  [pdf, other

    cs.IT

    Performance Analysis of Ultra-Dense Networks with Elevated Base Stations

    Authors: Italo Atzeni, Jesús Arnau, Marios Kountouris

    Abstract: This paper analyzes the downlink performance of ultra-dense networks with elevated base stations (BSs). We consider a general dual-slope pathloss model with distance-dependent probability of line-of-sight (LOS) transmission between BSs and receivers. Specifically, we consider the scenario where each link may be obstructed by randomly placed buildings. Using tools from stochastic geometry, we show… ▽ More

    Submitted 17 March, 2017; originally announced March 2017.

    Comments: 6 pages, 4 figures. To be presented at SpaSWiN'17 (WiOpt workshops), May 2017

  19. arXiv:1703.01279  [pdf, other

    cs.IT

    Downlink Cellular Network Analysis with LOS/NLOS Propagation and Elevated Base Stations

    Authors: Italo Atzeni, Jesús Arnau, Marios Kountouris

    Abstract: In this paper, we investigate the downlink performance of dense cellular networks with elevated base stations (BSs) using a channel model that incorporates line-of-sight (LOS)/non-line-of-sight (NLOS) propagation in both small-scale and large-scale fading. Modeling LOS fading with Nakagami-$m$ fading, we provide a unified framework based on stochastic geometry that encompasses both closest and str… ▽ More

    Submitted 3 March, 2017; originally announced March 2017.

    Comments: Submitted to the IEEE for possible publication

  20. arXiv:1702.06493  [pdf, ps, other

    cs.IT

    Timely CSI Acquisition Exploiting Full Duplex

    Authors: Jesus Arnau, Marios Kountouris

    Abstract: In this paper, we propose a method for acquiring accurate and timely channel state information (CSI) by leveraging full-duplex transmission. Specifically, we propose a mobile communication system in which base stations continuously transmit a pilot sequence in the uplink frequency band, while terminals use self-interference cancellation capabilities to obtain CSI at any time. Our proposal outperfo… ▽ More

    Submitted 21 February, 2017; originally announced February 2017.

    Comments: 6 pages, 4 figures, accepted at IEEE WCNC 2017

  21. arXiv:1602.03644  [pdf, other

    cs.IT

    Impact of LOS/NLOS Propagation and Path Loss in Ultra-Dense Cellular Networks

    Authors: Jesús Arnau, Italo Atzeni, Marios Kountouris

    Abstract: Most prior work on performance analysis of ultra-dense cellular networks (UDNs) has considered standard power-law path loss models and non-line-of-sight (NLOS) propagation modeled by Rayleigh fading. The effect of line-of-sight (LOS) on coverage and throughput and its implication on network densification are still not fully understood. In this paper, we investigate the performance of UDNs when the… ▽ More

    Submitted 28 September, 2016; v1 submitted 11 February, 2016; originally announced February 2016.

    Comments: Paper presented at IEEE ICC 2016 - Wireless Communications Symposium

  22. arXiv:1512.05526  [pdf, other

    cs.IT

    Single-Pole IIR Channel Power Prediction with Variable Delays

    Authors: Jesús Arnau

    Abstract: Exploiting outdated channel quality indicators is crucial in most adaptive wireless communication systems. This is often done through channel prediction based on previous received indicators. In this paper, we analyze the case where the feedback delay experienced by the quality indicators is not constant, but random. Focusing on a single-pole IIR predictor, we obtain analytical expressions for the… ▽ More

    Submitted 17 December, 2015; originally announced December 2015.

    Comments: Paper presented at IEEE GLOBECOM 2015, San Diego, California

  23. Fractional Pilot Reuse in Massive MIMO Systems

    Authors: Italo Atzeni, Jesús Arnau, Mérouane Debbah

    Abstract: Pilot contamination is known to be one of the main impairments for massive MIMO multi-cell communications. Inspired by the concept of fractional frequency reuse and by recent contributions on pilot reutilization among non-adjacent cells, we propose a new pilot allocation scheme to mitigate this effect. The key idea is to allow users in neighboring cells that are closest to their base stations to r… ▽ More

    Submitted 16 June, 2015; v1 submitted 25 March, 2015; originally announced March 2015.

    Comments: Paper presented at the IEEE ICC 2015 Workshop on 5G & Beyond - Enabling Technologies and Applications

  24. arXiv:1303.3110  [pdf, ps, other

    cs.OH

    Adaptive Transmission Techniques for Mobile Satellite Links

    Authors: Jesus Arnau, Alberto Rico-Alvariño, Carlos Mosquera

    Abstract: Adapting the transmission rate in an LMS channel is a challenging task because of the relatively fast time variations, of the long delays involved, and of the difficulty in map** the parameters of a time-varying channel into communication performance. In this paper, we propose two strategies for dealing with these impairments, namely, multi-layer coding (MLC) in the forward link, and open-loop a… ▽ More

    Submitted 13 March, 2013; originally announced March 2013.

    Comments: Presented at the 30th AIAA International Communications Satellite Systems Conference (ICSSC), Ottawa, Canada, 2012. Best Professional Paper Award

  25. arXiv:1211.5903  [pdf, ps, other

    cs.IT

    MMSE Performance Analysis of Generalized Multibeam Satellite Channels

    Authors: Dimitrios Christopoulos, Jesus Arnau, Symeon Chatzinotas, Carlos Mosquera, Bjorn Ottersten

    Abstract: Aggressive frequency reuse in the return link (RL) of multibeam satellite communications (SatComs) is crucial towards the implementation of next generation, interactive satellite services. In this direction, multiuser detection has shown great potential in mitigating the increased intrasystem interferences, induced by a tight spectrum reuse. Herein we present an analytic framework to describe the… ▽ More

    Submitted 26 November, 2012; originally announced November 2012.

    Comments: 4 pages, 2 figures, submitted to the IEEE