Skip to main content

Showing 1–23 of 23 results for author: Soudris, D

.
  1. arXiv:2407.03711  [pdf, other

    cs.AR

    Decoupled Access-Execute enabled DVFS for tinyML deployments on STM32 microcontrollers

    Authors: Elisavet Lydia Alvanaki, Manolis Katsaragakis, Dimosthenis Masouros, Sotirios Xydis, Dimitrios Soudris

    Abstract: Over the last years the rapid growth Machine Learning (ML) inference applications deployed on the Edge is rapidly increasing. Recent Internet of Things (IoT) devices and microcontrollers (MCUs), become more and more mainstream in everyday activities. In this work we focus on the family of STM32 MCUs. We propose a novel methodology for CNN deployment on the STM32 family, focusing on power optimizat… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: 6 pages, 6 figures, 1 listing, presented in IEEE DATE 2024

    Journal ref: 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE) (pp. 1-6). IEEE

  2. arXiv:2405.16953  [pdf, other

    cs.CV cs.DC cs.PF

    Evaluation of Resource-Efficient Crater Detectors on Embedded Systems

    Authors: Simon Vellas, Bill Psomas, Kalliopi Karadima, Dimitrios Danopoulos, Alexandros Paterakis, George Lentaris, Dimitrios Soudris, Konstantinos Karantzalos

    Abstract: Real-time analysis of Martian craters is crucial for mission-critical operations, including safe landings and geological exploration. This work leverages the latest breakthroughs for on-the-edge crater detection aboard spacecraft. We rigorously benchmark several YOLO networks using a Mars craters dataset, analyzing their performance on embedded systems with a focus on optimization for low-power de… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted at 2024 IEEE International Geoscience and Remote Sensing Symposium

  3. arXiv:2404.13715  [pdf, other

    cs.LG

    TF2AIF: Facilitating development and deployment of accelerated AI models on the cloud-edge continuum

    Authors: Aimilios Leftheriotis, Achilleas Tzenetopoulos, George Lentaris, Dimitrios Soudris, Georgios Theodoridis

    Abstract: The B5G/6G evolution relies on connect-compute technologies and highly heterogeneous clusters with HW accelerators, which require specialized coding to be efficiently utilized. The current paper proposes a custom tool for generating multiple SW versions of a certain AI function input in high-level language, e.g., Python TensorFlow, while targeting multiple diverse HW+SW platforms. TF2AIF builds up… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: to be published in EUCNC & 6G Summit 2024

  4. arXiv:2402.07545  [pdf, other

    cs.LG cs.AR

    TransAxx: Efficient Transformers with Approximate Computing

    Authors: Dimitrios Danopoulos, Georgios Zervakis, Dimitrios Soudris, Jörg Henkel

    Abstract: Vision Transformer (ViT) models which were recently introduced by the transformer architecture have shown to be very competitive and often become a popular alternative to Convolutional Neural Networks (CNNs). However, the high computational requirements of these models limit their practical applicability especially on low-power devices. Current state-of-the-art employs approximate multipliers to a… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  5. arXiv:2312.01172  [pdf, other

    cs.LG

    On-sensor Printed Machine Learning Classification via Bespoke ADC and Decision Tree Co-Design

    Authors: Giorgos Armeniakos, Paula L. Duarte, Priyanjana Pal, Georgios Zervakis, Mehdi B. Tahoori, Dimitrios Soudris

    Abstract: Printed electronics (PE) technology provides cost-effective hardware with unmet customization, due to their low non-recurring engineering and fabrication costs. PE exhibit features such as flexibility, stretchability, porosity, and conformality, which make them a prominent candidate for enabling ubiquitous computing. Still, the large feature sizes in PE limit the realization of complex printed cir… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

    Comments: Accepted for publication at the 27th Design, Automation and Test in Europe Conference (DATE'24), Mar 25-27 2024, Valencia, Spain

  6. arXiv:2307.11128  [pdf, other

    cs.AR cs.AI cs.ET cs.PL

    Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications

    Authors: Vasileios Leon, Muhammad Abdullah Hanif, Giorgos Armeniakos, Xun Jiao, Muhammad Shafique, Kiamal Pekmestzi, Dimitrios Soudris

    Abstract: The challenging deployment of compute-intensive applications from domains such Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces the community of computing systems to explore new design approaches. Approximate Computing appears as an emerging solution, allowing to tune the quality of results in the design of a system in order to improve the energy efficiency and/or performan… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: Under Review at ACM Computing Surveys

  7. arXiv:2307.11124  [pdf, other

    cs.AR cs.ET cs.PL

    Approximate Computing Survey, Part I: Terminology and Software & Hardware Approximation Techniques

    Authors: Vasileios Leon, Muhammad Abdullah Hanif, Giorgos Armeniakos, Xun Jiao, Muhammad Shafique, Kiamal Pekmestzi, Dimitrios Soudris

    Abstract: The rapid growth of demanding applications in domains applying multimedia processing and machine learning has marked a new era for edge and cloud computing. These applications involve massive data and compute-intensive tasks, and thus, typical computing paradigms in embedded systems and data centers are stressed to meet the worldwide demand for high performance. Concurrently, the landscape of the… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: Under Review at ACM Computing Surveys

  8. The Unexpected Efficiency of Bin Packing Algorithms for Dynamic Storage Allocation in the Wild: An Intellectual Abstract

    Authors: Christos P. Lamprakos, Sotirios Xydis, Francky Catthoor, Dimitrios Soudris

    Abstract: Recent work has shown that viewing allocators as black-box 2DBP solvers bears meaning. For instance, there exists a 2DBP-based fragmentation metric which often correlates monotonically with maximum resident set size (RSS). Given the field's indeterminacy with respect to fragmentation definitions, as well as the immense value of physical memory savings, we are motivated to set allocator-generated p… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

    Comments: 13 pages, 10 figures, 3 tables. To appear in ISMM '23

  9. arXiv:2304.10862  [pdf, other

    cs.PL

    Viewing Allocators as Bin Packing Solvers Demystifies Fragmentation

    Authors: Christos P. Lamprakos, Sotirios Xydis, Francky Catthoor, Dimitrios Soudris

    Abstract: This paper presents a trace-based simulation methodology for constructing representations of workload-allocator interaction. We use two-dimensional rectangular bin packing (2DBP) as our foundation. Classical 2DBP algorithms minimize their products' makespan, but virtual memory systems employing demand paging deem such a criterion inappropriate. We view an allocator's placement decisions as a solut… ▽ More

    Submitted 24 April, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

    Comments: 13 pages, 10 figures, 5 tables Edit: removed "regular submission" subtitle, cleaned page headers

  10. arXiv:2304.00953  [pdf, other

    cs.DB

    Energy Consumption Evaluation of Optane DC Persistent Memory for Indexing Data Structures

    Authors: Manolis Katsaragakis, Christos Baloukas, Lazaros Papadopoulos, Verena Kantere, Francky Catthoor, Dimitrios Soudris

    Abstract: The Intel Optane DC Persistent Memory (DCPM) is an attractive novel technology for building storage systems for data intensive HPC applications, as it provides lower cost per byte, low standby power and larger capacities than DRAM, with comparable latency. This work provides an in-depth evaluation of the energy consumption of the Optane DCPM, using well-established indexes specifically designed to… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: 10 pages Has been accepted and presented to IEEE International Conference on High Performance Computing 2022(HiPC), Bengaluru, India

  11. Model-to-Circuit Cross-Approximation For Printed Machine Learning Classifiers

    Authors: Giorgos Armeniakos, Georgios Zervakis, Dimitrios Soudris, Mehdi B. Tahoori, Jörg Henkel

    Abstract: Printed electronics (PE) promises on-demand fabrication, low non-recurring engineering costs, and sub-cent fabrication costs. It also allows for high customization that would be infeasible in silicon, and bespoke architectures prevail to improve the efficiency of emerging PE machine learning (ML) applications. Nevertheless, large feature sizes in PE prohibit the realization of complex ML models in… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: Accepted for publication by IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, March 2023. arXiv admin note: text overlap with arXiv:2203.05915

  12. Co-Design of Approximate Multilayer Perceptron for Ultra-Resource Constrained Printed Circuits

    Authors: Giorgos Armeniakos, Georgios Zervakis, Dimitrios Soudris, Mehdi B. Tahoori, Jörg Henkel

    Abstract: Printed Electronics (PE) exhibits on-demand, extremely low-cost hardware due to its additive manufacturing process, enabling machine learning (ML) applications for domains that feature ultra-low cost, conformity, and non-toxicity requirements that silicon-based systems cannot deliver. Nevertheless, large feature sizes in PE prohibit the realization of complex printed ML circuits. In this work, we… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

    Comments: Accepted for publication by IEEE Transactions on Computers, February 2023

  13. arXiv:2212.00873  [pdf, other

    cs.AR

    CONVOLVE: Smart and seamless design of smart edge processors

    Authors: M. Gomony, F. Putter, A. Gebregiorgis, G. Paulin, L. Mei, V. Jain, S. Hamdioui, V. Sanchez, T. Grosser, M. Geilen, M. Verhelst, F. Zenke, F. Gurkaynak, B. Bruin, S. Stuijk, S. Davidson, S. De, M. Ghogho, A. Jimborean, S. Eissa, L. Benini, D. Soudris, R. Bishnoi, S. Ainsworth, F. Corradi , et al. (3 additional authors not shown)

    Abstract: With the rise of Deep Learning (DL), our world braces for AI in every edge device, creating an urgent need for edge-AI SoCs. This SoC hardware needs to support high throughput, reliable and secure AI processing at Ultra Low Power (ULP), with a very short time to market. With its strong legacy in edge solutions and open processing platforms, the EU is well-positioned to become a leader in this SoC… ▽ More

    Submitted 2 May, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

  14. Towards making the most of NLP-based device map** optimization for OpenCL kernels

    Authors: Petros Vavaroutsos, Ioannis Oroutzoglou, Dimosthenis Masouros, Dimitrios Soudris

    Abstract: Nowadays, we are living in an era of extreme device heterogeneity. Despite the high variety of conventional CPU architectures, accelerator devices, such as GPUs and FPGAs, also appear in the foreground exploding the pool of available solutions to execute applications. However, choosing the appropriate device per application needs is an extremely challenging task due to the abstract relationship be… ▽ More

    Submitted 30 August, 2022; originally announced August 2022.

    Comments: Accepted at IEEE COINS 2022

    Journal ref: 2022 IEEE International Conference on Omni-layer Intelligent Systems (COINS), 2022, pp. 1-6

  15. arXiv:2203.08737  [pdf, other

    cs.AR cs.LG

    Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey

    Authors: Giorgos Armeniakos, Georgios Zervakis, Dimitrios Soudris, Jörg Henkel

    Abstract: Deep Neural Networks (DNNs) are very popular because of their high performance in various cognitive tasks in Machine Learning (ML). Recent advancements in DNNs have brought beyond human accuracy in many tasks, but at the cost of high computational complexity. To enable efficient execution of DNN inference, more and more research works, therefore, exploit the inherent error resilience of DNNs and e… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

    Comments: This paper has been accepted by ACM Computing Surveys (CSUR), 2022

    Journal ref: ACM Computing Surveys 2022

  16. Cross-Layer Approximation For Printed Machine Learning Circuits

    Authors: Giorgos Armeniakos, Georgios Zervakis, Dimitrios Soudris, Mehdi B. Tahoori, Jörg Henkel

    Abstract: Printed electronics (PE) feature low non-recurring engineering costs and low per unit-area fabrication costs, enabling thus extremely low-cost and on-demand hardware. Such low-cost fabrication allows for high customization that would be infeasible in silicon, and bespoke architectures prevail to improve the efficiency of emerging PE machine learning (ML) applications. However, even with bespoke ar… ▽ More

    Submitted 11 March, 2022; originally announced March 2022.

    Comments: Accepted for publication at the 25th Design, Automation and Test in Europe Conference (DATE'22), Mar 14-23 2022, Antwerp, Belgium

  17. AdaPT: Fast Emulation of Approximate DNN Accelerators in PyTorch

    Authors: Dimitrios Danopoulos, Georgios Zervakis, Kostas Siozios, Dimitrios Soudris, Jörg Henkel

    Abstract: Current state-of-the-art employs approximate multipliers to address the highly increased power demands of DNN accelerators. However, evaluating the accuracy of approximate DNNs is cumbersome due to the lack of adequate support for approximate arithmetic in DNN frameworks. We address this inefficiency by presenting AdaPT, a fast emulation framework that extends PyTorch to support approximate infere… ▽ More

    Submitted 11 October, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

    Comments: Accepted for publication in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

  18. EDEN: A high-performance, general-purpose, NeuroML-based neural simulator

    Authors: Sotirios Panagiotou, Harry Sidiropoulos, Mario Negrello, Dimitrios Soudris, Christos Strydis

    Abstract: Modern neuroscience employs in silico experimentation on ever-increasing and more detailed neural networks. The high modelling detail goes hand in hand with the need for high model reproducibility, reusability and transparency. Besides, the size of the models and the long timescales under study mandate the use of a simulation system with high computational performance, so as to provide an acceptab… ▽ More

    Submitted 12 June, 2021; originally announced June 2021.

    Comments: 29 pages, 9 figures

    Journal ref: Front. Neuroinform. 16 (2022)

  19. arXiv:2004.13873  [pdf, other

    eess.SY

    Automated Physics-Derived Code Generation for Sensor Fusion and State Estimation

    Authors: Orestis Kaparounakis, Vasileios Tsoutsouras, Dimitrios Soudris, Phillip Stanley-Marbell

    Abstract: We present a new method for automatically generating the implementation of state-estimation algorithms from a machine-readable specification of the physics of a sensing system and physics of its signals and signal constraints. We implement the new state-estimator code generation method as a backend for a physics specification language and we apply the backend to generate complete C code implementa… ▽ More

    Submitted 28 April, 2020; originally announced April 2020.

    Comments: 11 pages, 7 figures

  20. arXiv:1612.01501  [pdf, other

    cs.NE cs.DC

    BrainFrame: A node-level heterogeneous accelerator platform for neuron simulations

    Authors: Georgios Smaragdos, Georgios Chatzikonstantis, Rahul Kukreja, Harry Sidiropoulos, Dimitrios Rodopoulos, Ioannis Sourdis, Zaid Al-Ars, Christoforos Kachris, Dimitrios Soudris, Chris I. De Zeeuw, Christos Strydis

    Abstract: Objective: The advent of High-Performance Computing (HPC) in recent years has led to its increasing use in brain study through computational models. The scale and complexity of such models are constantly increasing, leading to challenging computational requirements. Even though modern HPC platforms can often deal with such challenges, the vast diversity of the modeling field does not permit for a… ▽ More

    Submitted 15 August, 2017; v1 submitted 5 December, 2016; originally announced December 2016.

    Comments: 16 pages, 18 figures, 5 tables

  21. arXiv:1406.0309  [pdf

    cs.NI cs.AR

    Network Function Virtualization based on FPGAs:A Framework for all-Programmable network devices

    Authors: Christoforos Kachris, Georgios Sirakoulis, Dimitrios Soudris

    Abstract: Network Function Virtualization (NFV) refers to the use of commodity hardware resources as the basic platform to perform specialized network functions as opposed to specialized hardware devices. Currently, NFV is mainly implemented based on general purpose processors, or general purpose network processors. In this paper we propose the use of FPGAs as an ideal platform for NFV that can be used to p… ▽ More

    Submitted 2 June, 2014; originally announced June 2014.

    Comments: Network function virtualizations, FPGA, dynamic reconfiguration

  22. arXiv:0710.4844  [pdf

    cs.AR

    A Partitioning Methodology for Accelerating Applications in Hybrid Reconfigurable Platforms

    Authors: M. D. Galanis, A. Milidonis, G. Theodoridis, D. Soudris, C. E. Goutis

    Abstract: In this paper, we propose a methodology for partitioning and map** computational intensive applications in reconfigurable hardware blocks of different granularity. A generic hybrid reconfigurable architecture is considered so as the methodology can be applicable to a large number of heterogeneous reconfigurable platforms. The methodology mainly consists of two stages, the analysis and the mapp… ▽ More

    Submitted 25 October, 2007; originally announced October 2007.

    Comments: Submitted on behalf of EDAA (http://www.edaa.com/)

    Journal ref: Dans Design, Automation and Test in Europe | Designers'Forum - DATE'05, Munich : Allemagne (2005)

  23. arXiv:0710.4656  [pdf

    cs.AR

    A Memory Hierarchical Layer Assigning and Prefetching Technique to Overcome the Memory Performance/Energy Bottleneck

    Authors: Minas Dasygenis, Erik Brockmeyer, Bart Durinck, Francky Catthoor, Dimitrios Soudris, Antonios Thanailakis

    Abstract: The memory subsystem has always been a bottleneck in performance as well as significant power contributor in memory intensive applications. Many researchers have presented multi-layered memory hierarchies as a means to design energy and performance efficient systems. However, most of the previous work do not explore trade-offs systematically. We fill this gap by proposing a formalized technique… ▽ More

    Submitted 25 October, 2007; originally announced October 2007.

    Comments: Submitted on behalf of EDAA (http://www.edaa.com/)

    Journal ref: Dans Design, Automation and Test in Europe - DATE'05, Munich : Allemagne (2005)