Skip to main content

Showing 1–32 of 32 results for author: Abdelfattah, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16635  [pdf, other

    cs.LG cs.AI cs.CL

    ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models

    Authors: Yash Akhauri, Ahmed F AbouElhamayed, Jordan Dotzel, Zhiru Zhang, Alexander M Rush, Safeen Huda, Mohamed S Abdelfattah

    Abstract: The high power consumption and latency-sensitive deployments of large language models (LLMs) have motivated techniques like quantization and sparsity. Contextual sparsity, where the sparsity pattern is input-dependent, is crucial in LLMs because the permanent removal of attention heads or neurons from LLMs can significantly degrade accuracy. Prior work has attempted to model contextual sparsity us… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  2. arXiv:2405.03103  [pdf, other

    cs.LG cs.CV

    Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs

    Authors: Jordan Dotzel, Yuzong Chen, Bahaa Kotb, Sushma Prasad, Gang Wu, Sheng Li, Mohamed S. Abdelfattah, Zhiru Zhang

    Abstract: The increasing size of large language models (LLMs) traditionally requires low-precision integer formats to meet strict latency and power demands. Yet recently, alternative formats such as Normal Float (NF4) have increased model accuracy at the cost of increased chip area. In this work, we first conduct a large-scale analysis of LLM weights and activations across 30 networks and conclude that most… ▽ More

    Submitted 10 June, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML 2024

  3. arXiv:2404.04900  [pdf, other

    cs.CL

    Radial Networks: Dynamic Layer Routing for High-Performance Large Language Models

    Authors: Jordan Dotzel, Yash Akhauri, Ahmed S. AbouElhamayed, Carly Jiang, Mohamed Abdelfattah, Zhiru Zhang

    Abstract: Large language models (LLMs) often struggle with strict memory, latency, and power demands. To meet these demands, various forms of dynamic sparsity have been proposed that reduce compute on an input-by-input basis. These methods improve over static methods by exploiting the variance across individual inputs, which has steadily grown with the exponential increase in training data. Yet, the increas… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: First two authors have equal contribution

  4. arXiv:2403.12981  [pdf, other

    cs.DC cs.AI cs.CV cs.LG

    Beyond Inference: Performance Analysis of DNN Server Overheads for Computer Vision

    Authors: Ahmed F. AbouElhamayed, Susanne Balle, Deshanand Singh, Mohamed S. Abdelfattah

    Abstract: Deep neural network (DNN) inference has become an important part of many data-center workloads. This has prompted focused efforts to design ever-faster deep learning accelerators such as GPUs and TPUs. However, an end-to-end DNN-based vision application contains more than just DNN inference, including input decompression, resizing, sampling, normalization, and data transfer. In this paper, we perf… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: 6 pages, 11 figures, DAC 2024: 61st IEEE/ACM Design Automation Conference. (DAC'24)

  5. arXiv:2403.02484  [pdf, other

    cs.LG cs.AI cs.CV cs.NE

    Encodings for Prediction-based Neural Architecture Search

    Authors: Yash Akhauri, Mohamed S. Abdelfattah

    Abstract: Predictor-based methods have substantially enhanced Neural Architecture Search (NAS) optimization. The efficacy of these predictors is largely influenced by the method of encoding neural network architectures. While traditional encodings used an adjacency matrix describing the graph structure of a neural network, novel encodings embrace a variety of approaches from unsupervised pretraining of late… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  6. arXiv:2403.02446  [pdf, other

    cs.LG cs.AR cs.CV cs.PF

    On Latency Predictors for Neural Architecture Search

    Authors: Yash Akhauri, Mohamed S. Abdelfattah

    Abstract: Efficient deployment of neural networks (NN) requires the co-optimization of accuracy and latency. For example, hardware-aware neural architecture search has been used to automatically find NN architectures that satisfy a latency constraint on a specific hardware device. Central to these search algorithms is a prediction model that is designed to provide a hardware latency estimate for a candidate… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted at MLSys'24

  7. arXiv:2402.13536  [pdf, other

    cs.CV cs.AI

    Exploring the Limits of Semantic Image Compression at Micro-bits per Pixel

    Authors: Jordan Dotzel, Bahaa Kotb, James Dotzel, Mohamed Abdelfattah, Zhiru Zhang

    Abstract: Traditional methods, such as JPEG, perform image compression by operating on structural information, such as pixel values or frequency content. These methods are effective to bitrates around one bit per pixel (bpp) and higher at standard image sizes. In contrast, text-based semantic compression directly stores concepts and their relationships using natural language, which has evolved with humans t… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Accepted to ICLR Tiny Papers 2024

  8. arXiv:2401.01008  [pdf, other

    cs.CV cs.AI

    Fast Sampling Through The Reuse Of Attention Maps In Diffusion Models

    Authors: Rosco Hunter, Łukasz Dudziak, Mohamed S. Abdelfattah, Abhinav Mehrotra, Sourav Bhattacharya, Hongkai Wen

    Abstract: Text-to-image diffusion models have demonstrated unprecedented capabilities for flexible and realistic image synthesis. Nevertheless, these models rely on a time-consuming sampling procedure, which has motivated attempts to reduce their latency. When improving efficiency, researchers often use the original diffusion model to train an additional network designed specifically for fast image generati… ▽ More

    Submitted 24 May, 2024; v1 submitted 13 December, 2023; originally announced January 2024.

  9. arXiv:2312.10854  [pdf, other

    cs.CV cs.LG

    The Right Losses for the Right Gains: Improving the Semantic Consistency of Deep Text-to-Image Generation with Distribution-Sensitive Losses

    Authors: Mahmoud Ahmed, Omer Moussa, Ismail Shaheen, Mohamed Abdelfattah, Amr Abdalla, Marwan Eid, Hesham Eraqi, Mohamed Moustafa

    Abstract: One of the major challenges in training deep neural networks for text-to-image generation is the significant linguistic discrepancy between ground-truth captions of each image in most popular datasets. The large difference in the choice of words in such captions results in synthesizing images that are semantically dissimilar to each other and to their ground-truth counterparts. Moreover, existing… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

  10. arXiv:2311.02758  [pdf, other

    cs.AR

    M4BRAM: Mixed-Precision Matrix-Matrix Multiplication in FPGA Block RAMs

    Authors: Yuzong Chen, Jordan Dotzel, Mohamed S. Abdelfattah

    Abstract: Mixed-precision quantization is a popular approach for compressing deep neural networks (DNNs). However, it is challenging to scale the performance efficiently with mixed-precision DNNs given the current FPGA architecture and conventional accelerator dataflows. In this work, we enhance the FPGA's capability for accelerating mixed-precision DNNs by proposing M4BRAM, a novel compute-in-block RAM (BR… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

    Comments: 10 pages, 12 figures, 3 tables, IEEE ICFPT 2023

  11. arXiv:2308.03290  [pdf, other

    cs.CV cs.LG

    FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search

    Authors: Jordan Dotzel, Gang Wu, Andrew Li, Muhammad Umar, Yun Ni, Mohamed S. Abdelfattah, Zhiru Zhang, Liqun Cheng, Martin G. Dixon, Norman P. Jouppi, Quoc V. Le, Sheng Li

    Abstract: Quantization has become a mainstream compression technique for reducing model size, computational requirements, and energy consumption for modern deep neural networks (DNNs). With improved numerical support in recent hardware, including multiple variants of integer and floating point, mixed-precision quantization has become necessary to achieve high-quality results with low model cost. Prior mixed… ▽ More

    Submitted 1 May, 2024; v1 submitted 7 August, 2023; originally announced August 2023.

    Comments: Accepted to AutoML 2024

  12. arXiv:2308.00127  [pdf, other

    cs.LG cs.DC cs.SE

    DiviML: A Module-based Heuristic for Map** Neural Networks onto Heterogeneous Platforms

    Authors: Yassine Ghannane, Mohamed S. Abdelfattah

    Abstract: Datacenters are increasingly becoming heterogeneous, and are starting to include specialized hardware for networking, video processing, and especially deep learning. To leverage the heterogeneous compute capability of modern datacenters, we develop an approach for compiler-level partitioning of deep neural networks (DNNs) onto multiple interconnected hardware devices. We present a general framewor… ▽ More

    Submitted 1 August, 2023; v1 submitted 31 July, 2023; originally announced August 2023.

    Comments: accepted at ICCAD'23

  13. arXiv:2306.02459  [pdf, other

    cs.LG cs.AR cs.CV cs.PF

    Multi-Predict: Few Shot Predictors For Efficient Neural Architecture Search

    Authors: Yash Akhauri, Mohamed S. Abdelfattah

    Abstract: Many hardware-aware neural architecture search (NAS) methods have been developed to optimize the topology of neural networks (NN) with the joint objectives of higher accuracy and lower latency. Recently, both accuracy and latency predictors have been used in NAS with great success, achieving high sample efficiency and accurate modeling of hardware (HW) device latency respectively. However, a new a… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

  14. arXiv:2305.18334  [pdf, other

    cs.AR cs.LG

    PQA: Exploring the Potential of Product Quantization in DNN Hardware Acceleration

    Authors: Ahmed F. AbouElhamayed, Angela Cui, Javier Fernandez-Marques, Nicholas D. Lane, Mohamed S. Abdelfattah

    Abstract: Conventional multiply-accumulate (MAC) operations have long dominated computation time for deep neural networks (DNNs), espcially convolutional neural networks (CNNs). Recently, product quantization (PQ) has been applied to these workloads, replacing MACs with memory lookups to pre-computed dot products. To better understand the efficiency tradeoffs of product-quantized DNNs (PQ-DNNs), we create a… ▽ More

    Submitted 28 March, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: ACM Transactions on Reconfigurable Technology and Systems (TRETS) - FCCM 2024 Journal Track

  15. arXiv:2304.03974  [pdf, other

    cs.AR

    BRAMAC: Compute-in-BRAM Architectures for Multiply-Accumulate on FPGAs

    Authors: Yuzong Chen, Mohamed S. Abdelfattah

    Abstract: Deep neural network (DNN) inference using reduced integer precision has been shown to achieve significant improvements in memory utilization and compute throughput with little or no accuracy loss compared to full-precision floating-point. Modern FPGA-based DNN inference relies heavily on the on-chip block RAM (BRAM) for model storage and the digital signal processing (DSP) unit for implementing th… ▽ More

    Submitted 8 April, 2023; originally announced April 2023.

    Comments: 11 pages, 13 figures, 3 tables, FCCM conference 2023

  16. arXiv:2211.10780  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    ArtELingo: A Million Emotion Annotations of WikiArt with Emphasis on Diversity over Language and Culture

    Authors: Youssef Mohamed, Mohamed Abdelfattah, Shyma Alhuwaider, Feifan Li, Xiangliang Zhang, Kenneth Ward Church, Mohamed Elhoseiny

    Abstract: This paper introduces ArtELingo, a new benchmark and dataset, designed to encourage work on diversity across languages and cultures. Following ArtEmis, a collection of 80k artworks from WikiArt with 0.45M emotion labels and English-only captions, ArtELingo adds another 0.79M annotations in Arabic and Chinese, plus 4.8K in Spanish to evaluate "cultural-transfer" performance. More than 51K artworks… ▽ More

    Submitted 19 November, 2022; originally announced November 2022.

    Comments: 9 pages, Accepted at EMNLP 22, for more details see https://www.artelingo.org/

  17. arXiv:2210.07271  [pdf, other

    cs.LG

    BLOX: Macro Neural Architecture Search Benchmark and Algorithms

    Authors: Thomas Chun Pong Chau, Łukasz Dudziak, Hongkai Wen, Nicholas Donald Lane, Mohamed S Abdelfattah

    Abstract: Neural architecture search (NAS) has been successfully used to design numerous high-performance neural networks. However, NAS is typically compute-intensive, so most existing approaches restrict the search to decide the operations and topological structure of a single block only, then the same block is stacked repeatedly to form an end-to-end model. Although such an approach reduces the size of se… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: Published in the Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks

  18. arXiv:2209.09570  [pdf, other

    cs.AR cs.LG

    Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design

    Authors: Hongxiang Fan, Thomas Chau, Stylianos I. Venieris, Royson Lee, Alexandros Kouris, Wayne Luk, Nicholas D. Lane, Mohamed S. Abdelfattah

    Abstract: Attention-based neural networks have become pervasive in many AI tasks. Despite their excellent algorithmic performance, the use of the attention mechanism and feed-forward network (FFN) demands excessive computational and memory resources, which often compromises their hardware performance. Although various sparse variants have been introduced, most approaches only focus on mitigating the quadrat… ▽ More

    Submitted 20 September, 2022; originally announced September 2022.

    Comments: Paper accepted by MICRO'22

  19. arXiv:2209.04966  [pdf, other

    cs.CV cs.RO

    Multi-modal Streaming 3D Object Detection

    Authors: Mazen Abdelfattah, Kaiwen Yuan, Z. Jane Wang, Rabab Ward

    Abstract: Modern autonomous vehicles rely heavily on mechanical LiDARs for perception. Current perception methods generally require 360° point clouds, collected sequentially as the LiDAR scans the azimuth and acquires consecutive wedge-shaped slices. The acquisition latency of a full scan (~ 100ms) may lead to outdated perception which is detrimental to safe operation. Recent streaming perception works prop… ▽ More

    Submitted 11 September, 2022; originally announced September 2022.

  20. Logic Shrinkage: Learned FPGA Netlist Sparsity for Efficient Neural Network Inference

    Authors: Erwei Wang, James J. Davis, Georgios-Ilias Stavrou, Peter Y. K. Cheung, George A. Constantinides, Mohamed S. Abdelfattah

    Abstract: FPGA-specific DNN architectures using the native LUTs as independently trainable inference operators have been shown to achieve favorable area-accuracy and energy-accuracy tradeoffs. The first work in this area, LUTNet, exhibited state-of-the-art performance for standard DNN benchmarks. In this paper, we propose the learned optimization of such LUT-based topologies, resulting in higher-efficiency… ▽ More

    Submitted 2 January, 2022; v1 submitted 4 December, 2021; originally announced December 2021.

    Comments: Accepted manuscript uploaded 04/12/21. DOA 22/11/21

  21. arXiv:2108.08305  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Temporal Kernel Consistency for Blind Video Super-Resolution

    Authors: Lichuan Xiang, Royson Lee, Mohamed S. Abdelfattah, Nicholas D. Lane, Hongkai Wen

    Abstract: Deep learning-based blind super-resolution (SR) methods have recently achieved unprecedented performance in upscaling frames with unknown degradation. These models are able to accurately estimate the unknown downscaling kernel from a given low-resolution (LR) image in order to leverage the kernel during restoration. Although these approaches have largely been successful, they are predominantly ima… ▽ More

    Submitted 18 August, 2021; originally announced August 2021.

  22. arXiv:2106.06799  [pdf, other

    cs.LG cs.AI

    Zero-Cost Operation Scoring in Differentiable Architecture Search

    Authors: Lichuan Xiang, Łukasz Dudziak, Mohamed S. Abdelfattah, Thomas Chau, Nicholas D. Lane, Hongkai Wen

    Abstract: We formalize and analyze a fundamental component of differentiable neural architecture search (NAS): local "operation scoring" at each operation choice. We view existing operation scoring functions as inexact proxies for accuracy, and we find that they perform poorly when analyzed empirically on NAS benchmarks. From this perspective, we introduce a novel \textit{perturbation-based zero-cost operat… ▽ More

    Submitted 8 February, 2023; v1 submitted 12 June, 2021; originally announced June 2021.

    Comments: Accepted at AAAI 2023

  23. arXiv:2106.02740  [pdf, other

    cs.CV

    ZeroWaste Dataset: Towards Deformable Object Segmentation in Cluttered Scenes

    Authors: Dina Bashkirova, Mohamed Abdelfattah, Ziliang Zhu, James Akl, Fadi Alladkani, ** Hu, Vitaly Ablavsky, Berk Calli, Sarah Adel Bargal, Kate Saenko

    Abstract: Less than 35% of recyclable waste is being actually recycled in the US, which leads to increased soil and sea pollution and is one of the major concerns of environmental researchers as well as the common public. At the heart of the problem are the inefficiencies of the waste sorting process (separating paper, plastic, metal, glass, etc.) due to the extremely complex and cluttered nature of the was… ▽ More

    Submitted 16 May, 2022; v1 submitted 4 June, 2021; originally announced June 2021.

  24. arXiv:2103.09448  [pdf, other

    cs.CV cs.CR cs.GR cs.LG

    Adversarial Attacks on Camera-LiDAR Models for 3D Car Detection

    Authors: Mazen Abdelfattah, Kaiwen Yuan, Z. Jane Wang, Rabab Ward

    Abstract: Most autonomous vehicles (AVs) rely on LiDAR and RGB camera sensors for perception. Using these point cloud and image data, perception models based on deep neural nets (DNNs) have achieved state-of-the-art performance in 3D detection. The vulnerability of DNNs to adversarial attacks has been heavily investigated in the RGB image domain and more recently in the point cloud domain, but rarely in bot… ▽ More

    Submitted 21 September, 2021; v1 submitted 17 March, 2021; originally announced March 2021.

    Comments: arXiv admin note: text overlap with arXiv:2101.10747 Updates in v2: Expanded conclusion and future work, reduced Figure 5's size, and a small correction in Table 3

  25. Towards Universal Physical Attacks On Cascaded Camera-Lidar 3D Object Detection Models

    Authors: Mazen Abdelfattah, Kaiwen Yuan, Z. Jane Wang, Rabab Ward

    Abstract: We propose a universal and physically realizable adversarial attack on a cascaded multi-modal deep learning network (DNN), in the context of self-driving cars. DNNs have achieved high performance in 3D object detection, but they are known to be vulnerable to adversarial attacks. These attacks have been heavily investigated in the RGB image domain and more recently in the point cloud domain, but ra… ▽ More

    Submitted 31 January, 2021; v1 submitted 26 January, 2021; originally announced January 2021.

    Journal ref: 2021 IEEE International Conference on Image Processing (ICIP)

  26. arXiv:2101.08134  [pdf, other

    cs.LG cs.AI cs.NE

    Zero-Cost Proxies for Lightweight NAS

    Authors: Mohamed S. Abdelfattah, Abhinav Mehrotra, Łukasz Dudziak, Nicholas D. Lane

    Abstract: Neural Architecture Search (NAS) is quickly becoming the standard methodology to design neural network models. However, NAS is typically compute-intensive because multiple models need to be evaluated before choosing the best one. To reduce the computational power and time needed, a proxy task is often used for evaluating each model instead of full training. In this paper, we evaluate conventional… ▽ More

    Submitted 19 March, 2021; v1 submitted 20 January, 2021; originally announced January 2021.

    Comments: ICLR 2021

  27. arXiv:2008.02897  [pdf, other

    cs.LG stat.ML

    Iterative Compression of End-to-End ASR Model using AutoML

    Authors: Abhinav Mehrotra, Łukasz Dudziak, **su Yeo, Young-yoon Lee, Ravichander Vipperla, Mohamed S. Abdelfattah, Sourav Bhattacharya, Samin Ishtiaq, Alberto Gil C. P. Ramos, SangJeong Lee, Daehyun Kim, Nicholas D. Lane

    Abstract: Increasing demand for on-device Automatic Speech Recognition (ASR) systems has resulted in renewed interests in develo** automatic model compression techniques. Past research have shown that AutoML-based Low Rank Factorization (LRF) technique, when applied to an end-to-end Encoder-Attention-Decoder style ASR model, can achieve a speedup of up to 3.7x, outperforming laborious manual rank-selectio… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

    Journal ref: INTERSPEECH 2020

  28. arXiv:2007.08668  [pdf, other

    cs.LG eess.SP stat.ML

    BRP-NAS: Prediction-based NAS using GCNs

    Authors: Łukasz Dudziak, Thomas Chau, Mohamed S. Abdelfattah, Royson Lee, Hyeji Kim, Nicholas D. Lane

    Abstract: Neural architecture search (NAS) enables researchers to automatically explore broad design spaces in order to improve efficiency of neural networks. This efficiency is especially important in the case of on-device deployment, where improvements in accuracy should be balanced out with computational demands of a model. In practice, performance metrics of model are computationally expensive to obtain… ▽ More

    Submitted 19 January, 2021; v1 submitted 16 July, 2020; originally announced July 2020.

    Comments: Published at NeurIPS 2020

  29. arXiv:2007.04356  [pdf, other

    eess.IV cs.CV

    Journey Towards Tiny Perceptual Super-Resolution

    Authors: Royson Lee, Łukasz Dudziak, Mohamed Abdelfattah, Stylianos I. Venieris, Hyeji Kim, Hongkai Wen, Nicholas D. Lane

    Abstract: Recent works in single-image perceptual super-resolution (SR) have demonstrated unprecedented performance in generating realistic textures by means of deep convolutional networks. However, these convolutional models are excessively large and expensive, hindering their effective deployment to end devices. In this work, we propose a neural architecture search (NAS) approach that integrates NAS and g… ▽ More

    Submitted 8 July, 2020; originally announced July 2020.

    Comments: Accepted at the 16th European Conference on Computer Vision (ECCV), 2020

  30. arXiv:2002.05022  [pdf, other

    eess.SP cs.LG

    Best of Both Worlds: AutoML Codesign of a CNN and its Hardware Accelerator

    Authors: Mohamed S. Abdelfattah, Łukasz Dudziak, Thomas Chau, Royson Lee, Hyeji Kim, Nicholas D. Lane

    Abstract: Neural architecture search (NAS) has been very successful at outperforming human-designed convolutional neural networks (CNN) in accuracy, and when hardware information is present, latency as well. However, NAS-designed CNNs typically have a complicated topology, therefore, it may be difficult to design a custom hardware (HW) accelerator for such CNNs. We automate HW-CNN codesign using NAS by incl… ▽ More

    Submitted 6 March, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

    Comments: accepted at DAC 2020

  31. arXiv:1907.03540  [pdf, other

    cs.LG cs.AI eess.AS stat.ML

    ShrinkML: End-to-End ASR Model Compression Using Reinforcement Learning

    Authors: Łukasz Dudziak, Mohamed S. Abdelfattah, Ravichander Vipperla, Stefanos Laskaridis, Nicholas D. Lane

    Abstract: End-to-end automatic speech recognition (ASR) models are increasingly large and complex to achieve the best possible accuracy. In this paper, we build an AutoML system that uses reinforcement learning (RL) to optimize the per-layer compression ratios when applied to a state-of-the-art attention based end-to-end ASR model composed of several LSTM layers. We use singular value decomposition (SVD) lo… ▽ More

    Submitted 24 September, 2019; v1 submitted 8 July, 2019; originally announced July 2019.

    Comments: INTERSPEECH 2019

  32. arXiv:1807.06434  [pdf, other

    cs.DC cs.AR eess.SP

    DLA: Compiler and FPGA Overlay for Neural Network Inference Acceleration

    Authors: Mohamed S. Abdelfattah, David Han, Andrew Bitar, Roberto DiCecco, Shane OConnell, Nitika Shanker, Joseph Chu, Ian Prins, Joshua Fender, Andrew C. Ling, Gordon R. Chiu

    Abstract: Overlays have shown significant promise for field-programmable gate-arrays (FPGAs) as they allow for fast development cycles and remove many of the challenges of the traditional FPGA hardware design flow. However, this often comes with a significant performance burden resulting in very little adoption of overlays for practical applications. In this paper, we tailor an overlay to a specific applica… ▽ More

    Submitted 13 July, 2018; originally announced July 2018.

    Comments: Accepted in the International Conference on Field-Programmable Logic and Applications (FPL 2018)