Skip to main content

Showing 1–17 of 17 results for author: Licht, J d F

.
  1. Wavefront Threading Enables Effective High-Level Synthesis

    Authors: Blake Pelton, Adam Sapek, Ken Eguro, Daniel Lo, Alessandro Forin, Matt Humphrey, **wen Xi, David Cox, Rajas Karandikar, Johannes de Fine Licht, Evgeny Babin, Adrian Caulfield, Doug Burger

    Abstract: Digital systems are growing in importance and computing hardware is growing more heterogeneous. Hardware design, however, remains laborious and expensive, in part due to the limitations of conventional hardware description languages (HDLs) like VHDL and Verilog. A longstanding research goal has been programming hardware like software, with high-level languages that can generate efficient hardware… ▽ More

    Submitted 10 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted to PLDI'24

  2. arXiv:2306.11182  [pdf, other

    cs.LG cs.DB cs.IR

    Co-design Hardware and Algorithm for Vector Search

    Authors: Wenqi Jiang, Shigang Li, Yu Zhu, Johannes de Fine Licht, Zhenhao He, Runbin Shi, Cedric Renggli, Shuai Zhang, Theodoros Rekatsinas, Torsten Hoefler, Gustavo Alonso

    Abstract: Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents. As performance demands for vector search systems surge, accelerated hardware of… ▽ More

    Submitted 6 July, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

    Comments: 11 pages

  3. arXiv:2306.02730  [pdf, other

    cs.DC

    Streaming Task Graph Scheduling for Dataflow Architectures

    Authors: Tiziano De Matteis, Lukas Gianinazzi, Johannes de Fine Licht, Torsten Hoefler

    Abstract: Dataflow devices represent an avenue towards saving the control and data movement overhead of Load-Store Architectures. Various dataflow accelerators have been proposed, but how to efficiently schedule applications on such devices remains an open problem. The programmer can explicitly implement both temporal and spatial parallelism, and pipelining across multiple processing elements can be crucial… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  4. arXiv:2212.13768  [pdf, other

    cs.DC cs.PL

    Python FPGA Programming with Data-Centric Multi-Level Design

    Authors: Johannes de Fine Licht, Tiziano De Matteis, Tal Ben-Nun, Andreas Kuster, Oliver Rausch, Manuel Burger, Carl-Johannes Johnsen, Torsten Hoefler

    Abstract: Although high-level synthesis (HLS) tools have significantly improved programmer productivity over hardware description languages, develo** for FPGAs remains tedious and error prone. Programmers must learn and implement a large set of vendor-specific syntax, patterns, and tricks to optimize (or even successfully compile) their applications, while dealing with ever-changing toolflows from the FPG… ▽ More

    Submitted 28 December, 2022; originally announced December 2022.

  5. arXiv:2210.04598  [pdf, other

    cs.DC cs.PF

    Temporal Vectorization: A Compiler Approach to Automatic Multi-Pum**

    Authors: Carl-Johannes Johnsen, Tiziano De Matteis, Tal Ben-Nun, Johannes de Fine Licht, Torsten Hoefler

    Abstract: The multi-pum** resource sharing technique can overcome the limitations commonly found in single-clocked FPGA designs by allowing hardware components to operate at a higher clock frequency than the surrounding system. However, this optimization cannot be expressed in high levels of abstraction, such as HLS, requiring the use of hand-optimized RTL. In this paper we show how to leverage multiple c… ▽ More

    Submitted 19 September, 2022; originally announced October 2022.

  6. arXiv:2204.06256  [pdf, other

    cs.DC

    Fast Arbitrary Precision Floating Point on FPGA

    Authors: Johannes de Fine Licht, Christopher A. Pattison, Alexandros Nikolaos Ziogas, David Simmons-Duffin, Torsten Hoefler

    Abstract: Numerical codes that require arbitrary precision floating point (APFP) numbers for their core computation are dominated by elementary arithmetic operations due to the super-linear complexity of multiplication in the number of mantissa bits. APFP computations on conventional software-based architectures are made exceedingly expensive by the lack of native hardware support, requiring elementary oper… ▽ More

    Submitted 13 April, 2022; originally announced April 2022.

  7. arXiv:2112.11879  [pdf, other

    cs.PL cs.DC cs.PF

    Lifting C Semantics for Dataflow Optimization

    Authors: Alexandru Calotoiu, Tal Ben-Nun, Grzegorz Kwasniewski, Johannes de Fine Licht, Timo Schneider, Philipp Schaad, Torsten Hoefler

    Abstract: C is the lingua franca of programming and almost any device can be programmed using C. However, programming mod-ern heterogeneous architectures such as multi-core CPUs and GPUs requires explicitly expressing parallelism as well as device-specific properties such as memory hierarchies. The resulting code is often hard to understand, debug, and modify for different architectures. We propose to lift… ▽ More

    Submitted 24 May, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

  8. arXiv:2107.00555  [pdf, other

    cs.PL cs.DC cs.PF

    Productivity, Portability, Performance: Data-Centric Python

    Authors: Alexandros Nikolaos Ziogas, Timo Schneider, Tal Ben-Nun, Alexandru Calotoiu, Tiziano De Matteis, Johannes de Fine Licht, Luca Lavarini, Torsten Hoefler

    Abstract: Python has become the de facto language for scientific computing. Programming in Python is highly productive, mainly due to its rich science-oriented software ecosystem built around the NumPy module. As a result, the demand for Python support in High Performance Computing (HPC) has skyrocketed. However, the Python language itself does not necessarily offer high performance. In this work, we presen… ▽ More

    Submitted 23 August, 2021; v1 submitted 1 July, 2021; originally announced July 2021.

  9. arXiv:2010.15218  [pdf, other

    cs.DC

    StencilFlow: Map** Large Stencil Programs to Distributed Spatial Computing Systems

    Authors: Johannes de Fine Licht, Andreas Kuster, Tiziano De Matteis, Tal Ben-Nun, Dominic Hofer, Torsten Hoefler

    Abstract: Spatial computing devices have been shown to significantly accelerate stencil computations, but have so far relied on unrolling the iterative dimension of a single stencil operation to increase temporal locality. This work considers the general case of map** directed acyclic graphs of heterogeneous stencil computations to spatial computing systems, assuming large input programs without an iterat… ▽ More

    Submitted 11 January, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

  10. arXiv:2010.14684  [pdf, other

    cs.DC cs.AR cs.DS

    Substream-Centric Maximum Matchings on FPGA

    Authors: Maciej Besta, Marc Fischer, Tal Ben-Nun, Dimitri Stanojevic, Johannes De Fine Licht, Torsten Hoefler

    Abstract: Develo** high-performance and energy-efficient algorithms for maximum matchings is becoming increasingly important in social network analysis, computational sciences, scheduling, and others. In this work, we propose the first maximum matching algorithm designed for FPGAs; it is energy-efficient and has provable guarantees on accuracy, performance, and storage utilization. To achieve this, we for… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

    Comments: Best Paper finalist at ACM FPGA'19, invited to special issue of ACM TRETS'20

    Journal ref: Proceedings of the ACM Transactions on Reconfigurable Technology and Systems (TRETS), 2020. Proceedings of the 27th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), 2019

  11. Flexible Communication Avoiding Matrix Multiplication on FPGA with High-Level Synthesis

    Authors: Johannes de Fine Licht, Grzegorz Kwasniewski, Torsten Hoefler

    Abstract: Data movement is the dominating factor affecting performance and energy in modern computing systems. Consequently, many algorithms have been developed to minimize the number of I/O operations for common computing patterns. Matrix multiplication is no exception, and lower bounds have been proven and implemented both for shared and distributed memory systems. Reconfigurable hardware platforms are a… ▽ More

    Submitted 25 January, 2021; v1 submitted 13 December, 2019; originally announced December 2019.

    Journal ref: In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'20), February 23-25, 2020, Seaside, CA, USA

  12. arXiv:1910.04436  [pdf, other

    cs.AR cs.DC cs.SE

    hlslib: Software Engineering for Hardware Design

    Authors: Johannes de Fine Licht, Torsten Hoefler

    Abstract: High-level synthesis (HLS) tools have brought FPGA development into the mainstream, by allowing programmers to design architectures using familiar languages such as C, C++, and OpenCL. While the move to these languages has brought significant benefits, many aspects of traditional software engineering are still unsupported, or not exploited by developers in practice. Furthermore, designing reconfig… ▽ More

    Submitted 10 October, 2019; originally announced October 2019.

    Comments: 4 pages extended abstract accepted to H2RC'19

  13. Streaming Message Interface: High-Performance Distributed Memory Programming on Reconfigurable Hardware

    Authors: Tiziano De Matteis, Johannes de Fine Licht, Jakub Beránek, Torsten Hoefler

    Abstract: Distributed memory programming is the established paradigm used in high-performance computing (HPC) systems, requiring explicit communication between nodes and devices. When FPGAs are deployed in distributed settings, communication is typically handled either by going through the host machine, sacrificing performance, or by streaming across fixed device-to-device connections, sacrificing flexibili… ▽ More

    Submitted 7 September, 2019; originally announced September 2019.

  14. FBLAS: Streaming Linear Algebra on FPGA

    Authors: Tiziano De Matteis, Johannes de Fine Licht, Torsten Hoefler

    Abstract: Spatial computing architectures pose an attractive alternative to mitigate control and data movement overheads typical of load-store architectures. In practice, these devices are rarely considered in the HPC community due to the steep learning curve, low productivity and lack of available libraries for fundamental operations. High-level synthesis (HLS) tools are facilitating hardware programming,… ▽ More

    Submitted 1 September, 2020; v1 submitted 18 July, 2019; originally announced July 2019.

  15. arXiv:1903.06697  [pdf, other

    cs.DC cs.AR

    Graph Processing on FPGAs: Taxonomy, Survey, Challenges

    Authors: Maciej Besta, Dimitri Stanojevic, Johannes De Fine Licht, Tal Ben-Nun, Torsten Hoefler

    Abstract: Graph processing has become an important part of various areas, such as machine learning, computational sciences, medical applications, social network analysis, and many others. Various graphs, for example web or social networks, may contain up to trillions of edges. The sheer size of such datasets, combined with the irregular nature of graph processing, poses unique challenges for the runtime and… ▽ More

    Submitted 27 April, 2019; v1 submitted 24 February, 2019; originally announced March 2019.

  16. arXiv:1902.10345  [pdf, other

    cs.PL cs.DC cs.PF

    Stateful Dataflow Multigraphs: A Data-Centric Model for Performance Portability on Heterogeneous Architectures

    Authors: Tal Ben-Nun, Johannes de Fine Licht, Alexandros Nikolaos Ziogas, Timo Schneider, Torsten Hoefler

    Abstract: The ubiquity of accelerators in high-performance computing has driven programming complexity beyond the skill-set of the average domain scientist. To maintain performance portability in the future, it is imperative to decouple architecture-specific programming paradigms from the underlying scientific computations. We present the Stateful DataFlow multiGraph (SDFG), a data-centric intermediate repr… ▽ More

    Submitted 2 January, 2020; v1 submitted 27 February, 2019; originally announced February 2019.

  17. arXiv:1805.08288  [pdf, other

    cs.DC cs.PL

    Transformations of High-Level Synthesis Codes for High-Performance Computing

    Authors: Johannes de Fine Licht, Maciej Besta, Simon Meierhans, Torsten Hoefler

    Abstract: Spatial computing architectures promise a major stride in performance and energy efficiency over the traditional load/store devices currently employed in large scale computing systems. The adoption of high-level synthesis (HLS) from languages such as C++ and OpenCL has greatly increased programmer productivity when designing for such platforms. While this has enabled a wider audience to target spa… ▽ More

    Submitted 23 November, 2020; v1 submitted 21 May, 2018; originally announced May 2018.

    ACM Class: I.1.3; C.1.4; D.1.3