Search | arXiv e-print repository

VaPr: Variable-Precision Tensors to Accelerate Robot Motion Planning

Authors: Yu-Shun Hsiao, Siva Kumar Sastry Hari, Balakumar Sundaralingam, Jason Yik, Thierry Tambe, Charbel Sakr, Stephen W. Keckler, Vijay Janapa Reddi

Abstract: High-dimensional motion generation requires numerical precision for smooth, collision-free solutions. Typically, double-precision or single-precision floating-point (FP) formats are utilized. Using these for big tensors imposes a strain on the memory bandwidth provided by the devices and alters the memory footprint, hence limiting their applicability to low-power edge devices needed for mobile rob… ▽ More High-dimensional motion generation requires numerical precision for smooth, collision-free solutions. Typically, double-precision or single-precision floating-point (FP) formats are utilized. Using these for big tensors imposes a strain on the memory bandwidth provided by the devices and alters the memory footprint, hence limiting their applicability to low-power edge devices needed for mobile robots. The uniform application of reduced precision can be advantageous but severely degrades solutions. Using decreased precision data types for important tensors, we propose to accelerate motion generation by removing memory bottlenecks. We propose variable-precision (VaPr) search optimization to determine the appropriate precision for large tensors from a vast search space of approximately 4 million unique combinations for FP data types across the tensors. To obtain the efficiency gains, we exploit existing platform support for an out-of-the-box GPU speedup and evaluate prospective precision converter units for GPU types that are not currently supported. Our experimental results on 800 planning problems for the Franka Panda robot on the MotionBenchmaker dataset across 8 environments show that a 4-bit FP format is sufficient for the largest set of tensors in the motion generation stack. With the software-only solution, VaPr achieves 6.3% and 6.3% speedups on average for a significant portion of motion generation over the SOTA solution (CuRobo) on Jetson Orin and RTX2080 Ti GPU, respectively, and 9.9%, 17.7% speedups with the FP converter. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: 7 pages, 5 figures, 8 tables, to be published in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:2310.06471 [pdf]

doi 10.1021/acs.nanolett.1c01570

Silicon Nanoantenna Mix Arrays for a Trifecta of Quantum Emitter Enhancements

Authors: Zhaogang Dong, Sergey Gorelik, Ramón Paniagua-Dominguez, Johnathan Yik, **fa Ho, Febiana Tjiptoharsono, Emmanuel Lassalle, Soroosh Daqiqeh Rezaei, Darren C. J. Neo, ** Bai, Arseniy I. Kuznetsov, Joel K. W. Yang

Abstract: Dielectric nanostructures have demonstrated optical antenna effects due to Mie resonances. Preliminary investigations on dielectric nanoantennas have been carried out for a trifecta of enhancements, i.e., simultaneous enhancements in absorption, emission directionality and radiative decay rates of quantum emitters. However, these investigations are limited by fragile substrates or low Purcell fact… ▽ More Dielectric nanostructures have demonstrated optical antenna effects due to Mie resonances. Preliminary investigations on dielectric nanoantennas have been carried out for a trifecta of enhancements, i.e., simultaneous enhancements in absorption, emission directionality and radiative decay rates of quantum emitters. However, these investigations are limited by fragile substrates or low Purcell factor, which is extremely important for exciting quantum emitters electrically. In this paper, we present a Si mix antenna array to achieve the trifecta enhancement of ~1200 fold with a Purcell factor of ~47. The antenna design incorporates ~10 nm gaps within which fluorescent molecules strongly absorb the pump laser energy through a resonant mode. In the emission process, the antenna array increases the radiative decay rates of the fluorescence molecules via Purcell effect and provides directional emission through a separate mode. This work could lead to novel CMOS compatible platforms for enhancing fluorescence for biological and chemical applications. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: 20 pages, 4 figures

Journal ref: Nano Letters 21, 4853-4860 (2021)

arXiv:2304.04640 [pdf, other]

NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and Systems

Authors: Jason Yik, Korneel Van den Berghe, Douwe den Blanken, Younes Bouhadjar, Maxime Fabre, Paul Hueber, Denis Kleyko, Noah Pacik-Nelson, Pao-Sheng Vincent Sun, Guangzhi Tang, Shenqi Wang, Biyan Zhou, Soikat Hasan Ahmed, George Vathakkattil Joseph, Benedetto Leto, Aurora Micheli, Anurag Kumar Mishra, Gregor Lenz, Tao Sun, Zergham Ahmed, Mahmoud Akl, Brian Anderson, Andreas G. Andreou, Chiara Bartolozzi, Arindam Basu , et al. (73 additional authors not shown)

Abstract: Neuromorphic computing shows promise for advancing computing efficiency and capabilities of AI applications using brain-inspired principles. However, the neuromorphic research field currently lacks standardized benchmarks, making it difficult to accurately measure technological advancements, compare performance with conventional methods, and identify promising future research directions. Prior neu… ▽ More Neuromorphic computing shows promise for advancing computing efficiency and capabilities of AI applications using brain-inspired principles. However, the neuromorphic research field currently lacks standardized benchmarks, making it difficult to accurately measure technological advancements, compare performance with conventional methods, and identify promising future research directions. Prior neuromorphic computing benchmark efforts have not seen widespread adoption due to a lack of inclusive, actionable, and iterative benchmark design and guidelines. To address these shortcomings, we present NeuroBench: a benchmark framework for neuromorphic computing algorithms and systems. NeuroBench is a collaboratively-designed effort from an open community of nearly 100 co-authors across over 50 institutions in industry and academia, aiming to provide a representative structure for standardizing the evaluation of neuromorphic approaches. The NeuroBench framework introduces a common set of tools and systematic methodology for inclusive benchmark measurement, delivering an objective reference framework for quantifying neuromorphic approaches in both hardware-independent (algorithm track) and hardware-dependent (system track) settings. In this article, we present initial performance baselines across various model architectures on the algorithm track and outline the system track benchmark tasks and guidelines. NeuroBench is intended to continually expand its benchmarks and features to foster and track the progress made by the research community. △ Less

Submitted 17 January, 2024; v1 submitted 10 April, 2023; originally announced April 2023.

Comments: Updated from whitepaper to full perspective article preprint

arXiv:2211.08675 [pdf, other]

XRBench: An Extended Reality (XR) Machine Learning Benchmark Suite for the Metaverse

Authors: Hyoukjun Kwon, Krishnakumar Nair, Jamin Seo, Jason Yik, Debabrata Mohapatra, Dongyuan Zhan, **ook Song, Peter Capak, Peizhao Zhang, Peter Vajda, Colby Banbury, Mark Mazumder, Liangzhen Lai, Ashish Sirasao, Tushar Krishna, Harshit Khaitan, Vikas Chandra, Vijay Janapa Reddi

Abstract: Real-time multi-task multi-model (MTMM) workloads, a new form of deep learning inference workloads, are emerging for applications areas like extended reality (XR) to support metaverse use cases. These workloads combine user interactivity with computationally complex machine learning (ML) activities. Compared to standard ML applications, these ML workloads present unique difficulties and constraint… ▽ More Real-time multi-task multi-model (MTMM) workloads, a new form of deep learning inference workloads, are emerging for applications areas like extended reality (XR) to support metaverse use cases. These workloads combine user interactivity with computationally complex machine learning (ML) activities. Compared to standard ML applications, these ML workloads present unique difficulties and constraints. Real-time MTMM workloads impose heterogeneity and concurrency requirements on future ML systems and devices, necessitating the development of new capabilities. This paper begins with a discussion of the various characteristics of these real-time MTMM ML workloads and presents an ontology for evaluating the performance of future ML hardware for XR systems. Next, we present XRBENCH, a collection of MTMM ML tasks, models, and usage scenarios that execute these models in three representative ways: cascaded, concurrent, and cascaded-concurrent for XR use cases. Finally, we emphasize the need for new metrics that capture the requirements properly. We hope that our work will stimulate research and lead to the development of a new generation of ML systems for XR use cases. XRBench is available as an open-source project: https://github.com/XRBench △ Less

Submitted 19 May, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

arXiv:2012.10848 [pdf, other]

IntersectX: An Efficient Accelerator for Graph Mining

Authors: Gengyu Rao, **gji Chen, Jason Yik, Xuehai Qian

Abstract: Graph pattern mining applications try to find all embeddings that match specific patterns. Compared to the traditional graph computation, graph mining applications are computation-intensive. The state-of-the-art method, pattern enumeration, constructs the embeddings that match the pattern. The key operation -- intersection -- of two edge lists, poses challenges to conventional architectures and re… ▽ More Graph pattern mining applications try to find all embeddings that match specific patterns. Compared to the traditional graph computation, graph mining applications are computation-intensive. The state-of-the-art method, pattern enumeration, constructs the embeddings that match the pattern. The key operation -- intersection -- of two edge lists, poses challenges to conventional architectures and requires substantial execution time. In this paper, we propose IntersectX, a vertically designed accelerator for pattern enumeration with stream instruction set extension and architectural supports based on conventional processor. The stream based ISA can be considered as a natural extension to the traditional instructions that operate on scalar values. We develop the IntersectX architecture composed of specialized mechanisms that efficiently implement the stream ISA extensions, including: (1) Stream Map** Table (SMT) that records the map** between stream ID and stream register; (2) the read-only Stream Cache (S-Cache) that enables efficient stream data movements; (3) tracking the dependency between streams with a property of intersection; (4) Stream Value Processing Unit (SVPU) that implements sparse value computations; and (5) the nested intersection translator that generates micro-op sequences for implementing nested intersections. We implement IntersectX ISA and architecture on zSim, and test it with seven popular graph mining applications (triangle/three-chain/tailed-traingle counting, 3-motif mining, 4/5-clique counting, and FSM) on ten real graphs. We develop our own implementation of AutoMine (InHouseAutomine). The results show that IntersectX significantly outperforms InHouseAutomine on CPU, on average 10.7 times and up to 83.9 times ; and GRAMER, a state-of-the-art graph pattern mining accelerator, based on exhaustive check, on average 40.1 times and up to 181.8 times. △ Less

Submitted 19 April, 2021; v1 submitted 20 December, 2020; originally announced December 2020.

Showing 1–5 of 5 results for author: Yik, J