Skip to main content

Showing 1–29 of 29 results for author: Sze, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.07266  [pdf, other

    cs.ET cs.AR

    Architecture-Level Modeling of Photonic Deep Neural Network Accelerators

    Authors: Tanner Andrulis, Gohar Irfan Chaudhry, Vinith M. Suriyakumar, Joel S. Emer, Vivienne Sze

    Abstract: Photonics is a promising technology to accelerate Deep Neural Networks as it can use optical interconnects to reduce data movement energy and it enables low-energy, high-throughput optical-analog computations. To realize these benefits in a full system (accelerator + DRAM), designers must ensure that the benefits of using the electrical, optical, analog, and digital domains exceed the costs of c… ▽ More

    Submitted 14 May, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

    Comments: Published at ISPASS 2024

  2. arXiv:2405.07259  [pdf, other

    cs.AR

    CiMLoop: A Flexible, Accurate, and Fast Compute-In-Memory Modeling Tool

    Authors: Tanner Andrulis, Joel S. Emer, Vivienne Sze

    Abstract: Compute-In-Memory (CiM) is a promising solution to accelerate Deep Neural Networks (DNNs) as it can avoid energy-intensive DNN weight movement and use memory arrays to perform low-energy, high-density computations. These benefits have inspired research across the CiM stack, but CiM research often focuses on only one level of the stack (i.e., devices, circuits, architecture, workload, or map**) o… ▽ More

    Submitted 29 May, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

    Comments: Available at https://github.com/mit-emze/cimloop. Published in ISPASS 2024

  3. arXiv:2404.06553  [pdf, other

    cs.AR

    Modeling Analog-Digital-Converter Energy and Area for Compute-In-Memory Accelerator Design

    Authors: Tanner Andrulis, Ruicong Chen, Hae-Seung Lee, Joel S. Emer, Vivienne Sze

    Abstract: Analog Compute-in-Memory (CiM) accelerators use analog-digital converters (ADCs) to read the analog values that they compute. ADCs can consume significant energy and area, so architecture-level ADC decisions such as ADC resolution or number of ADCs can significantly impact overall CiM accelerator energy and area. Therefore, modeling how architecture-level decisions affect ADC energy and area is cr… ▽ More

    Submitted 14 May, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  4. Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity

    Authors: Zi Yu Xue, Yannan Nellie Wu, Joel S. Emer, Vivienne Sze

    Abstract: Sparse tensor algebra is a challenging class of workloads to accelerate due to low arithmetic intensity and varying sparsity patterns. Prior sparse tensor algebra accelerators have explored tiling sparse data to increase exploitable data reuse and improve throughput, but typically allocate tile size in a given buffer for the worst-case data occupancy. This severely limits the utilization of availa… ▽ More

    Submitted 26 June, 2024; v1 submitted 29 September, 2023; originally announced October 2023.

    Comments: 17 pages, 13 figures, in MICRO 2023

    Journal ref: 56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '23), 2023

  5. GMMap: Memory-Efficient Continuous Occupancy Map Using Gaussian Mixture Model

    Authors: Peter Zhi Xuan Li, Sertac Karaman, Vivienne Sze

    Abstract: Energy consumption of memory accesses dominates the compute energy in energy-constrained robots which require a compact 3D map of the environment to achieve autonomy. Recent map** frameworks only focused on reducing the map size while incurring significant memory usage during map construction due to multi-pass processing of each depth image. In this work, we present a memory-efficient continuous… ▽ More

    Submitted 19 January, 2024; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: 17 pages, 12 figures, 3 tables

    Journal ref: IEEE Transactions on Robotics 40 (2024) 1339-1355

  6. HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity

    Authors: Yannan Nellie Wu, Po-An Tsai, Saurav Muralidharan, Angshuman Parashar, Vivienne Sze, Joel S. Emer

    Abstract: Due to complex interactions among various deep neural network (DNN) optimization techniques, modern DNNs can have weights and activations that are dense or sparse with diverse sparsity degrees. To offer a good trade-off between accuracy and hardware performance, an ideal DNN accelerator should have high flexibility to efficiently translate DNN sparsity into reductions in energy and/or latency with… ▽ More

    Submitted 1 October, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted to MICRO23

  7. RAELLA: Reforming the Arithmetic for Efficient, Low-Resolution, and Low-Loss Analog PIM: No Retraining Required!

    Authors: Tanner Andrulis, Joel S. Emer, Vivienne Sze

    Abstract: Processing-In-Memory (PIM) accelerators have the potential to efficiently run Deep Neural Network (DNN) inference by reducing costly data movement and by using resistive RAM (ReRAM) for efficient analog compute. Unfortunately, overall PIM accelerator efficiency is limited by energy-intensive analog-to-digital converters (ADCs). Furthermore, existing accelerators that reduce ADC cost do so by chang… ▽ More

    Submitted 16 April, 2023; originally announced April 2023.

    Comments: 16 pages; 15 figures; Accepted at ISCA 2023 (the International Symposium on Computer Architecture)

    ACM Class: C.1.3

  8. Efficient Computation of Map-scale Continuous Mutual Information on Chip in Real Time

    Authors: Keshav Gupta, Peter Zhi Xuan Li, Sertac Karaman, Vivienne Sze

    Abstract: Exploration tasks are essential to many emerging robotics applications, ranging from search and rescue to space exploration. The planning problem for exploration requires determining the best locations for future measurements that will enhance the fidelity of the map, for example, by reducing its total entropy. A widely-studied technique involves computing the Mutual Information (MI) between the c… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

  9. arXiv:2209.10507  [pdf, other

    cs.NI cs.CV

    Gemino: Practical and Robust Neural Compression for Video Conferencing

    Authors: Vibhaalakshmi Sivaraman, Pantea Karimi, Vedantha Venkatapathy, Mehrdad Khani, Sadjad Fouladi, Mohammad Alizadeh, Frédo Durand, Vivienne Sze

    Abstract: Video conferencing systems suffer from poor user experience when network conditions deteriorate because current video codecs simply cannot operate at extremely low bitrates. Recently, several neural alternatives have been proposed that reconstruct talking head videos at very low bitrates using sparse representations of each frame such as facial landmark information. However, these approaches produ… ▽ More

    Submitted 19 October, 2023; v1 submitted 21 September, 2022; originally announced September 2022.

    Comments: 13 pages, 5 appendix

    Journal ref: USENIX NSDI 2024

  10. arXiv:2207.07033  [pdf, other

    cs.AI cs.CY

    Develo** a Series of AI Challenges for the United States Department of the Air Force

    Authors: Vijay Gadepally, Gregory Angelides, Andrei Barbu, Andrew Bowne, Laura J. Brattain, Tamara Broderick, Armando Cabrera, Glenn Carl, Ronisha Carter, Miriam Cha, Emilie Cowen, Jesse Cummings, Bill Freeman, James Glass, Sam Goldberg, Mark Hamilton, Thomas Heldt, Kuan Wei Huang, Phillip Isola, Boris Katz, Jamie Koerner, Yen-Chen Lin, David Mayo, Kyle McAlpin, Taylor Perron , et al. (17 additional authors not shown)

    Abstract: Through a series of federal initiatives and orders, the U.S. Government has been making a concerted effort to ensure American leadership in AI. These broad strategy documents have influenced organizations such as the United States Department of the Air Force (DAF). The DAF-MIT AI Accelerator is an initiative between the DAF and MIT to bridge the gap between AI researchers and DAF mission requireme… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

  11. arXiv:2205.05826  [pdf, other

    cs.AR cs.CV cs.DC

    Sparseloop: An Analytical Approach To Sparse Tensor Accelerator Modeling

    Authors: Yannan Nellie Wu, Po-An Tsai, Angshuman Parashar, Vivienne Sze, Joel S. Emer

    Abstract: In recent years, many accelerators have been proposed to efficiently process sparse tensor algebra applications (e.g., sparse neural networks). However, these proposals are single points in a large and diverse design space. The lack of systematic description and modeling support for these sparse tensor accelerators impedes hardware designers from efficient and effective design space exploration. T… ▽ More

    Submitted 9 January, 2023; v1 submitted 11 May, 2022; originally announced May 2022.

    Comments: Update website link, update UOP format description

  12. arXiv:2109.00642  [pdf, other

    cs.CV

    Searching for Efficient Multi-Stage Vision Transformers

    Authors: Yi-Lun Liao, Sertac Karaman, Vivienne Sze

    Abstract: Vision Transformer (ViT) demonstrates that Transformer for natural language processing can be applied to computer vision tasks and result in comparable performance to convolutional neural networks (CNN), which have been studied and adopted in computer vision for years. This naturally raises the question of how the performance of ViT can be advanced with design techniques of CNN. To this end, we pr… ▽ More

    Submitted 1 September, 2021; originally announced September 2021.

  13. arXiv:2104.00031  [pdf, other

    cs.CV cs.LG

    NetAdaptV2: Efficient Neural Architecture Search with Fast Super-Network Training and Architecture Optimization

    Authors: Tien-Ju Yang, Yi-Lun Liao, Vivienne Sze

    Abstract: Neural architecture search (NAS) typically consists of three main steps: training a super-network, training and evaluating sampled deep neural networks (DNNs), and training the discovered DNN. Most of the existing efforts speed up some steps at the cost of a significant slowdown of other steps or sacrificing the support of non-differentiable search metrics. The unbalanced reduction in the time spe… ▽ More

    Submitted 31 March, 2021; originally announced April 2021.

    Comments: Accepted by CVPR 2021

  14. arXiv:2006.13926  [pdf, other

    cs.ET

    Freely scalable and reconfigurable optical hardware for deep learning

    Authors: Liane Bernstein, Alexander Sludds, Ryan Hamerly, Vivienne Sze, Joel Emer, Dirk Englund

    Abstract: As deep neural network (DNN) models grow ever-larger, they can achieve higher accuracy and solve more complex problems. This trend has been enabled by an increase in available compute power; however, efforts to continue to scale electronic processors are impeded by the costs of communication, thermal management, power delivery and clocking. To improve scalability, we propose a digital optical neur… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

    Comments: 19 pages (15 main and 4 supplementary), 11 figures (6 main and 5 supplementary)

  15. arXiv:2002.00297  [pdf, other

    eess.IV cs.CV

    Depth Map Estimation of Dynamic Scenes Using Prior Depth Information

    Authors: James Noraky, Vivienne Sze

    Abstract: Depth information is useful for many applications. Active depth sensors are appealing because they obtain dense and accurate depth maps. However, due to issues that range from power constraints to multi-sensor interference, these sensors cannot always be continuously used. To overcome this limitation, we propose an algorithm that estimates depth maps using concurrently collected images and a previ… ▽ More

    Submitted 1 February, 2020; originally announced February 2020.

  16. arXiv:1912.12167  [pdf, other

    cs.CV cs.ET

    Design Considerations for Efficient Deep Neural Networks on Processing-in-Memory Accelerators

    Authors: Tien-Ju Yang, Vivienne Sze

    Abstract: This paper describes various design considerations for deep neural networks that enable them to operate efficiently and accurately on processing-in-memory accelerators. We highlight important properties of these accelerators and the resulting design considerations using experiments conducted on various state-of-the-art deep neural networks with the large-scale ImageNet dataset.

    Submitted 18 December, 2019; originally announced December 2019.

    Comments: Accepted by IEDM 2019

  17. arXiv:1905.02238  [pdf, other

    cs.RO cs.IT

    FSMI: Fast computation of Shannon Mutual Information for information-theoretic map**

    Authors: Zhengdong Zhang, Trevor Henderson, Sertac Karaman, Vivienne Sze

    Abstract: Exploration tasks are embedded in many robotics applications, such as search and rescue and space exploration. Information-based exploration algorithms aim to find the most informative trajectories by maximizing an information-theoretic metric, such as the mutual information between the map and potential future measurements. Unfortunately, most existing information-based exploration algorithms are… ▽ More

    Submitted 6 May, 2019; originally announced May 2019.

  18. arXiv:1904.03257  [pdf, ps, other

    cs.LG cs.DB cs.DC cs.SE stat.ML

    MLSys: The New Frontier of Machine Learning Systems

    Authors: Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood , et al. (44 additional authors not shown)

    Abstract: Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a ne… ▽ More

    Submitted 1 December, 2019; v1 submitted 29 March, 2019; originally announced April 2019.

  19. arXiv:1903.03273  [pdf, other

    cs.CV cs.RO

    FastDepth: Fast Monocular Depth Estimation on Embedded Systems

    Authors: Diana Wofk, Fangchang Ma, Tien-Ju Yang, Sertac Karaman, Vivienne Sze

    Abstract: Depth sensing is a critical function for robotic tasks such as localization, map** and obstacle detection. There has been a significant and growing interest in depth estimation from a single RGB image, due to the relatively low cost and size of monocular cameras. However, state-of-the-art single-view depth estimation algorithms are based on fairly complex deep neural networks that are too slow f… ▽ More

    Submitted 7 March, 2019; originally announced March 2019.

    Comments: Accepted for presentation at ICRA 2019. 8 pages, 6 figures, 7 tables

  20. arXiv:1902.05093  [pdf, other

    cs.CV

    DeeperLab: Single-Shot Image Parser

    Authors: Tien-Ju Yang, Maxwell D. Collins, Yukun Zhu, Jyh-**g Hwang, Ting Liu, Xiao Zhang, Vivienne Sze, George Papandreou, Liang-Chieh Chen

    Abstract: We present a single-shot, bottom-up approach for whole image parsing. Whole image parsing, also known as Panoptic Segmentation, generalizes the tasks of semantic segmentation for 'stuff' classes and instance segmentation for 'thing' classes, assigning both semantic and instance labels to every pixel in an image. Recent approaches to whole image parsing typically employ separate standalone modules… ▽ More

    Submitted 12 March, 2019; v1 submitted 13 February, 2019; originally announced February 2019.

    Comments: 20 pages. The code of the proposed Parsing Covering metric is available at http://deeperlab.mit.edu

  21. arXiv:1809.05780  [pdf, other

    cs.RO

    Navion: A 2mW Fully Integrated Real-Time Visual-Inertial Odometry Accelerator for Autonomous Navigation of Nano Drones

    Authors: Amr Suleiman, Zhengdong Zhang, Luca Carlone, Sertac Karaman, Vivienne Sze

    Abstract: This paper presents Navion, an energy-efficient accelerator for visual-inertial odometry (VIO) that enables autonomous navigation of miniaturized robots (e.g., nano drones), and virtual/augmented reality on portable devices. The chip uses inertial measurements and mono/stereo images to estimate the drone's trajectory and a 3D map of the environment. This estimate is obtained by running a state-of-… ▽ More

    Submitted 15 September, 2018; originally announced September 2018.

  22. arXiv:1807.07928  [pdf, other

    cs.DC cs.CV

    Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices

    Authors: Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, Vivienne Sze

    Abstract: A recent trend in DNN development is to extend the reach of deep learning applications to platforms that are more resource and energy constrained, e.g., mobile devices. These endeavors aim to reduce the DNN model size and improve the hardware processing efficiency, and have resulted in DNNs that are much more compact in their structures and/or have high data sparsity. These compact or sparse model… ▽ More

    Submitted 20 May, 2019; v1 submitted 10 July, 2018; originally announced July 2018.

    Comments: accepted for publication in IEEE Journal on Emerging and Selected Topics in Circuits and Systems. This extended version on arXiv also includes Eyexam in the appendix

  23. arXiv:1804.03230  [pdf, other

    cs.CV

    NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications

    Authors: Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, Hartwig Adam

    Abstract: This work proposes an algorithm, called NetAdapt, that automatically adapts a pre-trained deep neural network to a mobile platform given a resource budget. While many existing algorithms simplify networks based on the number of MACs or weights, optimizing those indirect metrics may not necessarily reduce the direct metrics, such as latency and energy consumption. To solve this problem, NetAdapt in… ▽ More

    Submitted 28 September, 2018; v1 submitted 9 April, 2018; originally announced April 2018.

    Comments: Accepted by ECCV 2018

  24. arXiv:1703.09039  [pdf, other

    cs.CV

    Efficient Processing of Deep Neural Networks: A Tutorial and Survey

    Authors: Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, Joel Emer

    Abstract: Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without… ▽ More

    Submitted 13 August, 2017; v1 submitted 27 March, 2017; originally announced March 2017.

    Comments: Based on tutorial on DNN Hardware at eyeriss.mit.edu/tutorial.html

  25. arXiv:1703.05853  [pdf, other

    cs.CV

    Towards Closing the Energy Gap Between HOG and CNN Features for Embedded Vision

    Authors: Amr Suleiman, Yu-Hsin Chen, Joel Emer, Vivienne Sze

    Abstract: Computer vision enables a wide range of applications in robotics/drones, self-driving cars, smart Internet of Things, and portable/wearable electronics. For many of these applications, local embedded processing is preferred due to privacy and/or latency concerns. Accordingly, energy-efficient embedded vision hardware delivering real-time and robust performance is crucial. While deep learning is ga… ▽ More

    Submitted 16 March, 2017; originally announced March 2017.

  26. Hardware for Machine Learning: Challenges and Opportunities

    Authors: Vivienne Sze, Yu-Hsin Chen, Joel Emer, Amr Suleiman, Zhengdong Zhang

    Abstract: Machine learning plays a critical role in extracting meaningful information out of the zetabytes of sensor data collected every day. For some applications, the goal is to analyze and understand the data to identify trends (e.g., surveillance, portable/wearable electronics); in other applications, the goal is to take immediate action based the data (e.g., robotics/drones, self-driving cars, smart I… ▽ More

    Submitted 16 October, 2017; v1 submitted 22 December, 2016; originally announced December 2016.

    Comments: Published as an invited conference paper at CICC 2017

  27. arXiv:1611.05128  [pdf, other

    cs.CV

    Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning

    Authors: Tien-Ju Yang, Yu-Hsin Chen, Vivienne Sze

    Abstract: Deep convolutional neural networks (CNNs) are indispensable to state-of-the-art computer vision algorithms. However, they are still rarely deployed on battery-powered mobile devices, such as smartphones and wearable gadgets, where vision algorithms can enable many revolutionary real-world applications. The key limiting factor is the high energy consumption of CNN processing due to its high computa… ▽ More

    Submitted 18 April, 2017; v1 submitted 15 November, 2016; originally announced November 2016.

    Comments: Published as a conference paper at CVPR 2017

  28. arXiv:1607.08635  [pdf

    cs.CV cs.AR

    A 58.6mW Real-Time Programmable Object Detector with Multi-Scale Multi-Object Support Using Deformable Parts Model on 1920x1080 Video at 30fps

    Authors: Amr Suleiman, Zhengdong Zhang, Vivienne Sze

    Abstract: This paper presents a programmable, energy-efficient and real-time object detection accelerator using deformable parts models (DPM), with 2x higher accuracy than traditional rigid body models. With 8 deformable parts detection, three methods are used to address the high computational complexity: classification pruning for 33x fewer parts classification, vector quantization for 15x memory size redu… ▽ More

    Submitted 27 July, 2016; originally announced July 2016.

  29. arXiv:1603.08968  [pdf, other

    cs.CV

    FAST: A Framework to Accelerate Super-Resolution Processing on Compressed Videos

    Authors: Zhengdong Zhang, Vivienne Sze

    Abstract: State-of-the-art super-resolution (SR) algorithms require significant computational resources to achieve real-time throughput (e.g., 60Mpixels/s for HD video). This paper introduces FAST (Free Adaptive Super-resolution via Transfer), a framework to accelerate any SR algorithm applied to compressed videos. FAST exploits the temporal correlation between adjacent frames such that SR is only applied t… ▽ More

    Submitted 4 August, 2017; v1 submitted 29 March, 2016; originally announced March 2016.