Skip to main content

Showing 1–15 of 15 results for author: Erez, M

.
  1. arXiv:2312.00647  [pdf, other

    cs.OS

    MaxMem: Colocation and Performance for Big Data Applications on Tiered Main Memory Servers

    Authors: Amanda Raybuck, Wei Zhang, Kayvan Mansoorshahi, Aditya K. Kamath, Mattan Erez, Simon Peter

    Abstract: We present MaxMem, a tiered main memory management system that aims to maximize Big Data application colocation and performance. MaxMem uses an application-agnostic and lightweight memory occupancy control mechanism based on fast memory miss ratios to provide application QoS under increasing colocation. By relying on memory access sampling and binning to quickly identify per-process memory heat gr… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: 12 pages, 10 figures

  2. arXiv:2310.01664  [pdf, other

    cs.LG cs.AI cs.CR

    Artemis: HE-Aware Training for Efficient Privacy-Preserving Machine Learning

    Authors: Yeonsoo Jeon, Mattan Erez, Michael Orshansky

    Abstract: Privacy-Preserving ML (PPML) based on Homomorphic Encryption (HE) is a promising foundational privacy technology. Making it more practical requires lowering its computational cost, especially, in handling modern large deep neural networks. Model compression via pruning is highly effective in conventional plaintext ML but cannot be effectively applied to HE-PPML as is. We propose Artemis, a highl… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

  3. arXiv:2309.15881  [pdf, other

    cs.LG cs.AI

    Enhancing Cross-Category Learning in Recommendation Systems with Multi-Layer Embedding Training

    Authors: Zihao Deng, Benjamin Ghaemmaghami, Ashish Kumar Singh, Benjamin Cho, Leo Orshansky, Mattan Erez, Michael Orshansky

    Abstract: Modern DNN-based recommendation systems rely on training-derived embeddings of sparse features. Input sparsity makes obtaining high-quality embeddings for rarely-occurring categories harder as their representations are updated infrequently. We demonstrate a training-time technique to produce superior embeddings via effective cross-category learning and theoretically explain its surprising effectiv… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: This is the preprint of our paper accepted at ACML 2023

  4. arXiv:2301.04228  [pdf, other

    cs.AR

    Harvesting L2 Caches in Server Processors

    Authors: Majid Jalili, Mattan Erez

    Abstract: We make three observations in modern processors: (1) LLC capacity is getting larger (up to 1GB); (2) core counts are increasing (up to 128 cores), accumulating a more significant amount of private L2 cache capacity on the chip; and (3) overall processor utilization in the cloud remains very low despite many efforts, leaving many large private caches unused. To enable better use of these beefy proc… ▽ More

    Submitted 27 February, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

  5. SecDDR: Enabling Low-Cost Secure Memories by Protecting the DDR Interface

    Authors: Ali Fakhrzadehgan, Prakash Ramrakhyani, Moinuddin K. Qureshi, Mattan Erez

    Abstract: The security goals of cloud providers and users include memory confidentiality and integrity, which requires implementing Replay-Attack protection (RAP). RAP can be achieved using integrity trees or mutually authenticated channels. Integrity trees incur significant performance overheads and are impractical for protecting large memories. Mutually authenticated channels have been proposed only for p… ▽ More

    Submitted 27 October, 2023; v1 submitted 1 September, 2022; originally announced September 2022.

    Journal ref: 2023 53rd IEEE/IFIP DSN, Porto, Portugal, 2023, pp. 14-27

  6. arXiv:2103.14808  [pdf, other

    cs.AR

    Reducing Load Latency with Cache Level Prediction

    Authors: Majid Jalili, Mattan Erez

    Abstract: High load latency that results from deep cache hierarchies and relatively slow main memory is an important limiter of single-thread performance. Data prefetch helps reduce this latency by fetching data up the hierarchy before it is requested by load instructions. However, data prefetching has shown to be imperfect in many situations. We propose cache-level prediction to complement prefetchers. Our… ▽ More

    Submitted 27 March, 2021; originally announced March 2021.

  7. arXiv:2012.00158  [pdf, other

    cs.AR

    Accelerating Bandwidth-Bound Deep Learning Inference with Main-Memory Accelerators

    Authors: Benjamin Y. Cho, Jeageun Jung, Mattan Erez

    Abstract: DL inference queries play an important role in diverse internet services and a large fraction of datacenter cycles are spent on processing DL inference queries. Specifically, the matrix-matrix multiplication (GEMM) operations of fully-connected MLP layers dominate many inference tasks. We find that the GEMM operations for datacenter DL inference tasks are memory bandwidth bound, contrary to common… ▽ More

    Submitted 30 November, 2020; originally announced December 2020.

  8. arXiv:2010.02825  [pdf, other

    cs.AR cs.ET

    WoLFRaM: Enhancing Wear-Leveling and Fault Tolerance in Resistive Memories using Programmable Address Decoders

    Authors: Leonid Yavits, Lois Orosa, Suyash Mahar, João Dinis Ferreira, Mattan Erez, Ran Ginosar, Onur Mutlu

    Abstract: Resistive memories have limited lifetime caused by limited write endurance and highly non-uniform write access patterns. Two main techniques to mitigate endurance-related memory failures are 1) wear-leveling, to evenly distribute the writes across the entire memory, and 2) fault tolerance, to correct memory cell failures. However, one of the main open challenges in extending the lifetime of existi… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: To appear in ICCD 2020

  9. arXiv:2006.05623  [pdf, other

    cs.LG stat.ML

    Training with Multi-Layer Embeddings for Model Reduction

    Authors: Benjamin Ghaemmaghami, Zihao Deng, Benjamin Cho, Leo Orshansky, Ashish Kumar Singh, Mattan Erez, Michael Orshansky

    Abstract: Modern recommendation systems rely on real-valued embeddings of categorical features. Increasing the dimension of embedding vectors improves model accuracy but comes at a high cost to model size. We introduce a multi-layer embedding training (MLET) architecture that trains embeddings via a sequence of linear layers to derive superior embedding accuracy vs. model size trade-off. Our approach is f… ▽ More

    Submitted 9 June, 2020; originally announced June 2020.

    Comments: 10 pages, 3 figures

  10. arXiv:2004.13027  [pdf, other

    cs.LG cs.AR

    FlexSA: Flexible Systolic Array Architecture for Efficient Pruned DNN Model Training

    Authors: Sangkug Lym, Mattan Erez

    Abstract: Modern deep learning models have high memory and computation cost. To make them fast and memory-cost efficient, structured model pruning is commonly used. We find that pruning a model using a common training accelerator with large systolic arrays is extremely performance-inefficient. To make a systolic array efficient for pruning and training, we propose FlexSA, a flexible systolic array architect… ▽ More

    Submitted 27 April, 2020; originally announced April 2020.

  11. arXiv:1908.06362  [pdf, other

    cs.AR

    Near Data Acceleration with Concurrent Host Access

    Authors: Benjamin Y. Cho, Yongkee Kwon, Sangkug Lym, Mattan Erez

    Abstract: Near-data accelerators (NDAs) that are integrated with main memory have the potential for significant power and performance benefits. Fully realizing these benefits requires the large available memory capacity to be shared between the host and the NDAs in a way that permits both regular memory access by some applications and accelerating others with an NDA, avoids copying data, enables collaborati… ▽ More

    Submitted 30 November, 2020; v1 submitted 17 August, 2019; originally announced August 2019.

  12. DeLTA: GPU Performance Model for Deep Learning Applications with In-depth Memory System Traffic Analysis

    Authors: Sangkug Lym, Donghyuk Lee, Mike O'Connor, Niladrish Chatterjee, Mattan Erez

    Abstract: Training convolutional neural networks (CNNs) requires intense compute throughput and high memory bandwidth. Especially, convolution layers account for the majority of the execution time of CNN training, and GPUs are commonly used to accelerate these layer workloads. GPU design optimization for efficient CNN training acceleration requires the accurate modeling of how their performance improves whe… ▽ More

    Submitted 2 April, 2019; originally announced April 2019.

  13. arXiv:1903.02596  [pdf, other

    cs.AR

    Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs

    Authors: Esha Choukse, Michael Sullivan, Mike O'Connor, Mattan Erez, Jeff Pool, David Nellans, Steve Keckler

    Abstract: GPUs offer orders-of-magnitude higher memory bandwidth than traditional CPU-only systems. However, GPU device memory tends to be relatively small and the memory capacity can not be increased by the user. This paper describes Buddy Compression, a scheme to increase both the effective GPU memory capacity and bandwidth while avoiding the downsides of conventional memory-expanding strategies. Buddy Co… ▽ More

    Submitted 15 April, 2019; v1 submitted 6 March, 2019; originally announced March 2019.

  14. PruneTrain: Fast Neural Network Training by Dynamic Sparse Model Reconfiguration

    Authors: Sangkug Lym, Esha Choukse, Siavash Zangeneh, Wei Wen, Sujay Sanghavi, Mattan Erez

    Abstract: State-of-the-art convolutional neural networks (CNNs) used in vision applications have large models with numerous weights. Training these models is very compute- and memory-resource intensive. Much research has been done on pruning or compressing these models to reduce the cost of inference, but little work has addressed the costs of training. We focus precisely on accelerating training. We propos… ▽ More

    Submitted 9 December, 2019; v1 submitted 26 January, 2019; originally announced January 2019.

  15. arXiv:1810.00307  [pdf, other

    cs.LG cs.AR

    Mini-batch Serialization: CNN Training with Inter-layer Data Reuse

    Authors: Sangkug Lym, Armand Behroozi, Wei Wen, Ge Li, Yongkee Kwon, Mattan Erez

    Abstract: Training convolutional neural networks (CNNs) requires intense computations and high memory bandwidth. We find that bandwidth today is over-provisioned because most memory accesses in CNN training can be eliminated by rearranging computation to better utilize on-chip buffers and avoid traffic resulting from large per-layer memory footprints. We introduce the MBS CNN training approach that signific… ▽ More

    Submitted 4 May, 2019; v1 submitted 29 September, 2018; originally announced October 2018.