Skip to main content

Showing 1–8 of 8 results for author: Bhushanam, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2305.01868  [pdf, other

    cs.LG cs.DC cs.IR cs.PF

    Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models

    Authors: Daochen Zha, Louis Feng, Liang Luo, Bhargav Bhushanam, Zirui Liu, Yusuo Hu, Jade Nie, Yuzhen Huang, Yuandong Tian, Arun Kejariwal, Xia Hu

    Abstract: Sharding a large machine learning model across multiple devices to balance the costs is important in distributed training. This is challenging because partitioning is NP-hard, and estimating the costs accurately and efficiently is difficult. In this work, we explore a "pre-train, and search" paradigm for efficient sharding. The idea is to pre-train a universal and once-for-all neural network to pr… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

    Comments: Accepted by MLSys 2023. Code available at https://github.com/daochenzha/neuroshard

  2. arXiv:2210.02023  [pdf, other

    cs.LG

    DreamShard: Generalizable Embedding Table Placement for Recommender Systems

    Authors: Daochen Zha, Louis Feng, Qiaoyu Tan, Zirui Liu, Kwei-Herng Lai, Bhargav Bhushanam, Yuandong Tian, Arun Kejariwal, Xia Hu

    Abstract: We study embedding table placement for distributed recommender systems, which aims to partition and place the tables on multiple hardware devices (e.g., GPUs) to balance the computation and communication costs. Although prior work has explored learning-based approaches for the device placement of computational graphs, embedding table placement remains to be a challenging problem because of 1) the… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: Accepted by NeurIPS 2022

  3. arXiv:2209.01143  [pdf, other

    cs.LG cs.AI

    Future Gradient Descent for Adapting the Temporal Shifting Data Distribution in Online Recommendation Systems

    Authors: Mao Ye, Ruichen Jiang, Haoxiang Wang, Dhruv Choudhary, Xiaocong Du, Bhargav Bhushanam, Aryan Mokhtari, Arun Kejariwal, Qiang Liu

    Abstract: One of the key challenges of learning an online recommendation model is the temporal domain shift, which causes the mismatch between the training and testing data distribution and hence domain generalization error. To overcome, we propose to learn a meta future gradient generator that forecasts the gradient information of the future data distribution for training so that the recommendation model c… ▽ More

    Submitted 2 September, 2022; originally announced September 2022.

  4. arXiv:2208.08489  [pdf, other

    cs.IR cs.LG

    Understanding Scaling Laws for Recommendation Models

    Authors: Newsha Ardalani, Carole-Jean Wu, Zeliang Chen, Bhargav Bhushanam, Adnan Aziz

    Abstract: Scale has been a major driving force in improving machine learning performance, and understanding scaling laws is essential for strategic planning for a sustainable model quality performance growth, long-term resource planning and develo** efficient system infrastructures to support large-scale models. In this paper, we study empirical scaling laws for DLRM style recommendation models, in partic… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

  5. arXiv:2208.06399  [pdf, other

    cs.LG cs.AI cs.IR

    AutoShard: Automated Embedding Table Sharding for Recommender Systems

    Authors: Daochen Zha, Louis Feng, Bhargav Bhushanam, Dhruv Choudhary, Jade Nie, Yuandong Tian, Jay Chae, Yinbin Ma, Arun Kejariwal, Xia Hu

    Abstract: Embedding learning is an important technique in deep recommendation models to map categorical features to dense vectors. However, the embedding tables often demand an extremely large number of parameters, which become the storage and efficiency bottlenecks. Distributed training solutions have been adopted to partition the embedding tables into multiple devices. However, the embedding tables can ea… ▽ More

    Submitted 12 August, 2022; originally announced August 2022.

    Comments: Accepted by KDD 2022. Code available at https://github.com/daochenzha/autoshard

  6. arXiv:2206.01206  [pdf, other

    cs.LG cs.AI

    Positive Unlabeled Contrastive Learning

    Authors: Anish Acharya, Sujay Sanghavi, Li **g, Bhargav Bhushanam, Dhruv Choudhary, Michael Rabbat, Inderjit Dhillon

    Abstract: Self-supervised pretraining on unlabeled data followed by supervised fine-tuning on labeled data is a popular paradigm for learning from limited labeled examples. We extend this paradigm to the classical positive unlabeled (PU) setting, where the task is to learn a binary classifier given only a few labeled positive samples, and (often) a large amount of unlabeled samples (which could be positive… ▽ More

    Submitted 28 March, 2024; v1 submitted 1 June, 2022; originally announced June 2022.

  7. arXiv:2110.04844  [pdf, other

    cs.LG math.OC

    Frequency-aware SGD for Efficient Embedding Learning with Provable Benefits

    Authors: Yan Li, Dhruv Choudhary, Xiaohan Wei, Baichuan Yuan, Bhargav Bhushanam, Tuo Zhao, Guanghui Lan

    Abstract: Embedding learning has found widespread applications in recommendation systems and natural language modeling, among other domains. To learn quality embeddings efficiently, adaptive learning rate algorithms have demonstrated superior empirical performance over SGD, largely accredited to their token-dependent learning rate. However, the underlying mechanism for the efficiency of token-dependent lear… ▽ More

    Submitted 23 November, 2021; v1 submitted 10 October, 2021; originally announced October 2021.

    Comments: Additional experiments on Word2Vec embedding learning included

  8. arXiv:2105.01064  [pdf, other

    cs.IR cs.LG

    Alternate Model Growth and Pruning for Efficient Training of Recommendation Systems

    Authors: Xiaocong Du, Bhargav Bhushanam, Jiecao Yu, Dhruv Choudhary, Tianxiang Gao, Sherman Wong, Louis Feng, Jongsoo Park, Yu Cao, Arun Kejariwal

    Abstract: Deep learning recommendation systems at scale have provided remarkable gains through increasing model capacity (i.e. wider and deeper neural networks), but it comes at significant training cost and infrastructure cost. Model pruning is an effective technique to reduce computation overhead for deep neural networks by removing redundant parameters. However, modern recommendation systems are still th… ▽ More

    Submitted 3 May, 2021; originally announced May 2021.