Skip to main content

Showing 1–2 of 2 results for author: Wen, E D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.02545  [pdf, other

    cs.LG cs.AI

    Wukong: Towards a Scaling Law for Large-Scale Recommendation

    Authors: Buyun Zhang, Liang Luo, Yuxin Chen, Jade Nie, Xi Liu, Daifeng Guo, Yanli Zhao, Shen Li, Yuchen Hao, Yantao Yao, Guna Lakshminarayanan, Ellie Dingqiao Wen, Jongsoo Park, Maxim Naumov, Wenlin Chen

    Abstract: Scaling laws play an instrumental role in the sustainable improvement in model quality. Unfortunately, recommendation models to date do not exhibit such laws similar to those observed in the domain of large language models, due to the inefficiencies of their upscaling mechanisms. This limitation poses significant challenges in adapting these models to increasingly more complex real-world datasets.… ▽ More

    Submitted 4 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: 12 pages

  2. arXiv:2403.00877  [pdf, other

    cs.LG cs.DC cs.IR

    Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation

    Authors: Liang Luo, Buyun Zhang, Michael Tsang, Yinbin Ma, Ching-Hsiang Chu, Yuxin Chen, Shen Li, Yuchen Hao, Yanli Zhao, Guna Lakshminarayanan, Ellie Dingqiao Wen, Jongsoo Park, Dheevatsa Mudigere, Maxim Naumov

    Abstract: We study a mismatch between the deep learning recommendation models' flat architecture, common distributed training paradigm and hierarchical data center topology. To address the associated inefficiencies, we propose Disaggregated Multi-Tower (DMT), a modeling technique that consists of (1) Semantic-preserving Tower Transform (SPTT), a novel training paradigm that decomposes the monolithic global… ▽ More

    Submitted 2 May, 2024; v1 submitted 1 March, 2024; originally announced March 2024.