-
Time-based Sequence Model for Personalization and Recommendation Systems
Authors:
Tigran Ishkhanov,
Maxim Naumov,
Xianjie Chen,
Yan Zhu,
Yuan Zhong,
Alisson Gusatti Azzolini,
Chonglin Sun,
Frank Jiang,
Andrey Malevich,
Liang Xiong
Abstract:
In this paper we develop a novel recommendation model that explicitly incorporates time information. The model relies on an embedding layer and TSL attention-like mechanism with inner products in different vector spaces, that can be thought of as a modification of multi-headed attention. This mechanism allows the model to efficiently treat sequences of user behavior of different length. We study t…
▽ More
In this paper we develop a novel recommendation model that explicitly incorporates time information. The model relies on an embedding layer and TSL attention-like mechanism with inner products in different vector spaces, that can be thought of as a modification of multi-headed attention. This mechanism allows the model to efficiently treat sequences of user behavior of different length. We study the properties of our state-of-the-art model on statistically designed data set. Also, we show that it outperforms more complex models with longer sequence length on the Taobao User Behavior dataset.
△ Less
Submitted 27 August, 2020;
originally announced August 2020.
-
Post-Training 4-bit Quantization on Embedding Tables
Authors:
Hui Guan,
Andrey Malevich,
Jiyan Yang,
Jongsoo Park,
Hector Yuen
Abstract:
Continuous representations have been widely adopted in recommender systems where a large number of entities are represented using embedding vectors. As the cardinality of the entities increases, the embedding components can easily contain millions of parameters and become the bottleneck in both storage and inference due to large memory consumption. This work focuses on post-training 4-bit quantiza…
▽ More
Continuous representations have been widely adopted in recommender systems where a large number of entities are represented using embedding vectors. As the cardinality of the entities increases, the embedding components can easily contain millions of parameters and become the bottleneck in both storage and inference due to large memory consumption. This work focuses on post-training 4-bit quantization on the continuous embeddings. We propose row-wise uniform quantization with greedy search and codebook-based quantization that consistently outperforms state-of-the-art quantization approaches on reducing accuracy degradation. We deploy our uniform quantization technique on a production model in Facebook and demonstrate that it can reduce the model size to only 13.89% of the single-precision version while the model quality stays neutral.
△ Less
Submitted 5 November, 2019;
originally announced November 2019.
-
Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications
Authors:
Jongsoo Park,
Maxim Naumov,
Protonu Basu,
Summer Deng,
Aravind Kalaiah,
Daya Khudia,
James Law,
Parth Malani,
Andrey Malevich,
Satish Nadathur,
Juan Pino,
Martin Schatz,
Alexander Sidorov,
Viswanath Sivakumar,
Andrew Tulloch,
Xiaodong Wang,
Yiming Wu,
Hector Yuen,
Utku Diril,
Dmytro Dzhulgakov,
Kim Hazelwood,
Bill Jia,
Yangqing Jia,
Lin Qiao,
Vijay Rao
, et al. (3 additional authors not shown)
Abstract:
The application of deep learning techniques resulted in remarkable improvement of machine learning models. In this paper provides detailed characterizations of deep learning models used in many Facebook social network services. We present computational characteristics of our models, describe high performance optimizations targeting existing systems, point out their limitations and make suggestions…
▽ More
The application of deep learning techniques resulted in remarkable improvement of machine learning models. In this paper provides detailed characterizations of deep learning models used in many Facebook social network services. We present computational characteristics of our models, describe high performance optimizations targeting existing systems, point out their limitations and make suggestions for the future general-purpose/accelerated inference hardware. Also, we highlight the need for better co-design of algorithms, numerics and computing platforms to address the challenges of workloads often run in data centers.
△ Less
Submitted 29 November, 2018; v1 submitted 24 November, 2018;
originally announced November 2018.