Search | arXiv e-print repository

Aligning Large Language Models with Recommendation Knowledge

Authors: Yuwei Cao, Nikhil Mehta, Xinyang Yi, Raghunandan Keshavan, Lukasz Heldt, Lichan Hong, Ed H. Chi, Maheswaran Sathiamoorthy

Abstract: Large language models (LLMs) have recently been used as backbones for recommender systems. However, their performance often lags behind conventional methods in standard tasks like retrieval. We attribute this to a mismatch between LLMs' knowledge and the knowledge crucial for effective recommendations. While LLMs excel at natural language reasoning, they cannot model complex user-item interactions… ▽ More Large language models (LLMs) have recently been used as backbones for recommender systems. However, their performance often lags behind conventional methods in standard tasks like retrieval. We attribute this to a mismatch between LLMs' knowledge and the knowledge crucial for effective recommendations. While LLMs excel at natural language reasoning, they cannot model complex user-item interactions inherent in recommendation tasks. We propose bridging the knowledge gap and equip** LLMs with recommendation-specific knowledge to address this. Operations such as Masked Item Modeling (MIM) and Bayesian Personalized Ranking (BPR) have found success in conventional recommender systems. Inspired by this, we simulate these operations through natural language to generate auxiliary-task data samples that encode item correlations and user preferences. Fine-tuning LLMs on such auxiliary-task data samples and incorporating more informative recommendation-task data samples facilitates the injection of recommendation-specific knowledge into LLMs. Extensive experiments across retrieval, ranking, and rating prediction tasks on LLMs such as FLAN-T5-Base and FLAN-T5-XL show the effectiveness of our technique in domains such as Amazon Toys & Games, Beauty, and Sports & Outdoors. Notably, our method outperforms conventional and LLM-based baselines, including the current SOTA, by significant margins in retrieval, showcasing its potential for enhancing recommendation quality. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: Accepted to the NAACL 2024 Findings

arXiv:2306.08121 [pdf, other]

Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations

Authors: Anima Singh, Trung Vu, Nikhil Mehta, Raghunandan Keshavan, Maheswaran Sathiamoorthy, Yilin Zheng, Lichan Hong, Lukasz Heldt, Li Wei, Devansh Tandon, Ed H. Chi, Xinyang Yi

Abstract: Randomly-hashed item ids are used ubiquitously in recommendation models. However, the learned representations from random hashing prevents generalization across similar items, causing problems of learning unseen and long-tail items, especially when item corpus is large, power-law distributed, and evolving dynamically. In this paper, we propose using content-derived features as a replacement for ra… ▽ More Randomly-hashed item ids are used ubiquitously in recommendation models. However, the learned representations from random hashing prevents generalization across similar items, causing problems of learning unseen and long-tail items, especially when item corpus is large, power-law distributed, and evolving dynamically. In this paper, we propose using content-derived features as a replacement for random ids. We show that simply replacing ID features with content-based embeddings can cause a drop in quality due to reduced memorization capability. To strike a good balance of memorization and generalization, we propose to use Semantic IDs -- a compact discrete item representation learned from frozen content embeddings using RQ-VAE that captures the hierarchy of concepts in items -- as a replacement for random item ids. Similar to content embeddings, the compactness of Semantic IDs poses a problem of easy adaption in recommendation models. We propose novel methods for adapting Semantic IDs in industry-scale ranking models, through hashing sub-pieces of of the Semantic-ID sequences. In particular, we find that the SentencePiece model that is commonly used in LLM tokenization outperforms manually crafted pieces such as N-grams. To the end, we evaluate our approaches in a real-world ranking model for YouTube recommendations. Our experiments demonstrate that Semantic IDs can replace the direct use of video IDs by improving the generalization ability on new and long-tail item slices without sacrificing overall model quality. △ Less

Submitted 30 May, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

arXiv:2305.05065 [pdf, other]

Recommender Systems with Generative Retrieval

Authors: Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan H. Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Q. Tran, Jonah Samost, Maciej Kula, Ed H. Chi, Maheswaran Sathiamoorthy

Abstract: Modern recommender systems perform large-scale retrieval by first embedding queries and item candidates in the same unified space, followed by approximate nearest neighbor search to select top candidates given a query embedding. In this paper, we propose a novel generative retrieval approach, where the retrieval model autoregressively decodes the identifiers of the target candidates. To that end,… ▽ More Modern recommender systems perform large-scale retrieval by first embedding queries and item candidates in the same unified space, followed by approximate nearest neighbor search to select top candidates given a query embedding. In this paper, we propose a novel generative retrieval approach, where the retrieval model autoregressively decodes the identifiers of the target candidates. To that end, we create semantically meaningful tuple of codewords to serve as a Semantic ID for each item. Given Semantic IDs for items in a user session, a Transformer-based sequence-to-sequence model is trained to predict the Semantic ID of the next item that the user will interact with. To the best of our knowledge, this is the first Semantic ID-based generative model for recommendation tasks. We show that recommender systems trained with the proposed paradigm significantly outperform the current SOTA models on various datasets. In addition, we show that incorporating Semantic IDs into the sequence-to-sequence model enhances its ability to generalize, as evidenced by the improved retrieval performance observed for items with no prior interaction history. △ Less

Submitted 3 November, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

Comments: To appear in The 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:0910.5260 [pdf, ps, other]

doi 10.1016/j.trc.2012.12.007

A Gradient Descent Algorithm on the Grassman Manifold for Matrix Completion

Authors: Raghunandan H. Keshavan, Sewoong Oh

Abstract: We consider the problem of reconstructing a low-rank matrix from a small subset of its entries. In this paper, we describe the implementation of an efficient algorithm called OptSpace, based on singular value decomposition followed by local manifold optimization, for solving the low-rank matrix completion problem. It has been shown that if the number of revealed entries is large enough, the outp… ▽ More We consider the problem of reconstructing a low-rank matrix from a small subset of its entries. In this paper, we describe the implementation of an efficient algorithm called OptSpace, based on singular value decomposition followed by local manifold optimization, for solving the low-rank matrix completion problem. It has been shown that if the number of revealed entries is large enough, the output of singular value decomposition gives a good estimate for the original matrix, so that local optimization reconstructs the correct matrix with high probability. We present numerical results which show that this algorithm can reconstruct the low rank matrix exactly from a very small subset of its entries. We further study the robustness of the algorithm with respect to noise, and its performance on actual collaborative filtering datasets. △ Less

Submitted 3 November, 2009; v1 submitted 27 October, 2009; originally announced October 2009.

Comments: 26 pages, 15 figures

arXiv:0910.0921 [pdf, ps, other]

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison

Authors: Raghunandan H. Keshavan, Andrea Montanari, Sewoong Oh

Abstract: We consider a problem of significant practical importance, namely, the reconstruction of a low-rank data matrix from a small subset of its entries. This problem appears in many areas such as collaborative filtering, computer vision and wireless sensor networks. In this paper, we focus on the matrix completion problem in the case when the observed samples are corrupted by noise. We compare the pe… ▽ More We consider a problem of significant practical importance, namely, the reconstruction of a low-rank data matrix from a small subset of its entries. This problem appears in many areas such as collaborative filtering, computer vision and wireless sensor networks. In this paper, we focus on the matrix completion problem in the case when the observed samples are corrupted by noise. We compare the performance of three state-of-the-art matrix completion algorithms (OptSpace, ADMiRA and FPCA) on a single simulation platform and present numerical results. We show that in practice these efficient algorithms can be used to reconstruct real data matrices, as well as randomly generated matrices, accurately. △ Less

Submitted 3 November, 2009; v1 submitted 6 October, 2009; originally announced October 2009.

Comments: 7 pages, 7 figures, 47th Allerton Conference on Communication Control and Computing, 2009, invited paper

arXiv:0906.2027 [pdf, ps, other]

Matrix Completion from Noisy Entries

Authors: Raghunandan H. Keshavan, Andrea Montanari, Sewoong Oh

Abstract: Given a matrix M of low-rank, we consider the problem of reconstructing it from noisy observations of a small, random subset of its entries. The problem arises in a variety of applications, from collaborative filtering (the `Netflix problem') to structure-from-motion and positioning. We study a low complexity algorithm introduced by Keshavan et al.(2009), based on a combination of spectral techniq… ▽ More Given a matrix M of low-rank, we consider the problem of reconstructing it from noisy observations of a small, random subset of its entries. The problem arises in a variety of applications, from collaborative filtering (the `Netflix problem') to structure-from-motion and positioning. We study a low complexity algorithm introduced by Keshavan et al.(2009), based on a combination of spectral techniques and manifold optimization, that we call here OptSpace. We prove performance guarantees that are order-optimal in a number of circumstances. △ Less

Submitted 9 April, 2012; v1 submitted 10 June, 2009; originally announced June 2009.

Comments: 22 pages, 3 figures

arXiv:0901.3150 [pdf, ps, other]

Matrix Completion from a Few Entries

Authors: Raghunandan H. Keshavan, Andrea Montanari, Sewoong Oh

Abstract: Let M be a random (alpha n) x n matrix of rank r<<n, and assume that a uniformly random subset E of its entries is observed. We describe an efficient algorithm that reconstructs M from |E| = O(rn) observed entries with relative root mean square error RMSE <= C(rn/|E|)^0.5 . Further, if r=O(1), M can be reconstructed exactly from |E| = O(n log(n)) entries. These results apply beyond random matric… ▽ More Let M be a random (alpha n) x n matrix of rank r<<n, and assume that a uniformly random subset E of its entries is observed. We describe an efficient algorithm that reconstructs M from |E| = O(rn) observed entries with relative root mean square error RMSE <= C(rn/|E|)^0.5 . Further, if r=O(1), M can be reconstructed exactly from |E| = O(n log(n)) entries. These results apply beyond random matrices to general low-rank incoherent matrices. This settles (in the case of bounded rank) a question left open by Candes and Recht and improves over the guarantees for their reconstruction algorithm. The complexity of our algorithm is O(|E|r log(n)), which opens the way to its use for massive data sets. In the process of proving these statements, we obtain a generalization of a celebrated result by Friedman-Kahn-Szemeredi and Feige-Ofek on the spectrum of sparse random matrices. △ Less

Submitted 17 September, 2009; v1 submitted 20 January, 2009; originally announced January 2009.

Comments: 30 pages, 1 figure, journal version (v1, v2: Conference version ISIT 2009)

arXiv:0812.2599 [pdf, ps, other]

Learning Low Rank Matrices from O(n) Entries

Authors: Raghunandan H. Keshavan, Andrea Montanari, Sewoong Oh

Abstract: How many random entries of an n by m, rank r matrix are necessary to reconstruct the matrix within an accuracy d? We address this question in the case of a random matrix with bounded rank, whereby the observed entries are chosen uniformly at random. We prove that, for any d>0, C(r,d)n observations are sufficient. Finally we discuss the question of reconstructing the matrix efficiently, and demon… ▽ More How many random entries of an n by m, rank r matrix are necessary to reconstruct the matrix within an accuracy d? We address this question in the case of a random matrix with bounded rank, whereby the observed entries are chosen uniformly at random. We prove that, for any d>0, C(r,d)n observations are sufficient. Finally we discuss the question of reconstructing the matrix efficiently, and demonstrate through extensive simulations that this task can be accomplished in nPoly(log n) operations, for small rank. △ Less

Submitted 14 December, 2008; originally announced December 2008.

Comments: 8 pages, 11 figures, Forty-sixth Allerton Conference on Communication, Control and Computing, invited paper

Showing 1–8 of 8 results for author: Keshavan, R