-
Bayesian regularization of empirical MDPs
Authors:
Samarth Gupta,
Daniel N. Hill,
Lexing Ying,
Inderjit Dhillon
Abstract:
In most applications of model-based Markov decision processes, the parameters for the unknown underlying model are often estimated from the empirical data. Due to noise, the policy learnedfrom the estimated model is often far from the optimal policy of the underlying model. When applied to the environment of the underlying model, the learned policy results in suboptimal performance, thus calling f…
▽ More
In most applications of model-based Markov decision processes, the parameters for the unknown underlying model are often estimated from the empirical data. Due to noise, the policy learnedfrom the estimated model is often far from the optimal policy of the underlying model. When applied to the environment of the underlying model, the learned policy results in suboptimal performance, thus calling for solutions with better generalization performance. In this work we take a Bayesian perspective and regularize the objective function of the Markov decision process with prior information in order to obtain more robust policies. Two approaches are proposed, one based on $L^1$ regularization and the other on relative entropic regularization. We evaluate our proposed algorithms on synthetic simulations and on real-world search logs of a large scale online shop** store. Our results demonstrate the robustness of regularized MDP policies against the noise present in the models.
△ Less
Submitted 20 September, 2022; v1 submitted 3 August, 2022;
originally announced August 2022.
-
Counterfactual Learning To Rank for Utility-Maximizing Query Autocompletion
Authors:
Adam Block,
Rahul Kidambi,
Daniel N. Hill,
Thorsten Joachims,
Inderjit S. Dhillon
Abstract:
Conventional methods for query autocompletion aim to predict which completed query a user will select from a list. A shortcoming of this approach is that users often do not know which query will provide the best retrieval performance on the current information retrieval system, meaning that any query autocompletion methods trained to mimic user behavior can lead to suboptimal query suggestions. To…
▽ More
Conventional methods for query autocompletion aim to predict which completed query a user will select from a list. A shortcoming of this approach is that users often do not know which query will provide the best retrieval performance on the current information retrieval system, meaning that any query autocompletion methods trained to mimic user behavior can lead to suboptimal query suggestions. To overcome this limitation, we propose a new approach that explicitly optimizes the query suggestions for downstream retrieval performance. We formulate this as a problem of ranking a set of rankings, where each query suggestion is represented by the downstream item ranking it produces. We then present a learning method that ranks query suggestions by the quality of their item rankings. The algorithm is based on a counterfactual learning approach that is able to leverage feedback on the items (e.g., clicks, purchases) to evaluate query suggestions through an unbiased estimator, thus avoiding the assumption that users write or select optimal queries. We establish theoretical support for the proposed approach and provide learning-theoretic guarantees. We also present empirical results on publicly available datasets, and demonstrate real-world applicability using data from an online shop** store.
△ Less
Submitted 22 April, 2022;
originally announced April 2022.
-
Session-Aware Query Auto-completion using Extreme Multi-label Ranking
Authors:
Nishant Yadav,
Rajat Sen,
Daniel N. Hill,
Arya Mazumdar,
Inderjit S. Dhillon
Abstract:
Query auto-completion (QAC) is a fundamental feature in search engines where the task is to suggest plausible completions of a prefix typed in the search bar. Previous queries in the user session can provide useful context for the user's intent and can be leveraged to suggest auto-completions that are more relevant while adhering to the user's prefix. Such session-aware QACs can be generated by re…
▽ More
Query auto-completion (QAC) is a fundamental feature in search engines where the task is to suggest plausible completions of a prefix typed in the search bar. Previous queries in the user session can provide useful context for the user's intent and can be leveraged to suggest auto-completions that are more relevant while adhering to the user's prefix. Such session-aware QACs can be generated by recent sequence-to-sequence deep learning models; however, these generative approaches often do not meet the stringent latency requirements of responding to each user keystroke. Moreover, these generative approaches pose the risk of showing nonsensical queries.
In this paper, we provide a solution to this problem: we take the novel approach of modeling session-aware QAC as an eXtreme Multi-Label Ranking (XMR) problem where the input is the previous query in the session and the user's current prefix, while the output space is the set of tens of millions of queries entered by users in the recent past. We adapt a popular XMR algorithm for this purpose by proposing several modifications to the key steps in the algorithm. The proposed modifications yield a 3.9x improvement in terms of Mean Reciprocal Rank (MRR) over the baseline XMR approach on a public search logs dataset. We are able to maintain an inference latency of less than 10 ms while still using session context. When compared against baseline models of acceptable latency, we observed a 33% improvement in MRR for short prefixes of up to 3 characters. Moreover, our model yielded a statistically significant improvement of 2.81% over a production QAC system in terms of suggestion acceptance rate, when deployed on the search bar of an online shop** store as part of an A/B test.
△ Less
Submitted 21 August, 2021; v1 submitted 9 December, 2020;
originally announced December 2020.
-
A Zero Attention Model for Personalized Product Search
Authors:
Qingyao Ai,
Daniel N. Hill,
S. V. N. Vishwanathan,
W. Bruce Croft
Abstract:
Product search is one of the most popular methods for people to discover and purchase products on e-commerce websites. Because personal preferences often have an important influence on the purchase decision of each customer, it is intuitive that personalization should be beneficial for product search engines. While synthetic experiments from previous studies show that purchase histories are useful…
▽ More
Product search is one of the most popular methods for people to discover and purchase products on e-commerce websites. Because personal preferences often have an important influence on the purchase decision of each customer, it is intuitive that personalization should be beneficial for product search engines. While synthetic experiments from previous studies show that purchase histories are useful for identifying the individual intent of each product search session, the effect of personalization on product search in practice, however, remains mostly unknown. In this paper, we formulate the problem of personalized product search and conduct large-scale experiments with search logs sampled from a commercial e-commerce search engine. Results from our preliminary analysis show that the potential of personalization depends on query characteristics, interactions between queries, and user purchase histories. Based on these observations, we propose a Zero Attention Model for product search that automatically determines when and how to personalize a user-query pair via a novel attention mechanism. Empirical results on commercial product search logs show that the proposed model not only significantly outperforms state-of-the-art personalized product retrieval models, but also provides important information on the potential of personalization in each product search session.
△ Less
Submitted 29 August, 2019;
originally announced August 2019.
-
An Efficient Bandit Algorithm for Realtime Multivariate Optimization
Authors:
Daniel N Hill,
Houssam Nassif,
Yi Liu,
Anand Iyer,
S V N Vishwanathan
Abstract:
Optimization is commonly employed to determine the content of web pages, such as to maximize conversions on landing pages or click-through rates on search engine result pages. Often the layout of these pages can be decoupled into several separate decisions. For example, the composition of a landing page may involve deciding which image to show, which wording to use, what color background to displa…
▽ More
Optimization is commonly employed to determine the content of web pages, such as to maximize conversions on landing pages or click-through rates on search engine result pages. Often the layout of these pages can be decoupled into several separate decisions. For example, the composition of a landing page may involve deciding which image to show, which wording to use, what color background to display, etc. Such optimization is a combinatorial problem over an exponentially large decision space. Randomized experiments do not scale well to this setting, and therefore, in practice, one is typically limited to optimizing a single aspect of a web page at a time. This represents a missed opportunity in both the speed of experimentation and the exploitation of possible interactions between layout decisions.
Here we focus on multivariate optimization of interactive web pages. We formulate an approach where the possible interactions between different components of the page are modeled explicitly. We apply bandit methodology to explore the layout space efficiently and use hill-climbing to select optimal content in realtime. Our algorithm also extends to contextualization and personalization of layout selection. Simulation results show the suitability of our approach to large decision spaces with strong interactions between content. We further apply our algorithm to optimize a message that promotes adoption of an Amazon service. After only a single week of online optimization, we saw a 21% conversion increase compared to the median layout. Our technique is currently being deployed to optimize content across several locations at Amazon.com.
△ Less
Submitted 22 October, 2018;
originally announced October 2018.