-
Improving Sequential Query Recommendation with Immediate User Feedback
Authors:
Shameem A Puthiya Parambath,
Christos Anagnostopoulos,
Roderick Murray-Smith
Abstract:
We propose an algorithm for next query recommendation in interactive data exploration settings, like knowledge discovery for information gathering. The state-of-the-art query recommendation algorithms are based on sequence-to-sequence learning approaches that exploit historical interaction data. Due to the supervision involved in the learning process, such approaches fail to adapt to immediate use…
▽ More
We propose an algorithm for next query recommendation in interactive data exploration settings, like knowledge discovery for information gathering. The state-of-the-art query recommendation algorithms are based on sequence-to-sequence learning approaches that exploit historical interaction data. Due to the supervision involved in the learning process, such approaches fail to adapt to immediate user feedback. We propose to augment the transformer-based causal language models for query recommendations to adapt to the immediate user feedback using multi-armed bandit (MAB) framework. We conduct a large-scale experimental study using log files from a popular online literature discovery service and demonstrate that our algorithm improves the per-round regret substantially, with respect to the state-of-the-art transformer-based query recommendation models, which do not make use of immediate user feedback. Our data model and source code are available at https://github.com/shampp/exp3_ss
△ Less
Submitted 3 July, 2024; v1 submitted 12 May, 2022;
originally announced May 2022.
-
Max-Utility Based Arm Selection Strategy For Sequential Query Recommendations
Authors:
Shameem A. Puthiya Parambath,
Christos Anagnostopoulos,
Roderick Murray-Smith,
Sean MacAvaney,
Evangelos Zervas
Abstract:
We consider the query recommendation problem in closed loop interactive learning settings like online information gathering and exploratory analytics. The problem can be naturally modelled using the Multi-Armed Bandits (MAB) framework with countably many arms. The standard MAB algorithms for countably many arms begin with selecting a random set of candidate arms and then applying standard MAB algo…
▽ More
We consider the query recommendation problem in closed loop interactive learning settings like online information gathering and exploratory analytics. The problem can be naturally modelled using the Multi-Armed Bandits (MAB) framework with countably many arms. The standard MAB algorithms for countably many arms begin with selecting a random set of candidate arms and then applying standard MAB algorithms, e.g., UCB, on this candidate set downstream. We show that such a selection strategy often results in higher cumulative regret and to this end, we propose a selection strategy based on the maximum utility of the arms. We show that in tasks like online information gathering, where sequential query recommendations are employed, the sequences of queries are correlated and the number of potentially optimal queries can be reduced to a manageable size by selecting queries with maximum utility with respect to the currently executing query. Our experimental results using a recent real online literature discovery service log file demonstrate that the proposed arm selection strategy improves the cumulative regret substantially with respect to the state-of-the-art baseline algorithms. % and commonly used random selection strategy for a variety of contextual multi-armed bandit algorithms. Our data model and source code are available at ~\url{https://anonymous.4open.science/r/0e5ad6b7-ac02-4577-9212-c9d505d3dbdb/}.
△ Less
Submitted 31 August, 2021;
originally announced August 2021.
-
Permutation-Invariant Subgraph Discovery
Authors:
Raghvendra Mall,
Shameem A. Parambath,
Han Yufei,
Ting Yu,
Sanjay Chawla
Abstract:
We introduce Permutation and Structured Perturbation Inference (PSPI), a new problem formulation that abstracts many graph matching tasks that arise in systems biology. PSPI can be viewed as a robust formulation of the permutation inference or graph matching, where the objective is to find a permutation between two graphs under the assumption that a set of edges may have undergone a perturbation d…
▽ More
We introduce Permutation and Structured Perturbation Inference (PSPI), a new problem formulation that abstracts many graph matching tasks that arise in systems biology. PSPI can be viewed as a robust formulation of the permutation inference or graph matching, where the objective is to find a permutation between two graphs under the assumption that a set of edges may have undergone a perturbation due to an underlying cause. For example, suppose there are two gene regulatory networks X and Y from a diseased and normal tissue respectively. Then, the PSPI problem can be used to detect if there has been a structural change between the two networks which can serve as a signature of the disease. Besides the new problem formulation, we propose an ADMM algorithm (STEPD) to solve a relaxed version of the PSPI problem. An extensive case study on comparative gene regulatory networks (GRNs) is used to demonstrate that STEPD is able to accurately infer structured perturbations and thus provides a tool for computational biologists to identify novel prognostic signatures. A spectral analysis confirms that STEPD can recover small clique-like perturbations making it a useful tool for detecting permutation-invariant changes in graphs.
△ Less
Submitted 2 April, 2021;
originally announced April 2021.
-
Re-ranking Based Diversification: A Unifying View
Authors:
Shameem A Puthiya Parambath
Abstract:
We analyze different re-ranking algorithms for diversification and show that majority of them are based on maximizing submodular/modular functions from the class of parameterized concave/linear over modular functions. We study the optimality of such algorithms in terms of the `total curvature'. We also show that by adjusting the hyperparameter of the concave/linear composition to trade-off relevan…
▽ More
We analyze different re-ranking algorithms for diversification and show that majority of them are based on maximizing submodular/modular functions from the class of parameterized concave/linear over modular functions. We study the optimality of such algorithms in terms of the `total curvature'. We also show that by adjusting the hyperparameter of the concave/linear composition to trade-off relevance and diversity, if any, one is in fact tuning the `total curvature' of the function for relevance-diversity trade-off.
△ Less
Submitted 26 June, 2019;
originally announced June 2019.
-
Risk Aware Ranking for Top-$k$ Recommendations
Authors:
Shameem A Puthiya Parambath,
Nishant Vijayakumar,
Sanjay Chawla
Abstract:
Given an incomplete ratings data over a set of users and items, the preference completion problem aims to estimate a personalized total preference order over a subset of the items. In practical settings, a ranked list of top-$k$ items from the estimated preference order is recommended to the end user in the decreasing order of preference for final consumption. We analyze this model and observe tha…
▽ More
Given an incomplete ratings data over a set of users and items, the preference completion problem aims to estimate a personalized total preference order over a subset of the items. In practical settings, a ranked list of top-$k$ items from the estimated preference order is recommended to the end user in the decreasing order of preference for final consumption. We analyze this model and observe that such a ranking model results in suboptimal performance when the payoff associated with the recommended items is different. We propose a novel and very efficient algorithm for the preference ranking considering the uncertainty regarding the payoffs of the items. Once the preference scores for the users are obtained using any preference learning algorithm, we show that ranking the items using a risk seeking utility function results in the best ranking performance.
△ Less
Submitted 12 April, 2019; v1 submitted 17 March, 2019;
originally announced April 2019.
-
Reuse and Adaptation for Entity Resolution through Transfer Learning
Authors:
Saravanan Thirumuruganathan,
Shameem A Puthiya Parambath,
Mourad Ouzzani,
Nan Tang,
Shafiq Joty
Abstract:
Entity resolution (ER) is one of the fundamental problems in data integration, where machine learning (ML) based classifiers often provide the state-of-the-art results. Considerable human effort goes into feature engineering and training data creation. In this paper, we investigate a new problem: Given a dataset D_T for ER with limited or no training data, is it possible to train a good ML classif…
▽ More
Entity resolution (ER) is one of the fundamental problems in data integration, where machine learning (ML) based classifiers often provide the state-of-the-art results. Considerable human effort goes into feature engineering and training data creation. In this paper, we investigate a new problem: Given a dataset D_T for ER with limited or no training data, is it possible to train a good ML classifier on D_T by reusing and adapting the training data of dataset D_S from same or related domain? Our major contributions include (1) a distributed representation based approach to encode each tuple from diverse datasets into a standard feature space; (2) identification of common scenarios where the reuse of training data can be beneficial; and (3) five algorithms for handling each of the aforementioned scenarios. We have performed comprehensive experiments on 12 datasets from 5 different domains (publications, movies, songs, restaurants, and books). Our experiments show that our algorithms provide significant benefits such as providing superior performance for a fixed training data size.
△ Less
Submitted 28 September, 2018;
originally announced September 2018.
-
SAGA: A Submodular Greedy Algorithm For Group Recommendation
Authors:
Shameem A Puthiya Parambath,
Nishant Vijayakumar,
Sanjay Chawla
Abstract:
In this paper, we propose a unified framework and an algorithm for the problem of group recommendation where a fixed number of items or alternatives can be recommended to a group of users. The problem of group recommendation arises naturally in many real world contexts, and is closely related to the budgeted social choice problem studied in economics. We frame the group recommendation problem as c…
▽ More
In this paper, we propose a unified framework and an algorithm for the problem of group recommendation where a fixed number of items or alternatives can be recommended to a group of users. The problem of group recommendation arises naturally in many real world contexts, and is closely related to the budgeted social choice problem studied in economics. We frame the group recommendation problem as choosing a subgraph with the largest group consensus score in a completely connected graph defined over the item affinity matrix. We propose a fast greedy algorithm with strong theoretical guarantees, and show that the proposed algorithm compares favorably to the state-of-the-art group recommendation algorithms according to commonly used relevance and coverage performance measures on benchmark dataset.
△ Less
Submitted 25 December, 2017;
originally announced December 2017.
-
Theory of Optimizing Pseudolinear Performance Measures: Application to F-measure
Authors:
Shameem A Puthiya Parambath,
Nicolas Usunier,
Yves Grandvalet
Abstract:
Non-linear performance measures are widely used for the evaluation of learning algorithms. For example, $F$-measure is a commonly used performance measure for classification problems in machine learning and information retrieval community. We study the theoretical properties of a subset of non-linear performance measures called pseudo-linear performance measures which includes $F$-measure, \emph{J…
▽ More
Non-linear performance measures are widely used for the evaluation of learning algorithms. For example, $F$-measure is a commonly used performance measure for classification problems in machine learning and information retrieval community. We study the theoretical properties of a subset of non-linear performance measures called pseudo-linear performance measures which includes $F$-measure, \emph{Jaccard Index}, among many others. We establish that many notions of $F$-measures and \emph{Jaccard Index} are pseudo-linear functions of the per-class false negatives and false positives for binary, multiclass and multilabel classification. Based on this observation, we present a general reduction of such performance measure optimization problem to cost-sensitive classification problem with unknown costs. We then propose an algorithm with provable guarantees to obtain an approximately optimal classifier for the $F$-measure by solving a series of cost-sensitive classification problems. The strength of our analysis is to be valid on any dataset and any class of classifiers, extending the existing theoretical results on pseudo-linear measures, which are asymptotic in nature. We also establish the multi-objective nature of the $F$-score maximization problem by linking the algorithm with the weighted-sum approach used in multi-objective optimization. We present numerical experiments to illustrate the relative importance of cost asymmetry and thresholding when learning linear classifiers on various $F$-measure optimization tasks.
△ Less
Submitted 1 January, 2018; v1 submitted 1 May, 2015;
originally announced May 2015.
-
Topic Extraction and Bundling of Related Scientific Articles
Authors:
Shameem A Puthiya Parambath
Abstract:
Automatic classification of scientific articles based on common characteristics is an interesting problem with many applications in digital library and information retrieval systems. Properly organized articles can be useful for automatic generation of taxonomies in scientific writings, textual summarization, efficient information retrieval etc. Generating article bundles from a large number of in…
▽ More
Automatic classification of scientific articles based on common characteristics is an interesting problem with many applications in digital library and information retrieval systems. Properly organized articles can be useful for automatic generation of taxonomies in scientific writings, textual summarization, efficient information retrieval etc. Generating article bundles from a large number of input articles, based on the associated features of the articles is tedious and computationally expensive task. In this report we propose an automatic two-step approach for topic extraction and bundling of related articles from a set of scientific articles in real-time. For topic extraction, we make use of Latent Dirichlet Allocation (LDA) topic modeling techniques and for bundling, we make use of hierarchical agglomerative clustering techniques.
We run experiments to validate our bundling semantics and compare it with existing models in use. We make use of an online crowdsourcing marketplace provided by Amazon called Amazon Mechanical Turk to carry out experiments. We explain our experimental setup and empirical results in detail and show that our method is advantageous over existing ones.
△ Less
Submitted 1 May, 2015; v1 submitted 21 December, 2012;
originally announced December 2012.