-
HypeRS: Building a Hypergraph-driven ensemble Recommender System
Authors:
Alireza Gharahighehi,
Celine Vens,
Konstantinos Pliakos
Abstract:
Recommender systems are designed to predict user preferences over collections of items. These systems process users' previous interactions to decide which items should be ranked higher to satisfy their desires. An ensemble recommender system can achieve great recommendation performance by effectively combining the decisions generated by individual models. In this paper, we propose a novel ensemble…
▽ More
Recommender systems are designed to predict user preferences over collections of items. These systems process users' previous interactions to decide which items should be ranked higher to satisfy their desires. An ensemble recommender system can achieve great recommendation performance by effectively combining the decisions generated by individual models. In this paper, we propose a novel ensemble recommender system that combines predictions made by different models into a unified hypergraph ranking framework. This is the first time that hypergraph ranking has been employed to model an ensemble of recommender systems. Hypergraphs are generalizations of graphs where multiple vertices can be connected via hyperedges, efficiently modeling high-order relations. We differentiate real and predicted connections between users and items by assigning different hyperedge weights to individual recommender systems. We perform experiments using four datasets from the fields of movie, music and news media recommendation. The obtained results show that the ensemble hypergraph ranking method generates more accurate recommendations compared to the individual models and a weighted hybrid approach. The assignment of different hyperedge weights to the ensemble hypergraph further improves the performance compared to a setting with identical hyperedge weights.
△ Less
Submitted 22 June, 2023;
originally announced June 2023.
-
Predicting Survival Outcomes in the Presence of Unlabeled Data
Authors:
Fateme Nateghi Haredasht,
Celine Vens
Abstract:
Many clinical studies require the follow-up of patients over time. This is challenging: apart from frequently observed drop-out, there are often also organizational and financial challenges, which can lead to reduced data collection and, in turn, can complicate subsequent analyses. In contrast, there is often plenty of baseline data available of patients with similar characteristics and background…
▽ More
Many clinical studies require the follow-up of patients over time. This is challenging: apart from frequently observed drop-out, there are often also organizational and financial challenges, which can lead to reduced data collection and, in turn, can complicate subsequent analyses. In contrast, there is often plenty of baseline data available of patients with similar characteristics and background information, e.g., from patients that fall outside the study time window. In this article, we investigate whether we can benefit from the inclusion of such unlabeled data instances to predict accurate survival times. In other words, we introduce a third level of supervision in the context of survival analysis, apart from fully observed and censored instances, we also include unlabeled instances. We propose three approaches to deal with this novel setting and provide an empirical comparison over fifteen real-life clinical and gene expression survival datasets. Our results demonstrate that all approaches are able to increase the predictive performance over independent test data. We also show that integrating the partial supervision provided by censored data in a semi-supervised wrapper approach generally provides the best results, often achieving high improvements, compared to not using unlabeled data.
△ Less
Submitted 25 October, 2022;
originally announced October 2022.
-
Hierarchy exploitation to detect missing annotations on hierarchical multi-label classification
Authors:
Miguel Romero,
Felipe Kenji Nakano,
Jorge Finke,
Camilo Rocha,
Celine Vens
Abstract:
The availability of genomic data has grown exponentially in the last decade, mainly due to the development of new sequencing technologies. Based on the interactions between genes (and gene products) extracted from the increasing genomic data, numerous studies have focused on the identification of associations between genes and functions. While these studies have shown great promise, the problem of…
▽ More
The availability of genomic data has grown exponentially in the last decade, mainly due to the development of new sequencing technologies. Based on the interactions between genes (and gene products) extracted from the increasing genomic data, numerous studies have focused on the identification of associations between genes and functions. While these studies have shown great promise, the problem of annotating genes with functions remains an open challenge. In this work, we present a method to detect missing annotations in hierarchical multi-label classification datasets. We propose a method that exploits the class hierarchy by computing aggregated probabilities to the paths of classes from the leaves to the root for each instance. The proposed method is presented in the context of predicting missing gene function annotations, where these aggregated probabilities are further used to select a set of annotations to be verified through in vivo experiments. The experiments on Oriza sativa Japonica, a variety of rice, showcase that incorporating the hierarchy of classes into the method often improves the predictive performance and our proposed method yields superior results when compared to competitor methods from the literature.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.
-
BELLATREX: Building Explanations through a LocaLly AccuraTe Rule EXtractor
Authors:
Klest Dedja,
Felipe Kenji Nakano,
Konstantinos Pliakos,
Celine Vens
Abstract:
Tree-ensemble algorithms, such as random forest, are effective machine learning methods popular for their flexibility, high performance, and robustness to overfitting. However, since multiple learners are combined, they are not as interpretable as a single decision tree. In this work we propose a novel method that is Building Explanations through a LocalLy AccuraTe Rule EXtractor (Bellatrex), and…
▽ More
Tree-ensemble algorithms, such as random forest, are effective machine learning methods popular for their flexibility, high performance, and robustness to overfitting. However, since multiple learners are combined, they are not as interpretable as a single decision tree. In this work we propose a novel method that is Building Explanations through a LocalLy AccuraTe Rule EXtractor (Bellatrex), and is able to explain the forest prediction for a given test instance with only a few diverse rules. Starting from the decision trees generated by a random forest, our method 1) pre-selects a subset of the rules used to make the prediction, 2) creates a vector representation of such rules, 3) projects them to a low-dimensional space, 4) clusters such representations to pick a rule from each cluster to explain the instance prediction. We test the effectiveness of Bellatrex on 89 real-world datasets and we demonstrate the validity of our method for binary classification, regression, multi-label classification and time-to-event tasks. To the best of our knowledge, it is the first time that an interpretability toolbox can handle all these tasks within the same framework. We also show that our extracted surrogate model can approximate the performance of the corresponding ensemble model in all considered tasks, while selecting only few trees from the whole forest. We also show that our proposed approach substantially outperforms other explainable methods in terms of predictive performance.
△ Less
Submitted 6 January, 2023; v1 submitted 29 March, 2022;
originally announced March 2022.
-
An Adaptive Hybrid Active Learning Strategy with Free Ratings in Collaborative Filtering
Authors:
Alireza Gharahighehi,
Felipe Kenji Nakano,
Celine Vens
Abstract:
Recommender systems are information retrieval methods that predict user preferences to personalize services. These systems use the feedback and the ratings provided by users to model the behavior of users and to generate recommendations. Typically, the ratings are quite sparse, i.e., only a small fraction of items are rated by each user. To address this issue and enhance the performance, active le…
▽ More
Recommender systems are information retrieval methods that predict user preferences to personalize services. These systems use the feedback and the ratings provided by users to model the behavior of users and to generate recommendations. Typically, the ratings are quite sparse, i.e., only a small fraction of items are rated by each user. To address this issue and enhance the performance, active learning strategies can be used to select the most informative items to be rated. This rating elicitation procedure enriches the interaction matrix with informative ratings and therefore assists the recommender system to better model the preferences of the users. In this paper, we evaluate various non-personalized and personalized rating elicitation strategies. We also propose a hybrid strategy that adaptively combines a non-personalized and a personalized strategy. Furthermore, we propose a new procedure to obtain free ratings based on the side information of the items. We evaluate these ideas on the MovieLens dataset. The experiments reveal that our proposed hybrid strategy outperforms the strategies from the literature. We also propose the extent to which free ratings are obtained, improving further the performance and also the user experience.
△ Less
Submitted 11 March, 2022;
originally announced March 2022.
-
Diversification in Session-based News Recommender Systems
Authors:
Alireza Gharahighehi,
Celine Vens
Abstract:
Recommender systems are widely applied in digital platforms such as news websites to personalize services based on user preferences. In news websites most of users are anonymous and the only available data is sequences of items in anonymous sessions. Due to this, typical collaborative filtering methods, which are highly applied in many applications, are not effective in news recommendations. In th…
▽ More
Recommender systems are widely applied in digital platforms such as news websites to personalize services based on user preferences. In news websites most of users are anonymous and the only available data is sequences of items in anonymous sessions. Due to this, typical collaborative filtering methods, which are highly applied in many applications, are not effective in news recommendations. In this context, session-based recommenders are able to recommend next items given the sequence of previous items in the active session. Neighborhood-based session-based recommenders has been shown to be highly effective compared to more sophisticated approaches. In this study we propose scenarios to make these session-based recommender systems diversity-aware and to address the filter bubble phenomenon. The filter bubble phenomenon is a common concern in news recommendation systems and it occurs when the system narrows the information and deprives users of diverse information. The results of applying the proposed scenarios show that these diversification scenarios improve the diversity measures in these session-based recommender systems based on four news datasets.
△ Less
Submitted 17 December, 2021; v1 submitted 5 February, 2021;
originally announced February 2021.
-
Drug-Target Interaction Prediction via an Ensemble of Weighted Nearest Neighbors with Interaction Recovery
Authors:
Bin Liu,
Konstantinos Pliakos,
Celine Vens,
Grigorios Tsoumakas
Abstract:
Predicting drug-target interactions (DTI) via reliable computational methods is an effective and efficient way to mitigate the enormous costs and time of the drug discovery process. Structure-based drug similarities and sequence-based target protein similarities are the commonly used information for DTI prediction. Among numerous computational methods, neighborhood-based chemogenomic approaches th…
▽ More
Predicting drug-target interactions (DTI) via reliable computational methods is an effective and efficient way to mitigate the enormous costs and time of the drug discovery process. Structure-based drug similarities and sequence-based target protein similarities are the commonly used information for DTI prediction. Among numerous computational methods, neighborhood-based chemogenomic approaches that leverage drug and target similarities to perform predictions directly are simple but promising ones. However, existing similarity-based methods need to be re-trained to predict interactions for any new drugs or targets and cannot directly perform predictions for both new drugs, new targets, and new drug-target pairs. Furthermore, a large amount of missing (undetected) interactions in current DTI datasets hinders most DTI prediction methods. To address these issues, we propose a new method denoted as Weighted k-Nearest Neighbor with Interaction Recovery (WkNNIR). Not only can WkNNIR estimate interactions of any new drugs and/or new targets without any need of re-training, but it can also recover missing interactions (false negatives). In addition, WkNNIR exploits local imbalance to promote the influence of more reliable similarities on the interaction recovery and prediction processes. We also propose a series of ensemble methods that employ diverse sampling strategies and could be coupled with WkNNIR as well as any other DTI prediction method to improve performance. Experimental results over five benchmark datasets demonstrate the effectiveness of our approaches in predicting drug-target interactions. Lastly, we confirm the practical prediction ability of proposed methods to discover reliable interactions that were not reported in the original benchmark datasets.
△ Less
Submitted 9 July, 2021; v1 submitted 22 December, 2020;
originally announced December 2020.
-
Fair Multi-Stakeholder News Recommender System with Hypergraph ranking
Authors:
Alireza Gharahighehi,
Celine Vens,
Konstantinos Pliakos
Abstract:
Recommender systems are typically designed to fulfill end user needs. However, in some domains the users are not the only stakeholders in the system. For instance, in a news aggregator website users, authors, magazines as well as the platform itself are potential stakeholders. Most of the collaborative filtering recommender systems suffer from popularity bias. Therefore, if the recommender system…
▽ More
Recommender systems are typically designed to fulfill end user needs. However, in some domains the users are not the only stakeholders in the system. For instance, in a news aggregator website users, authors, magazines as well as the platform itself are potential stakeholders. Most of the collaborative filtering recommender systems suffer from popularity bias. Therefore, if the recommender system only considers users' preferences, presumably it over-represents popular providers and under-represents less popular providers. To address this issue one should consider other stakeholders in the generated ranked lists. In this paper we demonstrate that hypergraph learning has the natural capability of handling a multi-stakeholder recommendation task. A hypergraph can model high order relations between different types of objects and therefore is naturally inclined to generate recommendation lists considering multiple stakeholders. We form the recommendations in time-wise rounds and learn to adapt the weights of stakeholders to increase the coverage of low-covered stakeholders over time. The results show that the proposed approach counters popularity bias and produces fairer recommendations with respect to authors in two news datasets, at a low cost in precision.
△ Less
Submitted 9 February, 2021; v1 submitted 1 December, 2020;
originally announced December 2020.
-
Deep tree-ensembles for multi-output prediction
Authors:
Felipe Kenji Nakano,
Konstantinos Pliakos,
Celine Vens
Abstract:
Recently, deep neural networks have expanded the state-of-art in various scientific fields and provided solutions to long standing problems across multiple application domains. Nevertheless, they also suffer from weaknesses since their optimal performance depends on massive amounts of training data and the tuning of an extended number of parameters. As a countermeasure, some deep-forest methods ha…
▽ More
Recently, deep neural networks have expanded the state-of-art in various scientific fields and provided solutions to long standing problems across multiple application domains. Nevertheless, they also suffer from weaknesses since their optimal performance depends on massive amounts of training data and the tuning of an extended number of parameters. As a countermeasure, some deep-forest methods have been recently proposed, as efficient and low-scale solutions. Despite that, these approaches simply employ label classification probabilities as induced features and primarily focus on traditional classification and regression tasks, leaving multi-output prediction under-explored. Moreover, recent work has demonstrated that tree-embeddings are highly representative, especially in structured output prediction. In this direction, we propose a novel deep tree-ensemble (DTE) model, where every layer enriches the original feature set with a representation learning component based on tree-embeddings. In this paper, we specifically focus on two structured output prediction tasks, namely multi-label classification and multi-target regression. We conducted experiments using multiple benchmark datasets and the obtained results confirm that our method provides superior results to state-of-the-art methods in both tasks.
△ Less
Submitted 10 August, 2021; v1 submitted 3 November, 2020;
originally announced November 2020.