-
Hierarchy exploitation to detect missing annotations on hierarchical multi-label classification
Authors:
Miguel Romero,
Felipe Kenji Nakano,
Jorge Finke,
Camilo Rocha,
Celine Vens
Abstract:
The availability of genomic data has grown exponentially in the last decade, mainly due to the development of new sequencing technologies. Based on the interactions between genes (and gene products) extracted from the increasing genomic data, numerous studies have focused on the identification of associations between genes and functions. While these studies have shown great promise, the problem of…
▽ More
The availability of genomic data has grown exponentially in the last decade, mainly due to the development of new sequencing technologies. Based on the interactions between genes (and gene products) extracted from the increasing genomic data, numerous studies have focused on the identification of associations between genes and functions. While these studies have shown great promise, the problem of annotating genes with functions remains an open challenge. In this work, we present a method to detect missing annotations in hierarchical multi-label classification datasets. We propose a method that exploits the class hierarchy by computing aggregated probabilities to the paths of classes from the leaves to the root for each instance. The proposed method is presented in the context of predicting missing gene function annotations, where these aggregated probabilities are further used to select a set of annotations to be verified through in vivo experiments. The experiments on Oriza sativa Japonica, a variety of rice, showcase that incorporating the hierarchy of classes into the method often improves the predictive performance and our proposed method yields superior results when compared to competitor methods from the literature.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.
-
BELLATREX: Building Explanations through a LocaLly AccuraTe Rule EXtractor
Authors:
Klest Dedja,
Felipe Kenji Nakano,
Konstantinos Pliakos,
Celine Vens
Abstract:
Tree-ensemble algorithms, such as random forest, are effective machine learning methods popular for their flexibility, high performance, and robustness to overfitting. However, since multiple learners are combined, they are not as interpretable as a single decision tree. In this work we propose a novel method that is Building Explanations through a LocalLy AccuraTe Rule EXtractor (Bellatrex), and…
▽ More
Tree-ensemble algorithms, such as random forest, are effective machine learning methods popular for their flexibility, high performance, and robustness to overfitting. However, since multiple learners are combined, they are not as interpretable as a single decision tree. In this work we propose a novel method that is Building Explanations through a LocalLy AccuraTe Rule EXtractor (Bellatrex), and is able to explain the forest prediction for a given test instance with only a few diverse rules. Starting from the decision trees generated by a random forest, our method 1) pre-selects a subset of the rules used to make the prediction, 2) creates a vector representation of such rules, 3) projects them to a low-dimensional space, 4) clusters such representations to pick a rule from each cluster to explain the instance prediction. We test the effectiveness of Bellatrex on 89 real-world datasets and we demonstrate the validity of our method for binary classification, regression, multi-label classification and time-to-event tasks. To the best of our knowledge, it is the first time that an interpretability toolbox can handle all these tasks within the same framework. We also show that our extracted surrogate model can approximate the performance of the corresponding ensemble model in all considered tasks, while selecting only few trees from the whole forest. We also show that our proposed approach substantially outperforms other explainable methods in terms of predictive performance.
△ Less
Submitted 6 January, 2023; v1 submitted 29 March, 2022;
originally announced March 2022.
-
An Adaptive Hybrid Active Learning Strategy with Free Ratings in Collaborative Filtering
Authors:
Alireza Gharahighehi,
Felipe Kenji Nakano,
Celine Vens
Abstract:
Recommender systems are information retrieval methods that predict user preferences to personalize services. These systems use the feedback and the ratings provided by users to model the behavior of users and to generate recommendations. Typically, the ratings are quite sparse, i.e., only a small fraction of items are rated by each user. To address this issue and enhance the performance, active le…
▽ More
Recommender systems are information retrieval methods that predict user preferences to personalize services. These systems use the feedback and the ratings provided by users to model the behavior of users and to generate recommendations. Typically, the ratings are quite sparse, i.e., only a small fraction of items are rated by each user. To address this issue and enhance the performance, active learning strategies can be used to select the most informative items to be rated. This rating elicitation procedure enriches the interaction matrix with informative ratings and therefore assists the recommender system to better model the preferences of the users. In this paper, we evaluate various non-personalized and personalized rating elicitation strategies. We also propose a hybrid strategy that adaptively combines a non-personalized and a personalized strategy. Furthermore, we propose a new procedure to obtain free ratings based on the side information of the items. We evaluate these ideas on the MovieLens dataset. The experiments reveal that our proposed hybrid strategy outperforms the strategies from the literature. We also propose the extent to which free ratings are obtained, improving further the performance and also the user experience.
△ Less
Submitted 11 March, 2022;
originally announced March 2022.
-
Deep tree-ensembles for multi-output prediction
Authors:
Felipe Kenji Nakano,
Konstantinos Pliakos,
Celine Vens
Abstract:
Recently, deep neural networks have expanded the state-of-art in various scientific fields and provided solutions to long standing problems across multiple application domains. Nevertheless, they also suffer from weaknesses since their optimal performance depends on massive amounts of training data and the tuning of an extended number of parameters. As a countermeasure, some deep-forest methods ha…
▽ More
Recently, deep neural networks have expanded the state-of-art in various scientific fields and provided solutions to long standing problems across multiple application domains. Nevertheless, they also suffer from weaknesses since their optimal performance depends on massive amounts of training data and the tuning of an extended number of parameters. As a countermeasure, some deep-forest methods have been recently proposed, as efficient and low-scale solutions. Despite that, these approaches simply employ label classification probabilities as induced features and primarily focus on traditional classification and regression tasks, leaving multi-output prediction under-explored. Moreover, recent work has demonstrated that tree-embeddings are highly representative, especially in structured output prediction. In this direction, we propose a novel deep tree-ensemble (DTE) model, where every layer enriches the original feature set with a representation learning component based on tree-embeddings. In this paper, we specifically focus on two structured output prediction tasks, namely multi-label classification and multi-target regression. We conducted experiments using multiple benchmark datasets and the obtained results confirm that our method provides superior results to state-of-the-art methods in both tasks.
△ Less
Submitted 10 August, 2021; v1 submitted 3 November, 2020;
originally announced November 2020.