-
Positive-Unlabelled Learning for Identifying New Candidate Dietary Restriction-related Genes among Ageing-related Genes
Authors:
Jorge Paz-Ruza,
Alex A. Freitas,
Amparo Alonso-Betanzos,
Bertha Guijarro-Berdiñas
Abstract:
Dietary Restriction (DR) is one of the most popular anti-ageing interventions, prompting exhaustive research into genes associated with its mechanisms. Recently, Machine Learning (ML) has been explored to identify potential DR-related genes among ageing-related genes, aiming to minimize costly wet lab experiments needed to expand our knowledge on DR. However, to train a model from positive (DR-rel…
▽ More
Dietary Restriction (DR) is one of the most popular anti-ageing interventions, prompting exhaustive research into genes associated with its mechanisms. Recently, Machine Learning (ML) has been explored to identify potential DR-related genes among ageing-related genes, aiming to minimize costly wet lab experiments needed to expand our knowledge on DR. However, to train a model from positive (DR-related) and negative (non-DR-related) examples, existing ML methods naively label genes without known DR relation as negative examples, assuming that lack of DR-related annotation for a gene represents evidence of absence of DR-relatedness, rather than absence of evidence; this hinders the reliability of the negative examples (non-DR-related genes) and the method's ability to identify novel DR-related genes. This work introduces a novel gene prioritization method based on the two-step Positive-Unlabelled (PU) Learning paradigm: using a similarity-based, KNN-inspired approach, our method first selects reliable negative examples among the genes without known DR associations. Then, these reliable negatives and all known positives are used to train a classifier that effectively differentiates DR-related and non-DR-related genes, which is finally employed to generate a more reliable ranking of promising genes for novel DR-relatedness. Our method significantly outperforms the existing state-of-the-art non-PU approach for DR-relatedness prediction in three relevant performance metrics. In addition, curation of existing literature finds support for the top-ranked candidate DR-related genes identified by our model.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Automated Machine Learning for Positive-Unlabelled Learning
Authors:
Jack D. Saunders,
Alex A. Freitas
Abstract:
Positive-Unlabelled (PU) learning is a growing field of machine learning that aims to learn classifiers from data consisting of labelled positive and unlabelled instances, which can be in reality positive or negative, but whose label is unknown. An extensive number of methods have been proposed to address PU learning over the last two decades, so many so that selecting an optimal method for a give…
▽ More
Positive-Unlabelled (PU) learning is a growing field of machine learning that aims to learn classifiers from data consisting of labelled positive and unlabelled instances, which can be in reality positive or negative, but whose label is unknown. An extensive number of methods have been proposed to address PU learning over the last two decades, so many so that selecting an optimal method for a given PU learning task presents a challenge. Our previous work has addressed this by proposing GA-Auto-PU, the first Automated Machine Learning (Auto-ML) system for PU learning. In this work, we propose two new Auto-ML systems for PU learning: BO-Auto-PU, based on a Bayesian Optimisation approach, and EBO-Auto-PU, based on a novel evolutionary/Bayesian optimisation approach. We also present an extensive evaluation of the three Auto-ML systems, comparing them to each other and to well-established PU learning methods across 60 datasets (20 real-world datasets, each with 3 versions in terms of PU learning characteristics).
△ Less
Submitted 12 January, 2024;
originally announced January 2024.
-
Hierarchical Dependency Constrained Tree Augmented Naive Bayes Classifiers for Hierarchical Feature Spaces
Authors:
Cen Wan,
Alex A. Freitas
Abstract:
The Tree Augmented Naive Bayes (TAN) classifier is a type of probabilistic graphical model that constructs a single-parent dependency tree to estimate the distribution of the data. In this work, we propose two novel Hierarchical dependency-based Tree Augmented Naive Bayes algorithms, i.e. Hie-TAN and Hie-TAN-Lite. Both methods exploit the pre-defined parent-child (generalisation-specialisation) re…
▽ More
The Tree Augmented Naive Bayes (TAN) classifier is a type of probabilistic graphical model that constructs a single-parent dependency tree to estimate the distribution of the data. In this work, we propose two novel Hierarchical dependency-based Tree Augmented Naive Bayes algorithms, i.e. Hie-TAN and Hie-TAN-Lite. Both methods exploit the pre-defined parent-child (generalisation-specialisation) relationships between features as a type of constraint to learn the tree representation of dependencies among features, whilst the latter further eliminates the hierarchical redundancy during the classifier learning stage. The experimental results showed that Hie-TAN successfully obtained better predictive performance than several other hierarchical dependency constrained classification algorithms, and its predictive performance was further improved by eliminating the hierarchical redundancy, as suggested by the higher accuracy obtained by Hie-TAN-Lite.
△ Less
Submitted 8 February, 2022;
originally announced February 2022.
-
An Extensive Experimental Evaluation of Automated Machine Learning Methods for Recommending Classification Algorithms (Extended Version)
Authors:
Márcio P. Basgalupp,
Rodrigo C. Barros,
Alex G. C. de Sá,
Gisele L. Pappa,
Rafael G. Mantovani,
André C. P. L. F. de Carvalho,
Alex A. Freitas
Abstract:
This paper presents an experimental comparison among four Automated Machine Learning (AutoML) methods for recommending the best classification algorithm for a given input dataset. Three of these methods are based on Evolutionary Algorithms (EAs), and the other is Auto-WEKA, a well-known AutoML method based on the Combined Algorithm Selection and Hyper-parameter optimisation (CASH) approach. The EA…
▽ More
This paper presents an experimental comparison among four Automated Machine Learning (AutoML) methods for recommending the best classification algorithm for a given input dataset. Three of these methods are based on Evolutionary Algorithms (EAs), and the other is Auto-WEKA, a well-known AutoML method based on the Combined Algorithm Selection and Hyper-parameter optimisation (CASH) approach. The EA-based methods build classification algorithms from a single machine learning paradigm: either decision-tree induction, rule induction, or Bayesian network classification. Auto-WEKA combines algorithm selection and hyper-parameter optimisation to recommend classification algorithms from multiple paradigms. We performed controlled experiments where these four AutoML methods were given the same runtime limit for different values of this limit. In general, the difference in predictive accuracy of the three best AutoML methods was not statistically significant. However, the EA evolving decision-tree induction algorithms has the advantage of producing algorithms that generate interpretable classification models and that are more scalable to large datasets, by comparison with many algorithms from other learning paradigms that can be recommended by Auto-WEKA. We also observed that Auto-WEKA has shown meta-overfitting, a form of overfitting at the meta-learning level, rather than at the base-learning level.
△ Less
Submitted 15 September, 2020;
originally announced September 2020.
-
A Robust Experimental Evaluation of Automated Multi-Label Classification Methods
Authors:
Alex G. C. de Sá,
Cristiano G. Pimenta,
Gisele L. Pappa,
Alex A. Freitas
Abstract:
Automated Machine Learning (AutoML) has emerged to deal with the selection and configuration of algorithms for a given learning task. With the progression of AutoML, several effective methods were introduced, especially for traditional classification and regression problems. Apart from the AutoML success, several issues remain open. One issue, in particular, is the lack of ability of AutoML method…
▽ More
Automated Machine Learning (AutoML) has emerged to deal with the selection and configuration of algorithms for a given learning task. With the progression of AutoML, several effective methods were introduced, especially for traditional classification and regression problems. Apart from the AutoML success, several issues remain open. One issue, in particular, is the lack of ability of AutoML methods to deal with different types of data. Based on this scenario, this paper approaches AutoML for multi-label classification (MLC) problems. In MLC, each example can be simultaneously associated to several class labels, unlike the standard classification task, where an example is associated to just one class label. In this work, we provide a general comparison of five automated multi-label classification methods -- two evolutionary methods, one Bayesian optimization method, one random search and one greedy search -- on 14 datasets and three designed search spaces. Overall, we observe that the most prominent method is the one based on a canonical grammar-based genetic programming (GGP) search method, namely Auto-MEKA$_{GGP}$. Auto-MEKA$_{GGP}$ presented the best average results in our comparison and was statistically better than all the other methods in different search spaces and evaluated measures, except when compared to the greedy search method.
△ Less
Submitted 31 July, 2020; v1 submitted 16 May, 2020;
originally announced May 2020.
-
Multi-label classification search space in the MEKA software
Authors:
Alex G. C. de Sá,
Cristiano G. Pimenta,
Gisele L. Pappa,
Alex A. Freitas
Abstract:
This supplementary material aims to describe the proposed multi-label classification (MLC) search spaces based on the MEKA and WEKA softwares. First, we overview 26 MLC algorithms and meta-algorithms in MEKA, presenting their main characteristics, such as hyper-parameters, dependencies and constraints. Second, we review 28 single-label classification (SLC) algorithms, preprocessing algorithms and…
▽ More
This supplementary material aims to describe the proposed multi-label classification (MLC) search spaces based on the MEKA and WEKA softwares. First, we overview 26 MLC algorithms and meta-algorithms in MEKA, presenting their main characteristics, such as hyper-parameters, dependencies and constraints. Second, we review 28 single-label classification (SLC) algorithms, preprocessing algorithms and meta-algorithms in the WEKA software. These SLC algorithms were also studied because they are part of the proposed MLC search spaces. Fundamentally, this occurs due to the problem transformation nature of several MLC algorithms used in this work. These algorithms transform an MLC problem into one or several SLC problems in the first place and solve them with SLC model(s) in a next step. Therefore, understanding their main characteristics is crucial to this work. Finally, we present a formal description of the search spaces by proposing a context-free grammar that encompasses the 54 learning algorithms. This grammar basically comprehends the possible combinations, the constraints and dependencies among the learning algorithms.
△ Less
Submitted 31 July, 2020; v1 submitted 27 November, 2018;
originally announced November 2018.
-
Lower bounds for Laplacian spread and relations with invariant parameters revisited
Authors:
Enide Andrade,
Maria Aguieiras A. de Freitas,
María Robbiano,
Jonnathan Rodríguez
Abstract:
Let $G=\left( V\left( G\right) ,E\left( G\right) \right) $ be an $\left( n,m\right) $-graph and $X$ a nonempty proper subset of $V\left( G\right) $. Let $X^{c}=V\left( G\right) \backslash X$.\ The edge density of $X$ in $G$ is given by \begin{equation*} ρ_{G}\left( X\right) =\frac{n\left\vert E_{X}\left( G\right) \right\vert }{\left\vert X\right\vert \left\vert X^{c}\right\vert }, \end{equation*}…
▽ More
Let $G=\left( V\left( G\right) ,E\left( G\right) \right) $ be an $\left( n,m\right) $-graph and $X$ a nonempty proper subset of $V\left( G\right) $. Let $X^{c}=V\left( G\right) \backslash X$.\ The edge density of $X$ in $G$ is given by \begin{equation*} ρ_{G}\left( X\right) =\frac{n\left\vert E_{X}\left( G\right) \right\vert }{\left\vert X\right\vert \left\vert X^{c}\right\vert }, \end{equation*} where $E_{X}\left( G\right) \ $ is the set of edges in $G$ with one end in $% X $ and the other in $X^{c}$. The Laplacian spread of a graph is the difference between the greatest Laplacian eigenvalue and the algebraic connectivity. In this paper, we use the edge density of some nonempty proper subsets of vertices in $G$ to establish new lower bounds for the Laplacian spread. Also, using some known numerical inequalities some lower bounds for the Laplacian spread of a graph with a prescribed degree sequence are presented.
△ Less
Submitted 30 May, 2018;
originally announced May 2018.
-
A New Hierarchical Redundancy Eliminated Tree Augmented Naive Bayes Classifier for Co** with Gene Ontology-based Features
Authors:
Cen Wan,
Alex A. Freitas
Abstract:
The Tree Augmented Naive Bayes classifier is a type of probabilistic graphical model that can represent some feature dependencies. In this work, we propose a Hierarchical Redundancy Eliminated Tree Augmented Naive Bayes (HRE-TAN) algorithm, which considers removing the hierarchical redundancy during the classifier learning process, when co** with data containing hierarchically structured feature…
▽ More
The Tree Augmented Naive Bayes classifier is a type of probabilistic graphical model that can represent some feature dependencies. In this work, we propose a Hierarchical Redundancy Eliminated Tree Augmented Naive Bayes (HRE-TAN) algorithm, which considers removing the hierarchical redundancy during the classifier learning process, when co** with data containing hierarchically structured features. The experiments showed that HRE-TAN obtains significantly better predictive performance than the conventional Tree Augmented Naive Bayes classifier, and enhanced the robustness against imbalanced class distributions, in aging-related gene datasets with Gene Ontology terms used as features.
△ Less
Submitted 6 July, 2016;
originally announced July 2016.
-
Maxima of the Q-index: graphs with no K_s,t
Authors:
Maria Aguieiras A. de Freitas,
Vladimir Nikiforov,
Laura Patuzzi
Abstract:
This note presents a new spectral version of the graph Zarankiewicz problem: How large can be the maximum eigenvalue of the signless Laplacian of a graph of order $n$ that does not contain a specified complete bipartite subgraph. A conjecture is stated about general complete bipartite graphs, which is proved for infinitely many cases.
More precisely, it is shown that if $G$ is a graph of order…
▽ More
This note presents a new spectral version of the graph Zarankiewicz problem: How large can be the maximum eigenvalue of the signless Laplacian of a graph of order $n$ that does not contain a specified complete bipartite subgraph. A conjecture is stated about general complete bipartite graphs, which is proved for infinitely many cases.
More precisely, it is shown that if $G$ is a graph of order $n,$ with no subgraph isomorphic to $K_{2,s+1},$ then the largest eigenvalue $q(G)$ of the signless Laplacian of $G$ satisfies \[ q(G)\leq\frac{n+2s}{2}+\frac{1}{2}\sqrt{(n-2s)^{2}+8s}, \] with equality holding if and only if $G$ is a join of $K_{1}$ and an $s$-regular graph of order $n-1.$
△ Less
Submitted 2 July, 2015;
originally announced July 2015.
-
Maxima of the Q-index: forbidden 4-cycle and 5-cycle
Authors:
Maria Aguieiras A. de Freitas,
Vladimir Nikiforov,
Laura Patuzzi
Abstract:
This paper gives tight upper bounds on the largest eigenvalue q(G) of the signless Laplacian of graphs with no 4-cycle and no 5-cycle. If n is odd, let F_{n} be the friendship graph of order n; if n is even, let F_{n} be F_{n-1} with an edge hanged to its center. It is shown that if G is a graph of order n, with no 4-cycle, then q(G)<q(F_{n}), unless G=F_{n}. Let S_{n,k} be the join of a complete…
▽ More
This paper gives tight upper bounds on the largest eigenvalue q(G) of the signless Laplacian of graphs with no 4-cycle and no 5-cycle. If n is odd, let F_{n} be the friendship graph of order n; if n is even, let F_{n} be F_{n-1} with an edge hanged to its center. It is shown that if G is a graph of order n, with no 4-cycle, then q(G)<q(F_{n}), unless G=F_{n}. Let S_{n,k} be the join of a complete graph of order k and an independent set of order n-k. It is shown that if G is a graph of order n, with no 5-cycle, then q(G)<q(S_{n,2}), unless G=S_{n,k}. It is shown that these results are significant in spectral extremal graph problems. Two conjectures are formulated for the maximum q(G) of graphs with forbidden cycles.
△ Less
Submitted 7 August, 2013;
originally announced August 2013.
-
On the characteristic polynomial of Laplacian Matrices of Caterpillars
Authors:
D. M. Cardoso,
M. A. A. de Freitas,
E. A. Martins,
M. Robbinao,
B. San Martín
Abstract:
The characteristic polynomials of the adjacency matrix of line graphs of caterpillars and then the characteristic polynomials of their Laplacian or signless Laplacian matrices are characterized, using recursive formulas. Furthermore, the obtained results are applied on the determination of upper and lower bounds on the algebraic connectivity of these graphs.
The characteristic polynomials of the adjacency matrix of line graphs of caterpillars and then the characteristic polynomials of their Laplacian or signless Laplacian matrices are characterized, using recursive formulas. Furthermore, the obtained results are applied on the determination of upper and lower bounds on the algebraic connectivity of these graphs.
△ Less
Submitted 19 June, 2013;
originally announced June 2013.
-
Quorum sensing contributes to activated B cell homeostasis and to prevent autoimmunity
Authors:
Caroline Montaudouin,
Marie Anson,
Yi Hao,
Susanne V. Duncker,
Tahia Fernandez,
Emmanuelle Gaudin,
Michael Ehrenstein,
William G. Kerr,
Jean-Herve Colle,
Pierre Bruhns,
Marc Daeron,
Antonio A. Freitas
Abstract:
Maintenance of plasma IgM levels is critical for immune system function and homeostasis in humans and mice. However, the mechanisms that control homeostasis of the activated IgM-secreting B cells are unknown. After adoptive transfer into immune-deficient hosts, B-lymphocytes expand poorly but fully reconstitute the pool of natural IgM-secreting cells and circulating IgM levels. By using sequential…
▽ More
Maintenance of plasma IgM levels is critical for immune system function and homeostasis in humans and mice. However, the mechanisms that control homeostasis of the activated IgM-secreting B cells are unknown. After adoptive transfer into immune-deficient hosts, B-lymphocytes expand poorly but fully reconstitute the pool of natural IgM-secreting cells and circulating IgM levels. By using sequential cell transfers and B cell populations from several mutant mice, we were able to identify novel mechanisms regulating the size of the IgM-secreting B cell pool. Contrary to previous mechanisms described regulating homeostasis, which involve competition for the same niche by cells having overlap** survival requirements, homeostasis of the innate IgM-secreting B cell pool is also achieved when B cells populations are able to monitor the number of activated B cells by detecting their secreted products. Notably, B cell populations are able to assess the density of activated B cells by sensing their secreted IgG. This process involves the FcγRIIB, a low-affinity IgG receptor that is expressed on B cells and acts as a negative regulator of B cell activation, and its intracellular effector the inositol phosphatase SHIP. As a result of the engagement of this inhibitory pathway the number of activated IgM-secreting B cells is kept under control. We hypothesize that malfunction of this quorum-sensing mechanism may lead to uncontrolled B cell activation and autoimmunity.
△ Less
Submitted 3 September, 2012;
originally announced September 2012.