-
Machine Learning-Assisted Recurrence Prediction for Early-Stage Non-Small-Cell Lung Cancer Patients
Authors:
Adrianna Janik,
Maria Torrente,
Luca Costabello,
Virginia Calvo,
Brian Walsh,
Carlos Camps,
Sameh K. Mohamed,
Ana L. Ortega,
Vít Nováček,
Bartomeu Massutí,
Pasquale Minervini,
M. Rosario Garcia Campelo,
Edel del Barco,
Joaquim Bosch-Barrera,
Ernestina Menasalvas,
Mohan Timilsina,
Mariano Provencio
Abstract:
Background: Stratifying cancer patients according to risk of relapse can personalize their care. In this work, we provide an answer to the following research question: How to utilize machine learning to estimate probability of relapse in early-stage non-small-cell lung cancer patients?
Methods: For predicting relapse in 1,387 early-stage (I-II), non-small-cell lung cancer (NSCLC) patients from t…
▽ More
Background: Stratifying cancer patients according to risk of relapse can personalize their care. In this work, we provide an answer to the following research question: How to utilize machine learning to estimate probability of relapse in early-stage non-small-cell lung cancer patients?
Methods: For predicting relapse in 1,387 early-stage (I-II), non-small-cell lung cancer (NSCLC) patients from the Spanish Lung Cancer Group data (65.7 average age, 24.8% females, 75.2% males) we train tabular and graph machine learning models. We generate automatic explanations for the predictions of such models. For models trained on tabular data, we adopt SHAP local explanations to gauge how each patient feature contributes to the predicted outcome. We explain graph machine learning predictions with an example-based method that highlights influential past patients. Results: Machine learning models trained on tabular data exhibit a 76% accuracy for the Random Forest model at predicting relapse evaluated with a 10-fold cross-validation (model was trained 10 times with different independent sets of patients in test, train and validation sets, the reported metrics are averaged over these 10 test sets). Graph machine learning reaches 68% accuracy over a 200-patient, held-out test set, calibrated on a held-out set of 100 patients. Conclusions: Our results show that machine learning models trained on tabular and graph data can enable objective, personalised and reproducible prediction of relapse and therefore, disease outcome in patients with early-stage NSCLC. With further prospective and multisite validation, and additional radiological and molecular data, this prognostic model could potentially serve as a predictive decision support tool for deciding the use of adjuvant treatments in early-stage lung cancer. Keywords: Non-Small-Cell Lung Cancer, Tumor Recurrence Prediction, Machine Learning
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
A Unifying Model for Locally Constrained Spanning Tree Problems
Authors:
Luiz Alberto do Carmo Viana,
Manoel Campêlo,
Ignasi Sau,
Ana Silva
Abstract:
Given a graph $G$ and a digraph $D$ whose vertices are the edges of $G$, we investigate the problem of finding a spanning tree of $G$ that satisfies the constraints imposed by $D$. The restrictions to add an edge in the tree depend on its neighborhood in $D$. Here, we generalize previously investigated problems by also considering as input functions $\ell$ and $u$ on $E(G)$ that give a lower and a…
▽ More
Given a graph $G$ and a digraph $D$ whose vertices are the edges of $G$, we investigate the problem of finding a spanning tree of $G$ that satisfies the constraints imposed by $D$. The restrictions to add an edge in the tree depend on its neighborhood in $D$. Here, we generalize previously investigated problems by also considering as input functions $\ell$ and $u$ on $E(G)$ that give a lower and an upper bound, respectively, on the number of constraints that must be satisfied by each edge. The produced feasibility problem is denoted by \texttt{G-DCST}, while the optimization problem is denoted by \texttt{G-DCMST}. We show that \texttt{G-DCST} is NP-complete even under strong assumptions on the structures of $G$ and $D$, as well as on functions $\ell$ and $u$. On the positive side, we prove two polynomial results, one for \texttt{G-DCST} and another for \texttt{G-DCMST}, and also give a simple exponential-time algorithm along with a proof that it is asymptotically optimal under the Ð. Finally, we prove that other previously studied constrained spanning tree (\textsc{CST}) problems can be modeled within our framework, namely, the \textsc{Conflict CST}, the \textsc{Forcing CS, the \textsc{At Least One/All Dependency CST}, the \textsc{Maximum Degree CST}, the \textsc{Minimum Degree CST}, and the \textsc{Fixed-Leaves Minimum Degree CST}.
△ Less
Submitted 20 May, 2020;
originally announced May 2020.
-
An integer programming approach for solving a generalized version of the Grundy domination number
Authors:
Manoel Campêlo,
Daniel Severín
Abstract:
A sequence of vertices in a graph is called a legal dominating sequence if every vertex in the sequence dominates at least one vertex not dominated by those that precede it, and at the end all vertices of the graph are dominated. The Grundy domination number of a graph is the size of a largest legal dominating sequence. In this work, we introduce a generalized version of the Grundy domination prob…
▽ More
A sequence of vertices in a graph is called a legal dominating sequence if every vertex in the sequence dominates at least one vertex not dominated by those that precede it, and at the end all vertices of the graph are dominated. The Grundy domination number of a graph is the size of a largest legal dominating sequence. In this work, we introduce a generalized version of the Grundy domination problem. We explicitly calculate the corresponding parameter for paths and web graphs. We propose integer programming formulations for the new problem, find families of valid inequalities and perform extensive computational experiments to compare the formulations as well as to test these inequalities as cuts in a branch-and-cut framework. We also design and evaluate the performance of a heuristic for finding good initial lower and upper bounds and a tabu search that improves the initial lower bound. The test instances include randomly generated graphs, structured graphs, classical benchmark instances and two instances from a real application. Our approach is exact for graphs with 20-50 vertices and provides good solutions for graphs up to 10000 vertices.
△ Less
Submitted 2 February, 2021; v1 submitted 29 December, 2019;
originally announced December 2019.
-
Polyhedral study of the Convex Recoloring problem
Authors:
Manoel Campêlo,
Phablo F. S. Moura,
Joel C. Soares
Abstract:
A coloring of the vertices of a connected graph is convex if each color class induces a connected subgraph. We address the convex recoloring (CR) problem defined as follows. Given a graph $G$ and a coloring of its vertices, recolor a minimum number of vertices of $G$ so that the resulting coloring is convex. This problem, known to be NP-hard even on paths, was firstly motivated by applications on…
▽ More
A coloring of the vertices of a connected graph is convex if each color class induces a connected subgraph. We address the convex recoloring (CR) problem defined as follows. Given a graph $G$ and a coloring of its vertices, recolor a minimum number of vertices of $G$ so that the resulting coloring is convex. This problem, known to be NP-hard even on paths, was firstly motivated by applications on perfect phylogenies. In this work, we study CR on general graphs from a polyhedral point of view. First, we introduce a full-dimensional polytope based on the idea of connected subgraphs, and present a class of valid inequalities with righthand side one that comprises all facet-defining inequalities with binary coefficients when the input graph is a tree. Moreover, we define a general class of inequalities with righthand side in $\{1, \ldots, k\}$, where $k$ is the amount of colors used in the initial coloring, and show sufficient conditions for validity and facetness of such inequalities. Finally, we report on computational experiments for an application on mobile networks that can be modeled by the polytope of CR on paths. We evaluate the potential of the proposed inequalities to reduce the integrality gaps.
△ Less
Submitted 25 November, 2019;
originally announced November 2019.
-
The polytope of legal sequences
Authors:
Manoel Campêlo,
Daniel Severín
Abstract:
A sequence of vertices in a graph is called a \emph{(total) legal dominating sequence} if every vertex in the sequence (total) dominates at least one vertex not dominated by those ones that precede it, and at the end all vertices of the graph are (totally) dominated. The \emph{Grundy (total) domination number} of a graph is the size of the largest (total) legal dominating sequence. In this work, w…
▽ More
A sequence of vertices in a graph is called a \emph{(total) legal dominating sequence} if every vertex in the sequence (total) dominates at least one vertex not dominated by those ones that precede it, and at the end all vertices of the graph are (totally) dominated. The \emph{Grundy (total) domination number} of a graph is the size of the largest (total) legal dominating sequence. In this work, we address the problems of determining these two parameters by introducing a generalized version of them. We explicitly calculate the corresponding (general) parameter for paths and web graphs. We propose integer programming formulations for the new problem and we study the polytope associated to one of them. We find families of valid inequalities and derive conditions under which they are facet-defining. Finally, we perform computational experiments to compare the formulations as well as to test valid inequalities as cuts in a B\&C framework.
△ Less
Submitted 29 November, 2018;
originally announced November 2018.
-
Optimal k-fold colorings of webs and antiwebs
Authors:
Manoel Campêlo,
Ricardo C. Corrêa,
Phablo F. S. Moura,
Marcio C. Santos
Abstract:
A k-fold x-coloring of a graph is an assignment of (at least) k distinct colors from the set {1, 2, ..., x} to each vertex such that any two adjacent vertices are assigned disjoint sets of colors. The smallest number x such that G admits a k-fold x-coloring is the k-th chromatic number of G, denoted by χ_k(G). We determine the exact value of this parameter when G is a web or an antiweb. Our result…
▽ More
A k-fold x-coloring of a graph is an assignment of (at least) k distinct colors from the set {1, 2, ..., x} to each vertex such that any two adjacent vertices are assigned disjoint sets of colors. The smallest number x such that G admits a k-fold x-coloring is the k-th chromatic number of G, denoted by χ_k(G). We determine the exact value of this parameter when G is a web or an antiweb. Our results generalize the known corresponding results for odd cycles and imply necessary and sufficient conditions under which χ_k(G) attains its lower and upper bounds based on the clique, the fractional chromatic and the chromatic numbers. Additionally, we extend the concept of χ-critical graphs to χ_k-critical graphs. We identify the webs and antiwebs having this property, for every integer k <= 1.
△ Less
Submitted 29 August, 2011;
originally announced August 2011.
-
A Lagrangian Relaxation for the Maximum Stable Set Problem
Authors:
Manoel Campelo,
Ricardo C. Correa
Abstract:
We propose a new integer programming formulation for the problem of finding a maximum stable set of a graph based on representatives of stable sets. In addition, we investigate exact solutions provided by a Lagrangian decomposition of this formulation in which only one constraint is relaxed. Some computational experiments were carried out with an effective multi-threaded implementation of our al…
▽ More
We propose a new integer programming formulation for the problem of finding a maximum stable set of a graph based on representatives of stable sets. In addition, we investigate exact solutions provided by a Lagrangian decomposition of this formulation in which only one constraint is relaxed. Some computational experiments were carried out with an effective multi-threaded implementation of our algorithm in a multi-core system, and their results are presented.
△ Less
Submitted 8 March, 2009;
originally announced March 2009.