-
Maximizing Network Phylogenetic Diversity
Authors:
Leo van Iersel,
Mark Jones,
Jannik Schestag,
Celine Scornavacca,
Mathias Weller
Abstract:
Network Phylogenetic Diversity (Network-PD) is a measure for the diversity of a set of species based on a rooted phylogenetic network (with branch lengths and inheritance probabilities on the reticulation edges) describing the evolution of those species. We consider the \textsc{Max-Network-PD} problem: given such a network, find~$k$ species with maximum Network-PD score. We show that this problem…
▽ More
Network Phylogenetic Diversity (Network-PD) is a measure for the diversity of a set of species based on a rooted phylogenetic network (with branch lengths and inheritance probabilities on the reticulation edges) describing the evolution of those species. We consider the \textsc{Max-Network-PD} problem: given such a network, find~$k$ species with maximum Network-PD score. We show that this problem is fixed-parameter tractable (FPT) for binary networks, by describing an optimal algorithm running in $\mathcal{O}(2^r \log (k)(n+r))$~time, with~$n$ the total number of species in the network and~$r$ its reticulation number. Furthermore, we show that \textsc{Max-Network-PD} is NP-hard for level-1 networks, proving that, unless P$=$NP, the FPT approach cannot be extended by using the level as parameter instead of the reticulation number.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Embedding phylogenetic trees in networks of low treewidth
Authors:
Leo van Iersel,
Mark Jones,
Mathias Weller
Abstract:
Given a rooted, binary phylogenetic network and a rooted, binary phylogenetic tree, can the tree be embedded into the network? This problem, called \textsc{Tree Containment}, arises when validating networks constructed by phylogenetic inference methods.We present the first algorithm for (rooted) \textsc{Tree Containment} using the treewidth $t$ of the input network $N$ as parameter, showing that t…
▽ More
Given a rooted, binary phylogenetic network and a rooted, binary phylogenetic tree, can the tree be embedded into the network? This problem, called \textsc{Tree Containment}, arises when validating networks constructed by phylogenetic inference methods.We present the first algorithm for (rooted) \textsc{Tree Containment} using the treewidth $t$ of the input network $N$ as parameter, showing that the problem can be solved in $2^{O(t^2)}\cdot|N|$ time and space.
△ Less
Submitted 19 September, 2023; v1 submitted 1 July, 2022;
originally announced July 2022.
-
Federated Learning Enables Big Data for Rare Cancer Boundary Detection
Authors:
Sarthak Pati,
Ujjwal Baid,
Brandon Edwards,
Micah Sheller,
Shih-Han Wang,
G Anthony Reina,
Patrick Foley,
Alexey Gruzdev,
Deepthi Karkada,
Christos Davatzikos,
Chiharu Sako,
Satyam Ghodasara,
Michel Bilello,
Suyash Mohan,
Philipp Vollmuth,
Gianluca Brugnara,
Chandrakanth J Preetha,
Felix Sahm,
Klaus Maier-Hein,
Maximilian Zenk,
Martin Bendszus,
Wolfgang Wick,
Evan Calabrese,
Jeffrey Rudie,
Javier Villanueva-Meyer
, et al. (254 additional authors not shown)
Abstract:
Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train acc…
▽ More
Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train accurate and generalizable ML models, by only sharing numerical model updates. Here we present findings from the largest FL study to-date, involving data from 71 healthcare institutions across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, utilizing the largest dataset of such patients ever used in the literature (25,256 MRI scans from 6,314 patients). We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent. We anticipate our study to: 1) enable more studies in healthcare informed by large and diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further quantitative analyses for glioblastoma via performance optimization of our consensus model for eventual public release, and 3) demonstrate the effectiveness of FL at such scale and task complexity as a paradigm shift for multi-site collaborations, alleviating the need for data sharing.
△ Less
Submitted 25 April, 2022; v1 submitted 22 April, 2022;
originally announced April 2022.
-
Lost and Found: Stop** Bluetooth Finders from Leaking Private Information
Authors:
Mira Weller,
Jiska Classen,
Fabian Ullrich,
Denis Waßmann,
Erik Tews
Abstract:
A Bluetooth finder is a small battery-powered device that can be attached to important items such as bags, keychains, or bikes. The finder maintains a Bluetooth connection with the user's phone, and the user is notified immediately on connection loss. We provide the first comprehensive security and privacy analysis of current commercial Bluetooth finders. Our analysis reveals several significant s…
▽ More
A Bluetooth finder is a small battery-powered device that can be attached to important items such as bags, keychains, or bikes. The finder maintains a Bluetooth connection with the user's phone, and the user is notified immediately on connection loss. We provide the first comprehensive security and privacy analysis of current commercial Bluetooth finders. Our analysis reveals several significant security vulnerabilities in those products concerning mobile applications and the corresponding backend services in the cloud. We also show that all analyzed cloud-based products leak more private data than required for their respective cloud services.
Overall, there is a big market for Bluetooth finders, but none of the existing products is privacy-friendly. We close this gap by designing and implementing PrivateFind, which ensures locations of the user are never leaked to third parties. It is designed to run on similar hardware as existing finders, allowing vendors to update their systems using PrivateFind.
△ Less
Submitted 17 May, 2020;
originally announced May 2020.
-
Listing Conflicting Triples in Optimal Time
Authors:
Mathias Weller
Abstract:
Different sources of information might tell different stories about the evolutionary history of a given set of species. This leads to (rooted) phylogenetic trees that "disagree" on triples of species, which we call "conflict triples". An important subtask of computing consensus trees which is interesting in its own regard is the enumeration of all conflicts exhibited by a pair of phylogenetic tree…
▽ More
Different sources of information might tell different stories about the evolutionary history of a given set of species. This leads to (rooted) phylogenetic trees that "disagree" on triples of species, which we call "conflict triples". An important subtask of computing consensus trees which is interesting in its own regard is the enumeration of all conflicts exhibited by a pair of phylogenetic trees (on the same set of $n$ taxa). As it is possible that a significant part of the $n^3$ triples are in conflict, the trivial $Θ(n^3)$-time algorithm that checks for each triple whether it constitutes a conflict, was considered optimal. It turns out, however, that we can do way better in the case that there are only few conflicts. In particular, we show that we can enumerate all d conflict triples between a pair of phylogenetic trees in $O(n + d)$ time. Since any deterministic algorithm has to spend $Θ(n)$ time reading the input and $Θ(d)$ time writing the output, no deterministic algorithm can solve this task faster than we do (up to constant factors).
△ Less
Submitted 25 November, 2019;
originally announced November 2019.
-
Fast Exact Dynamic Time War** on Run-Length Encoded Time Series
Authors:
Vincent Froese,
Brijnesh Jain,
Maciej Rymar,
Mathias Weller
Abstract:
Dynamic Time War** (DTW) is a well-known similarity measure for time series. The standard dynamic programming approach to compute the DTW distance of two length-$n$ time series, however, requires~$O(n^2)$ time, which is often too slow for real-world applications. Therefore, many heuristics have been proposed to speed up the DTW computation. These are often based on lower bounding techniques, app…
▽ More
Dynamic Time War** (DTW) is a well-known similarity measure for time series. The standard dynamic programming approach to compute the DTW distance of two length-$n$ time series, however, requires~$O(n^2)$ time, which is often too slow for real-world applications. Therefore, many heuristics have been proposed to speed up the DTW computation. These are often based on lower bounding techniques, approximating the DTW distance, or considering special input data such as binary or piecewise constant time series. In this paper, we present a first exact algorithm to compute the DTW distance of two run-length encoded time series whose running time only depends on the encoding lengths of the inputs. The worst-case running time is cubic in the encoding length. In experiments we show that our algorithm is indeed fast for time series with short encoding lengths.
△ Less
Submitted 18 April, 2020; v1 submitted 7 March, 2019;
originally announced March 2019.
-
What is known about Vertex Cover Kernelization?
Authors:
Michael R. Fellows,
Lars Jaffke,
Aliz Izabella Király,
Frances A. Rosamond,
Mathias Weller
Abstract:
We are pleased to dedicate this survey on kernelization of the Vertex Cover problem, to Professor Juraj Hromkovič on the occasion of his 60th birthday. The Vertex Cover problem is often referred to as the Drosophila of parameterized complexity. It enjoys a long history. New and worthy perspectives will always be demonstrated first with concrete results here. This survey discusses several research…
▽ More
We are pleased to dedicate this survey on kernelization of the Vertex Cover problem, to Professor Juraj Hromkovič on the occasion of his 60th birthday. The Vertex Cover problem is often referred to as the Drosophila of parameterized complexity. It enjoys a long history. New and worthy perspectives will always be demonstrated first with concrete results here. This survey discusses several research directions in Vertex Cover kernelization. The Barrier Degree of Vertex Cover kernelization is discussed. We have reduction rules that kernelize vertices of small degree, including in this paper new results that reduce graphs almost to minimum degree five. Can this process go on forever? What is the minimum vertex-degree barrier for polynomial-time kernelization? Assuming the Exponential-Time Hypothesis, there is a minimum degree barrier. The idea of automated kernelization is discussed. We here report the first experimental results of an AI-guided branching algorithm for Vertex Cover whose logic seems amenable for application in finding reduction rules to kernelize small-degree vertices. The survey highlights a central open problem in parameterized complexity. Happy Birthday, Juraj!
△ Less
Submitted 13 May, 2019; v1 submitted 23 November, 2018;
originally announced November 2018.
-
Constructing a Consensus Phylogeny from a Leaf-Removal Distance
Authors:
Cedric Chauve,
Mark Jones,
Manuel Lafond,
Céline Scornavacca,
Mathias Weller
Abstract:
Understanding the evolution of a set of genes or species is a fundamental problem in evolutionary biology. The problem we study here takes as input a set of trees describing {possibly discordant} evolutionary scenarios for a given set of genes or species, and aims at finding a single tree that minimizes the leaf-removal distance to the input trees. This problem is a specific instance of the genera…
▽ More
Understanding the evolution of a set of genes or species is a fundamental problem in evolutionary biology. The problem we study here takes as input a set of trees describing {possibly discordant} evolutionary scenarios for a given set of genes or species, and aims at finding a single tree that minimizes the leaf-removal distance to the input trees. This problem is a specific instance of the general consensus/supertree problem, widely used to combine or summarize discordant evolutionary trees. The problem we introduce is specifically tailored to address the case of discrepancies between the input trees due to the misplacement of individual taxa. Most supertree or consensus tree problems are computationally intractable, and we show that the problem we introduce is also NP-hard. We provide tractability results in form of a 2-approximation algorithm. We also introduce a variant that minimizes the maximum number $d$ of leaves that are removed from any input tree, and provide a parameterized algorithm for this problem with parameter $d$.
△ Less
Submitted 8 July, 2019; v1 submitted 15 May, 2017;
originally announced May 2017.
-
Linear-Time Tree Containment in Phylogenetic Networks
Authors:
Mathias Weller
Abstract:
We consider the NP-hard Tree Containment problem that has important applications in phylogenetics. The problem asks if a given leaf-labeled network contains a subdivision of a given leaf-labeled tree. We develop a fast algorithm for the case that the input network is indeed a tree in which multiple leaves might share a label. By combining this algorithm with a generalization of a previously known…
▽ More
We consider the NP-hard Tree Containment problem that has important applications in phylogenetics. The problem asks if a given leaf-labeled network contains a subdivision of a given leaf-labeled tree. We develop a fast algorithm for the case that the input network is indeed a tree in which multiple leaves might share a label. By combining this algorithm with a generalization of a previously known decomposition scheme, we improve the running time on reticulation visible networks and nearly stable networks to linear time. While these are special classes of networks, they rank among the most general of the previously considered classes.
△ Less
Submitted 21 February, 2017;
originally announced February 2017.
-
On the Complexity of Hub Labeling
Authors:
Maxim Babenko,
Andrew V. Goldberg,
Haim Kaplan,
Ruslan Savchenko,
Mathias Weller
Abstract:
Hub Labeling (HL) is a data structure for distance oracles. Hierarchical HL (HHL) is a special type of HL, that received a lot of attention from a practical point of view. However, theoretical questions such as NP-hardness and approximation guarantee for HHL algorithms have been left aside. In this paper we study HL and HHL from the complexity theory point of view. We prove that both HL and HHL ar…
▽ More
Hub Labeling (HL) is a data structure for distance oracles. Hierarchical HL (HHL) is a special type of HL, that received a lot of attention from a practical point of view. However, theoretical questions such as NP-hardness and approximation guarantee for HHL algorithms have been left aside. In this paper we study HL and HHL from the complexity theory point of view. We prove that both HL and HHL are NP-hard, and present upper and lower bounds for the approximation ratios of greedy HHL algorithms used in practice. We also introduce a new variant of the greedy HHL algorithm and a proof that it produces small labels for graphs with small highway dimension.
△ Less
Submitted 11 January, 2015;
originally announced January 2015.
-
Optimal Hub Labeling is NP-complete
Authors:
Mathias Weller
Abstract:
Distance labeling is a preprocessing technique introduced by Peleg [Journal of Graph Theory, 33(3)] to speed up distance queries in large networks. Herein, each vertex receives a (short) label and, the distance between two vertices can be inferred from their two labels. One such preprocessing problem occurs in the hub labeling algorithm [Abraham et al., SODA'10]: the label of a vertex v is a set o…
▽ More
Distance labeling is a preprocessing technique introduced by Peleg [Journal of Graph Theory, 33(3)] to speed up distance queries in large networks. Herein, each vertex receives a (short) label and, the distance between two vertices can be inferred from their two labels. One such preprocessing problem occurs in the hub labeling algorithm [Abraham et al., SODA'10]: the label of a vertex v is a set of vertices x (the "hubs") with their distance d(x,v) to v and the distance between any two vertices u and v is the sum of their distances to a common hub. The problem of assigning as few such hubs as possible was conjectured to be NP-hard, but no proof was known to date. We give a reduction from the well-known Vertex Cover problem on graphs to prove that finding an optimal hub labeling is indeed NP-hard.
△ Less
Submitted 31 July, 2014;
originally announced July 2014.
-
A Polynomial-time Algorithm for Outerplanar Diameter Improvement
Authors:
Nathann Cohen,
Daniel Gonçalves,
Eun Jung Kim,
Christophe Paul,
Ignasi Sau,
Dimitrios M. Thilikos,
Mathias Weller
Abstract:
The Outerplanar Diameter Improvement problem asks, given a graph $G$ and an integer $D$, whether it is possible to add edges to $G$ in a way that the resulting graph is outerplanar and has diameter at most $D$. We provide a dynamic programming algorithm that solves this problem in polynomial time. Outerplanar Diameter Improvement demonstrates several structural analogues to the celebrated and chal…
▽ More
The Outerplanar Diameter Improvement problem asks, given a graph $G$ and an integer $D$, whether it is possible to add edges to $G$ in a way that the resulting graph is outerplanar and has diameter at most $D$. We provide a dynamic programming algorithm that solves this problem in polynomial time. Outerplanar Diameter Improvement demonstrates several structural analogues to the celebrated and challenging Planar Diameter Improvement problem, where the resulting graph should, instead, be planar. The complexity status of this latter problem is open.
△ Less
Submitted 23 May, 2014; v1 submitted 22 March, 2014;
originally announced March 2014.
-
Interval scheduling and colorful independent sets
Authors:
René van Bevern,
Matthias Mnich,
Rolf Niedermeier,
Mathias Weller
Abstract:
Numerous applications in scheduling, such as resource allocation or steel manufacturing, can be modeled using the NP-hard Independent Set problem (given an undirected graph and an integer k, find a set of at least k pairwise non-adjacent vertices). Here, one encounters special graph classes like 2-union graphs (edge-wise unions of two interval graphs) and strip graphs (edge-wise unions of an inter…
▽ More
Numerous applications in scheduling, such as resource allocation or steel manufacturing, can be modeled using the NP-hard Independent Set problem (given an undirected graph and an integer k, find a set of at least k pairwise non-adjacent vertices). Here, one encounters special graph classes like 2-union graphs (edge-wise unions of two interval graphs) and strip graphs (edge-wise unions of an interval graph and a cluster graph), on which Independent Set remains NP-hard but admits constant-ratio approximations in polynomial time. We study the parameterized complexity of Independent Set on 2-union graphs and on subclasses like strip graphs. Our investigations significantly benefit from a new structural "compactness" parameter of interval graphs and novel problem formulations using vertex-colored interval graphs. Our main contributions are:
1. We show a complexity dichotomy: restricted to graph classes closed under induced subgraphs and disjoint unions, Independent Set is polynomial-time solvable if both input interval graphs are cluster graphs, and is NP-hard otherwise.
2. We chart the possibilities and limits of effective polynomial-time preprocessing (also known as kernelization).
3. We extend Halldórsson and Karlsson (2006)'s fixed-parameter algorithm for Independent Set on strip graphs parameterized by the structural parameter "maximum number of live jobs" to show that the problem (also known as Job Interval Selection) is fixed-parameter tractable with respect to the parameter k and generalize their algorithm from strip graphs to 2-union graphs. Preliminary experiments with random data indicate that Job Interval Selection with up to fifteen jobs and 5*10^5 intervals can be solved optimally in less than five minutes.
△ Less
Submitted 12 July, 2014; v1 submitted 4 February, 2014;
originally announced February 2014.