-
SpanSeq: Similarity-based sequence data splitting method for improved development and assessment of deep learning projects
Authors:
Alfred Ferrer Florensa,
Jose Juan Almagro Armenteros,
Henrik Nielsen,
Frank Møller Aarestrup,
Philip Thomas Lanken Conradsen Clausen
Abstract:
The use of deep learning models in computational biology has increased massively in recent years, and is expected to do so further with the current advances in fields like Natural Language Processing. These models, although able to draw complex relations between input and target, are also largely inclined to learn noisy deviations from the pool of data used during their development. In order to as…
▽ More
The use of deep learning models in computational biology has increased massively in recent years, and is expected to do so further with the current advances in fields like Natural Language Processing. These models, although able to draw complex relations between input and target, are also largely inclined to learn noisy deviations from the pool of data used during their development. In order to assess their performance on unseen data (their capacity to generalize), it is common to randomly split the available data in development (train/validation) and test sets. This procedure, although standard, has lately been shown to produce dubious assessments of generalization due to the existing similarity between samples in the databases used. In this work, we present SpanSeq, a database partition method for machine learning that can scale to most biological sequences (genes, proteins and genomes) in order to avoid data leakage between sets. We also explore the effect of not restraining similarity between sets by reproducing the development of the state-of-the-art model DeepLoc, not only confirming the consequences of randomly splitting databases on the model assessment, but expanding those repercussions to the model development. SpanSeq is available for downloading and installing at https://github.com/genomicepidemiology/SpanSeq.
△ Less
Submitted 5 March, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
The Highest-Redshift Balmer Breaks as a Test of $Λ$CDM
Authors:
Charles L. Steinhardt,
Albert Sneppen,
Thorbjørn Clausen,
Harley Katz,
Martin P. Rey,
Jonas Stahlschmidt
Abstract:
Recent studies have reported tension between the presence of luminous, high-redshift galaxies and the halo mass functions predicted by standard cosmology. Here, an improved test is proposed using the presence of high-redshift Balmer breaks to probe the formation of early $10^4 - 10^5 M_\odot$ baryonic minihalos. Unlike previous tests, this does not depend upon the mass-to-light ratio, stellar init…
▽ More
Recent studies have reported tension between the presence of luminous, high-redshift galaxies and the halo mass functions predicted by standard cosmology. Here, an improved test is proposed using the presence of high-redshift Balmer breaks to probe the formation of early $10^4 - 10^5 M_\odot$ baryonic minihalos. Unlike previous tests, this does not depend upon the mass-to-light ratio, stellar initial mass function, or star-formation history, which are all weakly constrained at high redshift. We show that the strongest Balmer breaks allowed at $z = 9$ using the simplest $Λ$CDM cosmological model have $D_{4000} \leq 1.26$ under idealized circumstances and $D_{4000} \leq 1.14$ including realistic feedback models. Since current photometric template fitting to JWST sources infers the existence of stronger Balmer breaks out to $z \gtrsim 11$, upcoming spectroscopic followup will either demonstrate those templates are invalid at high redshift or imply new physics beyond `vanilla' $Λ$CDM.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
Efficient Data-Driven Network Functions
Authors:
Zhiyuan Yao,
Yoann Desmouceaux,
Juan-Antonio Cordero-Fuertes,
Mark Townsley,
Thomas Heide Clausen
Abstract:
Cloud environments require dynamic and adaptive networking policies. It is preferred to use heuristics over advanced learning algorithms in Virtual Network Functions (VNFs) in production becuase of high-performance constraints. This paper proposes Aquarius to passively yet efficiently gather observations and enable the use of machine learning to collect, infer, and supply accurate networking state…
▽ More
Cloud environments require dynamic and adaptive networking policies. It is preferred to use heuristics over advanced learning algorithms in Virtual Network Functions (VNFs) in production becuase of high-performance constraints. This paper proposes Aquarius to passively yet efficiently gather observations and enable the use of machine learning to collect, infer, and supply accurate networking state information-without incurring additional signalling and management overhead. This paper illustrates the use of Aquarius with a traffic classifier, an autoscaling system, and a load balancer-and demonstrates the use of three different machine learning paradigms-unsupervised, supervised, and reinforcement learning, within Aquarius, for inferring network state. Testbed evaluations show that Aquarius increases network state visibility and brings notable performance gains with low overhead.
△ Less
Submitted 24 August, 2022;
originally announced August 2022.
-
Multi-Agent Reinforcement Learning for Network Load Balancing in Data Center
Authors:
Zhiyuan Yao,
Zihan Ding,
Thomas Clausen
Abstract:
This paper presents the network load balancing problem, a challenging real-world task for multi-agent reinforcement learning (MARL) methods. Traditional heuristic solutions like Weighted-Cost Multi-Path (WCMP) and Local Shortest Queue (LSQ) are less flexible to the changing workload distributions and arrival rates, with a poor balance among multiple load balancers. The cooperative network load bal…
▽ More
This paper presents the network load balancing problem, a challenging real-world task for multi-agent reinforcement learning (MARL) methods. Traditional heuristic solutions like Weighted-Cost Multi-Path (WCMP) and Local Shortest Queue (LSQ) are less flexible to the changing workload distributions and arrival rates, with a poor balance among multiple load balancers. The cooperative network load balancing task is formulated as a Dec-POMDP problem, which naturally induces the MARL methods. To bridge the reality gap for applying learning-based methods, all methods are directly trained and evaluated on an emulation system from moderate-to large-scale. Experiments on realistic testbeds show that the independent and "selfish" load balancing strategies are not necessarily the globally optimal ones, while the proposed MARL solution has a superior performance over different realistic settings. Additionally, the potential difficulties of MARL methods for network load balancing are analysed, which helps to draw the attention of the learning and network communities to such challenges.
△ Less
Submitted 19 August, 2022; v1 submitted 27 January, 2022;
originally announced January 2022.
-
Reinforced Workload Distribution Fairness
Authors:
Zhiyuan Yao,
Zihan Ding,
Thomas Heide Clausen
Abstract:
Network load balancers are central components in data centers, that distributes workloads across multiple servers and thereby contribute to offering scalable services. However, when load balancers operate in dynamic environments with limited monitoring of application server loads, they rely on heuristic algorithms that require manual configurations for fairness and performance. To alleviate that,…
▽ More
Network load balancers are central components in data centers, that distributes workloads across multiple servers and thereby contribute to offering scalable services. However, when load balancers operate in dynamic environments with limited monitoring of application server loads, they rely on heuristic algorithms that require manual configurations for fairness and performance. To alleviate that, this paper proposes a distributed asynchronous reinforcement learning mechanism to-with no active load balancer state monitoring and limited network observations-improve the fairness of the workload distribution achieved by a load balancer. The performance of proposed mechanism is evaluated and compared with stateof-the-art load balancing algorithms in a simulator, under configurations with progressively increasing complexities. Preliminary results show promise in RLbased load balancing algorithms, and identify additional challenges and future research directions, including reward function design and model scalability.
△ Less
Submitted 29 October, 2021;
originally announced November 2021.
-
Towards Intelligent Load Balancing in Data Centers
Authors:
Zhiyuan Yao,
Yoann Desmouceaux,
Mark Townsley,
Thomas Heide Clausen
Abstract:
Network load balancers are important components in data centers to provide scalable services. Workload distribution algorithms are based on heuristics, e.g., Equal-Cost Multi-Path (ECMP), Weighted-Cost Multi-Path (WCMP) or naive machine learning (ML) algorithms, e.g., ridge regression. Advanced ML-based approaches help achieve performance gain in different networking and system problems. However,…
▽ More
Network load balancers are important components in data centers to provide scalable services. Workload distribution algorithms are based on heuristics, e.g., Equal-Cost Multi-Path (ECMP), Weighted-Cost Multi-Path (WCMP) or naive machine learning (ML) algorithms, e.g., ridge regression. Advanced ML-based approaches help achieve performance gain in different networking and system problems. However, it is challenging to apply ML algorithms on networking problems in real-life systems. It requires domain knowledge to collect features from low-latency, high-throughput, and scalable networking systems, which are dynamic and heterogenous. This paper proposes Aquarius to bridge the gap between ML and networking systems and demonstrates its usage in the context of network load balancers. This paper demonstrates its ability of conducting both offline data analysis and online model deployment in realistic systems. The results show that the ML model trained and deployed using Aquarius improves load balancing performance yet they also reveals more challenges to be resolved to apply ML for networking systems.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
Charon: Load-Aware Load-Balancing in P4
Authors:
Carmine Rizzi,
Zhiyuan Yao,
Yoann Desmouceaux,
Mark Townsley,
Thomas Heide Clausen
Abstract:
Load-Balancers play an important role in data centers as they distribute network flows across application servers and guarantee per-connection consistency. It is hard however to make fair load balancing decisions so that all resources are efficiently occupied yet not overloaded. Tracking connection states allows to infer server load states and make informed decisions, but at the cost of additional…
▽ More
Load-Balancers play an important role in data centers as they distribute network flows across application servers and guarantee per-connection consistency. It is hard however to make fair load balancing decisions so that all resources are efficiently occupied yet not overloaded. Tracking connection states allows to infer server load states and make informed decisions, but at the cost of additional memory space consumption. This makes it hard to implement on programmable hardware, which has constrained memory but offers line-rate performance. This paper presents Charon, a stateless load-aware load balancer that has line-rate performance implemented in P4-NetFPGA. Charon passively collects load states from application servers and employs the power-of-2-choices scheme to make data-driven load balancing decisions and improve resource utilization. Perconnection consistency is preserved statelessly by encoding server ID in a covert channel. The prototype design and implementation details are described in this paper. Simulation results show performance gains in terms of load distribution fairness, quality of service, throughput and processing latency.
△ Less
Submitted 2 November, 2021; v1 submitted 27 October, 2021;
originally announced October 2021.
-
Mock hyperbolic reflection spaces and Frobenius groups of finite Morley rank
Authors:
Tim Clausen,
Katrin Tent
Abstract:
We define the notion of mock hyperbolic reflection spaces and use it to study Frobenius groups, in particular in the context of groups of finite Morley rank including the so-called bad groups. We show that connected Frobenius groups of finite Morley rank and odd type with nilpotent complement split or interpret a bad field of characteristic zero. Furthermore, we show that mock hyperbolic reflectio…
▽ More
We define the notion of mock hyperbolic reflection spaces and use it to study Frobenius groups, in particular in the context of groups of finite Morley rank including the so-called bad groups. We show that connected Frobenius groups of finite Morley rank and odd type with nilpotent complement split or interpret a bad field of characteristic zero. Furthermore, we show that mock hyperbolic reflection spaces of finite Morley rank satisfy certain rank inequalities, implying in particular that any connected Frobenius group of odd type and Morley rank at most ten either splits or is a simple non-split sharply 2-transitive group of characteristic different from 2 and of Morley rank 8 or 10.
△ Less
Submitted 20 April, 2021;
originally announced April 2021.
-
All-Optical Nonlinear Pre-Compensation of Long-Reach Unrepeatered Systems
Authors:
Pawel M. Kaminski,
Tiago Sutili,
José Hélio da Cruz Júnior,
Glauco C. C. P. Simões,
Francesco Da Ros,
Metodi P. Yankov,
Henrik E. Hansen,
Anders T. Clausen,
Søren Forchhammer,
Leif K. Oxenløwe,
Rafael C. Figueiredo,
Michael Galili
Abstract:
We numerically demonstrate an all-optical nonlinearity pre-compensation module for state-of-the-art long-reach Raman-amplified unrepeatered links. The compensator design is optimized in terms of propagation symmetry to maximize the performance gains under WDM transmission, achieving 4.0dB and 2.6dB of SNR improvement for 250-km and 350-km links.
We numerically demonstrate an all-optical nonlinearity pre-compensation module for state-of-the-art long-reach Raman-amplified unrepeatered links. The compensator design is optimized in terms of propagation symmetry to maximize the performance gains under WDM transmission, achieving 4.0dB and 2.6dB of SNR improvement for 250-km and 350-km links.
△ Less
Submitted 6 January, 2021;
originally announced January 2021.
-
Dp-minimal profinite groups and valuations on the integers
Authors:
Tim Clausen
Abstract:
We study dp-minimal infinite profinite groups that are equipped with a uniformly definable fundamental system of open subgroups. We show that these groups have an open subgroup $A$ such that either $A$ is a direct product of countably many copies of $\mathbb{F}_p$ for some prime $p$, or $A$ is of the form $A \cong \prod_p \mathbb{Z}_p^{α_p} \times A_p$ where $α_p < ω$ and $A_p$ is a finite abelian…
▽ More
We study dp-minimal infinite profinite groups that are equipped with a uniformly definable fundamental system of open subgroups. We show that these groups have an open subgroup $A$ such that either $A$ is a direct product of countably many copies of $\mathbb{F}_p$ for some prime $p$, or $A$ is of the form $A \cong \prod_p \mathbb{Z}_p^{α_p} \times A_p$ where $α_p < ω$ and $A_p$ is a finite abelian $p$-group for each prime $p$. Moreover, we show that if $A$ is of this form, then there is a fundamental system of open subgroups such that the expansion of $A$ by this family of subgroups is dp-minimal. Our main ingredient is a quantifier elimination result for a class of valued abelian groups. We also apply it to $(\mathbb{Z},+)$ and we show that if we expand $(\mathbb{Z},+)$ by any chain of subgroups $(B_i)_{i<ω}$, we obtain a dp-minimal structure. This structure is distal if and only if the size of the quotients $B_i/B_{i+1}$ is bounded.
△ Less
Submitted 20 August, 2020;
originally announced August 2020.
-
On the geometry of sharply 2-transitive groups
Authors:
Tim Clausen,
Katrin Tent
Abstract:
We show that the geometry associated to certain non-split sharply 2-transitive groups does not contain a proper projective plane. For a sharply 2-transitive group of finite Morley rank we improve known rank inequalities for this geometry and conclude that a sharply 2-transitive group of Morley rank 6 must be of the form $K\rtimes K^*$ for some algebraically closed field $K$.
We show that the geometry associated to certain non-split sharply 2-transitive groups does not contain a proper projective plane. For a sharply 2-transitive group of finite Morley rank we improve known rank inequalities for this geometry and conclude that a sharply 2-transitive group of Morley rank 6 must be of the form $K\rtimes K^*$ for some algebraically closed field $K$.
△ Less
Submitted 12 February, 2020;
originally announced February 2020.
-
Experimental Investigation of the Effect of Pilot Tone Modulation on Partial Response Modulation Formats
Authors:
Peter Madsen,
Anders T. Clausen,
Annika Dochhan,
Michael Eiselt
Abstract:
This paper presents an experimental investigation of 8% pilot tone modulation depth is a system transmitting NRZ, PAM4 and Duobinary. The penalty from the pilot tone increases with signal amplitude levels and reaches a received power penalty of 3 dB.
This paper presents an experimental investigation of 8% pilot tone modulation depth is a system transmitting NRZ, PAM4 and Duobinary. The penalty from the pilot tone increases with signal amplitude levels and reaches a received power penalty of 3 dB.
△ Less
Submitted 9 January, 2019;
originally announced January 2019.
-
Some model theory of profinite groups
Authors:
Tim Clausen,
Katrin Tent
Abstract:
We give some background on uniform pro-p groups and the model theory of profinite NIP groups.
We give some background on uniform pro-p groups and the model theory of profinite NIP groups.
△ Less
Submitted 20 May, 2017;
originally announced May 2017.
-
Security Issues in the Optimized Link State Routing Protocol Version 2 (OLSRV2)
Authors:
Ulrich Herberg,
Thomas Clausen
Abstract:
Mobile Ad hoc NETworks (MANETs) are leaving the confines of research laboratories, to find place in real-world deployments. Outside specialized domains (military, vehicular, etc.), city-wide communitynetworks are emerging, connecting regular Internet users with each other, and with the Internet, via MANETs. Growing to encompass more than a handful of "trusted participants", the question of preserv…
▽ More
Mobile Ad hoc NETworks (MANETs) are leaving the confines of research laboratories, to find place in real-world deployments. Outside specialized domains (military, vehicular, etc.), city-wide communitynetworks are emerging, connecting regular Internet users with each other, and with the Internet, via MANETs. Growing to encompass more than a handful of "trusted participants", the question of preserving the MANET network connectivity, even when faced with careless or malicious participants, arises, and must be addressed. A first step towards protecting a MANET is to analyze the vulnerabilities of the routing protocol, managing the connectivity. By understanding how the algorithms of the routing protocol operate, and how these can be exploited by those with ill intent, countermeasures can be developed, readying MANETs for wider deployment and use. This paper takes an abstract look at the algorithms that constitute the Optimized Link State Routing Protocol version 2 (OLSRv2), and identifies for each protocol element the possible vulnerabilities and attacks -- in a certain way, provides a "cookbook" for how to best attack an operational OLSRv2 network, or for how to proceed with develo** protective countermeasures against these attacks.
△ Less
Submitted 25 May, 2010;
originally announced May 2010.
-
A fresh look at 3D microwave ionization curves of hydrogen Rydberg atoms
Authors:
G. N. Rockwell,
V. F. Hoffman,
Th. Clausen,
R. Blümel
Abstract:
Analytical arguments and numerical simulations suggest that the shapes of 3D microwave ionization curves measured by Koch and collaborators (see P. M. Koch and K. A. H. van Leeuwen, Phys. Rep. {\bf 255}, 289 (1995)) depend only weakly on the angular momentum of the atoms in the initial microcanonical ensemble, but strongly on the principal quantum number and the magnetic quantum number. Based on…
▽ More
Analytical arguments and numerical simulations suggest that the shapes of 3D microwave ionization curves measured by Koch and collaborators (see P. M. Koch and K. A. H. van Leeuwen, Phys. Rep. {\bf 255}, 289 (1995)) depend only weakly on the angular momentum of the atoms in the initial microcanonical ensemble, but strongly on the principal quantum number and the magnetic quantum number. Based on this insight, coupled with the computational power of a high-end 60-node Beowulf PC cluster, we present the first 3D quantum calculations of microwave ionization curves in the experimentally relevant parameter regime.
△ Less
Submitted 26 July, 2001;
originally announced July 2001.