-
Enhancing Path Selections with Interference Graphs in Multihop Relay Wireless Networks
Authors:
Cao Vien Phung,
Andre Drummond,
Admela Jukan
Abstract:
The multihop relay wireless networks have gained traction due to the emergence of Reconfigurable Intelligent Surfaces (RISs) which can be used as relays in high frequency range wireless network, including THz or mmWave. To select paths in these networks, the transmission performance plays the key network in these networks. In this paper, we enhance and greatly simplify the path selection in multih…
▽ More
The multihop relay wireless networks have gained traction due to the emergence of Reconfigurable Intelligent Surfaces (RISs) which can be used as relays in high frequency range wireless network, including THz or mmWave. To select paths in these networks, the transmission performance plays the key network in these networks. In this paper, we enhance and greatly simplify the path selection in multihop relay RIS enabled wireless networks with what we refer to as interference graphs. Interference graphs are created based on SNR model, conical and cylindrical beam shapes in the transmission and the related interference model. Once created, they can be simply and efficiently used to select valid paths, without overestimation of the effect of interference. The results show that decreased ordering of conflict selections in the graphs yields the best results, as compared to conservative approach that tolerates no interference.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Maximizing Throughput with Routing Interference Avoidance in RIS-Assisted Relay Mesh Networks
Authors:
Cao Vien Phung,
Andre Drummond,
Admela Jukan
Abstract:
In the modern landscape of wireless communications, multi-hop, high-bandwidth, indoor Terahertz (THz) wireless communications are gaining significant attention. These systems couple Reconfigurable Intelligent Surface (RIS) and relay devices within the emerging 6G network framework, offering promising solutions for creating cell-less, indoor, and on-demand mesh networks. RIS devices are especially…
▽ More
In the modern landscape of wireless communications, multi-hop, high-bandwidth, indoor Terahertz (THz) wireless communications are gaining significant attention. These systems couple Reconfigurable Intelligent Surface (RIS) and relay devices within the emerging 6G network framework, offering promising solutions for creating cell-less, indoor, and on-demand mesh networks. RIS devices are especially attractive, constructed by an array of reflecting elements that can phase shifts, such that the reflecting signals can be focused, steered, and the power of the signal enhanced towards the destination. This paper presents an in-depth, analytical examination of how path allocation impacts interference within such networks. We develop the first model which analyzes interference based on the geometric parameters of beams (conic, cylindrical) as they interact with RIS, User Equipment (UE), and relay devices. We introduce a transmission scheduling heuristic designed to mitigate interference, alongside an efficient optimization method to maximize throughput. Our performance results elucidate the interference's effect on communication path quality and highlight effective path selection strategies with throughput maximization.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Multi-FedLS: a Framework for Cross-Silo Federated Learning Applications on Multi-Cloud Environments
Authors:
Rafaela C. Brum,
Maria Clicia Stelling de Castro,
Luciana Arantes,
Lúcia Maria de A. Drummond,
Pierre Sens
Abstract:
Federated Learning (FL) is a distributed Machine Learning (ML) technique that can benefit from cloud environments while preserving data privacy. We propose Multi-FedLS, a framework that manages multi-cloud resources, reducing execution time and financial costs of Cross-Silo Federated Learning applications by using preemptible VMs, cheaper than on-demand ones but that can be revoked at any time. Ou…
▽ More
Federated Learning (FL) is a distributed Machine Learning (ML) technique that can benefit from cloud environments while preserving data privacy. We propose Multi-FedLS, a framework that manages multi-cloud resources, reducing execution time and financial costs of Cross-Silo Federated Learning applications by using preemptible VMs, cheaper than on-demand ones but that can be revoked at any time. Our framework encloses four modules: Pre-Scheduling, Initial Map**, Fault Tolerance, and Dynamic Scheduler. This paper extends our previous work \cite{brum2022sbac} by formally describing the Multi-FedLS resource manager framework and its modules. Experiments were conducted with three Cross-Silo FL applications on CloudLab and a proof-of-concept confirms that Multi-FedLS can be executed on a multi-cloud composed by AWS and GCP, two commercial cloud providers. Results show that the problem of executing Cross-Silo FL applications in multi-cloud environments with preemptible VMs can be efficiently resolved using a mathematical formulation, fault tolerance techniques, and a simple heuristic to choose a new VM in case of revocation.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
Single-molecule fluorescence multiplexing by multi-parameter spectroscopic detection of nanostructured FRET labels
Authors:
Jiachong Chu,
Ayesha Ejaz,
Kyle M. Lin,
Madeline R. Joseph,
Aria E. Coraor,
D. Allan Drummond,
Allison H. Squires
Abstract:
Multiplexed, real-time fluorescence detection at the single-molecule level is highly desirable to reveal the stoichiometry, dynamics, and interactions of individual molecular species within complex systems. However, traditionally fluorescence sensing is limited to 3-4 concurrently detected labels, due to low signal-to-noise, high spectral overlap between labels, and the need to avoid dissimilar dy…
▽ More
Multiplexed, real-time fluorescence detection at the single-molecule level is highly desirable to reveal the stoichiometry, dynamics, and interactions of individual molecular species within complex systems. However, traditionally fluorescence sensing is limited to 3-4 concurrently detected labels, due to low signal-to-noise, high spectral overlap between labels, and the need to avoid dissimilar dye chemistries. We have engineered a palette of several dozen fluorescent labels, called FRETfluors, for spectroscopic multiplexing at the single-molecule level. Each FRETfluor is a compact nanostructure formed from the same three chemical building blocks (DNA, Cy3, and Cy5). The composition and dye-dye geometries create a characteristic Förster Resonance Energy Transfer (FRET) efficiency for each construct. In addition, we varied the local DNA sequence and attachment chemistry to alter the Cy3 and Cy5 emission properties and thereby shift the emission signatures of an entire series of FRET constructs to new sectors of the multi-parameter detection space. Unique spectroscopic emission of each FRETfluor is therefore conferred by a combination of FRET and this site-specific tuning of individual fluorophore photophysics. We show single-molecule identification of a set of 27 FRETfluors in a sample mixture using a subset of constructs statistically selected to minimize classification errors, measured using an Anti-Brownian ELectrokinetic (ABEL) trap which provides precise multi-parameter spectroscopic measurements. The ABEL trap also enables discrimination between FRETfluors attached to a target (here: mRNA) and unbound FRETfluors, eliminating the need for washes or removal of excess label by purification. We show single-molecule identification of a set of 27 FRETfluors in a sample mixture using a subset of constructs selected to minimize classification errors.
△ Less
Submitted 25 January, 2024; v1 submitted 4 July, 2023;
originally announced July 2023.
-
Enhancing detection of labor violations in the agricultural sector: A multilevel generalized linear regression model of H-2A violation counts
Authors:
Arezoo Jafari,
Priscila De Azevedo Drummond,
Dominic Nishigaya,
Shawn Bhimani,
Aidong Adam Ding,
Amy Farrell,
Kayse Lee Maass
Abstract:
Agricultural workers are essential to the supply chain for our daily food and yet, many face harmful work conditions, including garnished wages, and other labor violations. Workers on H-2A visas are particularly vulnerable due to the precarity of their immigration status being tied to their employer. Although worksite inspections are one mechanism to detect such violations, many labor violations a…
▽ More
Agricultural workers are essential to the supply chain for our daily food and yet, many face harmful work conditions, including garnished wages, and other labor violations. Workers on H-2A visas are particularly vulnerable due to the precarity of their immigration status being tied to their employer. Although worksite inspections are one mechanism to detect such violations, many labor violations affecting agricultural workers go undetected due to limited inspection resources. In this study, we identify multiple state and industry level factors that correlate with H-2A violations identified by the U.S. Department of Labor Wage and Hour Division using a multilevel zero-inflated negative binomial model. We find that three state-level factors (average farm acreage size, the number of agricultural establishments with less than 20 employees, and higher poverty rates) are correlated with H-2A violations. These findings provide guidance for inspection agencies regarding how to prioritize their limited resources to more effectively inspect agricultural workplaces, thereby improving workplace conditions for H-2A workers.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
Parallel Downlink Data Distribution in Indoor Multi-hop THz Networks
Authors:
Cao Vien Phung,
Andre Drummond,
Admela Jukan
Abstract:
The emerging dynamic Virtual Reality (VR) applications are the best candidate applications in high bandwidth indoor Terahertz (THz) wireless networks, with the Reconfigurable Intelligent Surface (RIS) devices presenting a breakthrough solution in extending the typically short THz communication range and alleviating line-of-sight link blockages. In future smart factories, it is envisioned that fact…
▽ More
The emerging dynamic Virtual Reality (VR) applications are the best candidate applications in high bandwidth indoor Terahertz (THz) wireless networks, with the Reconfigurable Intelligent Surface (RIS) devices presenting a breakthrough solution in extending the typically short THz communication range and alleviating line-of-sight link blockages. In future smart factories, it is envisioned that factory workers will use VR devices via VR application data with high quality resolution, while transmitting over THz links and RIS devices, enabled by the Mobile Edge Computing (MEC) capabilities. Since indoor RIS placement is static, whereas VR users move and send multiple VR data download requests simultaneously, there is a challenge of proper network load balancing, which if unaddressed can result in poor resource utilization and low throughput. To address this challenge, we propose a parallel downlink data distribution system and develop multi-criteria optimization solutions that can improve throughput, while transmitting each downlink data flow over a set of possible paths between source and destination devices. The results show that the proposed system can enhance the performance in terms of throughput benefit, as compared to the system using one serial download link distribution.
△ Less
Submitted 14 November, 2022;
originally announced November 2022.
-
TreeFlow: probabilistic programming and automatic differentiation for phylogenetics
Authors:
Christiaan Swanepoel,
Mathieu Fourment,
Xiang Ji,
Hassan Nasif,
Marc A Suchard,
Frederick A Matsen IV,
Alexei Drummond
Abstract:
Probabilistic programming frameworks are powerful tools for statistical modelling and inference. They are not immediately generalisable to phylogenetic problems due to the particular computational properties of the phylogenetic tree object. TreeFlow is a software library for probabilistic programming and automatic differentiation with phylogenetic trees. It implements inference algorithms for phyl…
▽ More
Probabilistic programming frameworks are powerful tools for statistical modelling and inference. They are not immediately generalisable to phylogenetic problems due to the particular computational properties of the phylogenetic tree object. TreeFlow is a software library for probabilistic programming and automatic differentiation with phylogenetic trees. It implements inference algorithms for phylogenetic tree times and model parameters given a tree topology. We demonstrate how TreeFlow can be used to quickly implement and assess new models. We also show that it provides reasonable performance for gradient-based inference algorithms compared to specialized computational libraries for phylogenetics.
△ Less
Submitted 9 November, 2022;
originally announced November 2022.
-
On the Efficiency and Quality of Protection of Preprovisioning in Elastic Optical Networks
Authors:
Paulo José S. Júnior,
Lucas R. Costa,
André C. Drummond
Abstract:
The study of protection techniques, such as pre-provisioning (off-line) and provisioning (on-line), has been explored in several ways in the optical network literature. In the new Elastic Optical Network (EON) paradigm, the pre-provisioning techniques were still little explored. Preprovisioning implies the prior allocation of resources in the network for the transport and protection of future conn…
▽ More
The study of protection techniques, such as pre-provisioning (off-line) and provisioning (on-line), has been explored in several ways in the optical network literature. In the new Elastic Optical Network (EON) paradigm, the pre-provisioning techniques were still little explored. Preprovisioning implies the prior allocation of resources in the network for the transport and protection of future connection demands, while the provisioning implies the allocation of resources when the demand arrives in the network. Applying preprovisioning reduces the downtime experienced by a connection after a failure, which will reduce unavailability and potentially avoid penalties for violation of Service Level Agreements (SLA) established with client networks. This work aims to explore the main protection techniques and evaluate their efficient in the EON scenario. The performance evaluation show that the use of preprovisioning techniques are more efficient, significantly reducing the network unavailability and bandwidth usage in EON networks. Our solution has an unavailability 40 times lower than shared solutions being only 4% above the optimum.
△ Less
Submitted 4 February, 2022;
originally announced February 2022.
-
Metropolitan Optical Networks: A Survey on New Architectures and Future Trends
Authors:
Léia Sousa de Sousa,
André Costa Drummond
Abstract:
Metropolitan optical networks are undergoing major transformations to continue being able to provide services that meet the requirements of the applications of the future. The arrival of the $5G$ will expand the possibilities for offering IoT applications, autonomous vehicles, and smart cities services while imposing strong pressure on the physical infrastructure currently implemented, as well as…
▽ More
Metropolitan optical networks are undergoing major transformations to continue being able to provide services that meet the requirements of the applications of the future. The arrival of the $5G$ will expand the possibilities for offering IoT applications, autonomous vehicles, and smart cities services while imposing strong pressure on the physical infrastructure currently implemented, as well as on static traffic engineering techniques that do not respond in an agile way to the dynamic and heterogeneous nature of the upcoming traffic patterns. In order to guarantee the strictest quality of service and quality of experience requirements for users, as well as meeting the providers' objectives of maintaining an acceptable trade-off between cost and performance, new architectures for metropolitan optical networks have been proposed in the literature, with a growing interest starting from $2017$. However, due to the proliferation of a dozen of new architectures in recent years, many questions need to be investigated regarding the planning, implementation, and management of these architectures, before they could be considered for practical application. This work presents a comprehensive survey of the new proposed architectures for metropolitan optical networks. Firstly, the main data transmission systems, equipment involved, and the structural organization of the new metro ecosystems are discussed. The already established and the novel architectures are presented, highlighting its characteristics and application, and comparative analysis among these architectures is carried out identifying the future technological trends. Finally, outstanding research questions are drawn to help direct future research on the field.
△ Less
Submitted 6 April, 2022; v1 submitted 25 January, 2022;
originally announced January 2022.
-
Bayesian inference of the climbing grade scale
Authors:
Alexei Drummond,
Alex Po**a
Abstract:
Climbing grades are used to classify a climbing route based on its perceived difficulty, and have come to play a central role in the sport of rock climbing. Recently, the first statistically rigorous method for estimating climbing grades from whole-history ascent data was described, based on the dynamic Bradley-Terry model for games between players of time-varying ability. In this paper, we implem…
▽ More
Climbing grades are used to classify a climbing route based on its perceived difficulty, and have come to play a central role in the sport of rock climbing. Recently, the first statistically rigorous method for estimating climbing grades from whole-history ascent data was described, based on the dynamic Bradley-Terry model for games between players of time-varying ability. In this paper, we implement inference under the whole-history rating model using Markov chain Monte Carlo and apply the method to a curated data set made up of climbers who climb regularly. We use these data to get an estimate of the model's fundamental scale parameter m, which defines the proportional increase in difficulty associated with an increment of grade. We show that the data conform to assumptions that the climbing grade scale is a logarithmic scale of difficulty, like decibels or stellar magnitude. We estimate that an increment in Ewbank, French and UIAA climbing grade systems corresponds to 2.1, 2.09 and 2.13 times increase in difficulty respectively, assuming a logistic model of probability of success as a function of grade. Whereas we find that the Vermin scale for bouldering (V-grade scale) corresponds to a 3.17 increase in difficulty per grade increment. In addition, we highlight potential connections between the logarithmic properties of climbing grade scales and the psychophysical laws of Weber and Fechner.
△ Less
Submitted 15 November, 2021;
originally announced November 2021.
-
Scheduling Bag-of-Tasks in Clouds using Spot and Burstable Virtual Machines
Authors:
Luan Teylo,
Luciana Arantes,
Pierre Sens,
Lúcia Maria de A. Drummond
Abstract:
Leading Cloud providers offer several types of Virtual Machines (VMs) in diverse contract models, with different guarantees in terms of availability and reliability. Among them, the most popular contract models are the on-demand and the spot models. In the former, on-demand VMs are allocated for a fixed cost per time unit, and their availability is ensured during the whole execution. On the other…
▽ More
Leading Cloud providers offer several types of Virtual Machines (VMs) in diverse contract models, with different guarantees in terms of availability and reliability. Among them, the most popular contract models are the on-demand and the spot models. In the former, on-demand VMs are allocated for a fixed cost per time unit, and their availability is ensured during the whole execution. On the other hand, in the spot market, VMs are offered with a huge discount when compared to the on-demand VMs, but their availability fluctuates according to the cloud's current demand that can terminate or hibernate a spot VM at any time. Furthermore, in order to cope with workload variations, cloud providers have also introduced the concept of burstable VMs which are able to burst up their respective baseline CPU performance during a limited period of time with an up to 20% discount when compared to an equivalent non-burstable on-demand VMs. In the current work, we present the Burst Hibernation-Aware Dynamic Scheduler (Burst-HADS), a framework that schedules and executes tasks of Bag-of-Tasks applications with deadline constraints by exploiting spot and on-demand burstable VMs, aiming at minimizing both the monetary cost and the execution time. Based on ILS metaheuristics, Burst-HADS defines an initial scheduling map of tasks to VMs which can then be dynamically altered by migrating tasks of a hibernated spot VM or by performing work-stealing when VMs become idle. Performance results on Amazon EC2 cloud with different applications show that, when compared to a solution that uses only regular on-demand instances, Burst-HADS reduces the monetary cost of the execution and meet the application deadline even in scenarios with high spot hibernation rates. It also reduces the total execution time when compared to a solution that uses only spot and non-burstable on-demand instances.
△ Less
Submitted 10 November, 2020;
originally announced November 2020.
-
Jamming-Aware Control Plane in Elastic Optical Networks
Authors:
Ítalo Brasileiro,
Mounir Bensalem,
André Drummond,
Admela Jukan
Abstract:
Physical layer security is essential in optical networks. In this paper, we study a jamming-aware control plane, in which a high power jamming attack exists in the network. The studied control plane considers that the jammed connections can be detected and avoided. We used a physical layer model, in which we embedded the additional jamming power, to evaluate different security in scenarios, such a…
▽ More
Physical layer security is essential in optical networks. In this paper, we study a jamming-aware control plane, in which a high power jamming attack exists in the network. The studied control plane considers that the jammed connections can be detected and avoided. We used a physical layer model, in which we embedded the additional jamming power, to evaluate different security in scenarios, such as a jamming-free scenario, jamming with an unaware controller, and jamming with an aware controller. The performance is analyzed in terms of the blocking rate and slots utilization. We analyze the impact of jamming attacks in the least used link and in the most used link on the network. The results demonstrates that the jamming avoidance by the control plane can reach performance near the not jammed scenario.
△ Less
Submitted 2 June, 2020;
originally announced June 2020.
-
Dynamic Multi-Modulation Allocation Scheme for Elastic Optical Networks
Authors:
Lucas R. Costa,
André C. Drummond
Abstract:
In order to deal with the recent rapid increase in Internet traffic, a transmission technology is required to enable the efficient use of the optical fiber spectrum while offering flexibility in network bandwidth. To meet these challenges, the emergence of Elastic Optical Networks (EON) has brought new conceptions in the operation of optical networks, improving flexibility and efficiency for the n…
▽ More
In order to deal with the recent rapid increase in Internet traffic, a transmission technology is required to enable the efficient use of the optical fiber spectrum while offering flexibility in network bandwidth. To meet these challenges, the emergence of Elastic Optical Networks (EON) has brought new conceptions in the operation of optical networks, improving flexibility and efficiency for the next generation core networks. In EON, traffic demands are typically supported by allocating bandwidth-variable optical channels with heterogeneous modulation formats in a spectral-efficient manner. Elastic optical path networks require the routing, modulation level, and spectrum allocation (RMLSA) to efficiently allocate optical spectrum resources to optical paths. To address the RMLSA problem, Modulation Scheme approaches have recently been proposed to allow the use of any routing and spectrum assignment (RSA) algorithm to solve the RMLSA problem. In this paper, we propose a new Modulation Scheme that enables the routing of traffic through dynamic multi-modulation allocation in multiple hops to achieve blocking performance improvement. Numerical results demonstrate that the proposed adaptive modulation scheme achieves a reduction in bandwidth blocking of up to two orders of magnitude in an underloaded network scenario, and 86% with higher loads, playing an important role in spectrum savings compared with the literature schemes.
△ Less
Submitted 10 February, 2020;
originally announced February 2020.
-
Embedding Jamming Attacks into Physical Layer Models in Optical Networks
Authors:
Mounir Bensalem,
Ítalo Brasileiro,
André Drummond,
Admela Jukan
Abstract:
Optical networks are prone to physical layer attacks, in particular the insertion of high jamming power. In this paper, we present a study of jamming attacks in elastic optical networks (EON) by embedding the jamming into the physical layer model, and we analyze its impact on the blocking probability and slots utilization. We evaluate our proposed model using a single link and a network topology a…
▽ More
Optical networks are prone to physical layer attacks, in particular the insertion of high jamming power. In this paper, we present a study of jamming attacks in elastic optical networks (EON) by embedding the jamming into the physical layer model, and we analyze its impact on the blocking probability and slots utilization. We evaluate our proposed model using a single link and a network topology and we show that for in-band-jamming, the slots utilization decreases with the increase of jamming power, and becomes null when the jamming power is higher than 3 dB, while for out-of-band jamming, the impact is maximal for a specific jamming power, 1.75 dB in our simulation. Considering multiple positions of attackers, we attained the highest blocking probability 32% for a specific jamming power 2 dB. We conclude that the impact of jamming depends on attacker positions as well as the jamming power.
△ Less
Submitted 7 February, 2020;
originally announced February 2020.
-
A survey on Crosstalk and Routing, Modulation Selection, Core and Spectrum Allocation in Elastic Optical Networks
Authors:
Ítalo Brasileiro,
Lucas Costa,
André Drummond
Abstract:
Elastic Optical Networks (EON) emerge as a viable solution to supply the current growing demand for bandwidth. With the application of multi-core fibers (MCF) in EON links, it is possible to increase the availability of spectral resources. An EON network with MCF enables Space-Division Multiplexing (SDM), allowing the use of more resources in the fibers and increasing the capacity of attending cir…
▽ More
Elastic Optical Networks (EON) emerge as a viable solution to supply the current growing demand for bandwidth. With the application of multi-core fibers (MCF) in EON links, it is possible to increase the availability of spectral resources. An EON network with MCF enables Space-Division Multiplexing (SDM), allowing the use of more resources in the fibers and increasing the capacity of attending circuit requests. However, the use of SDM brings some problems of interference between the circuits of a fiber, with greater emphasis on crosstalk interference. In this paper, some important concepts around EON are presented, along with the characterization of SDM technology. The Routing, Modulation, Spectrum and Core Allocation (RMSCA) problem is also characterized, and some solutions currently found in the literature are cited. After, the impact of crosstalk interference is discussed, and which elements are responsible for its occurrence. The paper is concludes with an evaluation of the state of the art, and the discrimination of the main points found from the study of papers related to the SDM-EON scenario.
△ Less
Submitted 19 July, 2019;
originally announced July 2019.
-
A Bag-of-Tasks Scheduler Tolerant to Temporal Failures in Clouds
Authors:
Luan Teylo,
Lúcia Maria de A. Drummond,
Luciana Arantes,
Pierre Sens
Abstract:
Cloud platforms have emerged as a prominent environment to execute high performance computing (HPC) applications providing on-demand resources as well as scalability. They usually offer different classes of Virtual Machines (VMs) which ensure different guarantees in terms of availability and volatility, provisioning the same resource through multiple pricing models. For instance, in Amazon EC2 clo…
▽ More
Cloud platforms have emerged as a prominent environment to execute high performance computing (HPC) applications providing on-demand resources as well as scalability. They usually offer different classes of Virtual Machines (VMs) which ensure different guarantees in terms of availability and volatility, provisioning the same resource through multiple pricing models. For instance, in Amazon EC2 cloud, the user pays per hour for on-demand VMs while spot VMs are unused instances available for lower price. Despite the monetary advantages, a spot VM can be terminated, stopped, or hibernated by EC2 at any moment.
Using both hibernation-prone spot VMs (for cost sake) and on-demand VMs, we propose in this paper a static scheduling for HPC applications which are composed by independent tasks (bag-of-task) with deadline constraints. However, if a spot VM hibernates and it does not resume within a time which guarantees the application's deadline, a temporal failure takes place. Our scheduling, thus, aims at minimizing monetary costs of bag-of-tasks applications in EC2 cloud, respecting its deadline and avoiding temporal failures. To this end, our algorithm statically creates two scheduling maps: (i) the first one contains, for each task, its starting time and on which VM (i.e., an available spot or on-demand VM with the current lowest price) the task should execute; (ii) the second one contains, for each task allocated on a VM spot in the first map, its starting time and on which on-demand VM it should be executed to meet the application deadline in order to avoid temporal failures. The latter will be used whenever the hibernation period of a spot VM exceeds a time limit.
Performance results from simulation with task execution traces, configuration of Amazon EC2 VM classes, and VMs market history confirms the effectiveness of our scheduling and that it tolerates temporal failures.
△ Less
Submitted 24 October, 2018;
originally announced October 2018.
-
The fossilized birth-death model for the analysis of stratigraphic range data under different speciation concepts
Authors:
Tanja Stadler,
Alexandra Gavryushkina,
Rachel C. M. Warnock,
Alexei J. Drummond,
Tracy A. Heath
Abstract:
A birth-death-sampling model gives rise to phylogenetic trees with samples from the past and the present. Interpreting "birth" as branching speciation, "death" as extinction, and "sampling" as fossil preservation and recovery, this model -- also referred to as the fossilized birth-death (FBD) model -- gives rise to phylogenetic trees on extant and fossil samples. The model has been mathematically…
▽ More
A birth-death-sampling model gives rise to phylogenetic trees with samples from the past and the present. Interpreting "birth" as branching speciation, "death" as extinction, and "sampling" as fossil preservation and recovery, this model -- also referred to as the fossilized birth-death (FBD) model -- gives rise to phylogenetic trees on extant and fossil samples. The model has been mathematically analyzed and successfully applied to a range of datasets on different taxonomic levels, such as penguins, plants, and insects. However, the current mathematical treatment of this model does not allow for a group of temporally distinct fossil specimens to be assigned to the same species. In this paper, we provide a general mathematical FBD modeling framework that explicitly takes "stratigraphic ranges" into account, with a stratigraphic range being defined as the lineage interval associated with a single species, ranging through time from the first to the last fossil appearance of the species. To assign a sequence of fossil samples in the phylogenetic tree to the same species, i.e., to specify a stratigraphic range, we need to define the mode of speciation. We provide expressions to account for three common speciation modes: budding (or asymmetric) speciation, bifurcating (or symmetric) speciation, and anagenetic speciation. Our equations allow for flexible joint Bayesian analysis of paleontological and neontological data. Furthermore, our framework is directly applicable to epidemiology, where a stratigraphic range is the observed duration of infection of a single patient, "birth" via budding is transmission, "death" is recovery, and "sampling" is sequencing the pathogen of a patient. Thus, we present a model that allows for incorporation of multiple observations through time from a single patient.
△ Less
Submitted 9 March, 2018; v1 submitted 30 June, 2017;
originally announced June 2017.
-
A Quantitative Model for Predicting Cross-application Interference in Virtual Environments
Authors:
Maicon Melo Alves,
Lúcia Maria de Assumpção Drummond
Abstract:
Cross-application interference can affect drastically performance of HPC applications when running in clouds. This problem is caused by concurrent access performed by co-located applications to shared and non-sliceable resources such as cache and memory. In order to address this issue, some works adopted a qualitative approach that does not take into account the amount of access to shared resource…
▽ More
Cross-application interference can affect drastically performance of HPC applications when running in clouds. This problem is caused by concurrent access performed by co-located applications to shared and non-sliceable resources such as cache and memory. In order to address this issue, some works adopted a qualitative approach that does not take into account the amount of access to shared resources. In addition, a few works, even considering the amount of access, evaluated just the SLLC access contention as the root of this problem. However, our experiments revealed that interference is intrinsically related to the amount of simultaneous access to shared resources, besides showing that another shared resources, apart from SLLC, can also influence the interference suffered by co-located applications. In this paper, we present a quantitative model for predicting cross-application interference in virtual environments. Our proposed model takes into account the amount of simultaneous access to SLLC, DRAM and virtual network, and the similarity of application's access burden to predict the level of interference suffered by applications when co-located in a same physical machine. Experiments considering a real petroleum reservoir simulator and applications from HPCC benchmark showed that our model reached an average and maximum prediction errors around 4\% and 12\%, besides achieving an error less than 10\% in approximately 96\% of all tested cases.
△ Less
Submitted 13 October, 2016;
originally announced October 2016.
-
Bayesian phylogenetic estimation of fossil ages
Authors:
Alexei J. Drummond,
Tanja Stadler
Abstract:
Recent advances have allowed for both morphological fossil evidence and molecular sequences to be integrated into a single combined inference of divergence dates under the rule of Bayesian probability. In particular the fossilized birth-death tree prior and the Lewis-Mk model of discrete morphological evolution allow for the estimation of both divergence times and phylogenetic relationships betwee…
▽ More
Recent advances have allowed for both morphological fossil evidence and molecular sequences to be integrated into a single combined inference of divergence dates under the rule of Bayesian probability. In particular the fossilized birth-death tree prior and the Lewis-Mk model of discrete morphological evolution allow for the estimation of both divergence times and phylogenetic relationships between fossil and extant taxa. We exploit this statistical framework to investigate the internal consistency of these models by producing phylogenetic estimates of the age of each fossil in turn, within two rich and well-characterized data sets of fossil and extant species (penguins and canids). We find that the estimation accuracy of fossil ages is generally high with credible intervals seldom excluding the true age and median relative error in the two data sets of 5.7% and 13.2% respectively. The median relative standard error (RSD) was 9.2% and 7.2% respectively, suggesting good precision, although with some outliers. In fact in the two data sets we analyze the phylogenetic estimates of fossil age is on average < 2 My from the midpoint age of the geological strata from which it was excavated. The high level of internal consistency found in our analyses suggests that the Bayesian statistical model employed is an adequate fit for both the geological and morphological data, and provides evidence from real data that the framework used can accurately model the evolution of discrete morphological traits coded from fossil and extant taxa. We anticipate that this approach will have diverse applications beyond divergence time dating, including dating fossils that are temporally unconstrained, testing of the "morphological clock", and for uncovering potential model misspecification and/or data errors when controversial phylogenetic hypotheses are obtained based on combined divergence dating analyses.
△ Less
Submitted 3 May, 2016; v1 submitted 27 January, 2016;
originally announced January 2016.
-
Solving the Quadratic Assignment Problem on heterogeneous environment (CPUs and GPUs) with the application of Level 2 Reformulation and Linearization Technique
Authors:
Alexandre Domingues Gonçalves,
Artur Alves Pessoa,
Lúcia Maria de Assumpção Drummond,
Cristiana Bentes,
Ricardo Farias
Abstract:
The Quadratic Assignment Problem, QAP, is a classic combinatorial optimization problem, classified as NP-hard and widely studied. This problem consists in assigning N facilities to N locations obeying the relation of 1 to 1, aiming to minimize costs of the displacement between the facilities. The application of Reformulation and Linearization Technique, RLT, to the QAP leads to a tight linear rela…
▽ More
The Quadratic Assignment Problem, QAP, is a classic combinatorial optimization problem, classified as NP-hard and widely studied. This problem consists in assigning N facilities to N locations obeying the relation of 1 to 1, aiming to minimize costs of the displacement between the facilities. The application of Reformulation and Linearization Technique, RLT, to the QAP leads to a tight linear relaxation but large and difficult to solve. Previous works based on level 3 RLT needed about 700GB of working memory to process one large instances (N = 30 facilities). We present a modified version of the algorithm proposed by Adams et al. which executes on heterogeneous systems (CPUs and GPUs), based on level 2 RLT. For some instances, our algorithm is up to 140 times faster and occupy 97% less memory than the level 3 RLT version. The proposed algorithm was able to solve by first time two instances: tai35b and tai40b.
△ Less
Submitted 7 October, 2015;
originally announced October 2015.
-
Computational Performance and Statistical Accuracy of *BEAST and Comparisons with Other Methods
Authors:
Huw A. Ogilvie,
Joseph Heled,
Dong Xie,
Alexei J. Drummond
Abstract:
Under the multispecies coalescent model of molecular evolution, gene trees have independent evolutionary histories within a shared species tree. In comparison, supermatrix concatenation methods assume that gene trees share a single common genealogical history, thereby equating gene coalescence with species divergence. The multispecies coalescent is supported by previous studies which found that it…
▽ More
Under the multispecies coalescent model of molecular evolution, gene trees have independent evolutionary histories within a shared species tree. In comparison, supermatrix concatenation methods assume that gene trees share a single common genealogical history, thereby equating gene coalescence with species divergence. The multispecies coalescent is supported by previous studies which found that its predicted distributions fit empirical data, and that concatenation is not a consistent estimator of the species tree. *BEAST, a fully Bayesian implementation of the multispecies coalescent, is popular but computationally intensive, so the increasing size of phylogenetic data sets is both a computational challenge and an opportunity for better systematics. Using simulation studies, we characterize the scaling behaviour of *BEAST, and enable quantitative prediction of the impact increasing the number of loci has on both computational performance and statistical accuracy. Follow up simulations over a wide range of parameters show that the statistical performance of *BEAST relative to concatenation improves both as branch length is reduced and as the number of loci is increased. Finally, using simulations based on estimated parameters from two phylogenomic data sets, we compare the performance of a range of species tree and concatenation methods to show that using *BEAST with tens of loci can be preferable to using concatenation with thousands of loci. Our results provide insight into the practicalities of Bayesian species tree estimation, the number of loci required to obtain a given level of accuracy and the situations in which supermatrix or summary methods will be outperformed by the fully Bayesian multispecies coalescent.
△ Less
Submitted 5 October, 2015; v1 submitted 21 June, 2015;
originally announced June 2015.
-
Bayesian total evidence dating reveals the recent crown radiation of penguins
Authors:
Alexandra Gavryushkina,
Tracy A. Heath,
Daniel T. Ksepka,
Tanja Stadler,
David Welch,
Alexei J. Drummond
Abstract:
The total-evidence approach to divergence-time dating uses molecular and morphological data from extant and fossil species to infer phylogenetic relationships, species divergence times, and macroevolutionary parameters in a single coherent framework. Current model-based implementations of this approach lack an appropriate model for the tree describing the diversification and fossilization process…
▽ More
The total-evidence approach to divergence-time dating uses molecular and morphological data from extant and fossil species to infer phylogenetic relationships, species divergence times, and macroevolutionary parameters in a single coherent framework. Current model-based implementations of this approach lack an appropriate model for the tree describing the diversification and fossilization process and can produce estimates that lead to erroneous conclusions. We address this shortcoming by providing a total-evidence method implemented in a Bayesian framework. This approach uses a mechanistic tree prior to describe the underlying diversification process that generated the tree of extant and fossil taxa. Previous attempts to apply the total-evidence approach have used tree priors that do not account for the possibility that fossil samples may be direct ancestors of other samples. The fossilized birth-death (FBD) process explicitly models the diversification, fossilization, and sampling processes and naturally allows for sampled ancestors. This model was recently applied to estimate divergence times based on molecular data and fossil occurrence dates. We incorporate the FBD model and a model of morphological trait evolution into a Bayesian total-evidence approach to dating species phylogenies. We apply this method to extant and fossil penguins and show that the modern penguins radiated much more recently than has been previously estimated, with the basal divergence in the crown clade occurring at ~12.7 Ma and most splits leading to extant species occurring in the last 2 million years. Our results demonstrate that including stem-fossil diversity can greatly improve the estimates of the divergence times of crown taxa. The method is available in BEAST2 (v. 2.4) www.beast2.org with packages SA (v. at least 1.1.4) and morph-models (v. at least 1.0.4).
△ Less
Submitted 24 January, 2017; v1 submitted 15 June, 2015;
originally announced June 2015.
-
The space of ultrametric phylogenetic trees
Authors:
Alex Gavryushkin,
Alexei J. Drummond
Abstract:
The reliability of a phylogenetic inference method from genomic sequence data is ensured by its statistical consistency. Bayesian inference methods produce a sample of phylogenetic trees from the posterior distribution given sequence data. Hence the question of statistical consistency of such methods is equivalent to the consistency of the summary of the sample. More generally, statistical consist…
▽ More
The reliability of a phylogenetic inference method from genomic sequence data is ensured by its statistical consistency. Bayesian inference methods produce a sample of phylogenetic trees from the posterior distribution given sequence data. Hence the question of statistical consistency of such methods is equivalent to the consistency of the summary of the sample. More generally, statistical consistency is ensured by the tree space used to analyse the sample.
In this paper, we consider two standard parameterisations of phylogenetic time-trees used in evolutionary models: inter-coalescent interval lengths and absolute times of divergence events. For each of these parameterisations we introduce a natural metric space on ultrametric phylogenetic trees. We compare the introduced spaces with existing models of tree space and formulate several formal requirements that a metric space on phylogenetic trees must possess in order to be a satisfactory space for statistical analysis, and justify them. We show that only a few known constructions of the space of phylogenetic trees satisfy these requirements. However, our results suggest that these basic requirements are not enough to distinguish between the two metric spaces we introduce and that the choice between metric spaces requires additional properties to be considered. Particularly, that the summary tree minimising the square distance to the trees from the sample might be different for different parameterisations. This suggests that further fundamental insight is needed into the problem of statistical consistency of phylogenetic inference methods.
△ Less
Submitted 8 June, 2016; v1 submitted 13 October, 2014;
originally announced October 2014.
-
Handling Flash-Crowd Events to Improve the Performance of Web Applications
Authors:
Ubiratam de Paula Junior,
Lúcia M. A. Drummond,
Daniel de Oliveira,
Yuri Frota,
Valmir C. Barbosa
Abstract:
Cloud computing can offer a set of computing resources according to users' demand. It is suitable to be used to handle flash-crowd events in Web applications due to its elasticity and on-demand characteristics. Thus, when Web applications need more computing or storage capacity, they just instantiate new resources. However, providers have to estimate the amount of resources to instantiate to handl…
▽ More
Cloud computing can offer a set of computing resources according to users' demand. It is suitable to be used to handle flash-crowd events in Web applications due to its elasticity and on-demand characteristics. Thus, when Web applications need more computing or storage capacity, they just instantiate new resources. However, providers have to estimate the amount of resources to instantiate to handle with the flash-crowd event. This estimation is far from trivial since each cloud environment provides several kinds of heterogeneous resources, each one with its own characteristics such as bandwidth, CPU, memory and financial cost. In this paper, the Flash Crowd Handling Problem (FCHP) is precisely defined and formulated as an integer programming problem. A new algorithm for handling with a flash crowd named FCHP-ILS is also proposed. With FCHP-ILS the Web applications can replicate contents in the already instantiated resources and define the types and amount of resources to instantiate in the cloud during a flash crowd. Our approach is evaluated considering real flash crowd traces obtained from the related literature. We also present a case study, based on a synthetic dataset representing flash-crowd events in small scenarios aiming at the comparison of the proposed approach against Amazon's Auto-Scale mechanism.
△ Less
Submitted 10 October, 2014;
originally announced October 2014.
-
Inferring epidemiological dynamics with Bayesian coalescent inference: The merits of deterministic and stochastic models
Authors:
Alex Po**a,
Tim Vaughan,
Tanja Stadler,
Alexei Drummond
Abstract:
Estimation of epidemiological and population parameters from molecular sequence data has become central to the understanding of infectious disease dynamics. Various models have been proposed to infer details of the dynamics that describe epidemic progression. These include inference approaches derived from Kingman's coalescent theory. Here, we use recently described coalescent theory for epidemic…
▽ More
Estimation of epidemiological and population parameters from molecular sequence data has become central to the understanding of infectious disease dynamics. Various models have been proposed to infer details of the dynamics that describe epidemic progression. These include inference approaches derived from Kingman's coalescent theory. Here, we use recently described coalescent theory for epidemic dynamics to develop stochastic and deterministic coalescent SIR tree priors. We implement these in a Bayesian phylogenetic inference framework to permit joint estimation of SIR epidemic parameters and the sample genealogy. We assess the performance of the two coalescent models and also juxtapose results obtained with BDSIR, a recently published birth-death-sampling model for epidemic inference. Comparisons are made by analyzing sets of genealogies simulated under precisely known epidemiological parameters. Additionally, we analyze influenza A (H1N1) sequence data sampled in the Canterbury region of New Zealand and HIV-1 sequence data obtained from known UK infection clusters. We show that both coalescent SIR models are effective at estimating epidemiological parameters from data with large fundamental reproductive number $R_0$ and large population size $S_0$. Furthermore, we find that the stochastic variant generally outperforms its deterministic counterpart in terms of error, bias, and highest posterior density coverage, particularly for smaller $R_0$ and $S_0$. However, each of these inference models are shown to have undesirable properties in certain circumstances, especially for epidemic outbreaks with $R_0$ close to one or with small effective susceptible populations.
△ Less
Submitted 19 December, 2014; v1 submitted 7 July, 2014;
originally announced July 2014.
-
Bayesian inference of sampled ancestor trees for epidemiology and fossil calibration
Authors:
Alexandra Gavryushkina,
David Welch,
Tanja Stadler,
Alexei Drummond
Abstract:
Phylogenetic analyses which include fossils or molecular sequences that are sampled through time require models that allow one sample to be a direct ancestor of another sample. As previously available phylogenetic inference tools assume that all samples are tips, they do not allow for this possibility. We have developed and implemented a Bayesian Markov Chain Monte Carlo (MCMC) algorithm to infer…
▽ More
Phylogenetic analyses which include fossils or molecular sequences that are sampled through time require models that allow one sample to be a direct ancestor of another sample. As previously available phylogenetic inference tools assume that all samples are tips, they do not allow for this possibility. We have developed and implemented a Bayesian Markov Chain Monte Carlo (MCMC) algorithm to infer what we call sampled ancestor trees, that is, trees in which sampled individuals can be direct ancestors of other sampled individuals. We use a family of birth-death models where individuals may remain in the tree process after the sampling, in particular we extend the birth-death skyline model [Stadler et al, 2013] to sampled ancestor trees. This method allows the detection of sampled ancestors as well as estimation of the probability that an individual will be removed from the process when it is sampled. We show that sampled ancestor birth-death models where all samples come from different time points are non-identifiable and thus require one parameter to be known in order to infer other parameters. We apply this method to epidemiological data, where the possibility of sampled ancestors enables us to identify individuals that infected other individuals after being sampled and to infer fundamental epidemiological parameters. We also apply the method to infer divergence times and diversification rates when fossils are included among the species samples, so that fossilisation events are modelled as a part of the tree branching process. Such modelling has many advantages as argued in literature. The sampler is available as an open-source BEAST2 package (https://github.com/gavryushkina/sampled-ancestors).
△ Less
Submitted 24 June, 2014; v1 submitted 17 June, 2014;
originally announced June 2014.
-
Calibrated birth-death phylogenetic time-tree priors for Bayesian inference
Authors:
Joseph Heled,
Alexei J. Drummond
Abstract:
Here we introduce a general class of multiple calibration birth-death tree priors for use in Bayesian phylogenetic inference. All tree priors in this class separate ancestral node heights into a set of "calibrated nodes" and "uncalibrated nodes" such that the marginal distribution of the calibrated nodes is user-specified whereas the density ratio of the birth-death prior is retained for trees wit…
▽ More
Here we introduce a general class of multiple calibration birth-death tree priors for use in Bayesian phylogenetic inference. All tree priors in this class separate ancestral node heights into a set of "calibrated nodes" and "uncalibrated nodes" such that the marginal distribution of the calibrated nodes is user-specified whereas the density ratio of the birth-death prior is retained for trees with equal values for the calibrated nodes. We describe two formulations, one in which the calibration information informs the prior on ranked tree topologies, through the (conditional) prior, and the other which factorizes the prior on divergence times and ranked topologies, thus allowing uniform, or any arbitrary prior distribution on ranked topologies. While the first of these formulations has some attractive properties the algorithm we present for computing its prior density is computationally intensive. On the other hand, the second formulation is always computationally efficient. We demonstrate the utility of the new class of multiple-calibration tree priors using both small simulations and a real-world analysis and compare the results to existing schemes. The two new calibrated tree priors described in this paper offer greater flexibility and control of prior specification in calibrated time-tree inference and divergence time dating, and will remove the need for indirect approaches to the assessment of the combined effect of calibration densities and tree process priors in Bayesian phylogenetic inference.
△ Less
Submitted 19 November, 2013;
originally announced November 2013.
-
Simultaneous reconstruction of evolutionary history and epidemiological dynamics from viral sequences with the birth-death SIR model
Authors:
Denise Kühnert,
Tanja Stadler,
Timothy G. Vaughan,
Alexei J. Drummond
Abstract:
The evolution of RNA viruses such as HIV, Hepatitis C and Influenza virus occurs so rapidly that the viruses' genomes contain information on past ecological dynamics. Hence, we develop a phylodynamic method that enables the joint estimation of epidemiological parameters and phylogenetic history. Based on a compartmental susceptible-infected-removed (SIR) model, this method provides separate inform…
▽ More
The evolution of RNA viruses such as HIV, Hepatitis C and Influenza virus occurs so rapidly that the viruses' genomes contain information on past ecological dynamics. Hence, we develop a phylodynamic method that enables the joint estimation of epidemiological parameters and phylogenetic history. Based on a compartmental susceptible-infected-removed (SIR) model, this method provides separate information on incidence and prevalence of infections. Detailed information on the interaction of host population dynamics and evolutionary history can inform decisions on how to contain or entirely avoid disease outbreaks.
We apply our Birth-Death SIR method (BDSIR) to two viral data sets. First, five human immunodeficiency virus type 1 clusters sampled in the United Kingdom between 1999 and 2003 are analyzed. The estimated basic reproduction ratios range from 1.9 to 3.2 among the clusters. All clusters show a decline in the growth rate of the local epidemic in the middle or end of the 90's.
The analysis of a hepatitis C virus (HCV) genotype 2c data set shows that the local epidemic in the Córdoban city Cruz del Eje originated around 1906 (median), coinciding with an immigration wave from Europe to central Argentina that dates from 1880--1920. The estimated time of epidemic peak is around 1970.
△ Less
Submitted 21 March, 2014; v1 submitted 23 August, 2013;
originally announced August 2013.
-
Memory Aware Load Balance Strategy on a Parallel Branch-and-Bound Application
Authors:
Juliana M. N. Silva,
Cristina Boeres,
Lúcia M. A. Drummond,
Artur A. Pessoa
Abstract:
The latest trends in high-performance computing systems show an increasing demand on the use of a large scale multicore systems in a efficient way, so that high compute-intensive applications can be executed reasonably well. However, the exploitation of the degree of parallelism available at each multicore component can be limited by the poor utilization of the memory hierarchy available. Actually…
▽ More
The latest trends in high-performance computing systems show an increasing demand on the use of a large scale multicore systems in a efficient way, so that high compute-intensive applications can be executed reasonably well. However, the exploitation of the degree of parallelism available at each multicore component can be limited by the poor utilization of the memory hierarchy available. Actually, the multicore architecture introduces some distinct features that are already observed in shared memory and distributed environments. One example is that subsets of cores can share different subsets of memory. In order to achieve high performance it is imperative that a careful allocation scheme of an application is carried out on the available cores, based on a scheduling model that considers the main performance bottlenecks, as for example, memory contention. In this paper, the {\em Multicore Cluster Model} (MCM) is proposed, which captures the most relevant performance characteristics in multicores systems such as the influence of memory hierarchy and contention. Better performance was achieved when a load balance strategy for a Branch-and-Bound application applied to the Partitioning Sets Problem is based on MCM, showing its efficiency and applicability to modern systems.
△ Less
Submitted 22 February, 2013;
originally announced February 2013.
-
A Distributed Transportation Simplex Applied to a Content Distribution Network Problem
Authors:
Rafaelli de C. Coutinho,
Lúcia M. A. Drummond,
Yuri Frota
Abstract:
A Content Distribution Network (CDN) can be defined as an overlay system that replicates copies of contents at multiple points of a network, close to the final users, with the objective of improving data access. CDN technology is widely used for the distribution of large-sized contents, like in video streaming. In this paper we address the problem of finding the best server for each customer reque…
▽ More
A Content Distribution Network (CDN) can be defined as an overlay system that replicates copies of contents at multiple points of a network, close to the final users, with the objective of improving data access. CDN technology is widely used for the distribution of large-sized contents, like in video streaming. In this paper we address the problem of finding the best server for each customer request in CDNs, in order to minimize the overall cost. We consider the problem as a transportation problem and a distributed algorithm is proposed to solve it. The algorithm is composed of two independent phases: a distributed heuristic finds an initial solution that may be later improved by a distributed transportation simplex algorithm. It is compared with the sequential version of the transportation simplex and with an auction-based distributed algorithm. Computational experiments carried out on a set of instances adapted from the literature revealed that our distributed approach has a performance similar to its sequential counterpart, in spite of not requiring global information about the contents requests. Moreover, the results also showed that the new method outperforms the based-auction distributed algorithm.
△ Less
Submitted 23 October, 2012;
originally announced October 2012.
-
Calibrated Tree Priors for Relaxed Phylogenetics and Divergence Time Estimation
Authors:
Heled Joseph,
Alexei Drummond
Abstract:
The use of fossil evidence to calibrate divergence time estimation has a long history. More recently Bayesian MCMC has become the dominant method of divergence time estimation and fossil evidence has been re-interpreted as the specification of prior distributions on the divergence times of calibration nodes. These so-called "soft calibrations" have become widely used but the statistical properties…
▽ More
The use of fossil evidence to calibrate divergence time estimation has a long history. More recently Bayesian MCMC has become the dominant method of divergence time estimation and fossil evidence has been re-interpreted as the specification of prior distributions on the divergence times of calibration nodes. These so-called "soft calibrations" have become widely used but the statistical properties of calibrated tree priors in a Bayesian setting has not been carefully investigated. Here we clarify that calibration densities, such as those defined in BEAST 1.5, do not represent the marginal prior distribution of the calibration node. We illustrate this with a number of analytical results on small trees. We also describe an alternative construction for a calibrated Yule prior on trees that allows direct specification of the marginal prior distribution of the calibrated divergence time, with or without the restriction of monophyly. This method requires the computation of the Yule prior conditional on the height of the divergence being calibrated. Unfortunately, a practical solution for multiple calibrations remains elusive. Our results suggest that direct estimation of the prior induced by specifying multiple calibration densities should be a prerequisite of any divergence time dating analysis.
△ Less
Submitted 29 March, 2011;
originally announced March 2011.
-
Extinction in a self-regulating population with demographic and environmental noise
Authors:
Alexei J. Drummond,
Peter D. Drummond
Abstract:
We present an explicit unified stochastic model of fluctuations in population size due to random birth, death, density-dependent competition and environmental fluctuations. Stochastic dynamics provide insight into small populations, including processes such as extinction, that cannot be correctly treated by deterministic methods. We present exact analytical and simulation-based results for extin…
▽ More
We present an explicit unified stochastic model of fluctuations in population size due to random birth, death, density-dependent competition and environmental fluctuations. Stochastic dynamics provide insight into small populations, including processes such as extinction, that cannot be correctly treated by deterministic methods. We present exact analytical and simulation-based results for extinction times of our stochastic model and compare the different effects of environmental stochasticity and intrinsic demographic stochasticity. We use both the discrete master equation approach and an exact map** to a Fokker-Planck equation (the Poisson method) and stochastic equation, showing they are precisely equivalent. We also calculate approximate extinction times using a steepest descent method. This model can readily be extended to accommodate metapopulation structure and genetic variation in the population and thus represents a step towards a microscopically explicit synthesis of population dynamics and population genetics.
△ Less
Submitted 30 July, 2008; v1 submitted 29 July, 2008;
originally announced July 2008.
-
Population genetics of translational robustness
Authors:
Claus O. Wilke,
D. Allan Drummond
Abstract:
Recent work has shown that expression level is the main predictor of a gene’s evolutionary rate, and that more highly expressed genes evolve slower. A possible explanation for this observation is selection for proteins which fold properly despite mistranslation, in short selection for translational robustness. Translational robustness leads to the somewhat paradoxical prediction that highl…
▽ More
Recent work has shown that expression level is the main predictor of a gene’s evolutionary rate, and that more highly expressed genes evolve slower. A possible explanation for this observation is selection for proteins which fold properly despite mistranslation, in short selection for translational robustness. Translational robustness leads to the somewhat paradoxical prediction that highly expressed genes are extremely tolerant to missense substitutions but nevertheless evolve very slowly. Here, we study a simple theoretical model of translational robustness that allows us to gain analytic insight into how this paradoxical behavior arises.
△ Less
Submitted 19 February, 2006; v1 submitted 23 September, 2005;
originally announced September 2005.
-
A single determinant for the rate of yeast protein evolution
Authors:
D. Allan Drummond,
Alpan Raval,
Claus O. Wilke
Abstract:
A gene's rate of sequence evolution is among the most fundamental evolutionary quantities in common use, but what determines evolutionary rates has remained unclear. Here, we show that the two most commonly used methods to disentangle the determinants of evolutionary rate, partial correlation analysis and ordinary multivariate regression, produce misleading or spurious results when applied to no…
▽ More
A gene's rate of sequence evolution is among the most fundamental evolutionary quantities in common use, but what determines evolutionary rates has remained unclear. Here, we show that the two most commonly used methods to disentangle the determinants of evolutionary rate, partial correlation analysis and ordinary multivariate regression, produce misleading or spurious results when applied to noisy biological data. To overcome these difficulties, we employ an alternative method, principal component regression, which is a multivariate regression of evolutionary rate against the principal components of the predictor variables. We carry out the first combined analysis of seven predictors (gene expression level, dispensability, protein abundance, codon adaptation index, gene length, number of protein-protein interactions, and the gene's centrality in the interaction network). Strikingly, our analysis reveals a single dominant component which explains 40-fold more variation in evolutionary rate than any other, suggesting that protein evolutionary rate has a single determinant among the seven predictors. The dominant component explains nearly half the variation in the rate of synonymous and protein evolution. Our results support the hypothesis that selection against the cost of translation-error-induced protein misfolding governs the rate of synonymous and protein sequence evolution in yeast.
△ Less
Submitted 8 June, 2005;
originally announced June 2005.
-
Why highly expressed proteins evolve slowly
Authors:
D. Allan Drummond,
Jesse D. Bloom,
Christoph Adami,
Claus O. Wilke,
Frances H. Arnold
Abstract:
Much recent work has explored molecular and population-genetic constraints on the rate of protein sequence evolution. The best predictor of evolutionary rate is expression level, for reasons which have remained unexplained. Here, we hypothesize that selection to reduce the burden of protein misfolding will favor protein sequences with increased robustness to translational missense errors. Pressu…
▽ More
Much recent work has explored molecular and population-genetic constraints on the rate of protein sequence evolution. The best predictor of evolutionary rate is expression level, for reasons which have remained unexplained. Here, we hypothesize that selection to reduce the burden of protein misfolding will favor protein sequences with increased robustness to translational missense errors. Pressure for translational robustness increases with expression level and constrains sequence evolution. Using several sequenced yeast genomes, global expression and protein abundance data, and sets of paralogs traceable to an ancient whole-genome duplication in yeast, we rule out several confounding effects and show that expression level explains roughly half the variation in Saccharomyces cerevisiae protein evolutionary rates. We examine causes for expression's dominant role and find that genome-wide tests favor the translational robustness explanation over existing hypotheses that invoke constraints on function or translational efficiency. Our results suggest that proteins evolve at rates largely unrelated to their functions, and can explain why highly expressed proteins evolve slowly across the tree of life.
△ Less
Submitted 12 August, 2005; v1 submitted 2 June, 2005;
originally announced June 2005.
-
Why high-error-rate random mutagenesis libraries are enriched in functional and improved proteins
Authors:
D. Allan Drummond,
Brent L. Iverson,
George Georgiou,
Frances H. Arnold
Abstract:
Recently, several groups have used error-prone polymerase chain reactions to construct mutant libraries containing up to 27 nucleotide mutations per gene on average, and reported a striking observation: although retention of protein function initially declines exponentially with mutations as has previously been observed, orders of magnitude more proteins remain viable at the highest mutation rat…
▽ More
Recently, several groups have used error-prone polymerase chain reactions to construct mutant libraries containing up to 27 nucleotide mutations per gene on average, and reported a striking observation: although retention of protein function initially declines exponentially with mutations as has previously been observed, orders of magnitude more proteins remain viable at the highest mutation rates than this trend would predict. Mutant proteins having improved or novel activity were isolated disproportionately from these heavily mutated libraries, leading to the suggestion that distant regions of sequence space are enriched in useful cooperative mutations and that optimal mutagenesis should target these regions. If true, these claims have profound implications for laboratory evolution and for evolutionary theory. Here, we demonstrate that properties of the polymerase chain reaction can explain these results and, consequently, that average protein viability indeed decreases exponentially with mutational distance at all error rates. We show that high-error-rate mutagenesis may be useful in certain cases, though for very different reasons than originally proposed, and that optimal mutation rates are inherently protocol-dependent. Our results allow optimal mutation rates to be found given mutagenesis conditions and a protein of known mutational robustness.
△ Less
Submitted 18 February, 2005; v1 submitted 22 November, 2004;
originally announced November 2004.
-
Thermodynamic Prediction of Protein Neutrality
Authors:
Jesse D. Bloom,
Jonathan J. Silberg,
Claus O. Wilke,
D. Allan Drummond,
Christoph Adami,
Frances H. Arnold
Abstract:
We present a simple theory that uses thermodynamic parameters to predict the probability that a protein retains the wildtype structure after one or more random amino acid substitutions. Our theory predicts that for large numbers of substitutions the probability that a protein retains its structure will decline exponentially with the number of substitutions, with the severity of this decline dete…
▽ More
We present a simple theory that uses thermodynamic parameters to predict the probability that a protein retains the wildtype structure after one or more random amino acid substitutions. Our theory predicts that for large numbers of substitutions the probability that a protein retains its structure will decline exponentially with the number of substitutions, with the severity of this decline determined by properties of the structure. Our theory also predicts that a protein can gain extra robustness to the first few substitutions by increasing its thermodynamic stability. We validate our theory with simulations on lattice protein models and by showing that it quantitatively predicts previously published experimental measurements on subtilisin and our own measurements on variants of TEM1 beta-lactamase. Our work unifies observations about the clustering of functional proteins in sequence space, and provides a basis for interpreting the response of proteins to substitutions in protein engineering applications.
△ Less
Submitted 4 December, 2004; v1 submitted 13 September, 2004;
originally announced September 2004.
-
On reducing the complexity of matrix clocks
Authors:
L. M. A. Drummond,
V. C. Barbosa
Abstract:
Matrix clocks are a generalization of the notion of vector clocks that allows the local representation of causal precedence to reach into an asynchronous distributed computation's past with depth $x$, where $x\ge 1$ is an integer. Maintaining matrix clocks correctly in a system of $n$ nodes requires that everymessage be accompanied by $O(n^x)$ numbers, which reflects an exponential dependency of…
▽ More
Matrix clocks are a generalization of the notion of vector clocks that allows the local representation of causal precedence to reach into an asynchronous distributed computation's past with depth $x$, where $x\ge 1$ is an integer. Maintaining matrix clocks correctly in a system of $n$ nodes requires that everymessage be accompanied by $O(n^x)$ numbers, which reflects an exponential dependency of the complexity of matrix clocks upon the desired depth $x$. We introduce a novel type of matrix clock, one that requires only $nx$ numbers to be attached to each message while maintaining what for many applications may be the most significant portion of the information that the original matrix clock carries. In order to illustrate the new clock's applicability, we demonstrate its use in the monitoring of certain resource-sharing computations.
△ Less
Submitted 23 September, 2003;
originally announced September 2003.