-
Graph Neural Networks for Microbial Genome Recovery
Authors:
Andre Lamurias,
Alessandro Tibo,
Katja Hose,
Mads Albertsen,
Thomas Dyhre Nielsen
Abstract:
Microbes have a profound impact on our health and environment, but our understanding of the diversity and function of microbial communities is severely limited. Through DNA sequencing of microbial communities (metagenomics), DNA fragments (reads) of the individual microbes can be obtained, which through assembly graphs can be combined into long contiguous DNA sequences (contigs). Given the complex…
▽ More
Microbes have a profound impact on our health and environment, but our understanding of the diversity and function of microbial communities is severely limited. Through DNA sequencing of microbial communities (metagenomics), DNA fragments (reads) of the individual microbes can be obtained, which through assembly graphs can be combined into long contiguous DNA sequences (contigs). Given the complexity of microbial communities, single contig microbial genomes are rarely obtained. Instead, contigs are eventually clustered into bins, with each bin ideally making up a full genome. This process is referred to as metagenomic binning.
Current state-of-the-art techniques for metagenomic binning rely only on the local features for the individual contigs. These techniques therefore fail to exploit the similarities between contigs as encoded by the assembly graph, in which the contigs are organized. In this paper, we propose to use Graph Neural Networks (GNNs) to leverage the assembly graph when learning contig representations for metagenomic binning. Our method, VaeG-Bin, combines variational autoencoders for learning latent representations of the individual contigs, with GNNs for refining these representations by taking into account the neighborhood structure of the contigs in the assembly graph. We explore several types of GNNs and demonstrate that VaeG-Bin recovers more high-quality genomes than other state-of-the-art binners on both simulated and real-world datasets.
△ Less
Submitted 26 April, 2022;
originally announced April 2022.
-
Generalizing the first-difference correlated random walk for marine animal movement data
Authors:
Christoffer Moesgaard Albertsen
Abstract:
Animal telemetry data are often analysed with discrete time movement models assuming rotation in the movement. These models are defined with equidistant distant time steps. However, telemetry data from marine animals are observed irregularly. To account for irregular data, a time-irregularised first-difference correlated random walk model with drift is introduced. The model generalizes the commonl…
▽ More
Animal telemetry data are often analysed with discrete time movement models assuming rotation in the movement. These models are defined with equidistant distant time steps. However, telemetry data from marine animals are observed irregularly. To account for irregular data, a time-irregularised first-difference correlated random walk model with drift is introduced. The model generalizes the commonly used first-difference correlated random walk with regular time steps by allowing irregular time steps, including a drift term, and by allowing different autocorrelation in the two coordinates. The model is applied to data from a ringed seal collected through the Argos satellite system, and is compared to related movement models through simulations. Accounting for irregular data in the movement model results in accurate parameter estimates and reconstruction of movement paths. Measured by distance, the introduced model can provide more accurate movement paths than the regular time counterpart. Extracting accurate movement paths from uncertain telemetry data is important for evaluating space use patterns for marine animals, which in turn is crucial for management. Further, handling irregular data directly in the movement model allows efficient simultaneous analysis of several animals.
△ Less
Submitted 22 June, 2018;
originally announced June 2018.
-
A Hidden Markov Movement Model for rapidly identifying behavioral states from animal tracks
Authors:
Kim Whoriskey,
Marie Auger-Méthé,
Christoffer Moesgaard Albertsen,
Frederick G. Whoriskey,
Thomas R. Binder,
Charles C. Krueger,
Joanna Mills Flemming
Abstract:
1. Electronic telemetry is frequently used to document animal movement through time. Methods that can identify underlying behaviors driving specific movement patterns can help us understand how and why animals use available space, thereby aiding conservation and management efforts. For aquatic animal tracking data with significant measurement error, a Bayesian state-space model called the first-Di…
▽ More
1. Electronic telemetry is frequently used to document animal movement through time. Methods that can identify underlying behaviors driving specific movement patterns can help us understand how and why animals use available space, thereby aiding conservation and management efforts. For aquatic animal tracking data with significant measurement error, a Bayesian state-space model called the first-Difference Correlated Random Walk with Switching (DCRWS) has often been used for this purpose. However, for aquatic animals, highly accurate tracking data of animal movement are now becoming more common.
2. We developed a new Hidden Markov Model (HMM) for identifying behavioral states from animal tracks with negligible error, which we called the Hidden Markov Movement Model (HMMM). We implemented as the basis for the HMMM the process equation of the DCRWS, but we used the method of maximum likelihood and the R package TMB for rapid model fitting.
3. We compared the HMMM to a modified version of the DCRWS for highly accurate tracks, the DCRWSnome, and to a common HMM for animal tracks fitted with the R package moveHMM. We show that the HMMM is both accurate and suitable for multiple species by fitting it to real tracks from a grey seal, lake trout, and blue shark, as well as to simulated data.
4. The HMMM is a fast and reliable tool for making meaningful inference from animal movement data that is ideally suited for ecologists who want to use the popular DCRWS implementation for highly accurate tracking data. It additionally provides a groundwork for development of more complex modelling of animal movement with TMB. To facilitate its uptake, we make it available through the R package swim.
△ Less
Submitted 20 December, 2016;
originally announced December 2016.
-
Choosing the observational likelihood in state-space stock assessment models
Authors:
Christoffer Moesgaard Albertsen,
Anders Nielsen,
Uffe Høgsbro Thygesen
Abstract:
Data used in stock assessment models result from combinations of biological, ecological, fishery, and sampling processes. Since different types of errors propagate through these processes it can be difficult to identify a particular family of distributions for modelling errors on observations a priori. By implementing several observational likelihoods, modelling both numbers- and proportions-at-ag…
▽ More
Data used in stock assessment models result from combinations of biological, ecological, fishery, and sampling processes. Since different types of errors propagate through these processes it can be difficult to identify a particular family of distributions for modelling errors on observations a priori. By implementing several observational likelihoods, modelling both numbers- and proportions-at-age, in an age based state-space stock assessment model, we compare the model fit for each choice of likelihood along with the implications for spawning stock biomass and average fishing mortality. We propose using AIC intervals based on fitting the full observational model for comparing different observational likelihoods. Using data from four stocks, we show that the model fit is improved by modelling the correlation of observations within years. However, the best choice of observational likelihood differs for different stocks, and the choice is important for the short-term conclusions drawn from the assessment model; in particular, the choice can influence total allowable catch advise based on reference points.
△ Less
Submitted 20 September, 2016;
originally announced September 2016.
-
State-space models' dirty little secrets: even simple linear Gaussian models can have estimation problems
Authors:
Marie Auger-Méthé,
Chris Field,
Christoffer M. Albertsen,
Andrew E. Derocher,
Mark A. Lewis,
Ian D. Jonsen,
Joanna Mills Flemming
Abstract:
State-space models (SSMs) are increasingly used in ecology to model time-series such as animal movement paths and population dynamics. This type of hierarchical model is often structured to account for two levels of variability: biological stochasticity and measurement error. SSMs are flexible. They can model linear and nonlinear processes using a variety of statistical distributions. Recent ecolo…
▽ More
State-space models (SSMs) are increasingly used in ecology to model time-series such as animal movement paths and population dynamics. This type of hierarchical model is often structured to account for two levels of variability: biological stochasticity and measurement error. SSMs are flexible. They can model linear and nonlinear processes using a variety of statistical distributions. Recent ecological SSMs are often complex, with a large number of parameters to estimate. Through a simulation study, we show that even simple linear Gaussian SSMs can suffer from parameter- and state-estimation problems. We demonstrate that these problems occur primarily when measurement error is larger than biological stochasticity, the condition that often drives ecologists to use SSMs. Using an animal movement example, we show how these estimation problems can affect ecological inference. Biased parameter estimates of a SSM describing the movement of polar bears (\textit{Ursus maritimus}) result in overestimating their energy expenditure. We suggest potential solutions, but show that it often remains difficult to estimate parameters. While SSMs are powerful tools, they can give misleading results and we urge ecologists to assess whether the parameters can be estimated accurately before drawing ecological conclusions from their results.
△ Less
Submitted 29 March, 2016; v1 submitted 18 August, 2015;
originally announced August 2015.
-
Predicting macrobending loss for large-mode area photonic crystal fibers
Authors:
M. D. Nielsen,
N. A. Mortensen,
M. Albertsen,
J. R. Folkenberg,
A. Bjarklev,
D. Bonacinni
Abstract:
We report on an easy-to-evaluate expression for the prediction of the bend-loss for a large mode area photonic crystal fiber (PCF) with a triangular air-hole lattice. The expression is based on a recently proposed formulation of the V-parameter for a PCF and contains no free parameters. The validity of the expression is verified experimentally for varying fiber parameters as well as bend radius.…
▽ More
We report on an easy-to-evaluate expression for the prediction of the bend-loss for a large mode area photonic crystal fiber (PCF) with a triangular air-hole lattice. The expression is based on a recently proposed formulation of the V-parameter for a PCF and contains no free parameters. The validity of the expression is verified experimentally for varying fiber parameters as well as bend radius. The typical deviation between the position of the measured and the predicted bend loss edge is within measurement uncertainty.
△ Less
Submitted 10 April, 2004;
originally announced April 2004.