-
ntLink: a toolkit for de novo genome assembly scaffolding and map** using long reads
Authors:
Lauren Coombe,
René L. Warren,
Johnathan Wong,
Vladimir Nikolic,
Inanc Birol
Abstract:
With the increasing affordability and accessibility of genome sequencing data, de novo genome assembly is an important first step to a wide variety of downstream studies and analyses. Therefore, bioinformatics tools that enable the generation of high-quality genome assemblies in a computationally efficient manner are essential. Recent developments in long-read sequencing technologies have greatly…
▽ More
With the increasing affordability and accessibility of genome sequencing data, de novo genome assembly is an important first step to a wide variety of downstream studies and analyses. Therefore, bioinformatics tools that enable the generation of high-quality genome assemblies in a computationally efficient manner are essential. Recent developments in long-read sequencing technologies have greatly benefited genome assembly work, including scaffolding, by providing long-range evidence that can aid in resolving the challenging repetitive regions of complex genomes. ntLink is a flexible and resource-efficient genome scaffolding tool that utilizes long-read sequencing data to improve upon draft genome assemblies built from any sequencing technologies, including the same long reads. Instead of using read alignments to identify candidate joins, ntLink utilizes minimizer-based map**s to infer how input sequences should be ordered and oriented into scaffolds. Recent improvements to ntLink have added important features such as overlap detection, gap-filling and in-code scaffolding iterations. Here, we present three basic protocols demonstrating how to use each of these new features to yield highly contiguous genome assemblies, while still maintaining ntLink's proven computational efficiency. Further, as we illustrate in the alternate protocols, the lightweight minimizer-based map**s that enable ntLink scaffolding can also be utilized for other downstream applications, such as misassembly detection. With its modularity and multiple modes of execution, ntLink has broad benefit to the genomics community, from genome scaffolding and beyond. ntLink is an open-source project and is freely available from https://github.com/bcgsc/ntLink.
△ Less
Submitted 20 January, 2023;
originally announced January 2023.
-
GapPredict: A Language Model for Resolving Gaps in Draft Genome Assemblies
Authors:
Eric Chen,
Justin Chu,
Jessica Zhang,
Rene L. Warren,
Inanc Birol
Abstract:
Short-read DNA sequencing instruments can yield over 1e+12 bases per run, typically composed of reads 150 bases long. Despite this high throughput, de novo assembly algorithms have difficulty reconstructing contiguous genome sequences using short reads due to both repetitive and difficult-to-sequence regions in these genomes. Some of the short read assembly challenges are mitigated by scaffolding…
▽ More
Short-read DNA sequencing instruments can yield over 1e+12 bases per run, typically composed of reads 150 bases long. Despite this high throughput, de novo assembly algorithms have difficulty reconstructing contiguous genome sequences using short reads due to both repetitive and difficult-to-sequence regions in these genomes. Some of the short read assembly challenges are mitigated by scaffolding assembled sequences using paired-end reads. However, unresolved sequences in these scaffolds appear as "gaps". Here, we introduce GapPredict, a tool that uses a character-level language model to predict unresolved nucleotides in scaffold gaps. We benchmarked GapPredict against the state-of-the-art gap-filling tool Sealer, and observed that the former can fill 65.6% of the sampled gaps that were left unfilled by the latter, demonstrating the practical utility of deep learning approaches to the gap-filling problem in genome sequence assembly.
△ Less
Submitted 24 May, 2021; v1 submitted 21 May, 2021;
originally announced May 2021.
-
Interactive SARS-CoV-2 mutation timemaps
Authors:
Rene L. Warren,
Inanc Birol
Abstract:
As the year 2020 draws to an end, several new strains have been reported for the SARS-CoV-2 coronavirus, the agent responsible for the COVID-19 pandemic that has afflicted us all this past year. However, it is difficult to comprehend the scale, in sequence space, geographical location and time, at which SARS-CoV-2 mutates and evolves in its human hosts. To get an appreciation for the rapid evoluti…
▽ More
As the year 2020 draws to an end, several new strains have been reported for the SARS-CoV-2 coronavirus, the agent responsible for the COVID-19 pandemic that has afflicted us all this past year. However, it is difficult to comprehend the scale, in sequence space, geographical location and time, at which SARS-CoV-2 mutates and evolves in its human hosts. To get an appreciation for the rapid evolution of the coronavirus, we built interactive scalable vector graphics maps that show daily nucleotide variations in genomes from the six most populated continents compared to that of the initial, ground-zero SARS-CoV-2 isolate sequenced at the beginning of the year. Availability: Mutation time maps are available from https://bcgsc.github.io/SARS2/
△ Less
Submitted 31 December, 2020;
originally announced December 2020.
-
HLA predictions from the bronchoalveolar lavage fluid samples of five patients at the early stage of the Wuhan seafood market COVID-19 outbreak
Authors:
Rene L Warren,
Inanc Birol
Abstract:
We are in the midst of a global viral pandemic, one with no cure and a high mortality rate. The Human Leukocyte Antigen (HLA) gene complex plays a critical role in host immunity. We predicted HLA class I and II alleles from the transcriptome sequencing data prepared from the bronchoalveolar lavage fluid samples of five patients at the early stage of the COVID-19 outbreak. We identified the HLA-I a…
▽ More
We are in the midst of a global viral pandemic, one with no cure and a high mortality rate. The Human Leukocyte Antigen (HLA) gene complex plays a critical role in host immunity. We predicted HLA class I and II alleles from the transcriptome sequencing data prepared from the bronchoalveolar lavage fluid samples of five patients at the early stage of the COVID-19 outbreak. We identified the HLA-I allele A*24:02 in four out of five patients, which is higher than the expected frequency (17.2%) in the South Han Chinese population. The difference is statistically significant with a p-value less than $10^{-4}$. Our analysis results may help provide future insights on disease susceptibility.
△ Less
Submitted 27 April, 2020; v1 submitted 15 April, 2020;
originally announced April 2020.
-
Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species
Authors:
Keith R. Bradnam,
Joseph N. Fass,
Anton Alexandrov,
Paul Baranay,
Michael Bechner,
İnanç Birol,
Sébastien Boisvert,
Jarrod A. Chapman,
Guillaume Chapuis,
Rayan Chikhi,
Hamidreza Chitsaz,
Wen-Chi Chou,
Jacques Corbeil,
Cristian Del Fabbro,
T. Roderick Docking,
Richard Durbin,
Dent Earl,
Scott Emrich,
Pavel Fedotov,
Nuno A. Fonseca,
Ganeshkumar Ganapathy,
Richard A. Gibbs,
Sante Gnerre,
Élénie Godzaridis,
Steve Goldstein
, et al. (66 additional authors not shown)
Abstract:
Background - The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and…
▽ More
Background - The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. Results - In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. Conclusions - Many current genome assemblers produced useful assemblies, containing a significant representation of their genes, regulatory sequences, and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.
△ Less
Submitted 27 June, 2013; v1 submitted 23 January, 2013;
originally announced January 2013.
-
Coherent bremsstrahlung, boherent pair production, birefringence and polarimetry in the 20-170 GeV energy range using aligned crystals
Authors:
NA59 Collaboration,
A. Apyan,
R. O. Avakian,
B. Badelek,
S. Ballestrero,
C. Biino,
I. Birol,
P. Cenci,
S. H. Connell,
S. Eichblatt,
T. Fonseca,
A. Freund,
B. Gorini,
R. Groess,
K. Ispirian,
T. J. Ketel,
Yu. V. Kononets,
A. Lopez,
A. Mangiarotti,
B. van Rens,
J. P. F. Sellschop,
M. Shieh,
P. Sona,
V. Strakhovenko,
E. Uggerhoj
, et al. (5 additional authors not shown)
Abstract:
The processes of coherent bremsstrahlung (CB) and coherent pair production (CPP) based on aligned crystal targets have been studied in the energy range 20-170 GeV. The experimental arrangement allowed for measurements of single photon properties of these phenomena including their polarization dependences. This is significant as the theoretical description of CB and CPP is an area of active theor…
▽ More
The processes of coherent bremsstrahlung (CB) and coherent pair production (CPP) based on aligned crystal targets have been studied in the energy range 20-170 GeV. The experimental arrangement allowed for measurements of single photon properties of these phenomena including their polarization dependences. This is significant as the theoretical description of CB and CPP is an area of active theoretical debate and development. With the theoretical approach used in this paper both the measured cross sections and polarization observables are predicted very well. This indicates a proper understanding of CB and CPP up to energies of 170 GeV. Birefringence in CPP on aligned crystals is applied to determine the polarization parameters in our measurements. New technologies for high energy photon beam optics including phase plates and polarimeters for linear and circular polarization are demonstrated in this experiment. Coherent bremsstrahlung for the strings-on-strings (SOS) orientation yields a larger enhancement for hard photons than CB for the channeling orientations of the crystal. Our measurements and our calculations indicate low photon polarizations for the high energy SOS photons.
△ Less
Submitted 26 February, 2008; v1 submitted 7 December, 2005;
originally announced December 2005.
-
Results on the Coherent Interaction of High Energy Electrons and Photons in Oriented Single Crystals
Authors:
NA59 Collaboration,
A. Apyan,
R. O. Avakian,
B. Badelek,
S. Ballestrero,
C. Biino,
I. Birol,
P. Cenci,
S. H. Connell,
S. Eichblatt,
T. Fonseca,
A. Freund,
B. Gorini,
R. Groess,
K. Ispirian,
T. J. Ketel,
Yu. V. Kononets,
A. Lopez,
A. Mangiarotti,
B. van Rens,
J. P. F. Sellschop,
M. Shieh,
P. Sona,
V. Strakhovenko,
E. Uggerhoj
, et al. (5 additional authors not shown)
Abstract:
The CERN-NA-59 experiment examined a wide range of electromagnetic processes for multi-GeV electrons and photons interacting with oriented single crystals. The various types of crystals and their orientations were used for producing photon beams and for converting and measuring their polarisation.
The radiation emitted by 178 GeV unpolarised electrons incident on a 1.5 cm thick Si crystal orie…
▽ More
The CERN-NA-59 experiment examined a wide range of electromagnetic processes for multi-GeV electrons and photons interacting with oriented single crystals. The various types of crystals and their orientations were used for producing photon beams and for converting and measuring their polarisation.
The radiation emitted by 178 GeV unpolarised electrons incident on a 1.5 cm thick Si crystal oriented in the Coherent Bremsstrahlung (CB) and the String-of-Strings (SOS) modes was used to obtain multi-GeV linearly polarised photon beams.
A new crystal polarimetry technique was established for measuring the linear polarisation of the photon beam. The polarimeter is based on the dependence of the Coherent Pair Production (CPP) cross section in oriented single crystals on the direction of the photon polarisation with respect to the crystal plane. Both a 1 mm thick single crystal of Germanium and a 4 mm thick multi-tile set of synthetic Diamond crystals were used as analyzers of the linear polarisation.
A birefringence phenomenon, the conversion of the linear polarisation of the photon beam into circular polarisation, was observed. This was achieved by letting the linearly polarised photon beam pass through a 10 cm thick Silicon single crystal that acted as a "quarter wave plate" (QWP) as suggested by N. Cabibbo et al.
△ Less
Submitted 22 June, 2005;
originally announced June 2005.
-
Measurement of Coherent Emission and Linear Polarization of Photons by Electrons in the Strong Fields of Aligned Crystals
Authors:
NA59 Collaboration,
A. Apyan,
R. O. Avakian,
B. Badelek,
S. Ballestrero,
C. Biino,
I. Birol,
P. Cenci,
S. H. Connell,
S. Eichblatt,
T. Fonseca,
A. Freund,
B. Gorini,
R. Groess,
K. Ispirian,
T. J. Ketel,
Yu. V. Kononets,
A. Lopez,
A. Mangiarotti,
B. van Rens,
J. P. F. Sellschop,
M. Shieh,
P. Sona,
V. Strakhovenko,
E. Uggerhoj
, et al. (5 additional authors not shown)
Abstract:
We present new results regarding the features of high energy photon emission by an electron beam of 178 GeV penetrating a 1.5 cm thick single Si crystal aligned at the Strings-Of-Strings (SOS) orientation. This concerns a special case of coherent bremsstrahlung where the electron interacts with the strong fields of successive atomic strings in a plane and for which the largest enhancement of the…
▽ More
We present new results regarding the features of high energy photon emission by an electron beam of 178 GeV penetrating a 1.5 cm thick single Si crystal aligned at the Strings-Of-Strings (SOS) orientation. This concerns a special case of coherent bremsstrahlung where the electron interacts with the strong fields of successive atomic strings in a plane and for which the largest enhancement of the highest energy photons is expected. The polarization of the resulting photon beam was measured by the asymmetry of electron-positron pair production in an aligned diamond crystal analyzer. By the selection of a single pair the energy and the polarization of individual photons could be measured in an the environment of multiple photons produced in the radiator crystal. Photons in the high energy region show less than 20% linear polarization at the 90% confidence level.
△ Less
Submitted 24 June, 2004; v1 submitted 9 June, 2004;
originally announced June 2004.
-
Linear to Circular Polarisation Conversion using Birefringent Properties of Aligned Crystals for Multi-GeV Photons
Authors:
NA59 Collaboration,
A. Apyan,
R. O. Avakian,
B. Badelek,
S. Ballestrero,
C. Biino,
I. Birol,
P. Cenci,
S. H. Connell,
S. Eichblatt,
T. Fonseca,
A. Freund,
B. Gorini,
R. Groess,
K. Ispirian,
T. J. Ketel,
Yu. V. Kononets,
A. Lopez,
A. Mangiarotti,
B. van Rens,
J. P. F. Sellschop,
M. Shieh,
P. Sona,
V. Strakhovenko,
E. Uggerhoj
, et al. (5 additional authors not shown)
Abstract:
We present the first experimental results on the use of a thick aligned Si crystal acting as a quarter wave plate to induce a degree of circular polarisation in a high energy linearly polarised photon beam. The linearly polarised photon beam is produced from coherent bremsstrahlung radiation by 178 GeV unpolarised electrons incident on an aligned Si crystal, acting as a radiator. The linear pola…
▽ More
We present the first experimental results on the use of a thick aligned Si crystal acting as a quarter wave plate to induce a degree of circular polarisation in a high energy linearly polarised photon beam. The linearly polarised photon beam is produced from coherent bremsstrahlung radiation by 178 GeV unpolarised electrons incident on an aligned Si crystal, acting as a radiator. The linear polarisation of the photon beam is characterised by measuring the asymmetry in electron-positron pair production in a Ge crystal, for different crystal orientations. The Ge crystal therefore acts as an analyser. The birefringence phenomenon, which converts the linear polarisation to circular polarisation, is observed by letting the linearly polarised photons beam pass through a thick Si quarter wave plate crystal, and then measuring the asymmetry in electron-positron pair production again for a selection of relative angles between the crystallographic planes of the radiator, analyser and quarter wave plate. The systematics of the difference between the measured asymmetries with and without the quarter wave plate are predicted by theory to reveal an evolution in the Stokes parameters from which the appearance of a circularly polarised component in the photon beam can be demonstrated. The measured magnitude of the circularly polarised component was consistent with the theoretical predictions, and therefore is in indication of the existence of the birefringence effect.
△ Less
Submitted 24 June, 2004; v1 submitted 18 June, 2003;
originally announced June 2003.
-
Coherent Pair Production by Photons in the 20-170 GeV Energy Range Incident on Crystals and Birefringence
Authors:
NA59 Collaboration,
A. Apyan,
R. O. Avakian,
B. Badelek,
S. Ballestrero,
C. Biino,
I. Birol,
P. Cenci,
S. H. Connell,
S. Eichblatt,
T. Fonseca,
A. Freund,
B. Gorini,
R. Groess,
K. Ispirian,
T. J. Ketel,
Yu. V. Kononets,
A. Lopez,
A. Mangiarotti,
B. van Rens,
J. P. F. Sellschop,
M. Shieh,
P. Sona,
V. Strakhovenko,
E. Uggerhoj
, et al. (5 additional authors not shown)
Abstract:
The cross section for coherent pair production by linearly polarised photons in the 20-170 GeV energy range was measured for photon aligned incidence on ultra-high quality diamond and germanium crystals. The theoretical description of coherent bremsstrahlung and coherent pair production phenomena is an area of active theoretical debate and development. However, under our experimental conditions,…
▽ More
The cross section for coherent pair production by linearly polarised photons in the 20-170 GeV energy range was measured for photon aligned incidence on ultra-high quality diamond and germanium crystals. The theoretical description of coherent bremsstrahlung and coherent pair production phenomena is an area of active theoretical debate and development. However, under our experimental conditions, the theory predicted the combined cross section and polarisation experimental observables very well indeed. In macroscopic terms, our experiment measured a birefringence effect in pair production in a crystal. This study of this effect also constituted a measurement of the energy dependent linear polarisation of photons produced by coherent bremsstrahlung in aligned crystals. New technologies for manipulating high energy photon beams can be realised based on an improved understanding of QED phenomena at these energies. In particular, this experiment demonstrates an efficient new polarimetry technique. The pair production measurements were done using two independent methods simultaneously. The more complex method using a magnet spectrometer showed that the simpler method using a multiplicity detector was also viable.
△ Less
Submitted 24 June, 2004; v1 submitted 11 June, 2003;
originally announced June 2003.