-
Quantum-Enhanced Neural Exchange-Correlation Functionals
Authors:
Igor O. Sokolov,
Gert-Jan Both,
Art D. Bochevarov,
Pavel A. Dub,
Daniel S. Levine,
Christopher T. Brown,
Shaheen Acheche,
Panagiotis Kl. Barkoutsos,
Vincent E. Elfving
Abstract:
Kohn-Sham Density Functional Theory (KS-DFT) provides the exact ground state energy and electron density of a molecule, contingent on the as-yet-unknown universal exchange-correlation (XC) functional. Recent research has demonstrated that neural networks can efficiently learn to represent approximations to that functional, offering accurate generalizations to molecules not present during the train…
▽ More
Kohn-Sham Density Functional Theory (KS-DFT) provides the exact ground state energy and electron density of a molecule, contingent on the as-yet-unknown universal exchange-correlation (XC) functional. Recent research has demonstrated that neural networks can efficiently learn to represent approximations to that functional, offering accurate generalizations to molecules not present during the training process. With the latest advancements in quantum-enhanced machine learning (ML), evidence is growing that Quantum Neural Network (QNN) models may offer advantages in ML applications. In this work, we explore the use of QNNs for representing XC functionals, enhancing and comparing them to classical ML techniques. We present QNNs based on differentiable quantum circuits (DQCs) as quantum (hybrid) models for XC in KS-DFT, implemented across various architectures. We assess their performance on 1D and 3D systems. To that end, we expand existing differentiable KS-DFT frameworks and propose strategies for efficient training of such functionals, highlighting the importance of fractional orbital occupation for accurate results. Our best QNN-based XC functional yields energy profiles of the H$_2$ and planar H$_4$ molecules that deviate by no more than 1 mHa from the reference DMRG and FCI/6-31G results, respectively. Moreover, they reach chemical precision on a system, H$_2$H$_2$, not present in the training dataset, using only a few variational parameters. This work lays the foundation for the integration of quantum models in KS-DFT, thereby opening new avenues for expressing XC functionals in a differentiable way and facilitating computations of various properties.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Sustainable computational science: the ReScience initiative
Authors:
Nicolas P. Rougier,
Konrad Hinsen,
Frédéric Alexandre,
Thomas Arildsen,
Lorena Barba,
Fabien C. Y. Benureau,
C. Titus Brown,
Pierre de Buyl,
Ozan Caglayan,
Andrew P. Davison,
Marc André Delsuc,
Georgios Detorakis,
Alexandra K. Diem,
Damien Drix,
Pierre Enel,
Benoît Girard,
Olivia Guest,
Matt G. Hall,
Rafael Neto Henriques,
Xavier Hinaut,
Kamil S Jaron,
Mehdi Khamassi,
Almar Klein,
Tiina Manninen,
Pietro Marchesi
, et al. (20 additional authors not shown)
Abstract:
Computer science offers a large set of tools for prototy**, writing, running, testing, validating, sharing and reproducing results, however computational science lags behind. In the best case, authors may provide their source code as a compressed archive and they may feel confident their research is reproducible. But this is not exactly true. James Buckheit and David Donoho proposed more than tw…
▽ More
Computer science offers a large set of tools for prototy**, writing, running, testing, validating, sharing and reproducing results, however computational science lags behind. In the best case, authors may provide their source code as a compressed archive and they may feel confident their research is reproducible. But this is not exactly true. James Buckheit and David Donoho proposed more than two decades ago that an article about computational results is advertising, not scholarship. The actual scholarship is the full software environment, code, and data that produced the result. This implies new workflows, in particular in peer-reviews. Existing journals have been slow to adapt: source codes are rarely requested, hardly ever actually executed to check that they produce the results advertised in the article. ReScience is a peer-reviewed journal that targets computational research and encourages the explicit replication of already published research, promoting new and open-source implementations in order to ensure that the original research can be replicated from its description. To achieve this goal, the whole publishing chain is radically different from other traditional scientific journals. ReScience resides on GitHub where each new implementation of a computational study is made available together with comments, explanations, and software tests.
△ Less
Submitted 11 November, 2017; v1 submitted 14 July, 2017;
originally announced July 2017.
-
High-Bandwidth and Large Coupling Tolerance Graded-Index Multimode Polymer Waveguides for On-board High-Speed Optical Interconnects
Authors:
Jian Chen,
Nikolaos Bamiedakis,
Peter P. Vasil'ev,
Tom J. Edwards,
Christian T. A. Brown,
Richard V. Penty,
Ian H. White
Abstract:
Optical interconnects have attracted significant research interest for use in short-reach board-level optical communication links in supercomputers and data centres. Multimode polymer waveguides in particular constitute an attractive technology for on-board optical interconnects as they provide high bandwidth, offer relaxed alignment tolerances, and can be cost-effectively integrated onto standard…
▽ More
Optical interconnects have attracted significant research interest for use in short-reach board-level optical communication links in supercomputers and data centres. Multimode polymer waveguides in particular constitute an attractive technology for on-board optical interconnects as they provide high bandwidth, offer relaxed alignment tolerances, and can be cost-effectively integrated onto standard printed circuit boards (PCBs). However, the continuing improvements in bandwidth performance of optical sources make it important to investigate approaches to develop high bandwidth polymer waveguides. In this paper, we present dispersion studies on a graded-index (GI) waveguide in siloxane materials designed to deliver high bandwidth over a range of launch conditions. Bandwidth-length products of >70 GHzxm and ~65 GHzxm are observed using a 50/125 um multimode fibre (MMF) launch for input offsets of +/- 10 um without and with the use of a mode mixer respectively; and enhanced values of >100 GHzxm are found under a 10x microscope objective launch for input offsets of ~18 x 20 um^2. The large range of offsets is within the -1 dB alignment tolerances. A theoretical model is developed using the measured refractive index profile of the waveguide, and general agreement is found with experimental bandwidth measurements. The reported results clearly demonstrate the potential of this technology for use in high-speed board-level optical links, and indicate that data transmission of 100 Gb/s over a multimode polymer waveguide is feasible with appropriate refractive index engineering.
△ Less
Submitted 25 November, 2016;
originally announced December 2016.
-
Dispersion Studies on Multimode Polymer Spiral Waveguides for Board-Level Optical Interconnects
Authors:
Jian Chen,
Nikos Bamiedakis,
Tom J. Edwards,
Christian T. A. Brown,
Richard V. Penty,
Ian H. White
Abstract:
Dispersion studies are conducted on 1m long multimode polymer spiral waveguides with different refractive index profiles. Bandwidth-length products >40GHzxm are obtained from such waveguides under a 50/125 um MMF, indicating the potential of this technology.
Dispersion studies are conducted on 1m long multimode polymer spiral waveguides with different refractive index profiles. Bandwidth-length products >40GHzxm are obtained from such waveguides under a 50/125 um MMF, indicating the potential of this technology.
△ Less
Submitted 1 February, 2017; v1 submitted 6 November, 2016;
originally announced November 2016.
-
These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure
Authors:
Qingpeng Zhang,
Jason Pell,
Rosangela Canino-Koning,
Adina Chuang Howe,
C. Titus Brown
Abstract:
K-mer abundance analysis is widely used for many purposes in nucleotide sequence analysis, including data preprocessing for de novo assembly, repeat detection, and sequencing coverage estimation. We present the khmer software package for fast and memory efficient online counting of k-mers in sequencing data sets. Unlike previous methods based on data structures such as hash tables, suffix arrays,…
▽ More
K-mer abundance analysis is widely used for many purposes in nucleotide sequence analysis, including data preprocessing for de novo assembly, repeat detection, and sequencing coverage estimation. We present the khmer software package for fast and memory efficient online counting of k-mers in sequencing data sets. Unlike previous methods based on data structures such as hash tables, suffix arrays, and trie structures, khmer relies entirely on a simple probabilistic data structure, a Count-Min Sketch. The Count-Min Sketch permits online updating and retrieval of k-mer counts in memory which is necessary to support online k-mer analysis algorithms. On sparse data sets this data structure is considerably more memory efficient than any exact data structure. In exchange, the use of a Count-Min Sketch introduces a systematic overcount for k-mers; moreover, only the counts, and not the k-mers, are stored. Here we analyze the speed, the memory usage, and the miscount rate of khmer for generating k-mer frequency distributions and retrieving k-mer counts for individual k-mers. We also compare the performance of khmer to several other k-mer counting packages, including Tallymer, Jellyfish, BFCounter, DSK, KMC, Turtle and KAnalyze. Finally, we examine the effectiveness of profiling sequencing error, k-mer abundance trimming, and digital normalization of reads in the context of high khmer false positive rates. khmer is implemented in C++ wrapped in a Python interface, offers a tested and robust API, and is freely available under the BSD license at github.com/ged-lab/khmer.
△ Less
Submitted 14 July, 2014; v1 submitted 11 September, 2013;
originally announced September 2013.
-
Suppression of amplitude-to-phase noise conversion in balanced optical-microwave phase detectors
Authors:
Maurice Lessing,
Helen S. Margolis,
C. Tom A. Brown,
Patrick Gill,
Giuseppe Marra
Abstract:
We demonstrate an amplitude-to-phase (AM-PM) conversion coefficient for a balanced optical-microwave phase detector (BOM-PD) of 0.001 rad, corresponding to AM-PM induced phase noise 60 dB below the single-sideband relative intensity noise of the laser. This enables us to generate 8 GHz microwave signals from a commercial Er-fibre comb with a single-sideband residual phase noise of -131 dBc/Hz at 1…
▽ More
We demonstrate an amplitude-to-phase (AM-PM) conversion coefficient for a balanced optical-microwave phase detector (BOM-PD) of 0.001 rad, corresponding to AM-PM induced phase noise 60 dB below the single-sideband relative intensity noise of the laser. This enables us to generate 8 GHz microwave signals from a commercial Er-fibre comb with a single-sideband residual phase noise of -131 dBc/Hz at 1 Hz offset frequency and -148 dBc/Hz at 1 kHz offset frequency.
△ Less
Submitted 4 September, 2013;
originally announced September 2013.
-
RNA-Seq Map** Errors When Using Incomplete Reference Transcriptomes of Vertebrates
Authors:
Alexis Black Pyrkosz,
Hans Cheng,
C. Titus Brown
Abstract:
Whole transcriptome sequencing is increasingly being used as a functional genomics tool to study non- model organisms. However, when the reference transcriptome used to calculate differential expression is incomplete, significant error in the inferred expression levels can result. In this study, we use simulated reads generated from real transcriptomes to determine the accuracy of read map**, an…
▽ More
Whole transcriptome sequencing is increasingly being used as a functional genomics tool to study non- model organisms. However, when the reference transcriptome used to calculate differential expression is incomplete, significant error in the inferred expression levels can result. In this study, we use simulated reads generated from real transcriptomes to determine the accuracy of read map**, and measure the error resulting from using an incomplete transcriptome. We show that the two primary sources of count- ing error are 1) alternative splice variants that share reads and 2) missing transcripts from the reference. Alternative splice variants increase the false positive rate of map** while incomplete reference tran- scriptomes decrease the true positive rate, leading to inaccurate transcript expression levels. Grou** transcripts by gene or read sharing (similar to map** to a reference genome) significantly decreases false positives, but only by improving the reference transcriptome itself can the missing transcript problem be addressed. We also demonstrate that employing different map** software does not yield substantial increases in accuracy on simulated data. Finally, we show that read lengths or insert sizes must increase past 1kb to resolve map** ambiguity.
△ Less
Submitted 10 March, 2013;
originally announced March 2013.
-
khmer: Working with Big Data in Bioinformatics
Authors:
Eric McDonald,
C. Titus Brown
Abstract:
We introduce design and optimization considerations for the 'khmer' package.
We introduce design and optimization considerations for the 'khmer' package.
△ Less
Submitted 9 March, 2013;
originally announced March 2013.
-
Assembling large, complex environmental metagenomes
Authors:
Adina Chuang Howe,
Janet Jansson,
Stephanie A. Malfatti,
Susannah G. Tringe,
James M. Tiedje,
C. Titus Brown
Abstract:
The large volumes of sequencing data required to sample complex environments deeply pose new challenges to sequence analysis approaches. De novo metagenomic assembly effectively reduces the total amount of data to be analyzed but requires significant computational resources. We apply two pre-assembly filtering approaches, digital normalization and partitioning, to make large metagenome assemblies…
▽ More
The large volumes of sequencing data required to sample complex environments deeply pose new challenges to sequence analysis approaches. De novo metagenomic assembly effectively reduces the total amount of data to be analyzed but requires significant computational resources. We apply two pre-assembly filtering approaches, digital normalization and partitioning, to make large metagenome assemblies more comput\ ationaly tractable. Using a human gut mock community dataset, we demonstrate that these methods result in assemblies nearly identical to assemblies from unprocessed data. We then assemble two large soil metagenomes from matched Iowa corn and native prairie soils. The predicted functional content and phylogenetic origin of the assembled contigs indicate significant taxonomic differences despite similar function. The assembly strategies presented are generic and can be extended to any metagenome; full source code is freely available under a BSD license.
△ Less
Submitted 28 December, 2012; v1 submitted 12 December, 2012;
originally announced December 2012.
-
Illumina Sequencing Artifacts Revealed by Connectivity Analysis of Metagenomic Datasets
Authors:
Adina Chuang Howe,
Jason Pell,
Rosangela Canino-Koning,
Rachel Mackelprang,
Susannah Tringe,
Janet Jansson,
James M. Tiedje,
C. Titus Brown
Abstract:
Sequencing errors and biases in metagenomic datasets affect coverage-based assemblies and are often ignored during analysis. Here, we analyze read connectivity in metagenomes and identify the presence of problematic and likely a-biological connectivity within metagenome assembly graphs. Specifically, we identify highly connected sequences which join a large proportion of reads within each real met…
▽ More
Sequencing errors and biases in metagenomic datasets affect coverage-based assemblies and are often ignored during analysis. Here, we analyze read connectivity in metagenomes and identify the presence of problematic and likely a-biological connectivity within metagenome assembly graphs. Specifically, we identify highly connected sequences which join a large proportion of reads within each real metagenome. These sequences show position-specific bias in shotgun reads, suggestive of sequencing artifacts, and are only minimally incorporated into contigs by assembly. The removal of these sequences prior to assembly results in similar assembly content for most metagenomes and enables the use of graph partitioning to decrease assembly memory and time requirements.
△ Less
Submitted 1 December, 2012;
originally announced December 2012.
-
2μm Solid-State Laser Mode-locked By Single-Layer Graphene
Authors:
A. A. Lagatsky,
Z. Sun,
T. S. Kulmala,
R. S. Sundaram,
S. Milana,
F. Torrisi,
O. L. Antipov,
Y. Lee,
J. H. Ahn,
C. T. A. Brown,
W. Sibbett,
A. C. Ferrari
Abstract:
We report a 2μm ultrafast solid-state Tm:Lu2O3 laser, mode-locked by single-layer graphene, generating transform-limited~410fs pulses, with a spectral width~11.1nm at 2067nm. The maximum average output power is 270mW, at a pulse repetition frequency of 110MHz. This is a convenient high-power transform-limited laser at 2μm for various applications, such as laser surgery and material processing.
We report a 2μm ultrafast solid-state Tm:Lu2O3 laser, mode-locked by single-layer graphene, generating transform-limited~410fs pulses, with a spectral width~11.1nm at 2067nm. The maximum average output power is 270mW, at a pulse repetition frequency of 110MHz. This is a convenient high-power transform-limited laser at 2μm for various applications, such as laser surgery and material processing.
△ Less
Submitted 25 October, 2012;
originally announced October 2012.
-
Best Practices for Scientific Computing
Authors:
Greg Wilson,
D. A. Aruliah,
C. Titus Brown,
Neil P. Chue Hong,
Matt Davis,
Richard T. Guy,
Steven H. D. Haddock,
Katy Huff,
Ian M. Mitchell,
Mark Plumbley,
Ben Waugh,
Ethan P. White,
Paul Wilson
Abstract:
Scientists spend an increasing amount of time building and using software. However, most scientists are never taught how to do this efficiently. As a result, many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort. We describe a set of best practices for scientific software development that have solid foundations in research and e…
▽ More
Scientists spend an increasing amount of time building and using software. However, most scientists are never taught how to do this efficiently. As a result, many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort. We describe a set of best practices for scientific software development that have solid foundations in research and experience, and that improve scientists' productivity and the reliability of their software.
△ Less
Submitted 26 September, 2013; v1 submitted 30 September, 2012;
originally announced October 2012.
-
A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data
Authors:
C. Titus Brown,
Adina Howe,
Qingpeng Zhang,
Alexis B. Pyrkosz,
Timothy H. Brom
Abstract:
Deep shotgun sequencing and analysis of genomes, transcriptomes, amplified single-cell genomes, and metagenomes has enabled investigation of a wide range of organisms and ecosystems. However, sampling variation in short-read data sets and high sequencing error rates of modern sequencers present many new computational challenges in data interpretation. These challenges have led to the development o…
▽ More
Deep shotgun sequencing and analysis of genomes, transcriptomes, amplified single-cell genomes, and metagenomes has enabled investigation of a wide range of organisms and ecosystems. However, sampling variation in short-read data sets and high sequencing error rates of modern sequencers present many new computational challenges in data interpretation. These challenges have led to the development of new classes of map** tools and {\em de novo} assemblers. These algorithms are challenged by the continued improvement in sequencing throughput. We here describe digital normalization, a single-pass computational algorithm that systematizes coverage in shotgun sequencing data sets, thereby decreasing sampling variation, discarding redundant data, and removing the majority of errors. Digital normalization substantially reduces the size of shotgun data sets and decreases the memory and time requirements for {\em de novo} sequence assembly, all without significantly impacting content of the generated contigs. We apply digital normalization to the assembly of microbial genomic data, amplified single-cell genomic data, and transcriptomic data. Our implementation is freely available for use and modification.
△ Less
Submitted 21 May, 2012; v1 submitted 21 March, 2012;
originally announced March 2012.
-
Scaling metagenome sequence assembly with probabilistic de Bruijn graphs
Authors:
Jason Pell,
Arend Hintze,
Rosangela Canino-Koning,
Adina Howe,
James M. Tiedje,
C. Titus Brown
Abstract:
Deep sequencing has enabled the investigation of a wide range of environmental microbial ecosystems, but the high memory requirements for {\em de novo} assembly of short-read shotgun sequencing data from these complex populations are an increasingly large practical barrier. Here we introduce a memory-efficient graph representation with which we can analyze the k-mer connectivity of metagenomic sam…
▽ More
Deep sequencing has enabled the investigation of a wide range of environmental microbial ecosystems, but the high memory requirements for {\em de novo} assembly of short-read shotgun sequencing data from these complex populations are an increasingly large practical barrier. Here we introduce a memory-efficient graph representation with which we can analyze the k-mer connectivity of metagenomic samples. The graph representation is based on a probabilistic data structure, a Bloom filter, that allows us to efficiently store assembly graphs in as little as 4 bits per k-mer, albeit inexactly. We show that this data structure accurately represents DNA assembly graphs in low memory. We apply this data structure to the problem of partitioning assembly graphs into components as a prelude to assembly, and show that this reduces the overall memory requirements for {\em de novo} assembly of metagenomes. On one soil metagenome assembly, this approach achieves a nearly 40-fold decrease in the maximum memory requirements for assembly. This probabilistic graph representation is a significant theoretical advance in storing assembly graphs and also yields immediate leverage on metagenomic assembly.
△ Less
Submitted 29 June, 2012; v1 submitted 18 December, 2011;
originally announced December 2011.
-
Abundance Distributions in Artificial Life and Stochastic Models: "Age and Area" revisited
Authors:
C. Adami,
C. T. Brown,
M. Haggerty
Abstract:
Using an artificial system of self-replicating strings, we show a correlation between the age of a genotype and its abundance that reflects a punctuated rather than gradual picture of evolution, as suggested long ago by Willis. In support of this correlation, we measure genotype abundance distributions and find universal coefficients. Finally, we propose a simple stochastic model which describes…
▽ More
Using an artificial system of self-replicating strings, we show a correlation between the age of a genotype and its abundance that reflects a punctuated rather than gradual picture of evolution, as suggested long ago by Willis. In support of this correlation, we measure genotype abundance distributions and find universal coefficients. Finally, we propose a simple stochastic model which describes the dynamics of equilibrium periods and which correctly predicts most of the observed distributions.
△ Less
Submitted 29 March, 1995;
originally announced March 1995.
-
Evolutionary Learning in the 2D Artificial Life System "Avida"
Authors:
Chris Adami,
C. Titus Brown
Abstract:
We present a new tierra-inspired artificial life system with local interactions and two-dimensional geometry, based on an update mechanism akin to that of 2D cellular automata. We find that the spatial geometry is conducive to the development of diversity and thus improves adaptive capabilities. We also demonstrate the adaptive strength of the system by breeding cells with simple computational a…
▽ More
We present a new tierra-inspired artificial life system with local interactions and two-dimensional geometry, based on an update mechanism akin to that of 2D cellular automata. We find that the spatial geometry is conducive to the development of diversity and thus improves adaptive capabilities. We also demonstrate the adaptive strength of the system by breeding cells with simple computational abilities, and study the dependence of this adaptability on mutation rate and population size.
△ Less
Submitted 16 May, 1994;
originally announced May 1994.