Search | arXiv e-print repository

Quantum-Enhanced Neural Exchange-Correlation Functionals

Authors: Igor O. Sokolov, Gert-Jan Both, Art D. Bochevarov, Pavel A. Dub, Daniel S. Levine, Christopher T. Brown, Shaheen Acheche, Panagiotis Kl. Barkoutsos, Vincent E. Elfving

Abstract: Kohn-Sham Density Functional Theory (KS-DFT) provides the exact ground state energy and electron density of a molecule, contingent on the as-yet-unknown universal exchange-correlation (XC) functional. Recent research has demonstrated that neural networks can efficiently learn to represent approximations to that functional, offering accurate generalizations to molecules not present during the train… ▽ More Kohn-Sham Density Functional Theory (KS-DFT) provides the exact ground state energy and electron density of a molecule, contingent on the as-yet-unknown universal exchange-correlation (XC) functional. Recent research has demonstrated that neural networks can efficiently learn to represent approximations to that functional, offering accurate generalizations to molecules not present during the training process. With the latest advancements in quantum-enhanced machine learning (ML), evidence is growing that Quantum Neural Network (QNN) models may offer advantages in ML applications. In this work, we explore the use of QNNs for representing XC functionals, enhancing and comparing them to classical ML techniques. We present QNNs based on differentiable quantum circuits (DQCs) as quantum (hybrid) models for XC in KS-DFT, implemented across various architectures. We assess their performance on 1D and 3D systems. To that end, we expand existing differentiable KS-DFT frameworks and propose strategies for efficient training of such functionals, highlighting the importance of fractional orbital occupation for accurate results. Our best QNN-based XC functional yields energy profiles of the H$_2$ and planar H$_4$ molecules that deviate by no more than 1 mHa from the reference DMRG and FCI/6-31G results, respectively. Moreover, they reach chemical precision on a system, H$_2$H$_2$, not present in the training dataset, using only a few variational parameters. This work lays the foundation for the integration of quantum models in KS-DFT, thereby opening new avenues for expressing XC functionals in a differentiable way and facilitating computations of various properties. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:1707.04393 [pdf, other]

doi 10.7717/peerj-cs.142

Sustainable computational science: the ReScience initiative

Authors: Nicolas P. Rougier, Konrad Hinsen, Frédéric Alexandre, Thomas Arildsen, Lorena Barba, Fabien C. Y. Benureau, C. Titus Brown, Pierre de Buyl, Ozan Caglayan, Andrew P. Davison, Marc André Delsuc, Georgios Detorakis, Alexandra K. Diem, Damien Drix, Pierre Enel, Benoît Girard, Olivia Guest, Matt G. Hall, Rafael Neto Henriques, Xavier Hinaut, Kamil S Jaron, Mehdi Khamassi, Almar Klein, Tiina Manninen, Pietro Marchesi , et al. (20 additional authors not shown)

Abstract: Computer science offers a large set of tools for prototy**, writing, running, testing, validating, sharing and reproducing results, however computational science lags behind. In the best case, authors may provide their source code as a compressed archive and they may feel confident their research is reproducible. But this is not exactly true. James Buckheit and David Donoho proposed more than tw… ▽ More Computer science offers a large set of tools for prototy**, writing, running, testing, validating, sharing and reproducing results, however computational science lags behind. In the best case, authors may provide their source code as a compressed archive and they may feel confident their research is reproducible. But this is not exactly true. James Buckheit and David Donoho proposed more than two decades ago that an article about computational results is advertising, not scholarship. The actual scholarship is the full software environment, code, and data that produced the result. This implies new workflows, in particular in peer-reviews. Existing journals have been slow to adapt: source codes are rarely requested, hardly ever actually executed to check that they produce the results advertised in the article. ReScience is a peer-reviewed journal that targets computational research and encourages the explicit replication of already published research, promoting new and open-source implementations in order to ensure that the original research can be replicated from its description. To achieve this goal, the whole publishing chain is radically different from other traditional scientific journals. ReScience resides on GitHub where each new implementation of a computational study is made available together with comments, explanations, and software tests. △ Less

Submitted 11 November, 2017; v1 submitted 14 July, 2017; originally announced July 2017.

Comments: 8 pages, 1 figure

Journal ref: PeerJ Computer Science 3:e142 (2017)

arXiv:1612.01574 [pdf]

doi 10.1109/JLT.2015.2500611

High-Bandwidth and Large Coupling Tolerance Graded-Index Multimode Polymer Waveguides for On-board High-Speed Optical Interconnects

Authors: Jian Chen, Nikolaos Bamiedakis, Peter P. Vasil'ev, Tom J. Edwards, Christian T. A. Brown, Richard V. Penty, Ian H. White

Abstract: Optical interconnects have attracted significant research interest for use in short-reach board-level optical communication links in supercomputers and data centres. Multimode polymer waveguides in particular constitute an attractive technology for on-board optical interconnects as they provide high bandwidth, offer relaxed alignment tolerances, and can be cost-effectively integrated onto standard… ▽ More Optical interconnects have attracted significant research interest for use in short-reach board-level optical communication links in supercomputers and data centres. Multimode polymer waveguides in particular constitute an attractive technology for on-board optical interconnects as they provide high bandwidth, offer relaxed alignment tolerances, and can be cost-effectively integrated onto standard printed circuit boards (PCBs). However, the continuing improvements in bandwidth performance of optical sources make it important to investigate approaches to develop high bandwidth polymer waveguides. In this paper, we present dispersion studies on a graded-index (GI) waveguide in siloxane materials designed to deliver high bandwidth over a range of launch conditions. Bandwidth-length products of >70 GHzxm and ~65 GHzxm are observed using a 50/125 um multimode fibre (MMF) launch for input offsets of +/- 10 um without and with the use of a mode mixer respectively; and enhanced values of >100 GHzxm are found under a 10x microscope objective launch for input offsets of ~18 x 20 um^2. The large range of offsets is within the -1 dB alignment tolerances. A theoretical model is developed using the measured refractive index profile of the waveguide, and general agreement is found with experimental bandwidth measurements. The reported results clearly demonstrate the potential of this technology for use in high-speed board-level optical links, and indicate that data transmission of 100 Gb/s over a multimode polymer waveguide is feasible with appropriate refractive index engineering. △ Less

Submitted 25 November, 2016; originally announced December 2016.

Comments: 8 pages, 10 figures

Journal ref: Journal of Lightwave Technology, Vol. 34, Issue. 12 (2015)

arXiv:1611.03107 [pdf]

doi 10.1109/OIC.2015.7115670

Dispersion Studies on Multimode Polymer Spiral Waveguides for Board-Level Optical Interconnects

Authors: Jian Chen, Nikos Bamiedakis, Tom J. Edwards, Christian T. A. Brown, Richard V. Penty, Ian H. White

Abstract: Dispersion studies are conducted on 1m long multimode polymer spiral waveguides with different refractive index profiles. Bandwidth-length products >40GHzxm are obtained from such waveguides under a 50/125 um MMF, indicating the potential of this technology. Dispersion studies are conducted on 1m long multimode polymer spiral waveguides with different refractive index profiles. Bandwidth-length products >40GHzxm are obtained from such waveguides under a 50/125 um MMF, indicating the potential of this technology. △ Less

Submitted 1 February, 2017; v1 submitted 6 November, 2016; originally announced November 2016.

Comments: 3 pages, 2 figures, IEEE Optical Interconnects Conference (OIC), paper MD2, 2015

arXiv:1309.2975 [pdf, ps, other]

doi 10.1371/journal.pone.0101271

These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure

Authors: Qingpeng Zhang, Jason Pell, Rosangela Canino-Koning, Adina Chuang Howe, C. Titus Brown

Abstract: K-mer abundance analysis is widely used for many purposes in nucleotide sequence analysis, including data preprocessing for de novo assembly, repeat detection, and sequencing coverage estimation. We present the khmer software package for fast and memory efficient online counting of k-mers in sequencing data sets. Unlike previous methods based on data structures such as hash tables, suffix arrays,… ▽ More K-mer abundance analysis is widely used for many purposes in nucleotide sequence analysis, including data preprocessing for de novo assembly, repeat detection, and sequencing coverage estimation. We present the khmer software package for fast and memory efficient online counting of k-mers in sequencing data sets. Unlike previous methods based on data structures such as hash tables, suffix arrays, and trie structures, khmer relies entirely on a simple probabilistic data structure, a Count-Min Sketch. The Count-Min Sketch permits online updating and retrieval of k-mer counts in memory which is necessary to support online k-mer analysis algorithms. On sparse data sets this data structure is considerably more memory efficient than any exact data structure. In exchange, the use of a Count-Min Sketch introduces a systematic overcount for k-mers; moreover, only the counts, and not the k-mers, are stored. Here we analyze the speed, the memory usage, and the miscount rate of khmer for generating k-mer frequency distributions and retrieving k-mer counts for individual k-mers. We also compare the performance of khmer to several other k-mer counting packages, including Tallymer, Jellyfish, BFCounter, DSK, KMC, Turtle and KAnalyze. Finally, we examine the effectiveness of profiling sequencing error, k-mer abundance trimming, and digital normalization of reads in the context of high khmer false positive rates. khmer is implemented in C++ wrapped in a Python interface, offers a tested and robust API, and is freely available under the BSD license at github.com/ged-lab/khmer. △ Less

Submitted 14 July, 2014; v1 submitted 11 September, 2013; originally announced September 2013.

Journal ref: PLoS One. 2014 Jul 25;9(7):e101271

arXiv:1309.1116 [pdf]

doi 10.1364/OE.21.027057

Suppression of amplitude-to-phase noise conversion in balanced optical-microwave phase detectors

Authors: Maurice Lessing, Helen S. Margolis, C. Tom A. Brown, Patrick Gill, Giuseppe Marra

Abstract: We demonstrate an amplitude-to-phase (AM-PM) conversion coefficient for a balanced optical-microwave phase detector (BOM-PD) of 0.001 rad, corresponding to AM-PM induced phase noise 60 dB below the single-sideband relative intensity noise of the laser. This enables us to generate 8 GHz microwave signals from a commercial Er-fibre comb with a single-sideband residual phase noise of -131 dBc/Hz at 1… ▽ More We demonstrate an amplitude-to-phase (AM-PM) conversion coefficient for a balanced optical-microwave phase detector (BOM-PD) of 0.001 rad, corresponding to AM-PM induced phase noise 60 dB below the single-sideband relative intensity noise of the laser. This enables us to generate 8 GHz microwave signals from a commercial Er-fibre comb with a single-sideband residual phase noise of -131 dBc/Hz at 1 Hz offset frequency and -148 dBc/Hz at 1 kHz offset frequency. △ Less

Submitted 4 September, 2013; originally announced September 2013.

arXiv:1303.2411 [pdf, other]

RNA-Seq Map** Errors When Using Incomplete Reference Transcriptomes of Vertebrates

Authors: Alexis Black Pyrkosz, Hans Cheng, C. Titus Brown

Abstract: Whole transcriptome sequencing is increasingly being used as a functional genomics tool to study non- model organisms. However, when the reference transcriptome used to calculate differential expression is incomplete, significant error in the inferred expression levels can result. In this study, we use simulated reads generated from real transcriptomes to determine the accuracy of read map**, an… ▽ More Whole transcriptome sequencing is increasingly being used as a functional genomics tool to study non- model organisms. However, when the reference transcriptome used to calculate differential expression is incomplete, significant error in the inferred expression levels can result. In this study, we use simulated reads generated from real transcriptomes to determine the accuracy of read map**, and measure the error resulting from using an incomplete transcriptome. We show that the two primary sources of count- ing error are 1) alternative splice variants that share reads and 2) missing transcripts from the reference. Alternative splice variants increase the false positive rate of map** while incomplete reference tran- scriptomes decrease the true positive rate, leading to inaccurate transcript expression levels. Grou** transcripts by gene or read sharing (similar to map** to a reference genome) significantly decreases false positives, but only by improving the reference transcriptome itself can the missing transcript problem be addressed. We also demonstrate that employing different map** software does not yield substantial increases in accuracy on simulated data. Finally, we show that read lengths or insert sizes must increase past 1kb to resolve map** ambiguity. △ Less

Submitted 10 March, 2013; originally announced March 2013.

arXiv:1303.2223 [pdf, other]

khmer: Working with Big Data in Bioinformatics

Authors: Eric McDonald, C. Titus Brown

Abstract: We introduce design and optimization considerations for the 'khmer' package. We introduce design and optimization considerations for the 'khmer' package. △ Less

Submitted 9 March, 2013; originally announced March 2013.

Comments: Invited chapter for forthcoming book on Performance of Open Source Applications

arXiv:1212.2832 [pdf, other]

Assembling large, complex environmental metagenomes

Authors: Adina Chuang Howe, Janet Jansson, Stephanie A. Malfatti, Susannah G. Tringe, James M. Tiedje, C. Titus Brown

Abstract: The large volumes of sequencing data required to sample complex environments deeply pose new challenges to sequence analysis approaches. De novo metagenomic assembly effectively reduces the total amount of data to be analyzed but requires significant computational resources. We apply two pre-assembly filtering approaches, digital normalization and partitioning, to make large metagenome assemblies… ▽ More The large volumes of sequencing data required to sample complex environments deeply pose new challenges to sequence analysis approaches. De novo metagenomic assembly effectively reduces the total amount of data to be analyzed but requires significant computational resources. We apply two pre-assembly filtering approaches, digital normalization and partitioning, to make large metagenome assemblies more comput\ ationaly tractable. Using a human gut mock community dataset, we demonstrate that these methods result in assemblies nearly identical to assemblies from unprocessed data. We then assemble two large soil metagenomes from matched Iowa corn and native prairie soils. The predicted functional content and phylogenetic origin of the assembled contigs indicate significant taxonomic differences despite similar function. The assembly strategies presented are generic and can be extended to any metagenome; full source code is freely available under a BSD license. △ Less

Submitted 28 December, 2012; v1 submitted 12 December, 2012; originally announced December 2012.

Comments: Includes supporting information

arXiv:1212.0159 [pdf, other]

Illumina Sequencing Artifacts Revealed by Connectivity Analysis of Metagenomic Datasets

Authors: Adina Chuang Howe, Jason Pell, Rosangela Canino-Koning, Rachel Mackelprang, Susannah Tringe, Janet Jansson, James M. Tiedje, C. Titus Brown

Abstract: Sequencing errors and biases in metagenomic datasets affect coverage-based assemblies and are often ignored during analysis. Here, we analyze read connectivity in metagenomes and identify the presence of problematic and likely a-biological connectivity within metagenome assembly graphs. Specifically, we identify highly connected sequences which join a large proportion of reads within each real met… ▽ More Sequencing errors and biases in metagenomic datasets affect coverage-based assemblies and are often ignored during analysis. Here, we analyze read connectivity in metagenomes and identify the presence of problematic and likely a-biological connectivity within metagenome assembly graphs. Specifically, we identify highly connected sequences which join a large proportion of reads within each real metagenome. These sequences show position-specific bias in shotgun reads, suggestive of sequencing artifacts, and are only minimally incorporated into contigs by assembly. The removal of these sequences prior to assembly results in similar assembly content for most metagenomes and enables the use of graph partitioning to decrease assembly memory and time requirements. △ Less

Submitted 1 December, 2012; originally announced December 2012.

arXiv:1210.7042 [pdf, ps, other]

doi 10.1063/1.4773990

2μm Solid-State Laser Mode-locked By Single-Layer Graphene

Authors: A. A. Lagatsky, Z. Sun, T. S. Kulmala, R. S. Sundaram, S. Milana, F. Torrisi, O. L. Antipov, Y. Lee, J. H. Ahn, C. T. A. Brown, W. Sibbett, A. C. Ferrari

Abstract: We report a 2μm ultrafast solid-state Tm:Lu2O3 laser, mode-locked by single-layer graphene, generating transform-limited~410fs pulses, with a spectral width~11.1nm at 2067nm. The maximum average output power is 270mW, at a pulse repetition frequency of 110MHz. This is a convenient high-power transform-limited laser at 2μm for various applications, such as laser surgery and material processing. We report a 2μm ultrafast solid-state Tm:Lu2O3 laser, mode-locked by single-layer graphene, generating transform-limited~410fs pulses, with a spectral width~11.1nm at 2067nm. The maximum average output power is 270mW, at a pulse repetition frequency of 110MHz. This is a convenient high-power transform-limited laser at 2μm for various applications, such as laser surgery and material processing. △ Less

Submitted 25 October, 2012; originally announced October 2012.

Journal ref: Appl. Phys. Lett. 102, 013113 (2013)

arXiv:1210.0530 [pdf, ps, other]

doi 10.1371/journal.pbio.1001745

Best Practices for Scientific Computing

Authors: Greg Wilson, D. A. Aruliah, C. Titus Brown, Neil P. Chue Hong, Matt Davis, Richard T. Guy, Steven H. D. Haddock, Katy Huff, Ian M. Mitchell, Mark Plumbley, Ben Waugh, Ethan P. White, Paul Wilson

Abstract: Scientists spend an increasing amount of time building and using software. However, most scientists are never taught how to do this efficiently. As a result, many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort. We describe a set of best practices for scientific software development that have solid foundations in research and e… ▽ More Scientists spend an increasing amount of time building and using software. However, most scientists are never taught how to do this efficiently. As a result, many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort. We describe a set of best practices for scientific software development that have solid foundations in research and experience, and that improve scientists' productivity and the reliability of their software. △ Less

Submitted 26 September, 2013; v1 submitted 30 September, 2012; originally announced October 2012.

Comments: 18 pages

Journal ref: PLOS Biology 12(1): e1001745, Jan 2014

arXiv:1203.4802 [pdf, other]

A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data

Authors: C. Titus Brown, Adina Howe, Qingpeng Zhang, Alexis B. Pyrkosz, Timothy H. Brom

Abstract: Deep shotgun sequencing and analysis of genomes, transcriptomes, amplified single-cell genomes, and metagenomes has enabled investigation of a wide range of organisms and ecosystems. However, sampling variation in short-read data sets and high sequencing error rates of modern sequencers present many new computational challenges in data interpretation. These challenges have led to the development o… ▽ More Deep shotgun sequencing and analysis of genomes, transcriptomes, amplified single-cell genomes, and metagenomes has enabled investigation of a wide range of organisms and ecosystems. However, sampling variation in short-read data sets and high sequencing error rates of modern sequencers present many new computational challenges in data interpretation. These challenges have led to the development of new classes of map** tools and {\em de novo} assemblers. These algorithms are challenged by the continued improvement in sequencing throughput. We here describe digital normalization, a single-pass computational algorithm that systematizes coverage in shotgun sequencing data sets, thereby decreasing sampling variation, discarding redundant data, and removing the majority of errors. Digital normalization substantially reduces the size of shotgun data sets and decreases the memory and time requirements for {\em de novo} sequence assembly, all without significantly impacting content of the generated contigs. We apply digital normalization to the assembly of microbial genomic data, amplified single-cell genomic data, and transcriptomic data. Our implementation is freely available for use and modification. △ Less

Submitted 21 May, 2012; v1 submitted 21 March, 2012; originally announced March 2012.

arXiv:1112.4193 [pdf, other]

doi 10.1073/pnas.1121464109

Scaling metagenome sequence assembly with probabilistic de Bruijn graphs

Authors: Jason Pell, Arend Hintze, Rosangela Canino-Koning, Adina Howe, James M. Tiedje, C. Titus Brown

Abstract: Deep sequencing has enabled the investigation of a wide range of environmental microbial ecosystems, but the high memory requirements for {\em de novo} assembly of short-read shotgun sequencing data from these complex populations are an increasingly large practical barrier. Here we introduce a memory-efficient graph representation with which we can analyze the k-mer connectivity of metagenomic sam… ▽ More Deep sequencing has enabled the investigation of a wide range of environmental microbial ecosystems, but the high memory requirements for {\em de novo} assembly of short-read shotgun sequencing data from these complex populations are an increasingly large practical barrier. Here we introduce a memory-efficient graph representation with which we can analyze the k-mer connectivity of metagenomic samples. The graph representation is based on a probabilistic data structure, a Bloom filter, that allows us to efficiently store assembly graphs in as little as 4 bits per k-mer, albeit inexactly. We show that this data structure accurately represents DNA assembly graphs in low memory. We apply this data structure to the problem of partitioning assembly graphs into components as a prelude to assembly, and show that this reduces the overall memory requirements for {\em de novo} assembly of metagenomes. On one soil metagenome assembly, this approach achieves a nearly 40-fold decrease in the maximum memory requirements for assembly. This probabilistic graph representation is a significant theoretical advance in storing assembly graphs and also yields immediate leverage on metagenomic assembly. △ Less

Submitted 29 June, 2012; v1 submitted 18 December, 2011; originally announced December 2011.

arXiv:adap-org/9503004 [pdf, ps]

Abundance Distributions in Artificial Life and Stochastic Models: "Age and Area" revisited

Authors: C. Adami, C. T. Brown, M. Haggerty

Abstract: Using an artificial system of self-replicating strings, we show a correlation between the age of a genotype and its abundance that reflects a punctuated rather than gradual picture of evolution, as suggested long ago by Willis. In support of this correlation, we measure genotype abundance distributions and find universal coefficients. Finally, we propose a simple stochastic model which describes… ▽ More Using an artificial system of self-replicating strings, we show a correlation between the age of a genotype and its abundance that reflects a punctuated rather than gradual picture of evolution, as suggested long ago by Willis. In support of this correlation, we measure genotype abundance distributions and find universal coefficients. Finally, we propose a simple stochastic model which describes the dynamics of equilibrium periods and which correctly predicts most of the observed distributions. △ Less

Submitted 29 March, 1995; originally announced March 1995.

Comments: 12 p., tar-compressed uuencoded postscript incl. figures, Proc. of ECAL 95 conference, to appear

arXiv:adap-org/9405003 [pdf, ps]

Evolutionary Learning in the 2D Artificial Life System "Avida"

Authors: Chris Adami, C. Titus Brown

Abstract: We present a new tierra-inspired artificial life system with local interactions and two-dimensional geometry, based on an update mechanism akin to that of 2D cellular automata. We find that the spatial geometry is conducive to the development of diversity and thus improves adaptive capabilities. We also demonstrate the adaptive strength of the system by breeding cells with simple computational a… ▽ More We present a new tierra-inspired artificial life system with local interactions and two-dimensional geometry, based on an update mechanism akin to that of 2D cellular automata. We find that the spatial geometry is conducive to the development of diversity and thus improves adaptive capabilities. We also demonstrate the adaptive strength of the system by breeding cells with simple computational abilities, and study the dependence of this adaptability on mutation rate and population size. △ Less

Submitted 16 May, 1994; originally announced May 1994.

Comments: 5 p., postscript with figures (unpack with uufiles), to appear in the Proc. of ``Artificial Life IV'', MIT Press

Report number: MAP-173

Showing 1–16 of 16 results for author: Brown, C T