Search | arXiv e-print repository

arXiv:2406.11937 [pdf, other]

Using graph neural networks to reconstruct charged pion showers in the CMS High Granularity Calorimeter

Authors: M. Aamir, B. Acar, G. Adamov, T. Adams, C. Adloff, S. Afanasiev, C. Agrawal, C. Agrawal, A. Ahmad, H. A. Ahmed, S. Akbar, N. Akchurin, B. Akgul, B. Akgun, R. O. Akpinar, E. Aktas, A. AlKadhim, V. Alexakhin, J. Alimena, J. Alison, A. Alpana, W. Alshehri, P. Alvarez Dominguez, M. Alyari, C. Amendola , et al. (550 additional authors not shown)

Abstract: A novel method to reconstruct the energy of hadronic showers in the CMS High Granularity Calorimeter (HGCAL) is presented. The HGCAL is a sampling calorimeter with very fine transverse and longitudinal granularity. The active media are silicon sensors and scintillator tiles readout by SiPMs and the absorbers are a combination of lead and Cu/CuW in the electromagnetic section, and steel in the hadr… ▽ More A novel method to reconstruct the energy of hadronic showers in the CMS High Granularity Calorimeter (HGCAL) is presented. The HGCAL is a sampling calorimeter with very fine transverse and longitudinal granularity. The active media are silicon sensors and scintillator tiles readout by SiPMs and the absorbers are a combination of lead and Cu/CuW in the electromagnetic section, and steel in the hadronic section. The shower reconstruction method is based on graph neural networks and it makes use of a dynamic reduction network architecture. It is shown that the algorithm is able to capture and mitigate the main effects that normally hinder the reconstruction of hadronic showers using classical reconstruction methods, by compensating for fluctuations in the multiplicity, energy, and spatial distributions of the shower's constituents. The performance of the algorithm is evaluated using test beam data collected in 2018 prototype of the CMS HGCAL accompanied by a section of the CALICE AHCAL prototype. The capability of the method to mitigate the impact of energy leakage from the calorimeter is also demonstrated. △ Less

Submitted 30 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: Prepared for submission to JINST

arXiv:2308.01837 [pdf, other]

doi 10.1016/j.nima.2023.168922

Reduction of light output of plastic scintillator tiles during irradiation at cold temperatures and in low-oxygen environments

Authors: B. Kronheim, A. Belloni, T. K. Edberg, S. C. Eno, C. Howe, C. Palmer, C. Papageorgakis, M. Paranjpe, S. Sriram

Abstract: The advent of the silicon photomultiplier has allowed the development of highly segmented calorimeters using plastic scintillator as the active media, with photodetectors embedded in the calorimeter, in dimples in the plastic. To reduce the photodetector's dark current and radiation damage, the high granularity calorimeter designed for the high luminosity upgrade of the CMS detector at CERN's Larg… ▽ More The advent of the silicon photomultiplier has allowed the development of highly segmented calorimeters using plastic scintillator as the active media, with photodetectors embedded in the calorimeter, in dimples in the plastic. To reduce the photodetector's dark current and radiation damage, the high granularity calorimeter designed for the high luminosity upgrade of the CMS detector at CERN's Large Hadron Collider will be operated at a temperature of about -30$^\circ$C. Due to flammability considerations, a low oxygen environment is being considered. However, the radiation damage to the plastic scintillator during irradiation in this operating environment needs to be considered. In this paper, we present measurements of the relative decrease of light output during irradiation of small plastic scintillator tiles read out by silicon photomultipliers. The irradiations were performed using a $^{60}\mathrm{Co}$ source both to produce the tiles' light and as a source of ionizing irradiation at dose rates of 0.3, 1.3, and $1.6\,$Gy/hr, temperatures of -30, -15, -5, and 0$^\circ$C, and with several different oxygen concentrations in the surrounding atmosphere. The effect of the material used to wrap the tile was also studied. Substantial temporary damage, which annealed when the sample was warmed, was seen during the low-temperature irradiations, regardless of the oxygen concentration and wrap** material. The relative light loss was largest with 3M$^{\tiny \textrm{TM}}$ Enhanced Specular Reflector Film wrap** and smallest with no wrap**, although due to the substantially higher light yield with wrap**, the final light output is largest with wrap**. The light loss was less at warmer temperatures. Damage with $3\%$ oxygen was similar to that in standard atmosphere. Evidence of a plateau in the radical density was seen for the 0$^\circ$C data. △ Less

Submitted 3 August, 2023; originally announced August 2023.

Journal ref: Nucl. Instrum. Methods Phys. Res. A 1059 (2024) 168922

arXiv:2307.09448 [pdf]

Ultrafast In vivo Transient Absorption Spectroscopy

Authors: Tomi K. Baikie, Darius Kosmützky, Joshua M. Lawrence, Victor Gray, Christoph Schnedermann, Robin Horton, Joel D. Collins, Hitesh Medipally, Bartosz Witek, Marc M. Nowaczyk, Jenny Zhang, Laura Wey, Christopher J. Howe, Akshay Rao

Abstract: Transient absorption (TA) spectroscopy has proved fundamental to our understanding of energy and charge transfer in biological systems, allowing measurements of photoactive proteins on sub-picosecond timescales. Recently, ultrafast TA spectroscopy has been applied in vivo, providing sub-picosecond measurements of photosynthetic light harvesting and electron transfer processes within living photosy… ▽ More Transient absorption (TA) spectroscopy has proved fundamental to our understanding of energy and charge transfer in biological systems, allowing measurements of photoactive proteins on sub-picosecond timescales. Recently, ultrafast TA spectroscopy has been applied in vivo, providing sub-picosecond measurements of photosynthetic light harvesting and electron transfer processes within living photosynthetic microorganisms. The analysis of the resultant data is hindered by the number of different photoactive pigments and the associated complexity of photoactive reaction schemes within living cells. Here we show how in vivo ultrafast TA spectroscopy can be applied to a diverse array of organisms from the tree of life, both photosynthetic and non-photosynthetic. We have developed a series of software tools for performing global, lifetime and target analysis of in vivo TA datasets. These advances establish in vivo TA spectroscopy as a versatile technique for studying energy and charge transfer in living systems. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: 39 pages

arXiv:2207.00530 [pdf]

The Target Study: A Conceptual Model and Framework for Measuring Disparity

Authors: John W. Jackson, Yea-Jen Hsu, Raquel C. Greer, Romsai T. Boonyasai, Chanelle J. Howe

Abstract: We present a conceptual model to measure disparity--the target study--where social groups may be similarly situated (i.e., balanced) on allowable covariates. Our model, based on a sampling design, does not intervene to assign social group membership or alter allowable covariates. To address non-random sample selection, we extend our model to generalize or transport disparity or to assess disparity… ▽ More We present a conceptual model to measure disparity--the target study--where social groups may be similarly situated (i.e., balanced) on allowable covariates. Our model, based on a sampling design, does not intervene to assign social group membership or alter allowable covariates. To address non-random sample selection, we extend our model to generalize or transport disparity or to assess disparity after an intervention on eligibility-related variables that eliminates forms of collider-stratification. To avoid bias from differential timing of enrollment, we aggregate time-specific study results by balancing calendar time of enrollment across social groups. To provide a framework for emulating our model, we discuss study designs, data structures, and G-computation and weighting estimators. We compare our sampling-based model to prominent decomposition-based models used in healthcare and algorithmic fairness. We provide R code for all estimators and apply our methods to measure health system disparities in hypertension control using electronic medical records. △ Less

Submitted 16 March, 2024; v1 submitted 1 July, 2022; originally announced July 2022.

Comments: Completely re-written for a clearer and more formal presentation with added results for generalizability and transportability and a more detailed comparison to alternative models

arXiv:2201.13370 [pdf]

doi 10.1038/s41586-023-05763-9

Photosynthesis re-wired on the pico-second timescale

Authors: Tomi K. Baikie, Laura T. Wey, Hitesh Medipally, Erwin Reisner, Marc M. Nowaczyk, Richard H. Friend, Christopher J. Howe, Christoph Schnedermann, Akshay Rao, Jenny Z. Zhang

Abstract: Photosystems II and I (PSII and PSI) are the reaction centre complexes that drive the light reactions of photosynthesis. PSII performs light-driven water oxidation (quantum efficiencies and catalysis rates of up to 80% and 1000 $e^{-}\text{s}^{-1}$, respectively) and PSI further photo-energises the harvested electrons (quantum efficiencies of ~100%). The impressive performance of the light harvest… ▽ More Photosystems II and I (PSII and PSI) are the reaction centre complexes that drive the light reactions of photosynthesis. PSII performs light-driven water oxidation (quantum efficiencies and catalysis rates of up to 80% and 1000 $e^{-}\text{s}^{-1}$, respectively) and PSI further photo-energises the harvested electrons (quantum efficiencies of ~100%). The impressive performance of the light harvesting components of photosynthesis has motivated extensive biological, artificial and biohybrid approaches to re-wire photosynthesis to enable higher efficiencies and new reaction pathways, such as H2 evolution or alternative CO2 fixation. To date these approaches have focussed on charge extraction at the terminal electron quinones of PSII and terminal iron-sulfur clusters of PSI. Ideally electron extraction would be possible immediately from the photoexcited reaction centres to enable the greatest thermodynamic gains. However, this was believed to be impossible because the reaction centres are buried around 4 nm within PSII and 5 nm within PSI from the cytoplasmic face. Here, we demonstrate using in vivo ultrafast transient absorption (TA) spectroscopy that it is possible to extract electrons directly from photoexcited PSI and PSII, using both live cyanobacterial cells and isolated photosystems, with the exogenous electron mediator 2,6-dichloro1,4-benzoquinone (DCBQ). We postulate that DCBQ can oxidise peripheral chlorophyll pigments participating in highly delocalised charge transfer (CT) states after initial photoexcitation. Our results open new avenues to study and re-wire photosynthesis for bioenergy and semi-artificial photosynthesis. △ Less

Submitted 7 February, 2022; v1 submitted 31 January, 2022; originally announced January 2022.

arXiv:2110.13142 [pdf, other]

doi 10.1109/MSP.2021.3123557

Light-Field Microscopy for optical imaging of neuronal activity: when model-based methods meet data-driven approaches

Authors: **fan Song, Herman Verinaz Jadan, Carmel L. Howe, Amanda J. Foust, Pier Luigi Dragotti

Abstract: Understanding how networks of neurons process information is one of the key challenges in modern neuroscience. A necessary step to achieve this goal is to be able to observe the dynamics of large populations of neurons over a large area of the brain. Light-field microscopy (LFM), a type of scanless microscope, is a particularly attractive candidate for high-speed three-dimensional (3D) imaging. It… ▽ More Understanding how networks of neurons process information is one of the key challenges in modern neuroscience. A necessary step to achieve this goal is to be able to observe the dynamics of large populations of neurons over a large area of the brain. Light-field microscopy (LFM), a type of scanless microscope, is a particularly attractive candidate for high-speed three-dimensional (3D) imaging. It captures volumetric information in a single snapshot, allowing volumetric imaging at video frame-rates. Specific features of imaging neuronal activity using LFM call for the development of novel machine learning approaches that fully exploit priors embedded in physics and optics models. Signal processing theory and wave-optics theory could play a key role in filling this gap, and contribute to novel computational methods with enhanced interpretability and generalization by integrating model-driven and data-driven approaches. This paper is devoted to a comprehensive survey to state-of-the-art of computational methods for LFM, with a focus on model-based and data-driven approaches. △ Less

Submitted 24 October, 2021; originally announced October 2021.

Comments: 20 pages, 9 figures, article accepted by IEEE Signal Processing Magazine

arXiv:2103.06164 [pdf, other]

Model-inspired Deep Learning for Light-Field Microscopy with Application to Neuron Localization

Authors: **fan Song, Herman Verinaz Jadan, Carmel L. Howe, Peter Quicke, Amanda J. Foust, Pier Luigi Dragotti

Abstract: Light-field microscopes are able to capture spatial and angular information of incident light rays. This allows reconstructing 3D locations of neurons from a single snap-shot.In this work, we propose a model-inspired deep learning approach to perform fast and robust 3D localization of sources using light-field microscopy images. This is achieved by develo** a deep network that efficiently solves… ▽ More Light-field microscopes are able to capture spatial and angular information of incident light rays. This allows reconstructing 3D locations of neurons from a single snap-shot.In this work, we propose a model-inspired deep learning approach to perform fast and robust 3D localization of sources using light-field microscopy images. This is achieved by develo** a deep network that efficiently solves a convolutional sparse coding (CSC) problem to map Epipolar Plane Images (EPI) to corresponding sparse codes. The network architecture is designed systematically by unrolling the convolutional Iterative Shrinkage and Thresholding Algorithm (ISTA) while the network parameters are learned from a training dataset. Such principled design enables the deep network to leverage both domain knowledge implied in the model, as well as new parameters learned from the data, thereby combining advantages of model-based and learning-based methods. Practical experiments on localization of mammalian neurons from light-fields show that the proposed approach simultaneously provides enhanced performance, interpretability and efficiency. △ Less

Submitted 10 March, 2021; originally announced March 2021.

Comments: 5 pages, 6 figures, ICASSP 2021

arXiv:1909.01183 [pdf, other]

doi 10.1051/0004-6361/201935574

The Solar Orbiter SPICE instrument -- An extreme UV imaging spectrometer

Authors: The SPICE Consortium, :, M. Anderson, T. Appourchaux, F. Auchère, R. Aznar Cuadrado, J. Barbay, F. Baudin, S. Beardsley, K. Bocchialini, B. Borgo, D. Bruzzi, E. Buchlin, G. Burton, V. Blüchel, M. Caldwell, S. Caminade, M. Carlsson, W. Curdt, J. Davenne, J. Davila, C. E. DeForest, G. Del Zanna, D. Drummond, J. Dubau , et al. (66 additional authors not shown)

Abstract: The Spectral Imaging of the Coronal Environment (SPICE) instrument is a high-resolution imaging spectrometer operating at extreme ultraviolet (EUV) wavelengths. In this paper, we present the concept, design, and pre-launch performance of this facility instrument on the ESA/NASA Solar Orbiter mission. The goal of this paper is to give prospective users a better understanding of the possible types o… ▽ More The Spectral Imaging of the Coronal Environment (SPICE) instrument is a high-resolution imaging spectrometer operating at extreme ultraviolet (EUV) wavelengths. In this paper, we present the concept, design, and pre-launch performance of this facility instrument on the ESA/NASA Solar Orbiter mission. The goal of this paper is to give prospective users a better understanding of the possible types of observations, the data acquisition, and the sources that contribute to the instrument's signal. The paper discusses the science objectives, with a focus on the SPICE-specific aspects, before presenting the instrument's design, including optical, mechanical, thermal, and electronics aspects. This is followed by a characterisation and calibration of the instrument's performance. The paper concludes with descriptions of the operations concept and data processing. The performance measurements of the various instrument parameters meet the requirements derived from the mission's science objectives. The SPICE instrument is ready to perform measurements that will provide vital contributions to the scientific success of the Solar Orbiter mission. △ Less

Submitted 3 September, 2019; originally announced September 2019.

Comments: A&A, accepted 19 August 2019; 26 pages, 25 figures

Journal ref: A&A 642, A14 (2020)

arXiv:1712.09100 [pdf, other]

doi 10.1038/s41467-018-03320-x

Porous translucent electrodes enhance current generation from photosynthetic biofilms

Authors: Tobias Wenzel, Daniel Haertter, Paolo Bombelli, Christopher J. Howe, Ullrich Steiner

Abstract: We tested the enhancement of electrical current generated from photosynthetically active bacteria by use of electrodes with porosity on the nano- and micrometer length-scale. For two cyanobacteria on structured indium-tin-oxide electrodes, current generation was increased by two orders of magnitude and the photo-response was substantially faster compared to non-porous anodes. These properties high… ▽ More We tested the enhancement of electrical current generated from photosynthetically active bacteria by use of electrodes with porosity on the nano- and micrometer length-scale. For two cyanobacteria on structured indium-tin-oxide electrodes, current generation was increased by two orders of magnitude and the photo-response was substantially faster compared to non-porous anodes. These properties highlight porosity as an important design strategy for electrochemical bio-interfaces. The role of porosity on different length scales was studied systematically which revealed that the main performance enhancement was caused by the increased surface area of the electrodes. More complex microstructured architectures which spanned biofilms as translucent 3D scaffolds provided additional advantage in the presence of microbial direct electron transfer (DET). The absence of a clear DET contribution in both studied cyanobacteria, Synechocystis and Nostoc, raises questions about the role of conductive cellular components previously found in both organisms. △ Less

Submitted 25 December, 2017; originally announced December 2017.

Comments: 11 pages manuscript and 6 pages supplementary material, 18 figures overall

arXiv:1705.09435 [pdf, other]

Deep Learning for Lung Cancer Detection: Tackling the Kaggle Data Science Bowl 2017 Challenge

Authors: Kingsley Kuan, Mathieu Ravaut, Gaurav Manek, Huiling Chen, Jie Lin, Babar Nazir, Cen Chen, Tse Chiang Howe, Zeng Zeng, Vijay Chandrasekhar

Abstract: We present a deep learning framework for computer-aided lung cancer diagnosis. Our multi-stage framework detects nodules in 3D lung CAT scans, determines if each nodule is malignant, and finally assigns a cancer probability based on these results. We discuss the challenges and advantages of our framework. In the Kaggle Data Science Bowl 2017, our framework ranked 41st out of 1972 teams. We present a deep learning framework for computer-aided lung cancer diagnosis. Our multi-stage framework detects nodules in 3D lung CAT scans, determines if each nodule is malignant, and finally assigns a cancer probability based on these results. We discuss the challenges and advantages of our framework. In the Kaggle Data Science Bowl 2017, our framework ranked 41st out of 1972 teams. △ Less

Submitted 26 May, 2017; originally announced May 2017.

arXiv:1508.01631 [pdf]

Surfactant-aided exfoliation of molydenum disulphide for ultrafast pulse generation through edge-state saturable absorption

Authors: Richard C. T. Howe, Robert I. Woodward, Guohua Hu, Zongyin Yang, Edmund J. R. Kelleher, Tawfique Hasan

Abstract: We use liquid phase exfoliation to produce dispersions of molybdenum disulphide (MoS2) nanoflakes in aqueous surfactant solutions. The chemical structures of the bile salt surfactants play a crucial role in the exfoliation and stabilization of MoS2. The resultant MoS2 dispersions are heavily enriched in single and few (<6) layer flakes with large edge to surface area ratio. We use the dispersions… ▽ More We use liquid phase exfoliation to produce dispersions of molybdenum disulphide (MoS2) nanoflakes in aqueous surfactant solutions. The chemical structures of the bile salt surfactants play a crucial role in the exfoliation and stabilization of MoS2. The resultant MoS2 dispersions are heavily enriched in single and few (<6) layer flakes with large edge to surface area ratio. We use the dispersions to fabricate free-standing polymer composite wide-band saturable absorbers to develop mode-locked and Q- switched fibre lasers, tunable from 1535-1565 and 1030-1070 nm, respectively. We attribute this sub-bandgap optical absorption and its nonlinear saturation behaviour to edge-mediated states introduced within the material band-gap of the exfoliated MoS2 nanoflakes. △ Less

Submitted 11 December, 2015; v1 submitted 7 August, 2015; originally announced August 2015.

Comments: 6 pages, 5 figures

arXiv:1507.03188 [pdf, other]

Yb- and Er-doped fiber laser Q-switched with an optically uniform, broadband WS2 saturable absorber

Authors: M. Zhang, G. Hu, G. Hu, R. C. T. Howe, L. Chen, Z. Zheng, T. Hasan

Abstract: We demonstrate a ytterbium (Yb) and an erbium (Er)-doped fiber laser Q-switched by a solution processed, optically uniform, few-layer tungsten disulfide saturable absorber (WS2-SA). Nonlinear optical absorption of the WS2-SA in the sub-bandgap region, attributed to the edge-induced states, is characterized by 3.1% and 4.9% modulation depths with 1.38 and 3.83 MW/cm2 saturation intensities at 1030… ▽ More We demonstrate a ytterbium (Yb) and an erbium (Er)-doped fiber laser Q-switched by a solution processed, optically uniform, few-layer tungsten disulfide saturable absorber (WS2-SA). Nonlinear optical absorption of the WS2-SA in the sub-bandgap region, attributed to the edge-induced states, is characterized by 3.1% and 4.9% modulation depths with 1.38 and 3.83 MW/cm2 saturation intensities at 1030 and 1558 nm, respectively. By integrating the optically uniform WS2-SA in the Yb- and Er-doped laser cavities, we obtain self-starting Q-switched pulses with microsecond duration and kilohertz repetition rates at 1030 and 1558 nm. Our work demonstrates broadband sub-bandgap saturable absorption of a single, solution processed WS2-SA, providing new potential efficacy for WS2 in ultrafast photonic applications. △ Less

Submitted 12 July, 2015; originally announced July 2015.

arXiv:1503.08003 [pdf, other]

doi 10.1364/OE.23.020051

Wideband saturable absorption in few-layer molybdenum diselenide (MoSe2) for Q-switching Yb-, Er- and Tm-doped fiber lasers

Authors: R. I. Woodward, R. C. T. Howe, T. H. Runcorn, G. Hu, F. Torrisi, E. J. R. Kelleher, T. Hasan

Abstract: We fabricate a free-standing molybdenum diselenide (MoSe2) saturable absorber by embedding liquid-phase exfoliated few-layer MoSe2 flakes into a polymer film. The MoSe2-polymer composite is used to Q-switch fiber lasers based on ytterbium (Yb), erbium (Er) and thulium (Tm) gain fiber, producing trains of microsecond-duration pulses with kilohertz repetition rates at 1060 nm, 1566 nm and 1924 nm, r… ▽ More We fabricate a free-standing molybdenum diselenide (MoSe2) saturable absorber by embedding liquid-phase exfoliated few-layer MoSe2 flakes into a polymer film. The MoSe2-polymer composite is used to Q-switch fiber lasers based on ytterbium (Yb), erbium (Er) and thulium (Tm) gain fiber, producing trains of microsecond-duration pulses with kilohertz repetition rates at 1060 nm, 1566 nm and 1924 nm, respectively. Such operating wavelengths correspond to sub-bandgap saturable absorption in MoSe2, which is explained in the context of edge-states, building upon studies of other semiconducting transition metal dichalcogenide (TMD)-based saturable absorbers. Our work adds few-layer MoSe2 to the growing catalog of TMDs with remarkable optical properties, which offer new opportunities for photonic devices. △ Less

Submitted 12 June, 2015; v1 submitted 27 March, 2015; originally announced March 2015.

Journal ref: Opt. Express 23, 20051 (2015)

arXiv:1411.5948 [pdf, other]

doi 10.1002/aenm.201401299

A High Power-Density Mediator-Free Microfluidic Biophotovoltaic Device for Cyanobacterial Cells

Authors: Paolo Bombelli, Thomas Müller, Therese W. Herling, Christopher J. Howe, Tuomas P. J. Knowles

Abstract: Biophotovoltaics has emerged as a promising technology for generating renewable energy since it relies on living organisms as inexpensive, self-repairing and readily available catalysts to produce electricity from an abundant resource - sunlight. The efficiency of biophotovoltaic cells, however, has remained significantly lower than that achievable through synthetic materials. Here, we devise a pl… ▽ More Biophotovoltaics has emerged as a promising technology for generating renewable energy since it relies on living organisms as inexpensive, self-repairing and readily available catalysts to produce electricity from an abundant resource - sunlight. The efficiency of biophotovoltaic cells, however, has remained significantly lower than that achievable through synthetic materials. Here, we devise a platform to harness the large power densities afforded by miniaturised geometries. To this effect, we have developed a soft-lithography approach for the fabrication of microfluidic biophotovoltaic devices that do not require membranes or mediators. Synechocystis sp. PCC 6803 cells were injected and allowed to settle on the anode, permitting the physical proximity between cells and electrode required for mediator-free operation. We demonstrate power densities of above 100 mW/m2 for a chlorophyll concentration of 100 μM under white light, a high value for biophotovoltaic devices without extrinsic supply of additional energy. △ Less

Submitted 21 November, 2014; originally announced November 2014.

Comments: 9 pages, 7 figures, including supporting material. appears in Advanced Energy Materials, in print, 2014

arXiv:1309.2975 [pdf, ps, other]

doi 10.1371/journal.pone.0101271

These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure

Authors: Qingpeng Zhang, Jason Pell, Rosangela Canino-Koning, Adina Chuang Howe, C. Titus Brown

Abstract: K-mer abundance analysis is widely used for many purposes in nucleotide sequence analysis, including data preprocessing for de novo assembly, repeat detection, and sequencing coverage estimation. We present the khmer software package for fast and memory efficient online counting of k-mers in sequencing data sets. Unlike previous methods based on data structures such as hash tables, suffix arrays,… ▽ More K-mer abundance analysis is widely used for many purposes in nucleotide sequence analysis, including data preprocessing for de novo assembly, repeat detection, and sequencing coverage estimation. We present the khmer software package for fast and memory efficient online counting of k-mers in sequencing data sets. Unlike previous methods based on data structures such as hash tables, suffix arrays, and trie structures, khmer relies entirely on a simple probabilistic data structure, a Count-Min Sketch. The Count-Min Sketch permits online updating and retrieval of k-mer counts in memory which is necessary to support online k-mer analysis algorithms. On sparse data sets this data structure is considerably more memory efficient than any exact data structure. In exchange, the use of a Count-Min Sketch introduces a systematic overcount for k-mers; moreover, only the counts, and not the k-mers, are stored. Here we analyze the speed, the memory usage, and the miscount rate of khmer for generating k-mer frequency distributions and retrieving k-mer counts for individual k-mers. We also compare the performance of khmer to several other k-mer counting packages, including Tallymer, Jellyfish, BFCounter, DSK, KMC, Turtle and KAnalyze. Finally, we examine the effectiveness of profiling sequencing error, k-mer abundance trimming, and digital normalization of reads in the context of high khmer false positive rates. khmer is implemented in C++ wrapped in a Python interface, offers a tested and robust API, and is freely available under the BSD license at github.com/ged-lab/khmer. △ Less

Submitted 14 July, 2014; v1 submitted 11 September, 2013; originally announced September 2013.

Journal ref: PLoS One. 2014 Jul 25;9(7):e101271

arXiv:1212.2832 [pdf, other]

Assembling large, complex environmental metagenomes

Authors: Adina Chuang Howe, Janet Jansson, Stephanie A. Malfatti, Susannah G. Tringe, James M. Tiedje, C. Titus Brown

Abstract: The large volumes of sequencing data required to sample complex environments deeply pose new challenges to sequence analysis approaches. De novo metagenomic assembly effectively reduces the total amount of data to be analyzed but requires significant computational resources. We apply two pre-assembly filtering approaches, digital normalization and partitioning, to make large metagenome assemblies… ▽ More The large volumes of sequencing data required to sample complex environments deeply pose new challenges to sequence analysis approaches. De novo metagenomic assembly effectively reduces the total amount of data to be analyzed but requires significant computational resources. We apply two pre-assembly filtering approaches, digital normalization and partitioning, to make large metagenome assemblies more comput\ ationaly tractable. Using a human gut mock community dataset, we demonstrate that these methods result in assemblies nearly identical to assemblies from unprocessed data. We then assemble two large soil metagenomes from matched Iowa corn and native prairie soils. The predicted functional content and phylogenetic origin of the assembled contigs indicate significant taxonomic differences despite similar function. The assembly strategies presented are generic and can be extended to any metagenome; full source code is freely available under a BSD license. △ Less

Submitted 28 December, 2012; v1 submitted 12 December, 2012; originally announced December 2012.

Comments: Includes supporting information

arXiv:1212.0159 [pdf, other]

Illumina Sequencing Artifacts Revealed by Connectivity Analysis of Metagenomic Datasets

Authors: Adina Chuang Howe, Jason Pell, Rosangela Canino-Koning, Rachel Mackelprang, Susannah Tringe, Janet Jansson, James M. Tiedje, C. Titus Brown

Abstract: Sequencing errors and biases in metagenomic datasets affect coverage-based assemblies and are often ignored during analysis. Here, we analyze read connectivity in metagenomes and identify the presence of problematic and likely a-biological connectivity within metagenome assembly graphs. Specifically, we identify highly connected sequences which join a large proportion of reads within each real met… ▽ More Sequencing errors and biases in metagenomic datasets affect coverage-based assemblies and are often ignored during analysis. Here, we analyze read connectivity in metagenomes and identify the presence of problematic and likely a-biological connectivity within metagenome assembly graphs. Specifically, we identify highly connected sequences which join a large proportion of reads within each real metagenome. These sequences show position-specific bias in shotgun reads, suggestive of sequencing artifacts, and are only minimally incorporated into contigs by assembly. The removal of these sequences prior to assembly results in similar assembly content for most metagenomes and enables the use of graph partitioning to decrease assembly memory and time requirements. △ Less

Submitted 1 December, 2012; originally announced December 2012.

Showing 1–17 of 17 results for author: Howe, C