-
A compact model of Escherichia coli core and biosynthetic metabolism
Authors:
Marco Corrao,
Hai He,
Wolfram Liebermeister,
Elad Noor
Abstract:
Metabolic models condense biochemical knowledge about organisms in a structured and standardised way. As large-scale network reconstructions are readily available for many organisms of interest, genome-scale models are being widely used among modellers and engineers. However, these large models can be difficult to analyse and visualise, and occasionally generate hard-to-interpret or even biologica…
▽ More
Metabolic models condense biochemical knowledge about organisms in a structured and standardised way. As large-scale network reconstructions are readily available for many organisms of interest, genome-scale models are being widely used among modellers and engineers. However, these large models can be difficult to analyse and visualise, and occasionally generate hard-to-interpret or even biologically unrealistic predictions. Out of the thousands of enzymatic reactions in a typical bacterial metabolism, only a few hundred comprise the metabolic pathways essential to produce energy carriers and biosynthetic precursors. These pathways carry relatively high flux, are central to maintaining and reproducing the cell, and provide precursors and energy to engineered metabolic pathways. Here, focusing on these central metabolic subsystems, we present a manually-curated medium-scale model of energy and biosynthesis metabolism for the well-studied prokaryote Escherichia coli K-12 MG1655. The model is a sub-network of the most recent genome-scale reconstruction, iML1515, and comes with an updated layer of database annotations, as well as a range of metabolic maps for visualisation. We enriched the stoichiometric network with extensive biological information and quantitative data, enhancing the scope and applicability of the model. In addition, here we assess the properties of this model in relation to its genome-scale parent and demonstrate the use of the network and supporting data in various scenarios, including enzyme-constrained flux balance analysis, elementary flux mode analysis, and thermodynamic analysis. Overall, we believe this model holds the potential to become a reference medium-scale metabolic model for E. coli.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
FGBERT: Function-Driven Pre-trained Gene Language Model for Metagenomics
Authors:
ChenRui Duan,
Zelin Zang,
Yongjie Xu,
Hang He,
Zihan Liu,
Zijia Song,
Ju-Sheng Zheng,
Stan Z. Li
Abstract:
Metagenomic data, comprising mixed multi-species genomes, are prevalent in diverse environments like oceans and soils, significantly impacting human health and ecological functions. However, current research relies on K-mer representations, limiting the capture of structurally relevant gene contexts. To address these limitations and further our understanding of complex relationships between metage…
▽ More
Metagenomic data, comprising mixed multi-species genomes, are prevalent in diverse environments like oceans and soils, significantly impacting human health and ecological functions. However, current research relies on K-mer representations, limiting the capture of structurally relevant gene contexts. To address these limitations and further our understanding of complex relationships between metagenomic sequences and their functions, we introduce a protein-based gene representation as a context-aware and structure-relevant tokenizer. Our approach includes Masked Gene Modeling (MGM) for gene group-level pre-training, providing insights into inter-gene contextual information, and Triple Enhanced Metagenomic Contrastive Learning (TEM-CL) for gene-level pre-training to model gene sequence-function relationships. MGM and TEM-CL constitute our novel metagenomic language model {\NAME}, pre-trained on 100 million metagenomic sequences. We demonstrate the superiority of our proposed {\NAME} on eight datasets.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
Auditory Attention Decoding with Task-Related Multi-View Contrastive Learning
Authors:
Xiaoyu Chen,
Changde Du,
Qiongyi Zhou,
Huiguang He
Abstract:
The human brain can easily focus on one speaker and suppress others in scenarios such as a cocktail party. Recently, researchers found that auditory attention can be decoded from the electroencephalogram (EEG) data. However, most existing deep learning methods are difficult to use prior knowledge of different views (that is attended speech and EEG are task-related views) and extract an unsatisfact…
▽ More
The human brain can easily focus on one speaker and suppress others in scenarios such as a cocktail party. Recently, researchers found that auditory attention can be decoded from the electroencephalogram (EEG) data. However, most existing deep learning methods are difficult to use prior knowledge of different views (that is attended speech and EEG are task-related views) and extract an unsatisfactory representation. Inspired by Broadbent's filter model, we decode auditory attention in a multi-view paradigm and extract the most relevant and important information utilizing the missing view. Specifically, we propose an auditory attention decoding (AAD) method based on multi-view VAE with task-related multi-view contrastive (TMC) learning. Employing TMC learning in multi-view VAE can utilize the missing view to accumulate prior knowledge of different views into the fusion of representation, and extract the approximate task-related representation. We examine our method on two popular AAD datasets, and demonstrate the superiority of our method by comparing it to the state-of-the-art method.
△ Less
Submitted 8 August, 2023;
originally announced August 2023.
-
SuperMask: Generating High-resolution object masks from multi-view, unaligned low-resolution MRIs
Authors:
Hanxue Gu,
Hongyu He,
Roy Colglazier,
Jordan Axelrod,
Robert French,
Maciej A Mazurowski
Abstract:
Three-dimensional segmentation in magnetic resonance images (MRI), which reflects the true shape of the objects, is challenging since high-resolution isotropic MRIs are rare and typical MRIs are anisotropic, with the out-of-plane dimension having a much lower resolution. A potential remedy to this issue lies in the fact that often multiple sequences are acquired on different planes. However, in pr…
▽ More
Three-dimensional segmentation in magnetic resonance images (MRI), which reflects the true shape of the objects, is challenging since high-resolution isotropic MRIs are rare and typical MRIs are anisotropic, with the out-of-plane dimension having a much lower resolution. A potential remedy to this issue lies in the fact that often multiple sequences are acquired on different planes. However, in practice, these sequences are not orthogonal to each other, limiting the applicability of many previous solutions to reconstruct higher-resolution images from multiple lower-resolution ones. We propose a weakly-supervised deep learning-based solution to generating high-resolution masks from multiple low-resolution images. Our method combines segmentation and unsupervised registration networks by introducing two new regularizations to make registration and segmentation reinforce each other. Finally, we introduce a multi-view fusion method to generate high-resolution target object masks. The experimental results on two datasets show the superiority of our methods. Importantly, the advantage of not using high-resolution images in the training process makes our method applicable to a wide variety of MRI segmentation tasks.
△ Less
Submitted 13 March, 2023;
originally announced March 2023.
-
Extrapolative Controlled Sequence Generation via Iterative Refinement
Authors:
Vishakh Padmakumar,
Richard Yuanzhe Pang,
He He,
Ankur P. Parikh
Abstract:
We study the problem of extrapolative controlled generation, i.e., generating sequences with attribute values beyond the range seen in training. This task is of significant importance in automated design, especially drug discovery, where the goal is to design novel proteins that are \textit{better} (e.g., more stable) than existing sequences. Thus, by definition, the target sequences and their att…
▽ More
We study the problem of extrapolative controlled generation, i.e., generating sequences with attribute values beyond the range seen in training. This task is of significant importance in automated design, especially drug discovery, where the goal is to design novel proteins that are \textit{better} (e.g., more stable) than existing sequences. Thus, by definition, the target sequences and their attribute values are out of the training distribution, posing challenges to existing methods that aim to directly generate the target sequence. Instead, in this work, we propose Iterative Controlled Extrapolation (ICE) which iteratively makes local edits to a sequence to enable extrapolation. We train the model on synthetically generated sequence pairs that demonstrate small improvement in the attribute value. Results on one natural language task (sentiment analysis) and two protein engineering tasks (ACE2 stability and AAV fitness) show that ICE considerably outperforms state-of-the-art approaches despite its simplicity. Our code and models are available at: https://github.com/vishakhpk/iter-extrapolation.
△ Less
Submitted 7 June, 2023; v1 submitted 8 March, 2023;
originally announced March 2023.
-
PGCN: Pyramidal Graph Convolutional Network for EEG Emotion Recognition
Authors:
Ming **,
Enwei Zhu,
Changde Du,
Huiguang He,
**peng Li
Abstract:
Emotion recognition is essential in the diagnosis and rehabilitation of various mental diseases. In the last decade, electroencephalogram (EEG)-based emotion recognition has been intensively investigated due to its prominative accuracy and reliability, and graph convolutional network (GCN) has become a mainstream model to decode emotions from EEG signals. However, the electrode relationship, espec…
▽ More
Emotion recognition is essential in the diagnosis and rehabilitation of various mental diseases. In the last decade, electroencephalogram (EEG)-based emotion recognition has been intensively investigated due to its prominative accuracy and reliability, and graph convolutional network (GCN) has become a mainstream model to decode emotions from EEG signals. However, the electrode relationship, especially long-range electrode dependencies across the scalp, may be underutilized by GCNs, although such relationships have been proven to be important in emotion recognition. The small receptive field makes shallow GCNs only aggregate local nodes. On the other hand, stacking too many layers leads to over-smoothing. To solve these problems, we propose the pyramidal graph convolutional network (PGCN), which aggregates features at three levels: local, mesoscopic, and global. First, we construct a vanilla GCN based on the 3D topological relationships of electrodes, which is used to integrate two-order local features; Second, we construct several mesoscopic brain regions based on priori knowledge and employ mesoscopic attention to sequentially calculate the virtual mesoscopic centers to focus on the functional connections of mesoscopic brain regions; Finally, we fuse the node features and their 3D positions to construct a numerical relationship adjacency matrix to integrate structural and functional connections from the global perspective. Experimental results on three public datasets indicate that PGCN enhances the relationship modelling across the scalp and achieves state-of-the-art performance in both subject-dependent and subject-independent scenarios. Meanwhile, PGCN makes an effective trade-off between enhancing network depth and receptive fields while suppressing the ensuing over-smoothing. Our codes are publicly accessible at https://github.com/**minbox/PGCN.
△ Less
Submitted 5 February, 2023;
originally announced February 2023.
-
Multi-view Multi-label Fine-grained Emotion Decoding from Human Brain Activity
Authors:
Kaicheng Fu,
Changde Du,
Shengpei Wang,
Huiguang He
Abstract:
Decoding emotional states from human brain activity plays an important role in brain-computer interfaces. Existing emotion decoding methods still have two main limitations: one is only decoding a single emotion category from a brain activity pattern and the decoded emotion categories are coarse-grained, which is inconsistent with the complex emotional expression of human; the other is ignoring the…
▽ More
Decoding emotional states from human brain activity plays an important role in brain-computer interfaces. Existing emotion decoding methods still have two main limitations: one is only decoding a single emotion category from a brain activity pattern and the decoded emotion categories are coarse-grained, which is inconsistent with the complex emotional expression of human; the other is ignoring the discrepancy of emotion expression between the left and right hemispheres of human brain. In this paper, we propose a novel multi-view multi-label hybrid model for fine-grained emotion decoding (up to 80 emotion categories) which can learn the expressive neural representations and predicting multiple emotional states simultaneously. Specifically, the generative component of our hybrid model is parametrized by a multi-view variational auto-encoder, in which we regard the brain activity of left and right hemispheres and their difference as three distinct views, and use the product of expert mechanism in its inference network. The discriminative component of our hybrid model is implemented by a multi-label classification network with an asymmetric focal loss. For more accurate emotion decoding, we first adopt a label-aware module for emotion-specific neural representations learning and then model the dependency of emotional states by a masked self-attention mechanism. Extensive experiments on two visually evoked emotional datasets show the superiority of our method.
△ Less
Submitted 26 October, 2022;
originally announced November 2022.
-
Modeling Diverse Chemical Reactions for Single-step Retrosynthesis via Discrete Latent Variables
Authors:
Huarui He,
Jie Wang,
Yunfei Liu,
Feng Wu
Abstract:
Single-step retrosynthesis is the cornerstone of retrosynthesis planning, which is a crucial task for computer-aided drug discovery. The goal of single-step retrosynthesis is to identify the possible reactants that lead to the synthesis of the target product in one reaction. By representing organic molecules as canonical strings, existing sequence-based retrosynthetic methods treat the product-to-…
▽ More
Single-step retrosynthesis is the cornerstone of retrosynthesis planning, which is a crucial task for computer-aided drug discovery. The goal of single-step retrosynthesis is to identify the possible reactants that lead to the synthesis of the target product in one reaction. By representing organic molecules as canonical strings, existing sequence-based retrosynthetic methods treat the product-to-reactant retrosynthesis as a sequence-to-sequence translation problem. However, most of them struggle to identify diverse chemical reactions for a desired product due to the deterministic inference, which contradicts the fact that many compounds can be synthesized through various reaction types with different sets of reactants. In this work, we aim to increase reaction diversity and generate various reactants using discrete latent variables. We propose a novel sequence-based approach, namely RetroDVCAE, which incorporates conditional variational autoencoders into single-step retrosynthesis and associates discrete latent variables with the generation process. Specifically, RetroDVCAE uses the Gumbel-Softmax distribution to approximate the categorical distribution over potential reactions and generates multiple sets of reactants with the variational decoder. Experiments demonstrate that RetroDVCAE outperforms state-of-the-art baselines on both benchmark dataset and homemade dataset. Both quantitative and qualitative results show that RetroDVCAE can model the multi-modal distribution over reaction types and produce diverse reactant candidates.
△ Less
Submitted 10 August, 2022;
originally announced August 2022.
-
A systematic evaluation of methods for cell phenotype classification using single-cell RNA sequencing data
Authors:
Xiaowen Cao,
Li Xing,
Elham Majd,
Hua He,
Junhua Gu,
Xuekui Zhang
Abstract:
Background: Single-cell RNA sequencing (scRNA-seq) yields valuable insights about gene expression and gives critical information about complex tissue cellular composition. In the analysis of single-cell RNA sequencing, the annotations of cell subtypes are often done manually, which is time-consuming and irreproducible. Garnett is a cell-type annotation software based the on elastic net method. Bes…
▽ More
Background: Single-cell RNA sequencing (scRNA-seq) yields valuable insights about gene expression and gives critical information about complex tissue cellular composition. In the analysis of single-cell RNA sequencing, the annotations of cell subtypes are often done manually, which is time-consuming and irreproducible. Garnett is a cell-type annotation software based the on elastic net method. Besides cell-type annotation, supervised machine learning methods can also be applied to predict other cell phenotypes from genomic data. Despite the popularity of such applications, there is no existing study to systematically investigate the performance of those supervised algorithms in various sizes of scRNA-seq data sets.
Methods and Results: This study evaluates 13 popular supervised machine learning algorithms to classify cell phenotypes, using published real and simulated data sets with diverse cell sizes. The benchmark contained two parts. In the first part, we used real data sets to assess the popular supervised algorithms' computing speed and cell phenotype classification performance. The classification performances were evaluated using AUC statistics, F1-score, precision, recall, and false-positive rate. In the second part, we evaluated gene selection performance using published simulated data sets with a known list of real genes.
Conclusion: The study outcomes showed that ElasticNet with interactions performed best in small and medium data sets. NB was another appropriate method for medium data sets. In large data sets, XGB works excellent. Ensemble algorithms were not significantly superior to individual machine learning methods. Adding interactions to ElasticNet can help, and the improvement was significant in small data sets.
△ Less
Submitted 1 October, 2021;
originally announced October 2021.
-
OpenHI2 -- Open source histopathological image platform
Authors:
Pargorn Puttapirat,
Haichuan Zhang,
**gyi Deng,
Yuxin Dong,
Jiangbo Shi,
Hongyu He,
Zeyu Gao,
Chunbao Wang,
Xiangrong Zhang,
Chen Li
Abstract:
Transition from conventional to digital pathology requires a new category of biomedical informatic infrastructure which could facilitate delicate pathological routine. Pathological diagnoses are sensitive to many external factors and is known to be subjective. Only systems that can meet strict requirements in pathology would be able to run along pathological routines and eventually digitized the s…
▽ More
Transition from conventional to digital pathology requires a new category of biomedical informatic infrastructure which could facilitate delicate pathological routine. Pathological diagnoses are sensitive to many external factors and is known to be subjective. Only systems that can meet strict requirements in pathology would be able to run along pathological routines and eventually digitized the study area, and the developed platform should comply with existing pathological routines and international standards. Currently, there are a number of available software tools which can perform histopathological tasks including virtual slide viewing, annotating, and basic image analysis, however, none of them can serve as a digital platform for pathology. Here we describe OpenHI2, an enhanced version Open Histopathological Image platform which is capable of supporting all basic pathological tasks and file formats; ready to be deployed in medical institutions on a standard server environment or cloud computing infrastructure. In this paper, we also describe the development decisions for the platform and propose solutions to overcome technical challenges so that OpenHI2 could be used as a platform for histopathological images. Further addition can be made to the platform since each component is modularized and fully documented. OpenHI2 is free, open-source, and available at https://gitlab.com/BioAI/OpenHI.
△ Less
Submitted 15 January, 2020;
originally announced January 2020.
-
Multi-View Broad Learning System for Primate Oculomotor Decision Decoding
Authors:
Zhenhua Shi,
Xiaomo Chen,
Changming Zhao,
He He,
Veit Stuphorn,
Dongrui Wu
Abstract:
Multi-view learning improves the learning performance by utilizing multi-view data: data collected from multiple sources, or feature sets extracted from the same data source. This approach is suitable for primate brain state decoding using cortical neural signals. This is because the complementary components of simultaneously recorded neural signals, local field potentials (LFPs) and action potent…
▽ More
Multi-view learning improves the learning performance by utilizing multi-view data: data collected from multiple sources, or feature sets extracted from the same data source. This approach is suitable for primate brain state decoding using cortical neural signals. This is because the complementary components of simultaneously recorded neural signals, local field potentials (LFPs) and action potentials (spikes), can be treated as two views. In this paper, we extended broad learning system (BLS), a recently proposed wide neural network architecture, from single-view learning to multi-view learning, and validated its performance in decoding monkeys' oculomotor decision from medial frontal LFPs and spikes. We demonstrated that medial frontal LFPs and spikes in non-human primate do contain complementary information about the oculomotor decision, and that the proposed multi-view BLS is a more effective approach for decoding the oculomotor decision than several classical and state-of-the-art single-view and multi-view learning approaches.
△ Less
Submitted 2 July, 2020; v1 submitted 16 August, 2019;
originally announced August 2019.
-
Transfer Learning for Brain-Computer Interfaces: A Euclidean Space Data Alignment Approach
Authors:
He He,
Dongrui Wu
Abstract:
Objective: This paper targets a major challenge in develo** practical EEG-based brain-computer interfaces (BCIs): how to cope with individual differences so that better learning performance can be obtained for a new subject, with minimum or even no subject-specific data? Methods: We propose a novel approach to align EEG trials from different subjects in the Euclidean space to make them more simi…
▽ More
Objective: This paper targets a major challenge in develo** practical EEG-based brain-computer interfaces (BCIs): how to cope with individual differences so that better learning performance can be obtained for a new subject, with minimum or even no subject-specific data? Methods: We propose a novel approach to align EEG trials from different subjects in the Euclidean space to make them more similar, and hence improve the learning performance for a new subject. Our approach has three desirable properties: 1) it aligns the EEG trials directly in the Euclidean space, and any signal processing, feature extraction and machine learning algorithms can then be applied to the aligned trials; 2) its computational cost is very low; and, 3) it is unsupervised and does not need any label information from the new subject. Results: Both offline and simulated online experiments on motor imagery classification and event-related potential classification verified that our proposed approach outperformed a state-of-the-art Riemannian space data alignment approach, and several approaches without data alignment. Conclusion: The proposed Euclidean space EEG data alignment approach can greatly facilitate transfer learning in BCIs. Significance: Our proposed approach is effective, efficient, and easy to implement. It could be an essential pre-processing step for EEG-based BCIs.
△ Less
Submitted 2 April, 2019; v1 submitted 8 August, 2018;
originally announced August 2018.
-
Map-based cloning of the gene Pm21 that confers broad spectrum resistance to wheat powdery mildew
Authors:
Huagang He,
Shanying Zhu,
Yaoyong Ji,
Zhengning Jiang,
Renhui Zhao,
Tongde Bie
Abstract:
Common wheat (Triticum aestivum L.) is one of the most important cereal crops. Wheat powdery mildew caused by Blumeria graminis f. sp. tritici (Bgt) is a continuing threat to wheat production. The Pm21 gene, originating from Dasypyrum villosum, confers high resistance to all known Bgt races and has been widely applied in wheat breeding in China. In this research, we identify Pm21 as a typical coil…
▽ More
Common wheat (Triticum aestivum L.) is one of the most important cereal crops. Wheat powdery mildew caused by Blumeria graminis f. sp. tritici (Bgt) is a continuing threat to wheat production. The Pm21 gene, originating from Dasypyrum villosum, confers high resistance to all known Bgt races and has been widely applied in wheat breeding in China. In this research, we identify Pm21 as a typical coiled-coil, nucleotide-binding site, leucine-rich repeat gene by an integrated strategy of resistance gene analog (RGA)-based cloning via comparative genomics, physical and genetic map**, BSMV-induced gene silencing (BSMV-VIGS), large-scale mutagenesis and genetic transformation.
△ Less
Submitted 17 August, 2017;
originally announced August 2017.
-
Sharing deep generative representation for perceived image reconstruction from human brain activity
Authors:
Changde Du,
Changying Du,
Huiguang He
Abstract:
Decoding human brain activities via functional magnetic resonance imaging (fMRI) has gained increasing attention in recent years. While encouraging results have been reported in brain states classification tasks, reconstructing the details of human visual experience still remains difficult. Two main challenges that hinder the development of effective models are the perplexing fMRI measurement nois…
▽ More
Decoding human brain activities via functional magnetic resonance imaging (fMRI) has gained increasing attention in recent years. While encouraging results have been reported in brain states classification tasks, reconstructing the details of human visual experience still remains difficult. Two main challenges that hinder the development of effective models are the perplexing fMRI measurement noise and the high dimensionality of limited data instances. Existing methods generally suffer from one or both of these issues and yield dissatisfactory results. In this paper, we tackle this problem by casting the reconstruction of visual stimulus as the Bayesian inference of missing view in a multiview latent variable model. Sharing a common latent representation, our joint generative model of external stimulus and brain response is not only "deep" in extracting nonlinear features from visual images, but also powerful in capturing correlations among voxel activities of fMRI recordings. The nonlinearity and deep structure endow our model with strong representation ability, while the correlations of voxel activities are critical for suppressing noise and improving prediction. We devise an efficient variational Bayesian method to infer the latent variables and the model parameters. To further improve the reconstruction accuracy, the latent representations of testing instances are enforced to be close to that of their neighbours from the training set via posterior regularization. Experiments on three fMRI recording datasets demonstrate that our approach can more accurately reconstruct visual stimuli.
△ Less
Submitted 10 July, 2017; v1 submitted 25 April, 2017;
originally announced April 2017.
-
Photostimulation activates restorable fragmentation of single mitochondrion by initiating oxide flashes
Authors:
Yintao Wang,
Hao He,
Shaoyang Wang,
Yaohui Liu,
Minglie Hu,
Youjia Cao,
Chingyue Wang
Abstract:
Mitochondrial research is important to ageing, apoptosis, and mitochondrial diseases. In previous works, mitochondria are usually stimulated indirectly by proapoptotic drugs to study mitochondrial development, which is in lack of controllability, or spatial and temporal resolution. These chemicals or even gene techniques regulating mitochondrial dynamics may also activate other inter- or intra-cel…
▽ More
Mitochondrial research is important to ageing, apoptosis, and mitochondrial diseases. In previous works, mitochondria are usually stimulated indirectly by proapoptotic drugs to study mitochondrial development, which is in lack of controllability, or spatial and temporal resolution. These chemicals or even gene techniques regulating mitochondrial dynamics may also activate other inter- or intra-cellular processes simultaneously. Here we demonstrate a photostimulation method on single-mitochondrion level by tightly-focused femtosecond laser that can precisely activate restorable fragmentation of mitochondria which soon recover their original tubular structure after tens of seconds. In this process, series of mitochondrial reactive oxygen species (mROS) flashes are observed and found very critical to mitochondrial fragmentation. Meanwhile, transient openings of mitochondrial permeability transition pores (mPTP), suggested by oscillations of mitochondrial membrane potential, contribute to the scavenging of redundant mROS and recovery of fragmented mitochondria. Those results demonstrate photostimulation as an active, precise and controllable method for the study of mitochondrial oxidative and morphological dynamics or related fields.
△ Less
Submitted 17 February, 2015; v1 submitted 16 February, 2015;
originally announced February 2015.
-
Functionalized nanopore-embedded electrodes for rapid DNA sequencing
Authors:
Haiying He,
Ralph H. Scheicher,
Ravindra Pandey,
Alexandre Reily Rocha,
Stefano Sanvito,
Anton Grigoriev,
Rajeev Ahuja,
Shashi P. Karna
Abstract:
The determination of a patient's DNA sequence can, in principle, reveal an increased risk to fall ill with particular diseases [1,2] and help to design "personalized medicine" [3]. Moreover, statistical studies and comparison of genomes [4] of a large number of individuals are crucial for the analysis of mutations [5] and hereditary diseases, paving the way to preventive medicine [6]. DNA sequen…
▽ More
The determination of a patient's DNA sequence can, in principle, reveal an increased risk to fall ill with particular diseases [1,2] and help to design "personalized medicine" [3]. Moreover, statistical studies and comparison of genomes [4] of a large number of individuals are crucial for the analysis of mutations [5] and hereditary diseases, paving the way to preventive medicine [6]. DNA sequencing is, however, currently still a vastly time-consuming and very expensive task [4], consisting of pre-processing steps, the actual sequencing using the Sanger method, and post-processing in the form of data analysis [7]. Here we propose a new approach that relies on functionalized nanopore-embedded electrodes to achieve an unambiguous distinction of the four nucleic acid bases in the DNA sequencing process. This represents a significant improvement over previously studied designs [8,9] which cannot reliably distinguish all four bases of DNA. The transport properties of the setup investigated by us, employing state-of-the-art density functional theory together with the non-equilibrium Green's Function method, leads to current responses that differ by at least one order of magnitude for different bases and can thus provide a much more robust read-out of the base sequence. The implementation of our proposed setup could thus lead to a viable protocol for rapid DNA sequencing with significant consequences for the future of genome related research in particular and health care in general.
△ Less
Submitted 29 August, 2007;
originally announced August 2007.