-
Evaluating representation learning on the protein structure universe
Authors:
Arian R. Jamasb,
Alex Morehead,
Chaitanya K. Joshi,
Zuobai Zhang,
Kieran Didi,
Simon V. Mathis,
Charles Harris,
Jian Tang,
Jianlin Cheng,
Pietro Lio,
Tom L. Blundell
Abstract:
We introduce ProteinWorkshop, a comprehensive benchmark suite for representation learning on protein structures with Geometric Graph Neural Networks. We consider large-scale pre-training and downstream tasks on both experimental and predicted structures to enable the systematic evaluation of the quality of the learned structural representation and their usefulness in capturing functional relations…
▽ More
We introduce ProteinWorkshop, a comprehensive benchmark suite for representation learning on protein structures with Geometric Graph Neural Networks. We consider large-scale pre-training and downstream tasks on both experimental and predicted structures to enable the systematic evaluation of the quality of the learned structural representation and their usefulness in capturing functional relationships for downstream tasks. We find that: (1) large-scale pretraining on AlphaFold structures and auxiliary tasks consistently improve the performance of both rotation-invariant and equivariant GNNs, and (2) more expressive equivariant GNNs benefit from pretraining to a greater extent compared to invariant models. We aim to establish a common ground for the machine learning and computational biology communities to rigorously compare and advance protein structure representation learning. Our open-source codebase reduces the barrier to entry for working with large protein structure datasets by providing: (1) storage-efficient dataloaders for large-scale structural databases including AlphaFoldDB and ESM Atlas, as well as (2) utilities for constructing new tasks from the entire PDB. ProteinWorkshop is available at: github.com/a-r-j/ProteinWorkshop.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design
Authors:
Rishabh Anand,
Chaitanya K. Joshi,
Alex Morehead,
Arian R. Jamasb,
Charles Harris,
Simon V. Mathis,
Kieran Didi,
Bryan Hooi,
Pietro Liò
Abstract:
We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design. We build upon SE(3) flow matching for protein backbone generation and establish protocols for data preparation and evaluation to address unique challenges posed by RNA modeling. We formulate RNA structures as a set of rigid-body frames and associated loss functions which account for larger, more conformationally fle…
▽ More
We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design. We build upon SE(3) flow matching for protein backbone generation and establish protocols for data preparation and evaluation to address unique challenges posed by RNA modeling. We formulate RNA structures as a set of rigid-body frames and associated loss functions which account for larger, more conformationally flexible RNA backbones (13 atoms per nucleotide) vs. proteins (4 atoms per residue). Toward tackling the lack of diversity in 3D RNA datasets, we explore training with structural clustering and crop** augmentations. Additionally, we define a suite of evaluation metrics to measure whether the generated RNA structures are globally self-consistent (via inverse folding followed by forward folding) and locally recover RNA-specific structural descriptors. The most performant version of RNA-FrameFlow generates locally realistic RNA backbones of 40-150 nucleotides, over 40% of which pass our validity criteria as measured by a self-consistency TM-score >= 0.45, at which two RNAs have the same global fold. Open-source code: https://github.com/rish-16/rna-backbone-design
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Deep Learning for Protein-Ligand Docking: Are We There Yet?
Authors:
Alex Morehead,
Nabin Giri,
Jian Liu,
Jianlin Cheng
Abstract:
The effects of ligand binding on protein structures and their in vivo functions carry numerous implications for modern biomedical research and biotechnology development efforts such as drug discovery. Although several deep learning (DL) methods and benchmarks designed for protein-ligand docking have recently been introduced, to date no prior works have systematically studied the behavior of dockin…
▽ More
The effects of ligand binding on protein structures and their in vivo functions carry numerous implications for modern biomedical research and biotechnology development efforts such as drug discovery. Although several deep learning (DL) methods and benchmarks designed for protein-ligand docking have recently been introduced, to date no prior works have systematically studied the behavior of docking methods within the practical context of (1) using predicted (apo) protein structures for docking (e.g., for broad applicability); (2) docking multiple ligands concurrently to a given target protein (e.g., for enzyme design); and (3) having no prior knowledge of binding pockets (e.g., for pocket generalization). To enable a deeper understanding of docking methods' real-world utility, we introduce PoseBench, the first comprehensive benchmark for practical protein-ligand docking. PoseBench enables researchers to rigorously and systematically evaluate DL docking methods for apo-to-holo protein-ligand docking and protein-ligand structure generation using both single and multi-ligand benchmark datasets, the latter of which we introduce for the first time to the DL community. Empirically, using PoseBench, we find that all recent DL docking methods but one fail to generalize to multi-ligand protein targets and also that template-based docking algorithms perform equally well or better for multi-ligand docking as recent single-ligand DL docking methods, suggesting areas of improvement for future work. Code, data, tutorials, and benchmark results are available at https://github.com/BioinfoMachineLearning/PoseBench.
△ Less
Submitted 7 July, 2024; v1 submitted 22 May, 2024;
originally announced May 2024.
-
Towards Joint Sequence-Structure Generation of Nucleic Acid and Protein Complexes with SE(3)-Discrete Diffusion
Authors:
Alex Morehead,
Jeffrey Ruffolo,
Aadyot Bhatnagar,
Ali Madani
Abstract:
Generative models of macromolecules carry abundant and impactful implications for industrial and biomedical efforts in protein engineering. However, existing methods are currently limited to modeling protein structures or sequences, independently or jointly, without regard to the interactions that commonly occur between proteins and other macromolecules. In this work, we introduce MMDiff, a genera…
▽ More
Generative models of macromolecules carry abundant and impactful implications for industrial and biomedical efforts in protein engineering. However, existing methods are currently limited to modeling protein structures or sequences, independently or jointly, without regard to the interactions that commonly occur between proteins and other macromolecules. In this work, we introduce MMDiff, a generative model that jointly designs sequences and structures of nucleic acid and protein complexes, independently or in complex, using joint SE(3)-discrete diffusion noise. Such a model has important implications for emerging areas of macromolecular design including structure-based transcription factor design and design of noncoding RNA sequences. We demonstrate the utility of MMDiff through a rigorous new design benchmark for macromolecular complex generation that we introduce in this work. Our results demonstrate that MMDiff is able to successfully generate micro-RNA and single-stranded DNA molecules while being modestly capable of joint modeling DNA and RNA molecules in interaction with multi-chain protein complexes. Source code: https://github.com/Profluent-Internships/MMDiff.
△ Less
Submitted 21 December, 2023;
originally announced January 2024.
-
gRNAde: Geometric Deep Learning for 3D RNA inverse design
Authors:
Chaitanya K. Joshi,
Arian R. Jamasb,
Ramon Viñas,
Charles Harris,
Simon V. Mathis,
Alex Morehead,
Rishabh Anand,
Pietro Liò
Abstract:
Computational RNA design tasks are often posed as inverse problems, where sequences are designed based on adopting a single desired secondary structure without considering 3D geometry and conformational diversity. We introduce gRNAde, a geometric RNA design pipeline operating on 3D RNA backbones to design sequences that explicitly account for structure and dynamics. Under the hood, gRNAde is a mul…
▽ More
Computational RNA design tasks are often posed as inverse problems, where sequences are designed based on adopting a single desired secondary structure without considering 3D geometry and conformational diversity. We introduce gRNAde, a geometric RNA design pipeline operating on 3D RNA backbones to design sequences that explicitly account for structure and dynamics. Under the hood, gRNAde is a multi-state Graph Neural Network that generates candidate RNA sequences conditioned on one or more 3D backbone structures where the identities of the bases are unknown. On a single-state fixed backbone re-design benchmark of 14 RNA structures from the PDB identified by Das et al. [2010], gRNAde obtains higher native sequence recovery rates (56% on average) compared to Rosetta (45% on average), taking under a second to produce designs compared to the reported hours for Rosetta. We further demonstrate the utility of gRNAde on a new benchmark of multi-state design for structurally flexible RNAs, as well as zero-shot ranking of mutational fitness landscapes in a retrospective analysis of a recent RNA polymerase ribozyme structure. Open source code: https://github.com/chaitjo/geometric-rna-design
△ Less
Submitted 25 May, 2024; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Geometry-Complete Diffusion for 3D Molecule Generation and Optimization
Authors:
Alex Morehead,
Jianlin Cheng
Abstract:
Denoising diffusion probabilistic models (DDPMs) have pioneered new state-of-the-art results in disciplines such as computer vision and computational biology for diverse tasks ranging from text-guided image generation to structure-guided protein design. Along this latter line of research, methods have recently been proposed for generating 3D molecules using equivariant graph neural networks (GNNs)…
▽ More
Denoising diffusion probabilistic models (DDPMs) have pioneered new state-of-the-art results in disciplines such as computer vision and computational biology for diverse tasks ranging from text-guided image generation to structure-guided protein design. Along this latter line of research, methods have recently been proposed for generating 3D molecules using equivariant graph neural networks (GNNs) within a DDPM framework. However, such methods are unable to learn important geometric properties of 3D molecules, as they adopt molecule-agnostic and non-geometric GNNs as their 3D graph denoising networks, which notably hinders their ability to generate valid large 3D molecules. In this work, we address these gaps by introducing the Geometry-Complete Diffusion Model (GCDM) for 3D molecule generation, which outperforms existing 3D molecular diffusion models by significant margins across conditional and unconditional settings for the QM9 dataset and the larger GEOM-Drugs dataset, respectively, and generates more novel and unique unconditional 3D molecules for the QM9 dataset compared to previous methods. Importantly, we demonstrate that the geometry-complete denoising process of GCDM learned for 3D molecule generation enables the model to generate a significant proportion of valid and energetically-stable large molecules at the scale of GEOM-Drugs, whereas previous methods fail to do so with the features they learn. Additionally, we show that extensions of GCDM can not only effectively design 3D molecules for specific protein pockets but also that GCDM's geometric features can be repurposed to consistently optimize the geometry and chemical composition of existing 3D molecules for molecular stability and property specificity, demonstrating new versatility of molecular diffusion models. Our source code and data are freely available at https://github.com/BioinfoMachineLearning/Bio-Diffusion.
△ Less
Submitted 24 May, 2024; v1 submitted 8 February, 2023;
originally announced February 2023.
-
Geometry-Complete Perceptron Networks for 3D Molecular Graphs
Authors:
Alex Morehead,
Jianlin Cheng
Abstract:
The field of geometric deep learning has had a profound impact on the development of innovative and powerful graph neural network architectures. Disciplines such as computer vision and computational biology have benefited significantly from such methodological advances, which has led to breakthroughs in scientific domains such as protein structure prediction and design. In this work, we introduce…
▽ More
The field of geometric deep learning has had a profound impact on the development of innovative and powerful graph neural network architectures. Disciplines such as computer vision and computational biology have benefited significantly from such methodological advances, which has led to breakthroughs in scientific domains such as protein structure prediction and design. In this work, we introduce GCPNet, a new geometry-complete, SE(3)-equivariant graph neural network designed for 3D molecular graph representation learning. Rigorous experiments across four distinct geometric tasks demonstrate that GCPNet's predictions (1) for protein-ligand binding affinity achieve a statistically significant correlation of 0.608, more than 5% greater than current state-of-the-art methods; (2) for protein structure ranking achieve statistically significant target-local and dataset-global correlations of 0.616 and 0.871, respectively; (3) for Newtownian many-body systems modeling achieve a task-averaged mean squared error less than 0.01, more than 15% better than current methods; and (4) for molecular chirality recognition achieve a state-of-the-art prediction accuracy of 98.7%, better than any other machine learning method to date. The source code, data, and instructions to train new models or reproduce our results are freely available at https://github.com/BioinfoMachineLearning/GCPNet.
△ Less
Submitted 26 April, 2023; v1 submitted 4 November, 2022;
originally announced November 2022.
-
DRLComplex: Reconstruction of protein quaternary structures using deep reinforcement learning
Authors:
Elham Soltanikazemi,
Raj S. Roy,
Farhan Quadir,
Nabin Giri,
Alex Morehead,
Jianlin Cheng
Abstract:
Predicted inter-chain residue-residue contacts can be used to build the quaternary structure of protein complexes from scratch. However, only a small number of methods have been developed to reconstruct protein quaternary structures using predicted inter-chain contacts. Here, we present an agent-based self-learning method based on deep reinforcement learning (DRLComplex) to build protein complex s…
▽ More
Predicted inter-chain residue-residue contacts can be used to build the quaternary structure of protein complexes from scratch. However, only a small number of methods have been developed to reconstruct protein quaternary structures using predicted inter-chain contacts. Here, we present an agent-based self-learning method based on deep reinforcement learning (DRLComplex) to build protein complex structures using inter-chain contacts as distance constraints. We rigorously tested DRLComplex on two standard datasets of homodimeric and heterodimeric protein complexes (i.e., the CASP-CAPRI homodimer and Std_32 heterodimer datasets) using both true and predicted interchain contacts as inputs. Utilizing true contacts as input, DRLComplex achieved high average TM-scores of 0.9895 and 0.9881 and a low average interface RMSD (I_RMSD) of 0.2197 and 0.92 on the two datasets, respectively. When predicted contacts are used, the method achieves TM-scores of 0.73 and 0.76 for homodimers and heterodimers, respectively. Our experiments find that the accuracy of reconstructed quaternary structures depends on the accuracy of the contact predictions. Compared to other optimization methods for reconstructing quaternary structures from inter-chain contacts, DRLComplex performs similar to an advanced gradient descent method and better than a Markov Chain Monte Carlo simulation method and a simulated annealing-based method, validating the effectiveness of DRLComplex for quaternary reconstruction of protein complexes.
△ Less
Submitted 26 May, 2022;
originally announced May 2022.
-
DProQ: A Gated-Graph Transformer for Protein Complex Structure Assessment
Authors:
Xiao Chen,
Alex Morehead,
Jian Liu,
Jianlin Cheng
Abstract:
Proteins interact to form complexes to carry out essential biological functions. Computational methods have been developed to predict the structures of protein complexes. However, an important challenge in protein complex structure prediction is to estimate the quality of predicted protein complex structures without any knowledge of the corresponding native structures. Such estimations can then be…
▽ More
Proteins interact to form complexes to carry out essential biological functions. Computational methods have been developed to predict the structures of protein complexes. However, an important challenge in protein complex structure prediction is to estimate the quality of predicted protein complex structures without any knowledge of the corresponding native structures. Such estimations can then be used to select high-quality predicted complex structures to facilitate biomedical research such as protein function analysis and drug discovery. We challenge this significant task with DProQ, which introduces a gated neighborhood-modulating Graph Transformer (GGT) designed to predict the quality of 3D protein complex structures. Notably, we incorporate node and edge gates within a novel Graph Transformer framework to control information flow during graph message passing. We train and evaluate DProQ on four newly-developed datasets that we make publicly available in this work. Our rigorous experiments demonstrate that DProQ achieves state-of-the-art performance in ranking protein complex structures.
△ Less
Submitted 21 May, 2022;
originally announced May 2022.
-
EGR: Equivariant Graph Refinement and Assessment of 3D Protein Complex Structures
Authors:
Alex Morehead,
Xiao Chen,
Tianqi Wu,
Jian Liu,
Jianlin Cheng
Abstract:
Protein complexes are macromolecules essential to the functioning and well-being of all living organisms. As the structure of a protein complex, in particular its region of interaction between multiple protein subunits (i.e., chains), has a notable influence on the biological function of the complex, computational methods that can quickly and effectively be used to refine and assess the quality of…
▽ More
Protein complexes are macromolecules essential to the functioning and well-being of all living organisms. As the structure of a protein complex, in particular its region of interaction between multiple protein subunits (i.e., chains), has a notable influence on the biological function of the complex, computational methods that can quickly and effectively be used to refine and assess the quality of a protein complex's 3D structure can directly be used within a drug discovery pipeline to accelerate the development of new therapeutics and improve the efficacy of future vaccines. In this work, we introduce the Equivariant Graph Refiner (EGR), a novel E(3)-equivariant graph neural network (GNN) for multi-task structure refinement and assessment of protein complexes. Our experiments on new, diverse protein complex datasets, all of which we make publicly available in this work, demonstrate the state-of-the-art effectiveness of EGR for atomistic refinement and assessment of protein complexes and outline directions for future work in the field. In doing so, we establish a baseline for future studies in macromolecular refinement and structure analysis.
△ Less
Submitted 24 May, 2022; v1 submitted 20 May, 2022;
originally announced May 2022.
-
A Region-Based Deep Learning Approach to Automated Retail Checkout
Authors:
Maged Shoman,
Armstrong Aboah,
Alex Morehead,
Ye Duan,
Abdulateef Daud,
Yaw Adu-Gyamfi
Abstract:
Automating the product checkout process at conventional retail stores is a task poised to have large impacts on society generally speaking. Towards this end, reliable deep learning models that enable automated product counting for fast customer checkout can make this goal a reality. In this work, we propose a novel, region-based deep learning approach to automate product counting using a customize…
▽ More
Automating the product checkout process at conventional retail stores is a task poised to have large impacts on society generally speaking. Towards this end, reliable deep learning models that enable automated product counting for fast customer checkout can make this goal a reality. In this work, we propose a novel, region-based deep learning approach to automate product counting using a customized YOLOv5 object detection pipeline and the DeepSORT algorithm. Our results on challenging, real-world test videos demonstrate that our method can generalize its predictions to a sufficient level of accuracy and with a fast enough runtime to warrant deployment to real-world commercial settings. Our proposed method won 4th place in the 2022 AI City Challenge, Track 4, with an F1 score of 0.4400 on experimental validation data.
△ Less
Submitted 18 April, 2022;
originally announced April 2022.
-
Semi-Supervised Graph Learning Meets Dimensionality Reduction
Authors:
Alex Morehead,
Watchanan Chantapakul,
Jianlin Cheng
Abstract:
Semi-supervised learning (SSL) has recently received increased attention from machine learning researchers. By enabling effective propagation of known labels in graph-based deep learning (GDL) algorithms, SSL is poised to become an increasingly used technique in GDL in the coming years. However, there are currently few explorations in the graph-based SSL literature on exploiting classical dimensio…
▽ More
Semi-supervised learning (SSL) has recently received increased attention from machine learning researchers. By enabling effective propagation of known labels in graph-based deep learning (GDL) algorithms, SSL is poised to become an increasingly used technique in GDL in the coming years. However, there are currently few explorations in the graph-based SSL literature on exploiting classical dimensionality reduction techniques for improved label propagation. In this work, we investigate the use of dimensionality reduction techniques such as PCA, t-SNE, and UMAP to see their effect on the performance of graph neural networks (GNNs) designed for semi-supervised propagation of node labels. Our study makes use of benchmark semi-supervised GDL datasets such as the Cora and Citeseer datasets to allow meaningful comparisons of the representations learned by each algorithm when paired with a dimensionality reduction technique. Our comprehensive benchmarks and clustering visualizations quantitatively and qualitatively demonstrate that, under certain conditions, employing a priori and a posteriori dimensionality reduction to GNN inputs and outputs, respectively, can simultaneously improve the effectiveness of semi-supervised node label propagation and node clustering. Our source code is freely available on GitHub.
△ Less
Submitted 23 March, 2022;
originally announced March 2022.
-
Geometric Transformers for Protein Interface Contact Prediction
Authors:
Alex Morehead,
Chen Chen,
Jianlin Cheng
Abstract:
Computational methods for predicting the interface contacts between proteins come highly sought after for drug discovery as they can significantly advance the accuracy of alternative approaches, such as protein-protein docking, protein function analysis tools, and other computational methods for protein bioinformatics. In this work, we present the Geometric Transformer, a novel geometry-evolving g…
▽ More
Computational methods for predicting the interface contacts between proteins come highly sought after for drug discovery as they can significantly advance the accuracy of alternative approaches, such as protein-protein docking, protein function analysis tools, and other computational methods for protein bioinformatics. In this work, we present the Geometric Transformer, a novel geometry-evolving graph transformer for rotation and translation-invariant protein interface contact prediction, packaged within DeepInteract, an end-to-end prediction pipeline. DeepInteract predicts partner-specific protein interface contacts (i.e., inter-protein residue-residue contacts) given the 3D tertiary structures of two proteins as input. In rigorous benchmarks, DeepInteract, on challenging protein complex targets from the 13th and 14th CASP-CAPRI experiments as well as Docking Benchmark 5, achieves 14% and 1.1% top L/5 precision (L: length of a protein unit in a complex), respectively. In doing so, DeepInteract, with the Geometric Transformer as its graph-based backbone, outperforms existing methods for interface contact prediction in addition to other graph-based neural network backbones compatible with DeepInteract, thereby validating the effectiveness of the Geometric Transformer for learning rich relational-geometric features for downstream tasks on 3D protein structures.
△ Less
Submitted 4 March, 2022; v1 submitted 5 October, 2021;
originally announced October 2021.
-
DIPS-Plus: The Enhanced Database of Interacting Protein Structures for Interface Prediction
Authors:
Alex Morehead,
Chen Chen,
Ada Sedova,
Jianlin Cheng
Abstract:
How and where proteins interface with one another can ultimately impact the proteins' functions along with a range of other biological processes. As such, precise computational methods for protein interface prediction (PIP) come highly sought after as they could yield significant advances in drug discovery and design as well as protein function analysis. However, the traditional benchmark dataset…
▽ More
How and where proteins interface with one another can ultimately impact the proteins' functions along with a range of other biological processes. As such, precise computational methods for protein interface prediction (PIP) come highly sought after as they could yield significant advances in drug discovery and design as well as protein function analysis. However, the traditional benchmark dataset for this task, Docking Benchmark 5 (DB5), contains only a modest 230 complexes for training, validating, and testing different machine learning algorithms. In this work, we expand on a dataset recently introduced for this task, the Database of Interacting Protein Structures (DIPS), to present DIPS-Plus, an enhanced, feature-rich dataset of 42,112 complexes for geometric deep learning of protein interfaces. The previous version of DIPS contains only the Cartesian coordinates and types of the atoms comprising a given protein complex, whereas DIPS-Plus now includes a plethora of new residue-level features including protrusion indices, half-sphere amino acid compositions, and new profile hidden Markov model (HMM)-based sequence features for each amino acid, giving researchers a large, well-curated feature bank for training protein interface prediction methods. We demonstrate through rigorous benchmarks that training an existing state-of-the-art (SOTA) model for PIP on DIPS-Plus yields SOTA results, surpassing the performance of all other models trained on residue-level and atom-level encodings of protein complexes to date.
△ Less
Submitted 6 October, 2021; v1 submitted 6 June, 2021;
originally announced June 2021.
-
KEPLER's First Rocky Planet: Kepler-10b
Authors:
Natalie M. Batalha,
William J. Borucki,
Stephen T. Bryson,
Lars A. Buchhave,
Douglas A. Caldwell,
Jorgen Christensen-Dalsgaard,
David Ciardi,
Edward W. Dunham,
Francois Fressin,
Thomas N. Gautier III,
Ronald L. Gilliland,
Michael R. Haas,
Steve B. Howell,
Jon M. Jenkins,
Hans Kjeldsen,
David G. Koch,
David W. Latham,
Jack J. Lissauer,
Geoffrey W. Marcy,
Jason F. Rowe,
Dimitar D. Sasselov,
Sara Seager,
Jason H. Steffen,
Guillermo Torres,
Gibor S. Basri
, et al. (27 additional authors not shown)
Abstract:
NASA's Kepler Mission uses transit photometry to determine the frequency of earth-size planets in or near the habitable zone of Sun-like stars. The mission reached a milestone toward meeting that goal: the discovery of its first rocky planet, Kepler-10b. Two distinct sets of transit events were detected: 1) a 152 +/- 4 ppm dimming lasting 1.811 +/- 0.024 hours with ephemeris T[BJD]=2454964.57375+N…
▽ More
NASA's Kepler Mission uses transit photometry to determine the frequency of earth-size planets in or near the habitable zone of Sun-like stars. The mission reached a milestone toward meeting that goal: the discovery of its first rocky planet, Kepler-10b. Two distinct sets of transit events were detected: 1) a 152 +/- 4 ppm dimming lasting 1.811 +/- 0.024 hours with ephemeris T[BJD]=2454964.57375+N*0.837495 days and 2) a 376 +/- 9 ppm dimming lasting 6.86 +/- 0.07 hours with ephemeris T[BJD]=2454971.6761+N*45.29485 days. Statistical tests on the photometric and pixel flux time series established the viability of the planet candidates triggering ground-based follow-up observations. Forty precision Doppler measurements were used to confirm that the short-period transit event is due to a planetary companion. The parent star is bright enough for asteroseismic analysis. Photometry was collected at 1-minute cadence for >4 months from which we detected 19 distinct pulsation frequencies. Modeling the frequencies resulted in precise knowledge of the fundamental stellar properties. Kepler-10 is a relatively old (11.9 +/- 4.5 Gyr) but otherwise Sun-like Main Sequence star with Teff=5627 +/- 44 K, Mstar=0.895 +/- 0.060 Msun, and Rstar=1.056 +/- 0.021 Rsun. Physical models simultaneously fit to the transit light curves and the precision Doppler measurements yielded tight constraints on the properties of Kepler-10b that speak to its rocky composition: Mpl=4.56 +/- 1.29 Mearth, Rpl=1.416 +/- 0.036 Rearth, and density=8.8 +/- 2.9 gcc. Kepler-10b is the smallest transiting exoplanet discovered to date.
△ Less
Submitted 3 February, 2011;
originally announced February 2011.