-
Machine and deep learning methods for predicting 3D genome organization
Authors:
Brydon P. G. Wall,
My Nguyen,
J. Chuck Harrell,
Mikhail G. Dozmorov
Abstract:
Three-Dimensional (3D) chromatin interactions, such as enhancer-promoter interactions (EPIs), loops, Topologically Associating Domains (TADs), and A/B compartments play critical roles in a wide range of cellular processes by regulating gene expression. Recent development of chromatin conformation capture technologies has enabled genome-wide profiling of various 3D structures, even with single cell…
▽ More
Three-Dimensional (3D) chromatin interactions, such as enhancer-promoter interactions (EPIs), loops, Topologically Associating Domains (TADs), and A/B compartments play critical roles in a wide range of cellular processes by regulating gene expression. Recent development of chromatin conformation capture technologies has enabled genome-wide profiling of various 3D structures, even with single cells. However, current catalogs of 3D structures remain incomplete and unreliable due to differences in technology, tools, and low data resolution. Machine learning methods have emerged as an alternative to obtain missing 3D interactions and/or improve resolution. Such methods frequently use genome annotation data (ChIP-seq, DNAse-seq, etc.), DNA sequencing information (k-mers, Transcription Factor Binding Site (TFBS) motifs), and other genomic properties to learn the associations between genomic features and chromatin interactions. In this review, we discuss computational tools for predicting three types of 3D interactions (EPIs, chromatin interactions, TAD boundaries) and analyze their pros and cons. We also point out obstacles of computational prediction of 3D interactions and suggest future research directions.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Magnitude-squared coherence: A powerful tool for disentangling Doppler planet discoveries from stellar activity
Authors:
Sarah E. Dodson-Robinson,
Victor Ramirez Delgado,
Justin Harrell,
Charlotte Haley
Abstract:
If Doppler searches for earth-mass, habitable planets are to succeed, observers must be able to identify and model out stellar activity signals. Here we demonstrate how to diagnose activity signals by calculating the magnitude-squared coherence $\hat{C}^2_{xy}(f)$ between an activity indicator time series $x_t$ and the radial velocity (RV) time series $y_t$. Since planets only cause modulation in…
▽ More
If Doppler searches for earth-mass, habitable planets are to succeed, observers must be able to identify and model out stellar activity signals. Here we demonstrate how to diagnose activity signals by calculating the magnitude-squared coherence $\hat{C}^2_{xy}(f)$ between an activity indicator time series $x_t$ and the radial velocity (RV) time series $y_t$. Since planets only cause modulation in RV, not in activity indicators, a high value of $\hat{C}^2_{xy}(f)$ indicates that the signal at frequency $f$ has a stellar origin. We use Welch's method to measure coherence between activity indicators and RVs in archival observations of GJ 581, alpha Cen B, and GJ 3998. High RV-H$α$ coherence at the frequency of GJ 3998 b, and high RV-S index coherence at the frequency of GJ 3998 c, indicate that the planets may actually be stellar signals. We also replicate previous results showing that GJ 581 d and g are rotation harmonics and demonstrate that alpha Cen B has activity signals that are not associated with rotation. Welch's power spectrum estimates have cleaner spectral windows than Lomb-Scargle periodograms, improving our ability to estimate rotation periods. We find that the rotation period of GJ 581 is 132 days, with no evidence of differential rotation. Welch's method may yield unacceptably large bias for datasets with $N < 75$ observations and works best on datasets with $N > 100$. Tapering the time-domain data can reduce the bias of the Welch's power spectrum estimator, but observers should not apply tapers to datasets with extremely uneven observing cadence. A software package for calculating magnitude-squared coherence and Welch's power spectrum estimates is available on github.
△ Less
Submitted 31 January, 2022;
originally announced January 2022.
-
The EXPRES Stellar Signals Project II. State of the Field in Disentangling Photospheric Velocities
Authors:
Lily L. Zhao,
Debra A. Fischer,
Eric B. Ford,
Alex Wise,
Michaël Cretignier,
Suzanne Aigrain,
Oscar Barragan,
Megan Bedell,
Lars A. Buchhave,
João D. Camacho,
Heather M. Cegla,
Jessi Cisewski-Kehe,
Andrew Collier Cameron,
Zoe L. de Beurs,
Sally Dodson-Robinson,
Xavier Dumusque,
João P. Faria,
Christian Gilbertson,
Charlotte Haley,
Justin Harrell,
David W. Hogg,
Parker Holzer,
Ancy Anna John,
Baptiste Klein,
Marina Lafarga
, et al. (18 additional authors not shown)
Abstract:
Measured spectral shifts due to intrinsic stellar variability (e.g., pulsations, granulation) and activity (e.g., spots, plages) are the largest source of error for extreme precision radial velocity (EPRV) exoplanet detection. Several methods are designed to disentangle stellar signals from true center-of-mass shifts due to planets. The EXPRES Stellar Signals Project (ESSP) presents a self-consist…
▽ More
Measured spectral shifts due to intrinsic stellar variability (e.g., pulsations, granulation) and activity (e.g., spots, plages) are the largest source of error for extreme precision radial velocity (EPRV) exoplanet detection. Several methods are designed to disentangle stellar signals from true center-of-mass shifts due to planets. The EXPRES Stellar Signals Project (ESSP) presents a self-consistent comparison of 22 different methods tested on the same extreme-precision spectroscopic data from EXPRES. Methods derived new activity indicators, constructed models for map** an indicator to the needed RV correction, or separated out shape- and shift-driven RV components. Since no ground truth is known when using real data, relative method performance is assessed using the total and nightly scatter of returned RVs and agreement between the results of different methods. Nearly all submitted methods return a lower RV RMS than classic linear decorrelation, but no method is yet consistently reducing the RV RMS to sub-meter-per-second levels. There is a concerning lack of agreement between the RVs returned by different methods. These results suggest that continued progress in this field necessitates increased interpretability of methods, high-cadence data to capture stellar signals at all timescales, and continued tests like the ESSP using consistent data sets with more advanced metrics for method performance. Future comparisons should make use of various well-characterized data sets -- such as solar data or data with known injected planetary and/or stellar signals -- to better understand method performance and whether planetary signals are preserved.
△ Less
Submitted 25 January, 2022;
originally announced January 2022.
-
K2, Spitzer, and TESS Transits of Four Sub-Neptune Exoplanets
Authors:
Alison Duck,
Caleb K. Harada,
Justin Harrell,
Ryan R. A. Morris,
Edward Williams,
Ian Crossfield,
Michael Werner,
Drake Deming
Abstract:
We present new Spitzer transit observations of four K2 transiting sub-Neptunes: K2-36c, K2-79b, K2-167b, and K2-212b. We derive updated orbital ephemerides and radii for these planets based on a joint analysis of the Spitzer, TESS, and K2 photometry. We use the EVEREST pipeline to provide improved K2 photometry, by detrending instrumental noise and K2's pointing jitter. We used a pixel level decor…
▽ More
We present new Spitzer transit observations of four K2 transiting sub-Neptunes: K2-36c, K2-79b, K2-167b, and K2-212b. We derive updated orbital ephemerides and radii for these planets based on a joint analysis of the Spitzer, TESS, and K2 photometry. We use the EVEREST pipeline to provide improved K2 photometry, by detrending instrumental noise and K2's pointing jitter. We used a pixel level decorrelation method on the Spitzer observations to reduce instrumental systematic effects. We modeled the effect of possible blended eclipsing binaries, seeking to validate these planets via the achromaticity of the transits (K2 versus Spitzer). However, we find that Spitzer's signal-to-noise ratio for these small planets is insufficient to validate them via achromaticity. Nevertheless, by jointly fitting radii between K2 and Spitzer observations, we were able to independently confirm the K2 radius measurements. Due to the long time baseline between the K2 and Spitzer observations, we were also able to increase the precision of the orbital periods compared to K2 observations alone. The improvement is a factor of 3 for K2-36c, and more than an order of magnitude for the remaining planets. Considering possible JWST observations in 1/2023, previous 1 sigma uncertainties in transit times for these planets range from 74 to 434 minutes, but we have reduced them to the range of 8 to 23 minutes.
△ Less
Submitted 19 August, 2021;
originally announced August 2021.
-
Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies
Authors:
Alex Gittens,
Aditya Devarakonda,
Evan Racah,
Michael Ringenburg,
Lisa Gerhardt,
Jey Kottalam,
Jialin Liu,
Kristyn Maschhoff,
Shane Canon,
Jatin Chhugani,
Pramod Sharma,
Jiyan Yang,
James Demmel,
Jim Harrell,
Venkat Krishnamurthy,
Michael W. Mahoney,
Prabhat
Abstract:
We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks. We examine three widely-used and important matrix factorizations: NMF (for physical plausability), PCA (for its ubiquity…
▽ More
We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks. We examine three widely-used and important matrix factorizations: NMF (for physical plausability), PCA (for its ubiquity) and CX (for data interpretability). We apply these methods to TB-sized problems in particle physics, climate modeling and bioimaging. The data matrices are tall-and-skinny which enable the algorithms to map conveniently into Spark's data-parallel model. We perform scaling experiments on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide tuning guidance to obtain high performance.
△ Less
Submitted 20 September, 2016; v1 submitted 5 July, 2016;
originally announced July 2016.