CryoRL: Reinforcement Learning Enables Efficient Cryo-EM Data Collection
Authors:
Quanfu Fan,
Yilai Li,
Yuguang Yao,
John Cohn,
Sijia Liu,
Seychelle M. Vos,
Michael A. Cianfrocco
Abstract:
Single-particle cryo-electron microscopy (cryo-EM) has become one of the mainstream structural biology techniques because of its ability to determine high-resolution structures of dynamic bio-molecules. However, cryo-EM data acquisition remains expensive and labor-intensive, requiring substantial expertise. Structural biologists need a more efficient and objective method to collect the best data i…
▽ More
Single-particle cryo-electron microscopy (cryo-EM) has become one of the mainstream structural biology techniques because of its ability to determine high-resolution structures of dynamic bio-molecules. However, cryo-EM data acquisition remains expensive and labor-intensive, requiring substantial expertise. Structural biologists need a more efficient and objective method to collect the best data in a limited time frame. We formulate the cryo-EM data collection task as an optimization problem in this work. The goal is to maximize the total number of good images taken within a specified period. We show that reinforcement learning offers an effective way to plan cryo-EM data collection, successfully navigating heterogenous cryo-EM grids. The approach we developed, cryoRL, demonstrates better performance than average users for data collection under similar settings.
△ Less
Submitted 15 April, 2022;
originally announced April 2022.
A cross-study analysis of drug response prediction in cancer cell lines
Authors:
Fangfang Xia,
Jonathan Allen,
Prasanna Balaprakash,
Thomas Brettin,
Cristina Garcia-Cardona,
Austin Clyde,
Judith Cohn,
James Doroshow,
Xiaotian Duan,
Veronika Dubinkina,
Yvonne Evrard,
Ya Ju Fan,
Jason Gans,
Stewart He,
Pinyi Lu,
Sergei Maslov,
Alexander Partin,
Maulik Shukla,
Eric Stahlberg,
Justin M. Wozniak,
Hyunseung Yoo,
George Zaki,
Yitan Zhu,
Rick Stevens
Abstract:
To enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross validation within a single study to assess model accuracy. While an essential first step, cross validation within a biological data set typically provides an overly optimistic estimat…
▽ More
To enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross validation within a single study to assess model accuracy. While an essential first step, cross validation within a biological data set typically provides an overly optimistic estimate of the prediction performance on independent test sets. To provide a more rigorous assessment of model generalizability between different studies, we use machine learning to analyze five publicly available cell line-based data sets: NCI60, CTRP, GDSC, CCLE and gCSI. Based on observed experimental variability across studies, we explore estimates of prediction upper bounds. We report performance results of a variety of machine learning models, with a multitasking deep neural network achieving the best cross-study generalizability. By multiple measures, models trained on CTRP yield the most accurate predictions on the remaining testing data, and gCSI is the most predictable among the cell line data sets included in this study. With these experiments and further simulations on partial data, two lessons emerge: (1) differences in viability assays can limit model generalizability across studies, and (2) drug diversity, more than tumor diversity, is crucial for raising model generalizability in preclinical screening.
△ Less
Submitted 13 August, 2021; v1 submitted 18 April, 2021;
originally announced April 2021.