Partial Product Aware Machine Learning on DNA-Encoded Libraries
Authors:
Polina Binder,
Meghan Lawler,
LaShadric Grady,
Neil Carlson,
Sumudu Leelananda,
Svetlana Belyanskaya,
Joe Franklin,
Nicolas Tilmans,
Henri Palacci
Abstract:
DNA encoded libraries (DELs) are used for rapid large-scale screening of small molecules against a protein target. These combinatorial libraries are built through several cycles of chemistry and DNA ligation, producing large sets of DNA-tagged molecules. Training machine learning models on DEL data has been shown to be effective at predicting molecules of interest dissimilar from those in the orig…
▽ More
DNA encoded libraries (DELs) are used for rapid large-scale screening of small molecules against a protein target. These combinatorial libraries are built through several cycles of chemistry and DNA ligation, producing large sets of DNA-tagged molecules. Training machine learning models on DEL data has been shown to be effective at predicting molecules of interest dissimilar from those in the original DEL. Machine learning chemical property prediction approaches rely on the assumption that the property of interest is linked to a single chemical structure. In the context of DNA-encoded libraries, this is equivalent to assuming that every chemical reaction fully yields the desired product. However, in practice, multi-step chemical synthesis sometimes generates partial molecules. Each unique DNA tag in a DEL therefore corresponds to a set of possible molecules. Here, we leverage reaction yield data to enumerate the set of possible molecules corresponding to a given DNA tag. This paper demonstrates that training a custom GNN on this richer dataset improves accuracy and generalization performance.
△ Less
Submitted 16 May, 2022;
originally announced May 2022.
Accurate and Interpretable Machine Learning for Transparent Pricing of Health Insurance Plans
Authors:
Rohun Kshirsagar,
Li-Yen Hsu,
Vatshank Chaturvedi,
Charles H. Greenberg,
Matthew McClelland,
Anushadevi Mohan,
Wideet Shende,
Nicolas P. Tilmans,
Renzo Frigato,
Min Guo,
Ankit Chheda,
Meredith Trotter,
Shonket Ray,
Arnold Lee,
Miguel Alvarado
Abstract:
Health insurance companies cover half of the United States population through commercial employer-sponsored health plans and pay 1.2 trillion US dollars every year to cover medical expenses for their members. The actuary and underwriter roles at a health insurance company serve to assess which risks to take on and how to price those risks to ensure profitability of the organization. While Bayesian…
▽ More
Health insurance companies cover half of the United States population through commercial employer-sponsored health plans and pay 1.2 trillion US dollars every year to cover medical expenses for their members. The actuary and underwriter roles at a health insurance company serve to assess which risks to take on and how to price those risks to ensure profitability of the organization. While Bayesian hierarchical models are the current standard in the industry to estimate risk, interest in machine learning as a way to improve upon these existing methods is increasing. Lumiata, a healthcare analytics company, ran a study with a large health insurance company in the United States. We evaluated the ability of machine learning models to predict the per member per month cost of employer groups in their next renewal period, especially those groups who will cost less than 95\% of what an actuarial model predicts (groups with "concession opportunities"). We developed a sequence of two models, an individual patient-level and an employer-group-level model, to predict the annual per member per month allowed amount for employer groups, based on a population of 14 million patients. Our models performed 20\% better than the insurance carrier's existing pricing model, and identified 84\% of the concession opportunities. This study demonstrates the application of a machine learning system to compute an accurate and fair price for health insurance products and analyzes how explainable machine learning models can exceed actuarial models' predictive accuracy while maintaining interpretability.
△ Less
Submitted 27 February, 2021; v1 submitted 23 September, 2020;
originally announced September 2020.