Similarity Downselection: A Python implementation of a heuristic search algorithm for finding the set of the n most dissimilar items with an application in conformer sampling
Authors:
Felicity F. Nielson,
Sean M. Colby,
Ryan S. Renslow,
Thomas O. Metz
Abstract:
Finding the set of the n items most dissimilar from each other out of a larger population becomes increasingly difficult and computationally expensive as either n or the population size grows large. Finding the set of the n most dissimilar items is different than simply sorting an array of numbers because there exists a pairwise relationship between each item and all other items in the population.…
▽ More
Finding the set of the n items most dissimilar from each other out of a larger population becomes increasingly difficult and computationally expensive as either n or the population size grows large. Finding the set of the n most dissimilar items is different than simply sorting an array of numbers because there exists a pairwise relationship between each item and all other items in the population. For instance, if you have a set of the most dissimilar n=4 items, one or more of the items from n=4 might not be in the set n=5. An exact solution would have to search all possible combinations of size n in the population, exhaustively. We present an open-source software called similarity downselection (SDS), written in Python and freely available on GitHub. SDS implements a heuristic algorithm for quickly finding the approximate set(s) of the n most dissimilar items. We benchmark SDS against a Monte Carlo method, which attempts to find the exact solution through repeated random sampling. We show that for SDS to find the set of n most dissimilar conformers, our method is not only orders of magnitude faster, but is also more accurate than running the Monte Carlo for 1,000,000 iterations, each searching for set sizes n=3-7 out of a population of 50,000. We also benchmark SDS against the exact solution for example small populations, showing SDS produces a solution close to the exact solution in these instances.
△ Less
Submitted 6 May, 2021;
originally announced May 2021.
Exploring the impacts of conformer selection methods on ion mobility collision cross section predictions
Authors:
Felicity F. Nielson,
Sean M. Colby,
Dennis G. Thomas,
Ryan S. Renslow,
Thomas O. Metz
Abstract:
The prediction of structure dependent molecular properties, such as collision cross sections as measured using ion mobility spectrometry, are crucially dependent on the selection of the correct population of molecular conformers. Here, we report an in-depth evaluation of multiple conformation selection techniques, including simple averaging, Boltzmann weighting, lowest energy selection, low energy…
▽ More
The prediction of structure dependent molecular properties, such as collision cross sections as measured using ion mobility spectrometry, are crucially dependent on the selection of the correct population of molecular conformers. Here, we report an in-depth evaluation of multiple conformation selection techniques, including simple averaging, Boltzmann weighting, lowest energy selection, low energy threshold reductions, and similarity reduction. Generating 50,000 conformers each for 18 molecules, we used the In Silico Chemical Library Engine (ISiCLE) to calculate the collision cross sections for the entire dataset. First, we employed Monte Carlo simulations to understand the variability between conformer structures as generated using simulated annealing. Then we employed Monte Carlo simulations to the aforementioned conformer selection techniques applied on the simulated molecular property - the ion mobility collision cross section. Based on our analyses, we found Boltzmann weighting to be a good tradeoff between precision and theoretical accuracy. Combining multiple techniques revealed that energy thresholds and root-mean-squared deviation-based similarity reductions can save considerable computational expense while maintaining property prediction accuracy. Molecular dynamic conformer generation tools like AMBER can continue to generate new lowest energy conformers even after tens of thousands of generations, decreasing precision between runs. This reduced precision can be ameliorated and theoretical accuracy increased by running density functional theory geometry optimization on carefully selected conformers.
△ Less
Submitted 14 October, 2020;
originally announced October 2020.