-
CHA2: CHemistry Aware Convex Hull Autoencoder Towards Inverse Molecular Design
Authors:
Mohammad Sajjad Ghaemi,
Hang Hu,
Anguang Hu,
Hsu Kiang Ooi
Abstract:
Optimizing molecular design and discovering novel chemical structures to meet certain objectives, such as quantitative estimates of the drug-likeness score (QEDs), is NP-hard due to the vast combinatorial design space of discrete molecular structures, which makes it near impossible to explore the entire search space comprehensively to exploit de novo structures with properties of interest. To addr…
▽ More
Optimizing molecular design and discovering novel chemical structures to meet certain objectives, such as quantitative estimates of the drug-likeness score (QEDs), is NP-hard due to the vast combinatorial design space of discrete molecular structures, which makes it near impossible to explore the entire search space comprehensively to exploit de novo structures with properties of interest. To address this challenge, reducing the intractable search space into a lower-dimensional latent volume helps examine molecular candidates more feasibly via inverse design. Autoencoders are suitable deep learning techniques, equipped with an encoder that reduces the discrete molecular structure into a latent space and a decoder that inverts the search space back to the molecular design. The continuous property of the latent space, which characterizes the discrete chemical structures, provides a flexible representation for inverse design in order to discover novel molecules. However, exploring this latent space requires certain insights to generate new structures. We propose using a convex hall surrounding the top molecules in terms of high QEDs to ensnare a tight subspace in the latent representation as an efficient way to reveal novel molecules with high QEDs. We demonstrate the effectiveness of our suggested method by using the QM9 as a training dataset along with the Self- Referencing Embedded Strings (SELFIES) representation to calibrate the autoencoder in order to carry out the Inverse molecular design that leads to unfold novel chemical structure.
△ Less
Submitted 21 February, 2023;
originally announced February 2023.
-
Machine learning for the prediction of safe and biologically active organophosphorus molecules
Authors:
Hang Hu,
Hsu Kiang Ooi,
Mohammad Sajjad Ghaemi,
Anguang Hu
Abstract:
Drug discovery is a complex process with a large molecular space to be considered. By constraining the search space, the fragment-based drug design is an approach that can effectively sample the chemical space of interest. Here we propose a framework of Recurrent Neural Networks (RNN) with an attention model to sample the chemical space of organophosphorus molecules using the fragment-based approa…
▽ More
Drug discovery is a complex process with a large molecular space to be considered. By constraining the search space, the fragment-based drug design is an approach that can effectively sample the chemical space of interest. Here we propose a framework of Recurrent Neural Networks (RNN) with an attention model to sample the chemical space of organophosphorus molecules using the fragment-based approach. The framework is trained with a ZINC dataset that is screened for high druglikeness scores. The goal is to predict molecules with similar biological action modes as organophosphorus pesticides or chemical warfare agents yet less toxic to humans. The generated molecules contain a starting fragment of PO2F but have a bulky hydrocarbon side chain limiting its binding effectiveness to the targeted protein.
△ Less
Submitted 21 February, 2023;
originally announced February 2023.
-
Spectral Top-Down Recovery of Latent Tree Models
Authors:
Yariv Aizenbud,
Ariel Jaffe,
Meng Wang,
Amber Hu,
Noah Amsel,
Boaz Nadler,
Joseph T. Chang,
Yuval Kluger
Abstract:
Modeling the distribution of high dimensional data by a latent tree graphical model is a prevalent approach in multiple scientific domains. A common task is to infer the underlying tree structure, given only observations of its terminal nodes. Many algorithms for tree recovery are computationally intensive, which limits their applicability to trees of moderate size. For large trees, a common appro…
▽ More
Modeling the distribution of high dimensional data by a latent tree graphical model is a prevalent approach in multiple scientific domains. A common task is to infer the underlying tree structure, given only observations of its terminal nodes. Many algorithms for tree recovery are computationally intensive, which limits their applicability to trees of moderate size. For large trees, a common approach, termed divide-and-conquer, is to recover the tree structure in two steps. First, recover the structure separately of multiple, possibly random subsets of the terminal nodes. Second, merge the resulting subtrees to form a full tree. Here, we develop Spectral Top-Down Recovery (STDR), a deterministic divide-and-conquer approach to infer large latent tree models. Unlike previous methods, STDR partitions the terminal nodes in a non random way, based on the Fiedler vector of a suitable Laplacian matrix related to the observed nodes. We prove that under certain conditions, this partitioning is consistent with the tree structure. This, in turn, leads to a significantly simpler merging procedure of the small subtrees. We prove that STDR is statistically consistent and bound the number of samples required to accurately recover the tree with high probability. Using simulated data from several common tree models in phylogenetics, we demonstrate that STDR has a significant advantage in terms of runtime, with improved or similar accuracy.
△ Less
Submitted 7 December, 2021; v1 submitted 25 February, 2021;
originally announced February 2021.
-
Oxygen depletion hypothesis remains controversial: a mathematical model of oxygen depletion during FLASH radiation
Authors:
Ankang Hu,
Rui Qiu,
Zhen Wu,
Chunyan Li,
Hui Zhang,
Junli Li
Abstract:
Background: Experiments have reported low normal tissue toxicities during FLASH radiation, but the mechanism has not been elaborated. Several hypotheses have been proposed to explain the mechanism. The oxygen depletion hypothesis has been introduced and mostly studied qualitatively. Methods: We present a computational model to describe the time-dependent change of oxygen concentration in the tissu…
▽ More
Background: Experiments have reported low normal tissue toxicities during FLASH radiation, but the mechanism has not been elaborated. Several hypotheses have been proposed to explain the mechanism. The oxygen depletion hypothesis has been introduced and mostly studied qualitatively. Methods: We present a computational model to describe the time-dependent change of oxygen concentration in the tissue. The kinetic equation of the model is solved numerically using the finite difference method. The model is used to analyze the FLASH effect with the oxygen depletion hypothesis, and the brain tissue is chosen as an example. Results: The oxygen distribution is determined by the oxygen consumption rate of the tissue and the distance between capillaries. The change of oxygen concentration with time after radiation has been found to follow a negative exponential function, and the time constant is determined by the distance between capillaries. When the dose rate is high enough, the same dose results in the same change of oxygen concentration regardless of dose rate. The analysis of FLASH effect in the brain tissue based on this model does not support the explanation of the oxygen depletion hypothesis. Conclusions: The oxygen depletion hypothesis remains controversial because oxygen in most normal tissues cannot be depleted by FLASH radiation according to the mathematical analysis with this model and experiments on the expression and distribution of the hypoxia-inducible factors.
△ Less
Submitted 29 January, 2020;
originally announced January 2020.
-
PyBioNetFit and the Biological Property Specification Language
Authors:
Eshan D. Mitra,
Ryan Suderman,
Joshua Colvin,
Alexander Ionkov,
Andrew Hu,
Herbert M. Sauro,
Richard G. Posner,
William S. Hlavacek
Abstract:
In systems biology modeling, important steps include model parameterization, uncertainty quantification, and evaluation of agreement with experimental observations. To help modelers perform these steps, we developed the software PyBioNetFit. PyBioNetFit is designed for parameterization, and also supports uncertainty quantification, checking models against known system properties, and solving desig…
▽ More
In systems biology modeling, important steps include model parameterization, uncertainty quantification, and evaluation of agreement with experimental observations. To help modelers perform these steps, we developed the software PyBioNetFit. PyBioNetFit is designed for parameterization, and also supports uncertainty quantification, checking models against known system properties, and solving design problems. PyBioNetFit introduces the Biological Property Specification Language (BPSL) for the formal declaration of system properties. BPSL allows qualitative data to be used alone or in combination with quantitative data for parameterization model checking, and design. PyBioNetFit performs parameterization with parallelized metaheuristic optimization algorithms (differential evolution, particle swarm optimization, scatter search) that work directly with existing model definition standards: BioNetGen Language (BNGL) and Systems Biology Markup Language (SBML). We demonstrate PyBioNetFit's capabilities by solving 31 example problems, including the challenging problem of parameterizing a model of cell cycle control in yeast. We benchmark PyBioNetFit's parallelization efficiency on computer clusters, using up to 288 cores. Finally, we demonstrate the model checking and design applications of PyBioNetFit and BPSL by analyzing a model of therapeutic interventions in autophagy signaling.
△ Less
Submitted 18 March, 2019;
originally announced March 2019.