-
Adjoint Sensitivity Analysis on Multi-Scale Bioprocess Stochastic Reaction Network
Authors:
Keilung Choy,
Wei Xie
Abstract:
Motivated by the pressing challenges in the digital twin development for biomanufacturing systems, we introduce an adjoint sensitivity analysis (SA) approach to expedite the learning of mechanistic model parameters. In this paper, we consider enzymatic stochastic reaction networks representing a multi-scale bioprocess mechanistic model that allows us to integrate disparate data from diverse produc…
▽ More
Motivated by the pressing challenges in the digital twin development for biomanufacturing systems, we introduce an adjoint sensitivity analysis (SA) approach to expedite the learning of mechanistic model parameters. In this paper, we consider enzymatic stochastic reaction networks representing a multi-scale bioprocess mechanistic model that allows us to integrate disparate data from diverse production processes and leverage the information from existing macro-kinetic and genome-scale models. To support forward prediction and backward reasoning, we develop a convergent adjoint SA algorithm studying how the perturbations of model parameters and inputs (e.g., initial state) propagate through enzymatic reaction networks and impact on output trajectory predictions. This SA can provide a sample efficient and interpretable way to assess the sensitivities between inputs and outputs accounting for their causal dependencies. Our empirical study underscores the resilience of these sensitivities and illuminates a deeper comprehension of the regulatory mechanisms behind bioprocess through sensitivities.
△ Less
Submitted 28 June, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Digital Twin Calibration for Biological System-of-Systems: Cell Culture Manufacturing Process
Authors:
Fuqiang Cheng,
Wei Xie,
Hua Zheng
Abstract:
Biomanufacturing innovation relies on an efficient Design of Experiments (DoEs) to optimize processes and product quality. Traditional DoE methods, ignoring the underlying bioprocessing mechanisms, often suffer from a lack of interpretability and sample efficiency. This limitation motivates us to create a new optimal learning approach for digital twin model calibration. In this study, we consider…
▽ More
Biomanufacturing innovation relies on an efficient Design of Experiments (DoEs) to optimize processes and product quality. Traditional DoE methods, ignoring the underlying bioprocessing mechanisms, often suffer from a lack of interpretability and sample efficiency. This limitation motivates us to create a new optimal learning approach for digital twin model calibration. In this study, we consider the cell culture process multi-scale mechanistic model, also known as Biological System-of-Systems (Bio-SoS). This model with a modular design, composed of sub-models, allows us to integrate data across various production processes. To calibrate the Bio-SoS digital twin, we evaluate the mean squared error of model prediction and develop a computational approach to quantify the impact of parameter estimation error of individual sub-models on the prediction accuracy of digital twin, which can guide sample-efficient and interpretable DoEs.
△ Less
Submitted 28 June, 2024; v1 submitted 6 May, 2024;
originally announced May 2024.
-
Accelerating Discovery of Novel and Bioactive Ligands With Pharmacophore-Informed Generative Models
Authors:
Weixin Xie,
Jianhang Zhang,
Qin Xie,
Chaojun Gong,
Youjun Xu,
Luhua Lai,
Jianfeng Pei
Abstract:
Deep generative models have gained significant advancements to accelerate drug discovery by generating bioactive chemicals against desired targets. Nevertheless, most generated compounds that have been validated for potent bioactivity often exhibit structural novelty levels that fall short of satisfaction, thereby providing limited inspiration to human medicinal chemists. The challenge faced by ge…
▽ More
Deep generative models have gained significant advancements to accelerate drug discovery by generating bioactive chemicals against desired targets. Nevertheless, most generated compounds that have been validated for potent bioactivity often exhibit structural novelty levels that fall short of satisfaction, thereby providing limited inspiration to human medicinal chemists. The challenge faced by generative models lies in their ability to produce compounds that are both bioactive and novel, rather than merely making minor modifications to known actives present in the training set. Recognizing the utility of pharmacophores in facilitating scaffold hop**, we developed TransPharmer, an innovative generative model that integrates ligand-based interpretable pharmacophore fingerprints with generative pre-training transformer (GPT) for de novo molecule generation. TransPharmer demonstrates superior performance across tasks involving unconditioned distribution learning, de novo generation and scaffold elaboration under pharmacophoric constraints. Its distinct exploration mode within the local chemical space renders it particularly useful for scaffold hop**, producing compounds that are structurally novel while pharmaceutically related. The efficacy of TransPharmer is validated through two case studies involving the dopamine receptor D2 (DRD2) and polo-like kinase 1 (PLK1). Notably in the case of PLK1, three out of four synthesized designed compounds exhibit submicromolar activities, with the most potent one, IIP0943, demonstrating a potency of 5.1 nM. Featuring a new scaffold of 4-(benzo[b]thiophen-7-yloxy)pyrimidine, IIP0943 also exhibits high selectivity for PLK1. It was demonstrated that TransPharmer is a powerful tool for discovery of novel and bioactive ligands.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
DiffDTM: A conditional structure-free framework for bioactive molecules generation targeted for dual proteins
Authors:
Lei Huang,
Zheng Yuan,
Huihui Yan,
Rong Sheng,
Lin**g Liu,
Fuzhou Wang,
Weidun Xie,
Nanjun Chen,
Fei Huang,
Songfang Huang,
Ka-Chun Wong,
Yaoyun Zhang
Abstract:
Advances in deep generative models shed light on de novo molecule generation with desired properties. However, molecule generation targeted for dual protein targets still faces formidable challenges including protein 3D structure data requisition for model training, auto-regressive sampling, and model generalization for unseen targets. Here, we proposed DiffDTM, a novel conditional structure-free…
▽ More
Advances in deep generative models shed light on de novo molecule generation with desired properties. However, molecule generation targeted for dual protein targets still faces formidable challenges including protein 3D structure data requisition for model training, auto-regressive sampling, and model generalization for unseen targets. Here, we proposed DiffDTM, a novel conditional structure-free deep generative model based on a diffusion model for dual targets based molecule generation to address the above issues. Specifically, DiffDTM receives protein sequences and molecular graphs as inputs instead of protein and molecular conformations and incorporates an information fusion module to achieve conditional generation in a one-shot manner. We have conducted comprehensive multi-view experiments to demonstrate that DiffDTM can generate drug-like, synthesis-accessible, novel, and high-binding affinity molecules targeting specific dual proteins, outperforming the state-of-the-art (SOTA) models in terms of multiple evaluation metrics. Furthermore, we utilized DiffDTM to generate molecules towards dopamine receptor D2 and 5-hydroxytryptamine receptor 1A as new antipsychotics. The experimental results indicate that DiffDTM can be easily plugged into unseen dual targets to generate bioactive molecules, addressing the issues of requiring insufficient active molecule data for training as well as the need to retrain when encountering new targets.
△ Less
Submitted 24 June, 2023;
originally announced June 2023.
-
Stochastic Biological System-of-Systems Modelling for iPSC Culture
Authors:
Hua Zheng,
Sarah W. Harcum,
**xiang Pei,
Wei Xie
Abstract:
Large-scale manufacturing of induced pluripotent stem cells (iPSCs) is essential for cell therapies and regenerative medicines. Yet, iPSCs form large cell aggregates in suspension bioreactors, resulting in insufficient nutrient supply and extra metabolic waste build-up for the cells located at the core. Since subtle changes in micro-environment can lead to a heterogeneous cell population, a novel…
▽ More
Large-scale manufacturing of induced pluripotent stem cells (iPSCs) is essential for cell therapies and regenerative medicines. Yet, iPSCs form large cell aggregates in suspension bioreactors, resulting in insufficient nutrient supply and extra metabolic waste build-up for the cells located at the core. Since subtle changes in micro-environment can lead to a heterogeneous cell population, a novel Biological System-of-Systems (Bio-SoS) framework is proposed to model cell-to-cell interactions, spatial and metabolic heterogeneity, and cell response to micro-environmental variation. Building on stochastic metabolic reaction network, aggregation kinetics, and reaction-diffusion mechanisms, the Bio-SoS model characterizes causal interdependencies at individual cell, aggregate, and cell population levels. It has a modular design that enables data integration and improves predictions for different monolayer and aggregate culture processes. In addition, a variance decomposition analysis is derived to quantify the impact of factors (i.e., aggregate size) on cell product health and quality heterogeneity.
△ Less
Submitted 11 October, 2023; v1 submitted 28 May, 2023;
originally announced May 2023.
-
Stochastic Molecular Reaction Queueing Network Modeling for In Vitro Transcription Process
Authors:
Keqi Wang,
Wei Xie,
Hua Zheng
Abstract:
To facilitate a rapid response to pandemic threats, this paper focuses on develo** a mechanistic simulation model for in vitro transcription (IVT) process, a crucial step in mRNA vaccine manufacturing. To enhance production and support industry 4.0, this model is proposed to improve the prediction and analysis of IVT enzymatic reaction network. It incorporates a novel stochastic molecular reacti…
▽ More
To facilitate a rapid response to pandemic threats, this paper focuses on develo** a mechanistic simulation model for in vitro transcription (IVT) process, a crucial step in mRNA vaccine manufacturing. To enhance production and support industry 4.0, this model is proposed to improve the prediction and analysis of IVT enzymatic reaction network. It incorporates a novel stochastic molecular reaction queueing network with a regulatory kinetic model characterizing the effect of bioprocess state variables on reaction rates. The empirical study demonstrates that the proposed model has a promising performance under different production conditions and it could offer potential improvements in mRNA product quality and yield.
△ Less
Submitted 21 June, 2023; v1 submitted 16 May, 2023;
originally announced May 2023.
-
Structure-Function Dynamics Hybrid Modeling: RNA Degradation
Authors:
Hua Zheng,
Wei Xie,
Paul Whitford,
Ailun Wang,
Chunsheng Fang,
Wandi Xu
Abstract:
RNA structure and functional dynamics play fundamental roles in controlling biological systems. Molecular dynamics simulation, which can characterize interactions at an atomistic level, can advance the understanding on new drug discovery, manufacturing, and delivery mechanisms. However, it is computationally unattainable to support the development of a digital twin for enzymatic reaction network m…
▽ More
RNA structure and functional dynamics play fundamental roles in controlling biological systems. Molecular dynamics simulation, which can characterize interactions at an atomistic level, can advance the understanding on new drug discovery, manufacturing, and delivery mechanisms. However, it is computationally unattainable to support the development of a digital twin for enzymatic reaction network mechanism learning, and end-to-end bioprocess design and control. Thus, we create a hybrid ("mechanistic + machine learning") model characterizing the interdependence of RNA structure and functional dynamics from atomistic to macroscopic levels. To assess the proposed modeling strategy, in this paper, we consider RNA degradation which is a critical process in cellular biology that affects gene expression. The empirical study on RNA lifetime prediction demonstrates the promising performance of the proposed multi-scale bioprocess hybrid modeling strategy.
△ Less
Submitted 17 June, 2023; v1 submitted 6 May, 2023;
originally announced May 2023.
-
Metabolic Regulatory Network Kinetic Modeling with Multiple Isotopic Tracers for iPSCs
Authors:
Keqi Wang,
Wei Xie,
Sarah W. Harcum
Abstract:
The rapidly expanding market for regenerative medicines and cell therapies highlights the need to advance the understanding of cellular metabolisms and improve the prediction of cultivation production process for human induced pluripotent stem cells (iPSCs). In this paper, a metabolic kinetic model was developed to characterize underlying mechanisms of iPSC culture process, which can predict cell…
▽ More
The rapidly expanding market for regenerative medicines and cell therapies highlights the need to advance the understanding of cellular metabolisms and improve the prediction of cultivation production process for human induced pluripotent stem cells (iPSCs). In this paper, a metabolic kinetic model was developed to characterize underlying mechanisms of iPSC culture process, which can predict cell response to environmental perturbation and support process control. This model focuses on the central carbon metabolic network, including glycolysis, pentose phosphate pathway (PPP), tricarboxylic acid (TCA) cycle, and amino acid metabolism, which plays a crucial role to support iPSC proliferation. Heterogeneous measures of extracellular metabolites and multiple isotopic tracers collected under multiple conditions were used to learn metabolic regulatory mechanisms. Systematic cross-validation confirmed the model's performance in terms of providing reliable predictions on cellular metabolism and culture process dynamics under various culture conditions. Thus, the developed mechanistic kinetic model can support process control strategies to strategically select optimal cell culture conditions at different times, ensure cell product functionality, and facilitate large-scale manufacturing of regenerative medicines and cell therapies.
△ Less
Submitted 25 October, 2023; v1 submitted 29 April, 2023;
originally announced May 2023.
-
From Discovery to Production: Challenges and Novel Methodologies for Next Generation Biomanufacturing
Authors:
Wei Xie,
Giulia Pedrielli
Abstract:
The increasingly pressing demand of novel drugs (e.g., gene therapies for personalized cancer care, ever evolving vaccines) with unprecedented levels of personalization, has put a remarkable pressure on the traditionally long time required by the pharma R&D and manufacturing to go from design to production of new products. The revolution has already brought important changes in the technologies us…
▽ More
The increasingly pressing demand of novel drugs (e.g., gene therapies for personalized cancer care, ever evolving vaccines) with unprecedented levels of personalization, has put a remarkable pressure on the traditionally long time required by the pharma R&D and manufacturing to go from design to production of new products. The revolution has already brought important changes in the technologies used within the industry. In fact, practitioners are increasingly moving away from the classical paradigm of large-scale batch production to continuous biomanufacturing with flexible and modular design, which is further supported by the recent technology advance in single-use equipment. In contrast to long design processes, low product variability (one-fits-all), and highly rigid systems, modern pharma players are answering the question: can we bring design and process control up to the speed that novel production technologies give us to quickly set up a flexible production run?
In this tutorial, we present key challenges and potential solutions from the world of operations research that can support answering such question. We first present technical challenges and novel methods for the design of next generation drugs, followed by the process modeling and control approaches to successfully and efficiently manufacture them.
△ Less
Submitted 28 June, 2022; v1 submitted 8 May, 2022;
originally announced May 2022.
-
Self-Supervised Graph Transformer on Large-Scale Molecular Data
Authors:
Yu Rong,
Yatao Bian,
Tingyang Xu,
Weiyang Xie,
Ying Wei,
Wenbing Huang,
Junzhou Huang
Abstract:
How to obtain informative representations of molecules is a crucial prerequisite in AI-driven drug design and discovery. Recent researches abstract molecules as graphs and employ Graph Neural Networks (GNNs) for molecular representation learning. Nevertheless, two issues impede the usage of GNNs in real scenarios: (1) insufficient labeled molecules for supervised training; (2) poor generalization…
▽ More
How to obtain informative representations of molecules is a crucial prerequisite in AI-driven drug design and discovery. Recent researches abstract molecules as graphs and employ Graph Neural Networks (GNNs) for molecular representation learning. Nevertheless, two issues impede the usage of GNNs in real scenarios: (1) insufficient labeled molecules for supervised training; (2) poor generalization capability to new-synthesized molecules. To address them both, we propose a novel framework, GROVER, which stands for Graph Representation frOm self-superVised mEssage passing tRansformer. With carefully designed self-supervised tasks in node-, edge- and graph-level, GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data. Rather, to encode such complex information, GROVER integrates Message Passing Networks into the Transformer-style architecture to deliver a class of more expressive encoders of molecules. The flexibility of GROVER allows it to be trained efficiently on large-scale molecular dataset without requiring any supervision, thus being immunized to the two issues mentioned above. We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning. We then leverage the pre-trained GROVER for molecular property prediction followed by task-specific fine-tuning, where we observe a huge improvement (more than 6% on average) from current state-of-the-art methods on 11 challenging benchmarks. The insights we gained are that well-designed self-supervision losses and largely-expressive pre-trained models enjoy the significant potential on performance boosting.
△ Less
Submitted 28 October, 2020; v1 submitted 18 June, 2020;
originally announced July 2020.
-
Multi-View Graph Neural Networks for Molecular Property Prediction
Authors:
Hehuan Ma,
Yatao Bian,
Yu Rong,
Wenbing Huang,
Tingyang Xu,
Weiyang Xie,
Geyan Ye,
Junzhou Huang
Abstract:
The crux of molecular property prediction is to generate meaningful representations of the molecules. One promising route is to exploit the molecular graph structure through Graph Neural Networks (GNNs). It is well known that both atoms and bonds significantly affect the chemical properties of a molecule, so an expressive model shall be able to exploit both node (atom) and edge (bond) information…
▽ More
The crux of molecular property prediction is to generate meaningful representations of the molecules. One promising route is to exploit the molecular graph structure through Graph Neural Networks (GNNs). It is well known that both atoms and bonds significantly affect the chemical properties of a molecule, so an expressive model shall be able to exploit both node (atom) and edge (bond) information simultaneously. Guided by this observation, we present Multi-View Graph Neural Network (MV-GNN), a multi-view message passing architecture to enable more accurate predictions of molecular properties. In MV-GNN, we introduce a shared self-attentive readout component and disagreement loss to stabilize the training process. This readout component also renders the whole architecture interpretable. We further boost the expressive power of MV-GNN by proposing a cross-dependent message passing scheme that enhances information communication of the two views, which results in the MV-GNN^cross variant. Lastly, we theoretically justify the expressiveness of the two proposed models in terms of distinguishing non-isomorphism graphs. Extensive experiments demonstrate that MV-GNN models achieve remarkably superior performance over the state-of-the-art models on a variety of challenging benchmarks. Meanwhile, visualization results of the node importance are consistent with prior knowledge, which confirms the interpretability power of MV-GNN models.
△ Less
Submitted 12 June, 2020; v1 submitted 17 May, 2020;
originally announced May 2020.
-
Supporting Regularized Logistic Regression Privately and Efficiently
Authors:
Wenfa Li,
Hongzhe Liu,
Peng Yang,
Wei Xie
Abstract:
As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Increasing concerns over data privacy make it more and more difficult to coordinate and conduct…
▽ More
As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Increasing concerns over data privacy make it more and more difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used machine learning model in various disciplines while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluation on several studies validated the privacy guarantees, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc.
△ Less
Submitted 30 September, 2015;
originally announced October 2015.