-
Text-Guided Molecule Generation with Diffusion Language Model
Authors:
Haisong Gong,
Qiang Liu,
Shu Wu,
Liang Wang
Abstract:
Text-guided molecule generation is a task where molecules are generated to match specific textual descriptions. Recently, most existing SMILES-based molecule generation methods rely on an autoregressive architecture. In this work, we propose the Text-Guided Molecule Generation with Diffusion Language Model (TGM-DLM), a novel approach that leverages diffusion models to address the limitations of au…
▽ More
Text-guided molecule generation is a task where molecules are generated to match specific textual descriptions. Recently, most existing SMILES-based molecule generation methods rely on an autoregressive architecture. In this work, we propose the Text-Guided Molecule Generation with Diffusion Language Model (TGM-DLM), a novel approach that leverages diffusion models to address the limitations of autoregressive methods. TGM-DLM updates token embeddings within the SMILES string collectively and iteratively, using a two-phase diffusion generation process. The first phase optimizes embeddings from random noise, guided by the text description, while the second phase corrects invalid SMILES strings to form valid molecular representations. We demonstrate that TGM-DLM outperforms MolT5-Base, an autoregressive model, without the need for additional data resources. Our findings underscore the remarkable effectiveness of TGM-DLM in generating coherent and precise molecules with specific properties, opening new avenues in drug discovery and related scientific domains. Code will be released at: https://github.com/Deno-V/tgm-dlm.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Early feasibility of an embedded bi-directional brain-computer interface for ambulation
Authors:
Jeffrey Lim,
Po T. Wang,
Wonjoon Sohn,
Claudia Serrano-Amenos,
Mina Ibrahim,
Derrick Lin,
Shravan Thaploo,
Susan J. Shaw,
Michelle Armacost,
Hui Gong,
Brian Lee,
Darrin Lee,
Richard A. Andersen,
Payam Heydari,
Charles Y. Liu,
Zoran Nenadic,
An H. Do
Abstract:
Current treatments for paraplegia induced by spinal cord injury (SCI) are often limited by the severity of the injury. The accompanying loss of sensory and motor functions often results in reliance on wheelchairs, which in turn causes reduced quality of life and increased risk of co-morbidities. While brain-computer interfaces (BCIs) for ambulation have shown promise in restoring or replacing lowe…
▽ More
Current treatments for paraplegia induced by spinal cord injury (SCI) are often limited by the severity of the injury. The accompanying loss of sensory and motor functions often results in reliance on wheelchairs, which in turn causes reduced quality of life and increased risk of co-morbidities. While brain-computer interfaces (BCIs) for ambulation have shown promise in restoring or replacing lower extremity motor functions, none so far have simultaneously implemented sensory feedback functions. Additionally, many existing BCIs for ambulation rely on bulky external hardware that make them ill-suited for non-research settings. Here, we present an embedded bi-directional BCI (BDBCI), that restores motor function by enabling neural control over a robotic gait exoskeleton (RGE) and delivers sensory feedback via direct cortical electrical stimulation (DCES) in response to RGE leg swing. A first demonstration with this system was performed with a single subject implanted with electrocorticography electrodes, achieving an average lag-optimized cross-correlation of 0.80$\pm$0.08 between cues and decoded states over 5 runs.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Decoding of the Walking States and Step Rates from Cortical Electrocorticogram Signals
Authors:
Po T. Wang,
Colin M. McCrimmon,
Susan J. Shaw,
Hui Gong,
Luis A. Chui,
Payam Heydari,
Charles Y. Liu,
An H. Do,
Zoran Nenadic
Abstract:
Brain-computer interfaces (BCIs) have shown promising results in restoring motor function to individuals with spinal cord injury. These systems have traditionally focused on the restoration of upper extremity function; however, the lower extremities have received relatively little attention. Early feasibility studies used noninvasive electroencephalogram (EEG)-based BCIs to restore walking functio…
▽ More
Brain-computer interfaces (BCIs) have shown promising results in restoring motor function to individuals with spinal cord injury. These systems have traditionally focused on the restoration of upper extremity function; however, the lower extremities have received relatively little attention. Early feasibility studies used noninvasive electroencephalogram (EEG)-based BCIs to restore walking function to people with paraplegia. However, the limited spatiotemporal resolution of EEG signals restricted the application of these BCIs to elementary gait tasks, such as the initiation and termination of walking. To restore more complex gait functions, BCIs must accurately decode additional degrees of freedom from brain signals. In this study, we used subdurally recorded electrocorticogram (ECoG) signals from able-bodied subjects to design a decoder capable of predicting the walking state and step rate information. We recorded ECoG signals from the motor cortices of two individuals as they walked on a treadmill at different speeds. Our offline analysis demonstrated that the state information could be decoded from >16 minutes of ECoG data with an unprecedented accuracy of 99.8%. Additionally, using a Bayesian filter approach, we achieved an average correlation coefficient between the decoded and true step rates of 0.934. When combined, these decoders may yield decoding accuracies sufficient to safely operate present-day walking prostheses.
△ Less
Submitted 14 April, 2021;
originally announced April 2021.
-
Exploring the Regulatory Function of the N-terminal Domain of SARS-CoV-2 Spike Protein Through Molecular Dynamics Simulation
Authors:
Yao Li,
Tong Wang,
Juanrong Zhang,
Bin Shao,
Haipeng Gong,
Yusong Wang,
Siyuan Liu,
Tie-Yan Liu
Abstract:
SARS-CoV-2 is what has caused the COVID-19 pandemic. Early viral infection is mediated by the SARS-CoV-2 homo-trimeric Spike (S) protein with its receptor binding domains (RBDs) in the receptor-accessible state. We performed molecular dynamics simulation on the S protein with a focus on the function of its N-terminal domains (NTDs). Our study reveals that the NTD acts as a "wedge" and plays a cruc…
▽ More
SARS-CoV-2 is what has caused the COVID-19 pandemic. Early viral infection is mediated by the SARS-CoV-2 homo-trimeric Spike (S) protein with its receptor binding domains (RBDs) in the receptor-accessible state. We performed molecular dynamics simulation on the S protein with a focus on the function of its N-terminal domains (NTDs). Our study reveals that the NTD acts as a "wedge" and plays a crucial regulatory role in the conformational changes of the S protein. The complete RBD structural transition is allowed only when the neighboring NTD that typically prohibits the RBD's movements as a wedge detaches and swings away. Based on this NTD "wedge" model, we propose that the NTD-RBD interface should be a potential drug target.
△ Less
Submitted 6 January, 2021;
originally announced January 2021.
-
Predicting the real-valued distances between residue pairs for proteins
Authors:
Wenze Ding,
Haipeng Gong
Abstract:
Predicting protein structure from the amino acid sequence has been a challenge with theoretical and practical significance in biophysics. Despite the recent progresses elicited by improved residue-residue contact prediction, contact-based structure prediction has gradually reached the performance ceiling. New methods have been proposed to predict the residue-residue distance, but unanimously by si…
▽ More
Predicting protein structure from the amino acid sequence has been a challenge with theoretical and practical significance in biophysics. Despite the recent progresses elicited by improved residue-residue contact prediction, contact-based structure prediction has gradually reached the performance ceiling. New methods have been proposed to predict the residue-residue distance, but unanimously by simplifying the real-valued distance prediction into a multiclass classification problem. Here we show a regression-based distance prediction method, which adopts the generative adversarial network to capture the delicate geometric relationship between residue pairs and thus could predict the continuous, real-valued residue-residue distance satisfactorily. The predicted residue distance map allows rapid structure modeling by the CNS suite, and the constructed models approach at least the same level of quality as the other state-of-the-art protein structure prediction methods when tested on available CASP13 targets. Moreover, this method can be used directly for the structure prediction of membrane proteins without transfer learning.
△ Less
Submitted 18 December, 2019; v1 submitted 12 December, 2019;
originally announced December 2019.
-
Improved fragment-based movement with LRFragLib for all-atom Ab initio protein folding
Authors:
Tong Wang,
Haipeng Gong,
Eugene I. Shakhnovich
Abstract:
Fragment-based assembly has been widely used in Ab initio protein folding simulation which can effectively reduce the conformational space and thus accelerate sampling. The efficiency of fragment-based movement as well as the quality of fragment library determine whether the folding process can lead the free energy landscape to the global minimum and help the protein to reach near-native folded st…
▽ More
Fragment-based assembly has been widely used in Ab initio protein folding simulation which can effectively reduce the conformational space and thus accelerate sampling. The efficiency of fragment-based movement as well as the quality of fragment library determine whether the folding process can lead the free energy landscape to the global minimum and help the protein to reach near-native folded state. We designed an improved fragment-based movement, "fragmove", which substituted multiple backbone dihedral angles in every simulation step. This movement strategy was derived from the fragment library generated by LRFragLib, an effective fragment detection algorithm using logistic regression model. We show in replica exchange Monte Carlo (REMC) simulation that "fragmove", when compared with a set of existing movements in REMC, shows significant improved ability at increasing secondary and tertiary predicted model accuracy by 11.24% and 17.98%, respectively and reaching energy minima decreased by 5.72%. Our results demonstrate that this improved movement is more powerful to guide proteins faster to low energy regions of conformational space and promote folding efficiency and predicted model accuracy.
△ Less
Submitted 2 June, 2019;
originally announced June 2019.
-
AmoebaContact and GDFold: a new pipeline for rapid prediction of protein structures
Authors:
Wenzhi Mao,
Wenze Ding,
Haipeng Gong
Abstract:
Native contacts between residues could be predicted from the amino acid sequence of proteins, and the predicted contact information could assist the de novo protein structure prediction. Here, we present a novel pipeline of a residue contact predictor AmoebaContact and a contact-assisted folder GDFold for rapid protein structure prediction. Unlike mainstream contact predictors that utilize human-d…
▽ More
Native contacts between residues could be predicted from the amino acid sequence of proteins, and the predicted contact information could assist the de novo protein structure prediction. Here, we present a novel pipeline of a residue contact predictor AmoebaContact and a contact-assisted folder GDFold for rapid protein structure prediction. Unlike mainstream contact predictors that utilize human-designed neural networks, AmoebaContact adopts a set of network architectures that are found as optimal for contact prediction through automatic searching and predicts the residue contacts at a series of cutoffs. Different from conventional contact-assisted folders that only use top-scored contact pairs, GDFold considers all residue pairs from the prediction results of AmoebaContact in a differentiable loss function and optimizes the atom coordinates using the gradient descent algorithm. Combination of AmoebaContact and GDFold allows quick reconstruction of the protein structure, with comparable model quality to the state-of-the-art protein structure prediction methods.
△ Less
Submitted 28 May, 2019;
originally announced May 2019.
-
DeepPicker: a Deep Learning Approach for Fully Automated Particle Picking in Cryo-EM
Authors:
Feng Wang,
Huichao Gong,
Gaochao liu,
Mei**g Li,
Chuangye Yan,
Tian Xia,
Xueming Li,
Jianyang Zeng
Abstract:
Particle picking is a time-consuming step in single-particle analysis and often requires significant interventions from users, which has become a bottleneck for future automated electron cryo-microscopy (cryo-EM). Here we report a deep learning framework, called DeepPicker, to address this problem and fill the current gaps toward a fully automated cryo-EM pipeline. DeepPicker employs a novel cross…
▽ More
Particle picking is a time-consuming step in single-particle analysis and often requires significant interventions from users, which has become a bottleneck for future automated electron cryo-microscopy (cryo-EM). Here we report a deep learning framework, called DeepPicker, to address this problem and fill the current gaps toward a fully automated cryo-EM pipeline. DeepPicker employs a novel cross-molecule training strategy to capture common features of particles from previously-analyzed micrographs, and thus does not require any human intervention during particle picking. Tests on the recently-published cryo-EM data of three complexes have demonstrated that our deep learning based scheme can successfully accomplish the human-level particle picking process and identify a sufficient number of particles that are comparable to those manually by human experts. These results indicate that DeepPicker can provide a practically useful tool to significantly reduce the time and manual effort spent in single-particle analysis and thus greatly facilitate high-resolution cryo-EM structure determination.
△ Less
Submitted 6 May, 2016;
originally announced May 2016.