-
ControlMol: Adding Substruture Control To Molecule Diffusion Models
Authors:
Qi Zhengyang,
Liu Zi**g,
Zhang Jiying,
Cao He,
Li Yu
Abstract:
Designing new molecules is an important task in the field of pharmaceuticals. Due to the vast design space of molecules, generating molecules conditioned on a specific sub-structure relevant to a particular function or therapeutic target is a crucial task in computer-aided drug design. In this paper, we present ControlMol, which adds sub-structure control to molecule generation with diffusion mode…
▽ More
Designing new molecules is an important task in the field of pharmaceuticals. Due to the vast design space of molecules, generating molecules conditioned on a specific sub-structure relevant to a particular function or therapeutic target is a crucial task in computer-aided drug design. In this paper, we present ControlMol, which adds sub-structure control to molecule generation with diffusion models. Unlike previous methods which view this task as inpainting or conditional generation, we adopt the idea of ControlNet into conditional molecule generation and make adaptive adjustments to a pre-trained diffusion model. We apply our method to both 2D and 3D molecule generation tasks. Conditioned on randomly partitioned sub-structure data, our method outperforms previous methods by generating more valid and diverse molecules. The method is easy to implement and can be quickly applied to a variety of pre-trained molecule generation models.
△ Less
Submitted 22 April, 2024;
originally announced May 2024.
-
ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training
Authors:
Le Zhuo,
Zewen Chi,
Minghao Xu,
Heyan Huang,
Heqi Zheng,
Conghui He,
Xian-Ling Mao,
Wentao Zhang
Abstract:
We propose ProtLLM, a versatile cross-modal large language model (LLM) for both protein-centric and protein-language tasks. ProtLLM features a unique dynamic protein mounting mechanism, enabling it to handle complex inputs where the natural language text is interspersed with an arbitrary number of proteins. Besides, we propose the protein-as-word language modeling approach to train ProtLLM. By dev…
▽ More
We propose ProtLLM, a versatile cross-modal large language model (LLM) for both protein-centric and protein-language tasks. ProtLLM features a unique dynamic protein mounting mechanism, enabling it to handle complex inputs where the natural language text is interspersed with an arbitrary number of proteins. Besides, we propose the protein-as-word language modeling approach to train ProtLLM. By develo** a specialized protein vocabulary, we equip the model with the capability to predict not just natural language but also proteins from a vast pool of candidates. Additionally, we construct a large-scale interleaved protein-text dataset, named InterPT, for pre-training. This dataset comprehensively encompasses both (1) structured data sources like protein annotations and (2) unstructured data sources like biological research papers, thereby endowing ProtLLM with crucial knowledge for understanding proteins. We evaluate ProtLLM on classic supervised protein-centric tasks and explore its novel protein-language applications. Experimental results demonstrate that ProtLLM not only achieves superior performance against protein-specialized baselines on protein-centric tasks but also induces zero-shot and in-context learning capabilities on protein-language tasks.
△ Less
Submitted 27 February, 2024;
originally announced March 2024.
-
The Brain-Inspired Decoder for Natural Visual Image Reconstruction
Authors:
Wenyi Li,
Shengjie Zheng,
Yufan Liao,
Rongqi Hong,
Weiliang Chen,
Chenggnag He,
Xiaojian Li
Abstract:
Decoding images from brain activity has been a challenge. Owing to the development of deep learning, there are available tools to solve this problem. The decoded image, which aims to map neural spike trains to low-level visual features and high-level semantic information space. Recently, there are a few studies of decoding from spike trains, however, these studies pay less attention to the foundat…
▽ More
Decoding images from brain activity has been a challenge. Owing to the development of deep learning, there are available tools to solve this problem. The decoded image, which aims to map neural spike trains to low-level visual features and high-level semantic information space. Recently, there are a few studies of decoding from spike trains, however, these studies pay less attention to the foundations of neuroscience and there are few studies that merged receptive field into visual image reconstruction. In this paper, we propose a deep learning neural network architecture with biological properties to reconstruct visual image from spike trains. As far as we know, we implemented a method that integrated receptive field property matrix into loss function at the first time. Our model is an end-to-end decoder from neural spike trains to images. We not only merged Gabor filter into auto-encoder which used to generate images but also proposed a loss function with receptive field properties. We evaluated our decoder on two datasets which contain macaque primary visual cortex neural spikes and salamander retina ganglion cells (RGCs) spikes. Our results show that our method can effectively combine receptive field features to reconstruct images, providing a new approach to visual reconstruction based on neural information.
△ Less
Submitted 18 July, 2022;
originally announced July 2022.
-
A Spiking Neural Network based on Neural Manifold for Augmenting Intracortical Brain-Computer Interface Data
Authors:
Shengjie Zheng,
Wenyi Li,
Lang Qian,
Chenggang He,
Xiaojian Li
Abstract:
Brain-computer interfaces (BCIs), transform neural signals in the brain into in-structions to control external devices. However, obtaining sufficient training data is difficult as well as limited. With the advent of advanced machine learning methods, the capability of brain-computer interfaces has been enhanced like never before, however, these methods require a large amount of data for training a…
▽ More
Brain-computer interfaces (BCIs), transform neural signals in the brain into in-structions to control external devices. However, obtaining sufficient training data is difficult as well as limited. With the advent of advanced machine learning methods, the capability of brain-computer interfaces has been enhanced like never before, however, these methods require a large amount of data for training and thus require data augmentation of the limited data available. Here, we use spiking neural networks (SNN) as data generators. It is touted as the next-generation neu-ral network and is considered as one of the algorithms oriented to general artifi-cial intelligence because it borrows the neural information processing from bio-logical neurons. We use the SNN to generate neural spike information that is bio-interpretable and conforms to the intrinsic patterns in the original neural data. Ex-periments show that the model can directly synthesize new spike trains, which in turn improves the generalization ability of the BCI decoder. Both the input and output of the spiking neural model are spike information, which is a brain-inspired intelligence approach that can be better integrated with BCI in the future.
△ Less
Submitted 26 March, 2022;
originally announced April 2022.
-
Rapid and Accurate Detection of SARS-CoV-2 Mutations using a Cas12a-based Sensing Platform
Authors:
C He,
C Lin,
G Mo,
B Xi,
A Li,
D Huang,
Y Wan,
F Chen,
Y Liang,
Q Zuo,
W Xu,
D Feng,
G Zhang,
L Han,
C Ke,
H Du,
L Huang
Abstract:
The increasing prevalence of SARS-CoV-2 variants with spike mutations has raised concerns owing to higher transmission rates, disease severity, and escape from neutralizing antibodies. Rapid and accurate detection of SARS-CoV-2 variants provides crucial information concerning the outbreaks of SARS-CoV-2 variants and possible lines of transmission. This information is vital for infection prevention…
▽ More
The increasing prevalence of SARS-CoV-2 variants with spike mutations has raised concerns owing to higher transmission rates, disease severity, and escape from neutralizing antibodies. Rapid and accurate detection of SARS-CoV-2 variants provides crucial information concerning the outbreaks of SARS-CoV-2 variants and possible lines of transmission. This information is vital for infection prevention and control. We used a Cas12a-based RT-PCR combined with CRISPR on-site rapid detection system (RT-CORDS) platform to detect the key mutations in SARS-COV-2 variants, such as 69/70 deletion, N501Y, and D614G. We used type-specific CRISPR RNAs (crRNAs) to identify wild-type (crRNA-W) and mutant (crRNA-M) sequences of SARS-CoV-2. We successfully differentiated mutant variants from wild-type SARS-CoV-2 with a sensitivity of $10^{-17}$ M (approximately 6 copies/$μ$L). The assay took just 10 min with the Cas12a/crRNA reaction after a simple RT-PCR using a fluorescence reporting system. In addition, a sensitivity of $10^{-16}$ M could be achieved when lateral flow strips were used as readouts. The accuracy of RT-CORDS for SARS-CoV-2 variant detection was 100% consistent with the sequencing data. In conclusion, using the RT-CORDS platform, we accurately, sensitively, specifically, and rapidly detected SARS-CoV-2 variants. This method may be used in clinical diagnosis.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
A machine learning approach to using Quality-of-Life patient scores in guiding prostate radiation therapy dosing
Authors:
Zhijian Yang,
Daniel Olszewski,
Chujun He,
Giulia Pintea,
Jun Lian,
Tom Chou,
Ronald Chen,
Blerta Shtylla
Abstract:
Thanks to advancements in diagnosis and treatment, prostate cancer patients have high long-term survival rates. Currently, an important goal is to preserve quality-of-life during and after treatment. The relationship between the radiation a patient receives and the subsequent side effects he experiences is complex and difficult to model or predict. Here, we use machine learning algorithms and stat…
▽ More
Thanks to advancements in diagnosis and treatment, prostate cancer patients have high long-term survival rates. Currently, an important goal is to preserve quality-of-life during and after treatment. The relationship between the radiation a patient receives and the subsequent side effects he experiences is complex and difficult to model or predict. Here, we use machine learning algorithms and statistical models to explore the connection between radiation treatment and post-treatment gastro-urinary function. Since only a limited number of patient datasets are currently available, we used image flip** and curvature-based interpolation methods to generate more data in order to leverage transfer learning. Using interpolated and augmented data, we trained a convolutional autoencoder network to obtain near-optimal starting points for the weights. A convolutional neural network then analyzed the relationship between patient-reported quality-of-life and radiation. We also used analysis of variance and logistic regression to explore organ sensitivity to radiation and develop dosage thresholds for each organ region. Our findings show no connection between the bladder and quality-of-life scores. However, we found a connection between radiation applied to posterior and anterior rectal regions to changes in quality-of-life. Finally, we estimated radiation therapy dosage thresholds for each organ. Our analysis connects machine learning methods with organ sensitivity, thus providing a framework for informing cancer patient care using patient reported quality-of-life metrics.
△ Less
Submitted 21 May, 2020;
originally announced May 2020.
-
Unsupervised Representations of Pollen in Bright-Field Microscopy
Authors:
Chloe He,
Gerard Glowacki,
Alexis Gkantiragas
Abstract:
We present the first unsupervised deep learning method for pollen analysis using bright-field microscopy. Using a modest dataset of 650 images of pollen grains collected from honey, we achieve family level identification of pollen. We embed images of pollen grains into a low-dimensional latent space and compare Euclidean and Riemannian metrics on these spaces for clustering. We propose this syst…
▽ More
We present the first unsupervised deep learning method for pollen analysis using bright-field microscopy. Using a modest dataset of 650 images of pollen grains collected from honey, we achieve family level identification of pollen. We embed images of pollen grains into a low-dimensional latent space and compare Euclidean and Riemannian metrics on these spaces for clustering. We propose this system for automated analysis of pollen and other microscopic biological structures which have only small or unlabelled datasets available.
△ Less
Submitted 5 August, 2019;
originally announced August 2019.
-
Honey Authentication with Machine Learning Augmented Bright-Field Microscopy
Authors:
Chloe He,
Alexis Gkantiragas,
Gerard Glowacki
Abstract:
Honey has been collected and used by humankind as both a food and medicine for thousands of years. However, in the modern economy, honey has become subject to mislabelling and adulteration making it the third most faked food product in the world. The international scale of fraudulent honey has had both economic and environmental ramifications. In this paper, we propose a novel method of identify…
▽ More
Honey has been collected and used by humankind as both a food and medicine for thousands of years. However, in the modern economy, honey has become subject to mislabelling and adulteration making it the third most faked food product in the world. The international scale of fraudulent honey has had both economic and environmental ramifications. In this paper, we propose a novel method of identifying fraudulent honey using machine learning augmented microscopy.
△ Less
Submitted 28 December, 2018;
originally announced January 2019.
-
Convolutional Neural Networks for Automated Annotation of Cellular Cryo-Electron Tomograms
Authors:
Muyuan Chen,
Wei Dai,
Ying Sun,
Darius Jonasch,
Cynthia Y He,
Michael F. Schmid,
Wah Chiu,
Steven J Ludtke
Abstract:
Cellular Electron Cryotomography (CryoET) offers the ability to look inside cells and observe macromolecules frozen in action. A primary challenge for this technique is identifying and extracting the molecular components within the crowded cellular environment. We introduce a method using neural networks to dramatically reduce the time and human effort required for subcellular annotation and featu…
▽ More
Cellular Electron Cryotomography (CryoET) offers the ability to look inside cells and observe macromolecules frozen in action. A primary challenge for this technique is identifying and extracting the molecular components within the crowded cellular environment. We introduce a method using neural networks to dramatically reduce the time and human effort required for subcellular annotation and feature extraction. Subsequent subtomogram classification and averaging yields in-situ structures of molecular components of interest.
△ Less
Submitted 11 June, 2017; v1 submitted 19 January, 2017;
originally announced January 2017.
-
Prevalence of algorithm-based qualitative (ABQ) method osteoporotic vertebral fracture in elderly Chinese men and women with reference to semi-quantitative (SQ) method: Mr. Os and Ms Os. (Hong Kong) studies
Authors:
Xian Jun Zeng,
Min Deng,
Yi Xiang Wang,
James F. Griffith,
Lai Chang He,
Anthony W. L. Kwok,
Jason C. S. Leung,
Timothy Kwok,
** Chung Leung
Abstract:
Introduction: This study evaluated algorithm-based qualitative (ABQ) method for vertebral fracture (VF) evaluation with reference to semi-quantitative (SQ) method and bone mineral density (BMD) measurement. Methods: Mr. OS (Hong Kong) and Ms. OS (Hong Kong) represent the first large-scale cohort studies on bone health in elderly Chinese men and women. The current study compared Genant's SQ method…
▽ More
Introduction: This study evaluated algorithm-based qualitative (ABQ) method for vertebral fracture (VF) evaluation with reference to semi-quantitative (SQ) method and bone mineral density (BMD) measurement. Methods: Mr. OS (Hong Kong) and Ms. OS (Hong Kong) represent the first large-scale cohort studies on bone health in elderly Chinese men and women. The current study compared Genant's SQ method and ABQ method in these two cohorts. Based on quantitative measurement, the severity of ABQ method detected fractures was additionally classified into grade-1, grad-2, and grade-3 according to SQ's deformity criteria. The radiographs of 1,954 elderly Chinese men (mean: 72.3 years) and 1,953 elderly Chinese women (mean: 72.5 years) were evaluated. Results: according to ABQ, grade-1,-2,-3 VFs accounted for 1.89%, 1.74%, 2.25% in men, and 3.33%, 3.07%, and 5.53% in women. In men and women, 15.7% (35/223) and 34.5% (48/139) of vertebrae with SQ grade-1 deformity were ABQ(+, with fracture) respectively. In men and women, 89.7% (35/39) and 66.7% (48/72) of vertebrae with ABQ grade-1 fracture had SQ grade-1 deformity. For grade-1 change, SQ (-, negative without fracture) & ABQ (+, positive with vertebral cortex line fracture) subjects tend to have a lower BMD than the SQ(+)& ABQ(-) subjects. In subjects with SQ grade-2 deformity, those were also ABQ(+) tended to have a lower BMD than those were ABQ(-). In all grades, SQ(-)&ABQ(-) subjects tended to have highest BMD, while SQ(+)&ABQ(+)subjects tended to have lowest BMD. Conclusion: ABQ method may be more sensitive to VF associated mild lower BMD than SQ method.
△ Less
Submitted 10 November, 2016;
originally announced November 2016.
-
GMOL: An Interactive Tool for 3D Genome Structure Visualization
Authors:
Jackson Nowotny,
Avery Wells,
Lingfei Xu,
Renzhi Cao,
Tuan Trieu,
Chenfeng He,
Jianlin Cheng
Abstract:
It has been shown that genome spatial structures largely affect both genome activity and DNA function. Knowing this, many researchers are currently attempting to accurately model genome structures. Despite these increased efforts there still exists a shortage of tools dedicated to visualizing the genome. Creating a tool that can accurately visualize the genome can aid researchers by highlighting s…
▽ More
It has been shown that genome spatial structures largely affect both genome activity and DNA function. Knowing this, many researchers are currently attempting to accurately model genome structures. Despite these increased efforts there still exists a shortage of tools dedicated to visualizing the genome. Creating a tool that can accurately visualize the genome can aid researchers by highlighting structural relationships that may not be obvious when examining the sequence information alone. Here we present a desktop application, known as GMOL, designed to effectively visualize genome tertiary structures at multiple scales so that researchers may better analyze their genomic data. GMOL was developed based upon our multi-scale approach that allows a user to zoom in and out between six separate levels within the genome. These six scales are full genome, chromosome, loci, fiber, nucleosome, and nucleotide. In order to store the data of the different scales, a new file format, known as GSS, was created. With GMOL, a user can choose any unit at any scale and scale it up or down to visualize its structure and retrieve corresponding genome sequences from either Ensembl or a local database. Users can also interactively manipulate and measure the whole genome structure and extract static images and machine-readable data files in PDB format from the multi-scale structure. By using GMOL researchers will be able to better understand and analyze genome structure models and the impact their structural relations have on genome activity and DNA function through GMOLs unique features and functions, which includes the multi-scale method that can satisfy the users requirement to not only visualize genome tertiary structure, but also measure it.
△ Less
Submitted 23 July, 2015;
originally announced July 2015.