Search | arXiv e-print repository

Exploring the Potential of Large Language Models in Graph Generation

Authors: Yang Yao, Xin Wang, Zeyang Zhang, Yijian Qin, Ziwei Zhang, Xu Chu, Yuekui Yang, Wenwu Zhu, Hong Mei

Abstract: Large language models (LLMs) have achieved great success in many fields, and recent works have studied exploring LLMs for graph discriminative tasks such as node classification. However, the abilities of LLMs for graph generation remain unexplored in the literature. Graph generation requires the LLM to generate graphs with given properties, which has valuable real-world applications such as drug d… ▽ More Large language models (LLMs) have achieved great success in many fields, and recent works have studied exploring LLMs for graph discriminative tasks such as node classification. However, the abilities of LLMs for graph generation remain unexplored in the literature. Graph generation requires the LLM to generate graphs with given properties, which has valuable real-world applications such as drug discovery, while tends to be more challenging. In this paper, we propose LLM4GraphGen to explore the ability of LLMs for graph generation with systematical task designs and extensive experiments. Specifically, we propose several tasks tailored with comprehensive experiments to address key questions regarding LLMs' understanding of different graph structure rules, their ability to capture structural type distributions, and their utilization of domain knowledge for property-based graph generation. Our evaluations demonstrate that LLMs, particularly GPT-4, exhibit preliminary abilities in graph generation tasks, including rule-based and distribution-based generation. We also observe that popular prompting methods, such as few-shot and chain-of-thought prompting, do not consistently enhance performance. Besides, LLMs show potential in generating molecules with specific properties. These findings may serve as foundations for designing good LLMs based models for graph generation and provide valuable insights and further research. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2310.15488 [pdf, other]

doi 10.1088/1367-2630/ad345d

Reputation-based synergy and discounting mechanism promotes cooperation

Authors: Wenqiang Zhu, Xin Wang, Chaoqian Wang, Longzhao Liu, Hongwei Zheng, Shaoting Tang

Abstract: A good group reputation often facilitates more efficient synergistic teamwork in production activities. Here we translate this simple motivation into a reputation-based synergy and discounting mechanism in the public goods game. Specifically, the reputation type of a group, either good or bad determined by a reputation threshold, modifies the nonlinear payoff structure described by a unified reput… ▽ More A good group reputation often facilitates more efficient synergistic teamwork in production activities. Here we translate this simple motivation into a reputation-based synergy and discounting mechanism in the public goods game. Specifically, the reputation type of a group, either good or bad determined by a reputation threshold, modifies the nonlinear payoff structure described by a unified reputation impact factor. Results show that this reputation-based incentive mechanism could effectively promote cooperation compared with linear payoffs, despite the coexistence of synergy and discounting effects. Notably, the complicated interactions between reputation impact and reputation threshold result in a sharp phase transition from full cooperation to full defection. We also find that the presence of a few discounting groups could increase the average payoffs of cooperators, leading to an interesting phenomenon that when the reputation threshold is raised, the gap between the average payoffs of cooperations and defectors increases while the overall payoff decreases. Our work provides important insights into facilitating cooperation in social groups. △ Less

Submitted 5 November, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

Journal ref: New J. Phys. 26 (2024) 033046

arXiv:2208.13994 [pdf]

HiGNN: Hierarchical Informative Graph Neural Networks for Molecular Property Prediction Equipped with Feature-Wise Attention

Authors: Weimin Zhu, Yi Zhang, DuanCheng Zhao, Jianrong Xu, Ling Wang

Abstract: Elucidating and accurately predicting the druggability and bioactivities of molecules plays a pivotal role in drug design and discovery and remains an open challenge. Recently, graph neural networks (GNN) have made remarkable advancements in graph-based molecular property prediction. However, current graph-based deep learning methods neglect the hierarchical information of molecules and the relati… ▽ More Elucidating and accurately predicting the druggability and bioactivities of molecules plays a pivotal role in drug design and discovery and remains an open challenge. Recently, graph neural networks (GNN) have made remarkable advancements in graph-based molecular property prediction. However, current graph-based deep learning methods neglect the hierarchical information of molecules and the relationships between feature channels. In this study, we propose a well-designed hierarchical informative graph neural networks framework (termed HiGNN) for predicting molecular property by utilizing a co-representation learning of molecular graphs and chemically synthesizable BRICS fragments. Furthermore, a plug-and-play feature-wise attention block is first designed in HiGNN architecture to adaptively recalibrate atomic features after the message passing phase. Extensive experiments demonstrate that HiGNN achieves state-of-the-art predictive performance on many challenging drug discovery-associated benchmark datasets. In addition, we devise a molecule-fragment similarity mechanism to comprehensively investigate the interpretability of HiGNN model at the subgraph level, indicating that HiGNN as a powerful deep learning tool can help chemists and pharmacists identify the key components of molecules for designing better molecules with desired properties or functions. The source code is publicly available at https://github.com/idruglab/hignn. △ Less

Submitted 30 August, 2022; originally announced August 2022.

arXiv:2206.11769 [pdf, other]

Single-phase deep learning in cortico-cortical networks

Authors: Will Greedy, Heng Wei Zhu, Joseph Pemberton, Jack Mellor, Rui Ponte Costa

Abstract: The error-backpropagation (backprop) algorithm remains the most common solution to the credit assignment problem in artificial neural networks. In neuroscience, it is unclear whether the brain could adopt a similar strategy to correctly modify its synapses. Recent models have attempted to bridge this gap while being consistent with a range of experimental observations. However, these models are ei… ▽ More The error-backpropagation (backprop) algorithm remains the most common solution to the credit assignment problem in artificial neural networks. In neuroscience, it is unclear whether the brain could adopt a similar strategy to correctly modify its synapses. Recent models have attempted to bridge this gap while being consistent with a range of experimental observations. However, these models are either unable to effectively backpropagate error signals across multiple layers or require a multi-phase learning process, neither of which are reminiscent of learning in the brain. Here, we introduce a new model, Bursting Cortico-Cortical Networks (BurstCCN), which solves these issues by integrating known properties of cortical networks namely bursting activity, short-term plasticity (STP) and dendrite-targeting interneurons. BurstCCN relies on burst multiplexing via connection-type-specific STP to propagate backprop-like error signals within deep cortical networks. These error signals are encoded at distal dendrites and induce burst-dependent plasticity as a result of excitatory-inhibitory top-down inputs. First, we demonstrate that our model can effectively backpropagate errors through multiple layers using a single-phase learning process. Next, we show both empirically and analytically that learning in our model approximates backprop-derived gradients. Finally, we demonstrate that our model is capable of learning complex image classification tasks (MNIST and CIFAR-10). Overall, our results suggest that cortical features across sub-cellular, cellular, microcircuit and systems levels jointly underlie single-phase efficient deep learning in the brain. △ Less

Submitted 24 October, 2022; v1 submitted 23 June, 2022; originally announced June 2022.

Comments: Accepted to 36th Conference on Neural Information Processing Systems (NeurIPS 2022). 22 pages, 9 figures, 5 tables

arXiv:2201.09647 [pdf, other]

AlphaFold Accelerates Artificial Intelligence Powered Drug Discovery: Efficient Discovery of a Novel Cyclin-dependent Kinase 20 (CDK20) Small Molecule Inhibitor

Authors: Feng Ren, Xiao Ding, Min Zheng, Mikhail Korzinkin, Xin Cai, Wei Zhu, Alexey Mantsyzov, Alex Aliper, Vladimir Aladinskiy, Zhongying Cao, Shanshan Kong, Xi Long, Bonnie Hei Man Liu, Yingtao Liu, Vladimir Naumov, Anastasia Shneyderman, Ivan V. Ozerov, Ju Wang, Frank W. Pun, Alan Aspuru-Guzik, Michael Levitt, Alex Zhavoronkov

Abstract: The AlphaFold computer program predicted protein structures for the whole human genome, which has been considered as a remarkable breakthrough both in artificial intelligence (AI) application and structural biology. Despite the varying confidence level, these predicted structures still could significantly contribute to structure-based drug design of novel targets, especially the ones with no or li… ▽ More The AlphaFold computer program predicted protein structures for the whole human genome, which has been considered as a remarkable breakthrough both in artificial intelligence (AI) application and structural biology. Despite the varying confidence level, these predicted structures still could significantly contribute to structure-based drug design of novel targets, especially the ones with no or limited structural information. In this work, we successfully applied AlphaFold in our end-to-end AI-powered drug discovery engines constituted of a biocomputational platform PandaOmics and a generative chemistry platform Chemistry42, to identify a first-in-class hit molecule of a novel target without an experimental structure starting from target selection towards hit identification in a cost- and time-efficient manner. PandaOmics provided the targets of interest and Chemistry42 generated the molecules based on the AlphaFold predicted structure, and the selected molecules were synthesized and tested in biological assays. Through this approach, we identified a small molecule hit compound for CDK20 with a Kd value of 8.9 +/- 1.6 uM (n = 4) within 30 days from target selection and after only synthesizing 7 compounds. Based on the available data, the second round of AI-powered compound generation was conducted and through which, a more potent hit molecule, ISM042-2 048, was discovered with a Kd value of 210.0 +/- 42.4 nM (n = 2), within 30 days and after synthesizing 6 compounds from the discovery of the first hit ISM042-2-001. To the best of our knowledge, this is the first reported small molecule targeting CDK20 and more importantly, this work is the first demonstration of AlphaFold application in the hit identification process in early drug discovery. △ Less

Submitted 12 February, 2022; v1 submitted 21 January, 2022; originally announced January 2022.

Comments: 9 pages, 6 figures

arXiv:2007.10316 [pdf, other]

Auxiliary Diagnosing Coronary Stenosis Using Machine Learning

Authors: Weijun Zhu, Fengyuan Lu, Xiaoyu Yang, En Li

Abstract: How to accurately classify and diagnose whether an individual has Coronary Stenosis (CS) without invasive physical examination? This problem has not been solved satisfactorily. To this end, the four machine learning (ML) algorithms, i.e., Boosted Tree (BT), Decision Tree (DT), Logistic Regression (LR) and Random Forest (RF) are employed in this paper. First, eleven features including basic informa… ▽ More How to accurately classify and diagnose whether an individual has Coronary Stenosis (CS) without invasive physical examination? This problem has not been solved satisfactorily. To this end, the four machine learning (ML) algorithms, i.e., Boosted Tree (BT), Decision Tree (DT), Logistic Regression (LR) and Random Forest (RF) are employed in this paper. First, eleven features including basic information of an individual, symptoms and results of routine physical examination are selected, as well as one label is specified, indicating whether an individual suffers from different severity of coronary artery stenosis or not. On the basis of it, a sample set is constructed. Second, each of these four ML algorithms learns from the sample set to obtain the corresponding optimal classified results, respectively. The experimental results show that: RF performs better than other three algorithms, and the former algorithm classifies whether an individual has CS with an accuracy of 95.7% (=90/94). △ Less

Submitted 7 September, 2021; v1 submitted 16 July, 2020; originally announced July 2020.

Comments: 6 pages, 3 figure, 4 tables

arXiv:1803.11062 [pdf, other]

Analyzing DNA Hybridization via machine learning

Authors: Weijun Zhu

Abstract: In DNA computing, it is impossible to decide whether a specific hybridization among complex DNA molecules is effective or not within acceptable time. In order to address this common problem, we introduce a new method based on the machine learning technique. First, a sample set is employed to train the Boosted Tree (BT) algorithm, and the corresponding model is obtained. Second, this model is used… ▽ More In DNA computing, it is impossible to decide whether a specific hybridization among complex DNA molecules is effective or not within acceptable time. In order to address this common problem, we introduce a new method based on the machine learning technique. First, a sample set is employed to train the Boosted Tree (BT) algorithm, and the corresponding model is obtained. Second, this model is used to predict classification results of molecular hybridizations. The experiments show that the average accuracy of the new method is over 94.2%, and its average efficiency is over 90839 times higher than that of the existing method. These results indicate that the new method can quickly and accurately determine the biological effectiveness of molecular hybridization for a given DNA design. △ Less

Submitted 2 July, 2018; v1 submitted 27 March, 2018; originally announced March 2018.

Comments: 11 pages, 5 figures

arXiv:1707.00625 [pdf]

The Strong Cell-based Hydrogen Peroxide Generation Triggered by Cold Atmospheric Plasma

Authors: Dayun Yan, Haitao Cui, Wei Zhu, Annie Talbot, Lijie Grace Zhang, Jonathan H. Sherman, Michael Keidar

Abstract: Hydrogen peroxide (H2O2) is an important signaling molecule in cancer cells. However, the significant secretion of H2O2 by cancer cells have been rarely observed. Cold atmospheric plasma (CAP) is a near room temperature ionized gas composed of neutral particles, charged particles, reactive species, and electrons. Here, we first demonstrated that breast cancer cells and pancreatic adenocarcinoma ce… ▽ More Hydrogen peroxide (H2O2) is an important signaling molecule in cancer cells. However, the significant secretion of H2O2 by cancer cells have been rarely observed. Cold atmospheric plasma (CAP) is a near room temperature ionized gas composed of neutral particles, charged particles, reactive species, and electrons. Here, we first demonstrated that breast cancer cells and pancreatic adenocarcinoma cells generated micromolar level H2O2 during just 1 min of direct CAP treatment on these cells. The cell-based H2O2 generation is affected by the medium volume, the cell confluence, as well as the discharge voltage. The application of cold atmospheric plasma (CAP) in the cancer treatment has been intensively investigated over the past decade. Several cellular responses to the CAP treatment have been observed including the consumption of the CAP-originated reactive species, the rise of intracellular reactive oxygen species, the damage on DNA and mitochondria, as well as the activation of apoptotic events. This is a new previously unknown cellular response to CAP, which provides a new prospective to understand the interaction between CAP and cells. △ Less

Submitted 28 May, 2017; originally announced July 2017.

arXiv:1302.5507 [pdf]

doi 10.1371/journal.pone.0065632

SOAP3-dp: Fast, Accurate and Sensitive GPU-based Short Read Aligner

Authors: Ruibang Luo, Thomas Wong, Jianqiao Zhu, Chi-Man Liu, Edward Wu, Lap-Kei Lee, Haoxiang Lin, Wenjuan Zhu, David W. Cheung, Hing-Fung Ting, Siu-Ming Yiu, Chang Yu, Yingrui Li, Ruiqiang Li, Tak-Wah Lam

Abstract: To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including B… ▽ More To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, Bowtie2, SeqAlto, GEM and GPU-based aligners including BarraCUDA and CUSHAW, SOAP3-dp is two to tens of times faster, while maintaining the highest sensitivity and lowest false discovery rate (FDR) on Illumina reads with different lengths. Transcending its predecessor SOAP3, which does not allow gapped alignment, SOAP3-dp by default tolerates alignment similarity as low as 60 percent. Real data evaluation using human genome demonstrates SOAP3-dp's power to enable more authentic variants and longer Indels to be discovered. Fosmid sequencing shows a 9.1 percent FDR on newly discovered deletions. SOAP3-dp natively supports BAM file format and provides a scoring scheme same as BWA, which enables it to be integrated into existing analysis pipelines. SOAP3-dp has been deployed on Amazon-EC2, NIH-Biowulf and Tianhe-1A. △ Less

Submitted 23 March, 2013; v1 submitted 22 February, 2013; originally announced February 2013.

Comments: 21 pages, 6 figures, submitted to PLoS ONE, additional files available at "https://www.dropbox.com/sh/bhclhxpoiubh371/O5CO_CkXQE". Comments most welcome

Showing 1–9 of 9 results for author: Zhu, W