3D-Mol: A Novel Contrastive Learning Framework for Molecular Property Prediction with 3D Information

Taojie Kuang
Peng Cheng Laboratory
South China University of Technology
[email protected]
&Yiming Ren
Peng Cheng Laboratory
[email protected]
&Zhixiang Ren
Peng Cheng Laboratory
[email protected]
Corresponding author
Abstract

Molecular property prediction, crucial for early drug candidate screening and optimization, has seen advancements with deep learning-based methods. While deep learning-based methods have advanced considerably, they often fall short in fully leveraging 3D spatial information. Specifically, current molecular encoding techniques tend to inadequately extract spatial information, leading to ambiguous representations where a single one might represent multiple distinct molecules. Moreover, existing molecular modeling methods focus predominantly on the most stable 3D conformations, neglecting other viable conformations present in reality. To address these issues, we propose 3D-Mol, a novel approach designed for more accurate spatial structure representation. It deconstructs molecules into three hierarchical graphs to better extract geometric information. Additionally, 3D-Mol leverages contrastive learning for pretraining on 20 million unlabeled data, treating their conformations with identical topological structures as weighted positive pairs and contrasting ones as negatives, based on the similarity of their 3D conformation descriptors and fingerprints. We compare 3D-Mol with various state-of-the-art baselines on 7 benchmarks and demonstrate our outstanding performance.

1 Introduction

Molecular property prediction can effectively accelerate drug discovery by prioritizing promising compounds, streamlining drug development and increasing success rates. Moreover, it contributes to the comprehension of structure-activity relationships by demonstrating the influence of particular features on molecular interactions and other biological effects. Recently, deep learning methods have significantly advanced molecular property prediction, providing enhanced accuracy and deeper insights into complex molecular behaviors. The integration of 3D molecular information, which includes a comprehensive view of molecular structures, significantly enhance the model’s understanding of molecular properties and interactions. However, the expensive and time-consuming experiments result in the scarcity of labeled data, which significantly constrains the capacity of deep learning methods to extract 3D spatial information.
To fully understand the knowledge in unlabeled data, numerous methods based on self-supervised learning have been proposed to enhance the performance of molecular property prediction. For example, early work[1, 2, 3] employed self-supervised learning approaches for processing data represented in the Simplified Molecular Input Line Entry System (SMILES)[4]. However, SMILES is not adequate for the representation of the topological structure of molecule, making it challenging to provide reliable results. In parallel, various self-supervised learning methods based on molecular graph[5, 6, 7, 8, 9, 10, 11] despite employing molecular graph data to encode the topological structure of molecule, neglects the critical three-dimensional spatial information of the molecule. Since different 3D structures may lead to dissimilar molecular properties despite having the same 2D molecular topology. As an example shown in Figure 1, Thalidomide, a sedative treatment for morning sickness in pregnant women in the 1950s, has two distinct 3D structures, R-Thalidomide and S-Thalidomide. The former has desired drug effects, while the latter has been implicated in teratogenesis. Recently, several works[12, 13, 14, 15, 13, 16, 17] utilizing molecular 3D structures have been introduced. However, limited by pretraining methods, they have not fully learned the 3D spatial information in unlabeled data. Specifically, these methods focus only on the most stable (lowest energy) 3D conformations, neglecting other existing conformations. Therefore, it is imperative to develop an approach that comprehensively acquires 3D analytical insights, encompassing both pretrain strategy and encoding technique.
To address these issues, we propose a novel framework, 3D-Mol, for molecular representation and property prediction. We employ three graphs to hierarchically represent the atom-bond, bond-angle, and dihedral information of molecule, integrating information from these hierarchies through a message-passing strategy to obtain a comprehensive molecular representation. Moreover, by using a vast amount of unlabeled data, we create a novel self-supervised method, weighted contrastive learning, to pretrain our molecular encoder alongside the geometric approach from GearNet[18].

Refer to caption
Figure 1: Geometric difference leads to diverse properties. Thalidomide exists in two distinct 3D stereoisomeric forms, known as R-Thalidomide and S-Thalidomide. These two molecules can be represented by the same SMILES, but they have significantly dissimilar properties. The former is recognized for its therapeutic properties, while the latter has been implicated in teratogenesis.

In the proposed contrastive learning, conformations derived from the same SMILES are considered weighted positive pairs, while different ones are treated as weighted negative pairs, with weights indicating 3D conformation descriptor/fingerprint similarity. The molecular encoder is then finetuned on downstream tasks to predict molecular properties. Finally, we compare our approach with several state-of-the-art(SOTA) baselines on 7 molecular property prediction benchmarks[19], where our method achieves the best results on 5 benchmarks. In summary, our main contributions are as follows:
\bullet We propose a novel molecular embedding method based on hierarchical graph representation to thoroughly extract the 3D spatial structural features of molecule.
\bullet We improve the contrastive learning approach by utilizing 3D conformational information by considering conformations with the same SMILES as positive pairs and the opposites as negative pairs, while kee** the weight to indicate the 3D conformation descriptor and fingerprint similarity.
\bullet We evaluate 3D-Mol on various molecular property prediction benchmarks, showing that our model can significantly outperform existing competitive models on multiple tests.

2 Related Work

Due to the unique nature of molecular representation and the scarcity of labeled molecular data, existing methods generally use two methods to enhance the performance of molecular property prediction. One insight entails the development of a novel molecular encoder tailored to molecular data for efficient molecular information extraction. The other emphasizes fully harnessing the potential of unlabeled data, typically by devising a unique pretraining approach to pretrain the molecular encoder using a large amount of unlabeled data. Details of the key components for each strategy are listed below.

2.1 Molecular Representation and Encoder

Molecular representation and encoding are essential for accurate property prediction, vital in applications like molecular design and drug discovery. Some early works[20, 21] learned representation from chemical fingerprints (FP), such as ECFP[22] and MACCS[23]. Other works learned representation from molecular descriptors, such as SMILES. Inspired by mature NLP models, SMILES-BERT[24] used SMILES to extract molecular representations by applying the BERT[25] pretrain strategy. However, these methods depend on feature engineering, failing to capture the complete topological structure information of molecule.
Since a molecular graph is a natural representation of a molecule and conveys topological information, several research in recent years have embraced it as a means of molecular representation. GG-NN[5], DMPNN[6], and DeepAtomicCharge[26] employed a message passing strategy for molecular property prediction. AttentiveFP[9] used a graph attention network to aggregate and update node information. The MP-GNN[27] merged specific-scale graph neural network (GNN) and element-specific GNN, capturing various atomic interactions of multiphysical representations at different scales. MGCN[28] designed a graph convolution network to capture multilevel quantum interactions from the conformation and spatial information of molecule.
The works mentioned above focus on 2D molecular representation, which might miss crucial chemical details[29] and prove insufficient for accurate molecular property prediction[30]. Recently, some studies have attempted to enhance performance by modeling 3D molecular structure. Significant efforts have been made to employ 3D voxel-based representations for understanding molecular structures. Stepniewska et al. [31] used 3D convolutions to estimate the binding affinity of ligand-receptor complexes. libmolgrid [32] provided a library representing 3D molecular structures as multidimensional voxelized grids. OctSurf [33] used an octree-based representation to describe the interaction between protein pockets and ligands. In addition to voxel-based representations, several methods have been developed to embed 3D molecular information directly into GNNs. SGCN[15] applied different weights according to atomic distances during the GCN-based message passing process. SchNet[12] modeled complex atomic interactions using Gaussian radial basis functions for potential energy surface prediction to accelerate the exploration of chemical space. DimeNet[13] proposed directional message passing to fully utilize directional information within molecule. GEM[16] developed a novel geometrically enhanced molecular representation learning method and employs a specifically designed geometric-based GNN structure. However, these methods do not fully exploit the 3D structural information(like dihedral angle) of molecule.

2.2 Self-supervised Learning on Molecule

Self-supervised learning, with its substantial success in various research domains, has inspired numerous molecular property prediction studies. Influenced by methods such as BERT[25] and GPT[34], these studies employ this approach to efficiently harness large volumes of unlabeled data for pretraining. For one-dimensional data, SMILES is frequently used to extract molecular representations in the pretraining stage. SMILES2Vec[1] employed the RNN to extract features from SMILES. ChemBERTa[3] followed RoBERTa[35] by employing masked language modeling as a pretraining task, predicting masked tokens to restore the original sentence, which helped models understand sequence semantics. SMILES Transformer[36] used a SMILES string as input to produce a temporary embedding, which is then restored to the original input by a decoder.
As the topological information of molecular graphs received greater attention, numerous pretraining methods focused on molecular graph data have been proposed. N-gram graph[8] used the n-gram method in NLP to extract representations of molecule. PretrainGNN[7] proposed a new pretrain strategy, including node-level and graph-level self-supervised pretraining tasks. GraphCL[37], MOCL[38], and MolCLR[10] performed molecular contrastive learning via GNN by proposing new molecular graph augmentation methods. MPG[39] and GROVER[11] focused on node level and graph level representation and designed corresponding pretraining tasks at both levels. iMolCLR[40], Sugar[41] and ReLMole[42] focused on the substructure of molecule, and designed the substructure pretraining task by using substructure information.
With the 3D structure information of molecule proven to boost molecular property prediction, recent works have focused on pretraining tasks for the 3D structure information of molecule. 3DGCN[43] introduced a relative position matrix that includes 3D positions between atoms to ensure translational invariance during convolution. GraphMVP[44] proposed the SSL method involving contrastive learning and generative learning between 3D and 2D molecular views. GEM[16] proposed a self-supervised framework using molecular geometric information by constructing a new bond angle graph, where the chemical bonds within a molecule are considered as nodes and the angle formed between two bonds is considered as the edge. Uni-Mol[17] employed a transformer model to extract molecular representation by predicting atom distance. However, These works utilize only the most stable 3D conformation, thereby overlooking other conformations that exist in the real world.

3 Method

This section outlines the creation of 3D-Mol, a framework designed for 3D structural molecular property prediction, focusing on hierarchical graph-based molecular representation and the strategic weighting of contrastive pairs. Figure 2 provides an overview, with subsequent parts delving into specifics.

3.1 Molecular Encoder

3.1.1 Hierarchical Graph

Molecular raw data is represented by SMILES in most molecular databases. To extract spatial structure information from molecule, we use RDKit[45] to transform the SMILES representation into 3D molecular conformations. To fully extract 3D construct information, we deconstruct molecular conformation into three hierarchical graphs, denoted as Mol={Gab,Gba,Gda}𝑀𝑜𝑙subscript𝐺𝑎𝑏subscript𝐺𝑏𝑎subscript𝐺𝑑𝑎Mol=\{G_{a-b},G_{b-a},G_{d-a}\}italic_M italic_o italic_l = { italic_G start_POSTSUBSCRIPT italic_a - italic_b end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT italic_b - italic_a end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT italic_d - italic_a end_POSTSUBSCRIPT }. The atom-bond graph, commonly used as a 2D molecular graph, is represented as Gab={V,E,Patom,Pbond}subscript𝐺𝑎𝑏𝑉𝐸subscript𝑃𝑎𝑡𝑜𝑚subscript𝑃𝑏𝑜𝑛𝑑G_{a-b}=\{V,E,P_{atom},P_{bond}\}italic_G start_POSTSUBSCRIPT italic_a - italic_b end_POSTSUBSCRIPT = { italic_V , italic_E , italic_P start_POSTSUBSCRIPT italic_a italic_t italic_o italic_m end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT italic_b italic_o italic_n italic_d end_POSTSUBSCRIPT }, where V𝑉Vitalic_V is the set of atoms and E𝐸Eitalic_E is the set of bonds. PatomR|V|datomsubscript𝑃𝑎𝑡𝑜𝑚superscript𝑅𝑉subscript𝑑𝑎𝑡𝑜𝑚P_{atom}\in R^{|V|*d_{atom}}italic_P start_POSTSUBSCRIPT italic_a italic_t italic_o italic_m end_POSTSUBSCRIPT ∈ italic_R start_POSTSUPERSCRIPT | italic_V | ∗ italic_d start_POSTSUBSCRIPT italic_a italic_t italic_o italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT are the attributes of atoms, and datomsubscript𝑑𝑎𝑡𝑜𝑚d_{atom}italic_d start_POSTSUBSCRIPT italic_a italic_t italic_o italic_m end_POSTSUBSCRIPT is the number of atom attributes. PbondR|E|dbondsubscript𝑃𝑏𝑜𝑛𝑑superscript𝑅𝐸subscript𝑑𝑏𝑜𝑛𝑑P_{bond}\in R^{|E|*d_{bond}}italic_P start_POSTSUBSCRIPT italic_b italic_o italic_n italic_d end_POSTSUBSCRIPT ∈ italic_R start_POSTSUPERSCRIPT | italic_E | ∗ italic_d start_POSTSUBSCRIPT italic_b italic_o italic_n italic_d end_POSTSUBSCRIPT end_POSTSUPERSCRIPT are the attributes of bonds, and dbondsubscript𝑑𝑏𝑜𝑛𝑑d_{bond}italic_d start_POSTSUBSCRIPT italic_b italic_o italic_n italic_d end_POSTSUBSCRIPT is the number of bond attributes. The bond-angle graph, is represented as Gba={E,P,Angθ}subscript𝐺𝑏𝑎𝐸𝑃𝐴𝑛subscript𝑔𝜃G_{b-a}=\{E,P,Ang_{\theta}\}italic_G start_POSTSUBSCRIPT italic_b - italic_a end_POSTSUBSCRIPT = { italic_E , italic_P , italic_A italic_n italic_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT }, where P𝑃Pitalic_P is a set of the plane that is comprised of 3 connected atoms. Angθ𝐴𝑛subscript𝑔𝜃Ang_{\theta}italic_A italic_n italic_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT is the set of corresponding bond angles θ𝜃\thetaitalic_θ. The dihedral-angle graph, is represented as Gda={P,D,Angϕ}subscript𝐺𝑑𝑎𝑃𝐷𝐴𝑛subscript𝑔italic-ϕG_{d-a}=\{P,D,Ang_{\phi}\}italic_G start_POSTSUBSCRIPT italic_d - italic_a end_POSTSUBSCRIPT = { italic_P , italic_D , italic_A italic_n italic_g start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT }.

Refer to caption
Figure 2: The overview of the 3D-Mol model framework. a) In the pretraining stage, we employ weighted contrastive learning to effectively pretrain our model. In addition to using the mask strategy for graph data augmentation, we consider conformations from the same SMILES as positive pairs, while the weight represents their 3D conformation descriptor similarity. Conversely, distinct topological structures are treated as negative pairs, and the weight is dependent on fingerprint differences. b) In the finetuning stage, we use one well-pretrained encoder model to refine our approach across diverse downstream datasets through supervised learning.

The attributes of the plane are the attributes of 3 connected atoms and the corresponding bonds. D𝐷Ditalic_D represents the set of two connected planes, which connect with a bond. Angϕ𝐴𝑛subscript𝑔italic-ϕAng_{\phi}italic_A italic_n italic_g start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT represents the corresponding dihedral angle ϕitalic-ϕ\phiitalic_ϕ. These three graphs represent an actual molecule, and help our encoder learn 3D structure information.

3.1.2 Attribute Embedding

The 3D information of the molecule, such as the length of bonds and the angle between bonds, carries key chemical information. Firstly, we convert spatial characteristics to latent vectors. Referring to the previous work[14], we employed RBF(Radial basis function) layers to encode different geometric factors:

Flk=exp(βlk(exp(l)μlk)2)Wlksubscriptsuperscript𝐹𝑘𝑙𝑒𝑥𝑝subscriptsuperscript𝛽𝑘𝑙superscript𝑒𝑥𝑝𝑙subscriptsuperscript𝜇𝑘𝑙2subscriptsuperscript𝑊𝑘𝑙F^{k}_{l}=exp(-\beta^{k}_{l}(exp(-l)-\mu^{k}_{l})^{2})*W^{k}_{l}italic_F start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = italic_e italic_x italic_p ( - italic_β start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_e italic_x italic_p ( - italic_l ) - italic_μ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ∗ italic_W start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT (1)

where Flksubscriptsuperscript𝐹𝑘𝑙F^{k}_{l}italic_F start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT is the k-dimensional feature of bond length l𝑙litalic_l, and μlksubscriptsuperscript𝜇𝑘𝑙\mu^{k}_{l}italic_μ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT and βlksubscriptsuperscript𝛽𝑘𝑙\beta^{k}_{l}italic_β start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT are the center and width of l𝑙litalic_l respectively. μlksubscriptsuperscript𝜇𝑘𝑙\mu^{k}_{l}italic_μ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT is 0.1k𝑘kitalic_k and βlksubscriptsuperscript𝛽𝑘𝑙\beta^{k}_{l}italic_β start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT is 10. Similarly, the k-dimensional features of Fθksubscriptsuperscript𝐹𝑘𝜃F^{k}_{\theta}italic_F start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and Fϕksubscriptsuperscript𝐹𝑘italic-ϕF^{k}_{\phi}italic_F start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT of x are computed as:

Fθk=exp(βθk(θμθk)2)Wθksubscriptsuperscript𝐹𝑘𝜃𝑒𝑥𝑝subscriptsuperscript𝛽𝑘𝜃superscript𝜃subscriptsuperscript𝜇𝑘𝜃2subscriptsuperscript𝑊𝑘𝜃F^{k}_{\theta}=exp(-\beta^{k}_{\theta}(-\theta-\mu^{k}_{\theta})^{2})*W^{k}_{\theta}italic_F start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT = italic_e italic_x italic_p ( - italic_β start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( - italic_θ - italic_μ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ∗ italic_W start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT (2)
Fϕk=exp(βϕk(ϕμϕk)2)Wϕksubscriptsuperscript𝐹𝑘italic-ϕ𝑒𝑥𝑝subscriptsuperscript𝛽𝑘italic-ϕsuperscriptitalic-ϕsubscriptsuperscript𝜇𝑘italic-ϕ2subscriptsuperscript𝑊𝑘italic-ϕF^{k}_{\phi}=exp(\beta^{k}_{\phi}(-\phi-\mu^{k}_{\phi})^{2})*W^{k}_{\phi}italic_F start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT = italic_e italic_x italic_p ( italic_β start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( - italic_ϕ - italic_μ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ∗ italic_W start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT (3)

Where μθksubscriptsuperscript𝜇𝑘𝜃\mu^{k}_{\theta}italic_μ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and μϕksubscriptsuperscript𝜇𝑘italic-ϕ\mu^{k}_{\phi}italic_μ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT are denoted as the centers of bond angles and dihedral angles, respectively, establishing the peak of the function and centralizing the feature transformation. Similarly, the widths that determine the spread of the RBF, are represented as βθksubscriptsuperscript𝛽𝑘𝜃\beta^{k}_{\theta}italic_β start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT for bond angles and βϕksubscriptsuperscript𝛽𝑘italic-ϕ\beta^{k}_{\phi}italic_β start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT for dihedral angles. These widths dictate the spread of the function. The numerical values for these centers are set at π𝜋\piitalic_π/K, where K is the number of feature dimensions.

Refer to caption
Figure 3: The overview of the 3D-Mol encoder layer. The 3D-Mol encoder layer comprises three steps. Firstly, employing a message passing strategy, nodes in each graph exchange messages with their connected edges, leading to the updating of edge and node latent vectors. Secondly, the edge latent vector from the lower-level graph is transmitted to the higher-level graph as part of the node latent vector. Finally, the iteration is performed n times to derive the nthsubscript𝑛𝑡n_{th}italic_n start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT node latent vector, from which we extract the molecular latent vectors.

For the other attributes of atom and bond, we represent them with Patomsubscript𝑃𝑎𝑡𝑜𝑚P_{atom}italic_P start_POSTSUBSCRIPT italic_a italic_t italic_o italic_m end_POSTSUBSCRIPT and Pbondsubscript𝑃𝑏𝑜𝑛𝑑P_{bond}italic_P start_POSTSUBSCRIPT italic_b italic_o italic_n italic_d end_POSTSUBSCRIPT and embed them with the word embedding function. The initial features of atoms and bonds are represented as Fatom0subscriptsuperscript𝐹0𝑎𝑡𝑜𝑚F^{0}_{atom}italic_F start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a italic_t italic_o italic_m end_POSTSUBSCRIPT and Fbond0subscriptsuperscript𝐹0𝑏𝑜𝑛𝑑F^{0}_{bond}italic_F start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b italic_o italic_n italic_d end_POSTSUBSCRIPT respectively.

3.1.3 Graph Embedding

To embed the molecular hierarchical graph, we employ message passing strategy in {Gabi,Gbai,Gdai}subscriptsuperscript𝐺𝑖𝑎𝑏subscriptsuperscript𝐺𝑖𝑏𝑎subscriptsuperscript𝐺𝑖𝑑𝑎\{G^{i}_{a-b},G^{i}_{b-a},G^{i}_{d-a}\}{ italic_G start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a - italic_b end_POSTSUBSCRIPT , italic_G start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b - italic_a end_POSTSUBSCRIPT , italic_G start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d - italic_a end_POSTSUBSCRIPT }. For the ithsubscript𝑖𝑡i_{th}italic_i start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT layer in 3D-Mol, the information of those graphs will be updated by graph neural network. The overview is shown in figure 3, and the details are as follows:
First, we use GNNabi𝐺𝑁subscriptsuperscript𝑁𝑖𝑎𝑏GNN^{i}_{a-b}italic_G italic_N italic_N start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a - italic_b end_POSTSUBSCRIPT to aggregate the atom and bond latent vectors in Gabisubscriptsuperscript𝐺𝑖𝑎𝑏G^{i}_{a-b}italic_G start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a - italic_b end_POSTSUBSCRIPT. Given an atom v, its representation vector Fvisubscriptsuperscript𝐹𝑖𝑣F^{i}_{v}italic_F start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT is formalized by:

avi,ab=Aggab(i)(Fvi1,Fui1,Fuvi1|uN(v))subscriptsuperscript𝑎𝑖𝑎𝑏𝑣𝐴𝑔subscriptsuperscript𝑔𝑖𝑎𝑏subscriptsuperscript𝐹𝑖1𝑣subscriptsuperscript𝐹𝑖1𝑢conditionalsubscriptsuperscript𝐹𝑖1𝑢𝑣𝑢𝑁𝑣a^{i,a-b}_{v}=Agg^{(i)}_{a-b}({F^{i-1}_{v},F^{i-1}_{u},F^{i-1}_{uv}|u\in N(v)})italic_a start_POSTSUPERSCRIPT italic_i , italic_a - italic_b end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = italic_A italic_g italic_g start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a - italic_b end_POSTSUBSCRIPT ( italic_F start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , italic_F start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , italic_F start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT | italic_u ∈ italic_N ( italic_v ) ) (4)
Fvi=Combab,n(k)(Fvi1,avi)subscriptsuperscript𝐹𝑖𝑣𝐶𝑜𝑚subscriptsuperscript𝑏𝑘𝑎𝑏𝑛subscriptsuperscript𝐹𝑖1𝑣subscriptsuperscript𝑎𝑖𝑣F^{i}_{v}=Comb^{(k)}_{a-b,n}(F^{i-1}_{v},a^{i}_{v})italic_F start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = italic_C italic_o italic_m italic_b start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a - italic_b , italic_n end_POSTSUBSCRIPT ( italic_F start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , italic_a start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) (5)
Fuvi,temp=Combab,e(k)(Fuvi1,Fui1,Fvi1)subscriptsuperscript𝐹𝑖𝑡𝑒𝑚𝑝𝑢𝑣𝐶𝑜𝑚subscriptsuperscript𝑏𝑘𝑎𝑏𝑒subscriptsuperscript𝐹𝑖1𝑢𝑣subscriptsuperscript𝐹𝑖1𝑢subscriptsuperscript𝐹𝑖1𝑣F^{i,temp}_{uv}=Comb^{(k)}_{a-b,e}(F^{i-1}_{uv},F^{i-1}_{u},F^{i-1}_{v})italic_F start_POSTSUPERSCRIPT italic_i , italic_t italic_e italic_m italic_p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT = italic_C italic_o italic_m italic_b start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a - italic_b , italic_e end_POSTSUBSCRIPT ( italic_F start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT , italic_F start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , italic_F start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) (6)

where N(v)𝑁𝑣N(v)italic_N ( italic_v ) is the set of neighbors of atom v in Gabisubscriptsuperscript𝐺𝑖𝑎𝑏G^{i}_{a-b}italic_G start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a - italic_b end_POSTSUBSCRIPT, and Aggab(i)𝐴𝑔subscriptsuperscript𝑔𝑖𝑎𝑏Agg^{(i)}_{a-b}italic_A italic_g italic_g start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a - italic_b end_POSTSUBSCRIPT is the aggregation function for aggregating messages from the atom neighborhood. Combab,n(k)𝐶𝑜𝑚subscriptsuperscript𝑏𝑘𝑎𝑏𝑛Comb^{(k)}_{a-b,n}italic_C italic_o italic_m italic_b start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a - italic_b , italic_n end_POSTSUBSCRIPT and Combab,e(k)𝐶𝑜𝑚subscriptsuperscript𝑏𝑘𝑎𝑏𝑒Comb^{(k)}_{a-b,e}italic_C italic_o italic_m italic_b start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a - italic_b , italic_e end_POSTSUBSCRIPT are the update functions for updating the latent vectors of atom and bond, respectively. avi,absubscriptsuperscript𝑎𝑖𝑎𝑏𝑣a^{i,a-b}_{v}italic_a start_POSTSUPERSCRIPT italic_i , italic_a - italic_b end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT is the information from the neighboring atom and the corresponding bond after being aggregated. Fuvi,tempsubscriptsuperscript𝐹𝑖𝑡𝑒𝑚𝑝𝑢𝑣F^{i,temp}_{uv}italic_F start_POSTSUPERSCRIPT italic_i , italic_t italic_e italic_m italic_p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT is the temporary bond latent vectors of bond uv𝑢𝑣uvitalic_u italic_v in ithsubscript𝑖𝑡i_{th}italic_i start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT layer and is part of the bond latent vectors in Gbaisubscriptsuperscript𝐺𝑖𝑏𝑎G^{i}_{b-a}italic_G start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b - italic_a end_POSTSUBSCRIPT.
Then, we use GNNbai𝐺𝑁subscriptsuperscript𝑁𝑖𝑏𝑎GNN^{i}_{b-a}italic_G italic_N italic_N start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b - italic_a end_POSTSUBSCRIPT to aggregate the bond and plane vectors in Gbaisubscriptsuperscript𝐺𝑖𝑏𝑎G^{i}_{b-a}italic_G start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b - italic_a end_POSTSUBSCRIPT. Given a bond uv𝑢𝑣uvitalic_u italic_v, its latent vector Fuvisubscriptsuperscript𝐹𝑖𝑢𝑣F^{i}_{uv}italic_F start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT is formalized by:

auvi,ba=Aggba(i)({Fuvi1,Fvwi1,Fuvwi1|uN(v)wN(v)uw})subscriptsuperscript𝑎𝑖𝑏𝑎𝑢𝑣𝐴𝑔subscriptsuperscript𝑔𝑖𝑏𝑎conditional-setsubscriptsuperscript𝐹𝑖1𝑢𝑣subscriptsuperscript𝐹𝑖1𝑣𝑤subscriptsuperscript𝐹𝑖1𝑢𝑣𝑤𝑢𝑁𝑣𝑤𝑁𝑣𝑢𝑤\begin{split}a^{i,b-a}_{uv}=Agg^{(i)}_{b-a}(&\{F^{i-1}_{uv},F^{i-1}_{vw},F^{i-% 1}_{uvw}|u\in N(v)\cap\\ &w\in N(v)\cap u\neq w\})\end{split}start_ROW start_CELL italic_a start_POSTSUPERSCRIPT italic_i , italic_b - italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT = italic_A italic_g italic_g start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b - italic_a end_POSTSUBSCRIPT ( end_CELL start_CELL { italic_F start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT , italic_F start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v italic_w end_POSTSUBSCRIPT , italic_F start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v italic_w end_POSTSUBSCRIPT | italic_u ∈ italic_N ( italic_v ) ∩ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_w ∈ italic_N ( italic_v ) ∩ italic_u ≠ italic_w } ) end_CELL end_ROW (7)
Fuvi=Combba,n(k)(Fuvi1,Fuvi,temp,auvi)subscriptsuperscript𝐹𝑖𝑢𝑣𝐶𝑜𝑚subscriptsuperscript𝑏𝑘𝑏𝑎𝑛subscriptsuperscript𝐹𝑖1𝑢𝑣subscriptsuperscript𝐹𝑖𝑡𝑒𝑚𝑝𝑢𝑣subscriptsuperscript𝑎𝑖𝑢𝑣F^{i}_{uv}=Comb^{(k)}_{b-a,n}(F^{i-1}_{uv},F^{i,temp}_{uv},a^{i}_{uv})italic_F start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT = italic_C italic_o italic_m italic_b start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b - italic_a , italic_n end_POSTSUBSCRIPT ( italic_F start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT , italic_F start_POSTSUPERSCRIPT italic_i , italic_t italic_e italic_m italic_p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT , italic_a start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT ) (8)
Fuvwi1,temp=Combba,e(k)(Fuvwi1,Fuvi1,Fvwi1)subscriptsuperscript𝐹𝑖1𝑡𝑒𝑚𝑝𝑢𝑣𝑤𝐶𝑜𝑚subscriptsuperscript𝑏𝑘𝑏𝑎𝑒subscriptsuperscript𝐹𝑖1𝑢𝑣𝑤subscriptsuperscript𝐹𝑖1𝑢𝑣subscriptsuperscript𝐹𝑖1𝑣𝑤F^{i-1,temp}_{uvw}=Comb^{(k)}_{b-a,e}(F^{i-1}_{uvw},F^{i-1}_{uv},F^{i-1}_{vw})italic_F start_POSTSUPERSCRIPT italic_i - 1 , italic_t italic_e italic_m italic_p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v italic_w end_POSTSUBSCRIPT = italic_C italic_o italic_m italic_b start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b - italic_a , italic_e end_POSTSUBSCRIPT ( italic_F start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v italic_w end_POSTSUBSCRIPT , italic_F start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT , italic_F start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v italic_w end_POSTSUBSCRIPT ) (9)

where Aggba(i)𝐴𝑔subscriptsuperscript𝑔𝑖𝑏𝑎Agg^{(i)}_{b-a}italic_A italic_g italic_g start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b - italic_a end_POSTSUBSCRIPT is the aggregation function for aggregating messages from the bond neighborhood. Combba,n(k)𝐶𝑜𝑚subscriptsuperscript𝑏𝑘𝑏𝑎𝑛Comb^{(k)}_{b-a,n}italic_C italic_o italic_m italic_b start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b - italic_a , italic_n end_POSTSUBSCRIPT and Combba,e(k)𝐶𝑜𝑚subscriptsuperscript𝑏𝑘𝑏𝑎𝑒Comb^{(k)}_{b-a,e}italic_C italic_o italic_m italic_b start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b - italic_a , italic_e end_POSTSUBSCRIPT are the update functions for updating the bond and plane latent vectors. auvi,basubscriptsuperscript𝑎𝑖𝑏𝑎𝑢𝑣a^{i,b-a}_{uv}italic_a start_POSTSUPERSCRIPT italic_i , italic_b - italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT is the information from the neighboring bond and the corresponding bond angle after being aggregated. Fuvwi1,tempsubscriptsuperscript𝐹𝑖1𝑡𝑒𝑚𝑝𝑢𝑣𝑤F^{i-1,temp}_{uvw}italic_F start_POSTSUPERSCRIPT italic_i - 1 , italic_t italic_e italic_m italic_p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v italic_w end_POSTSUBSCRIPT is the temporary plane latent vectors of plane uvw𝑢𝑣𝑤uvwitalic_u italic_v italic_w in ithsubscript𝑖𝑡i_{th}italic_i start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT layer and is part of the plane latent vectors in Gdaisubscriptsuperscript𝐺𝑖𝑑𝑎G^{i}_{d-a}italic_G start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d - italic_a end_POSTSUBSCRIPT.
After processing the Gbaisubscriptsuperscript𝐺𝑖𝑏𝑎G^{i}_{b-a}italic_G start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b - italic_a end_POSTSUBSCRIPT, we use GNNdai𝐺𝑁subscriptsuperscript𝑁𝑖𝑑𝑎GNN^{i}_{d-a}italic_G italic_N italic_N start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d - italic_a end_POSTSUBSCRIPT to aggregate the plane latent vector in Gdaisubscriptsuperscript𝐺𝑖𝑑𝑎G^{i}_{d-a}italic_G start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d - italic_a end_POSTSUBSCRIPT. Given a plane constructed by nodes u, v, w and bonds uv𝑢𝑣uvitalic_u italic_v, vw𝑣𝑤vwitalic_v italic_w, its latent vector Fuvwisubscriptsuperscript𝐹𝑖𝑢𝑣𝑤F^{i}_{uvw}italic_F start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v italic_w end_POSTSUBSCRIPT is formalized by:

auvwi,da=Aggda(i)({Fuvwi1,Fvwhi1,Fuvwhi1|uN(v)vN(w)wN(h)uvwh})subscriptsuperscript𝑎𝑖𝑑𝑎𝑢𝑣𝑤𝐴𝑔subscriptsuperscript𝑔𝑖𝑑𝑎conditional-setsubscriptsuperscript𝐹𝑖1𝑢𝑣𝑤subscriptsuperscript𝐹𝑖1𝑣𝑤subscriptsuperscript𝐹𝑖1𝑢𝑣𝑤𝑢𝑁𝑣𝑣𝑁𝑤𝑤𝑁𝑢𝑣𝑤\begin{split}a^{i,d-a}_{uvw}=Agg^{(i)}_{d-a}(\{F^{i-1}_{uvw},F^{i-1}_{vwh},F^{% i-1}_{uvwh}|u\in N(v)\cap\\ v\in N(w)\cap w\in N(h)\cap u\neq v\neq w\neq h\})\end{split}start_ROW start_CELL italic_a start_POSTSUPERSCRIPT italic_i , italic_d - italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v italic_w end_POSTSUBSCRIPT = italic_A italic_g italic_g start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d - italic_a end_POSTSUBSCRIPT ( { italic_F start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v italic_w end_POSTSUBSCRIPT , italic_F start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v italic_w italic_h end_POSTSUBSCRIPT , italic_F start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v italic_w italic_h end_POSTSUBSCRIPT | italic_u ∈ italic_N ( italic_v ) ∩ end_CELL end_ROW start_ROW start_CELL italic_v ∈ italic_N ( italic_w ) ∩ italic_w ∈ italic_N ( italic_h ) ∩ italic_u ≠ italic_v ≠ italic_w ≠ italic_h } ) end_CELL end_ROW (10)
Fuvwi=Combda,n(k)(Fuvwi1,Fuvwi,temp,auvwi)subscriptsuperscript𝐹𝑖𝑢𝑣𝑤𝐶𝑜𝑚subscriptsuperscript𝑏𝑘𝑑𝑎𝑛subscriptsuperscript𝐹𝑖1𝑢𝑣𝑤subscriptsuperscript𝐹𝑖𝑡𝑒𝑚𝑝𝑢𝑣𝑤subscriptsuperscript𝑎𝑖𝑢𝑣𝑤F^{i}_{uvw}=Comb^{(k)}_{d-a,n}(F^{i-1}_{uvw},F^{i,temp}_{uvw},a^{i}_{uvw})italic_F start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v italic_w end_POSTSUBSCRIPT = italic_C italic_o italic_m italic_b start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d - italic_a , italic_n end_POSTSUBSCRIPT ( italic_F start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v italic_w end_POSTSUBSCRIPT , italic_F start_POSTSUPERSCRIPT italic_i , italic_t italic_e italic_m italic_p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v italic_w end_POSTSUBSCRIPT , italic_a start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v italic_w end_POSTSUBSCRIPT ) (11)

where aggda(i)𝑎𝑔subscriptsuperscript𝑔𝑖𝑑𝑎agg^{(i)}_{d-a}italic_a italic_g italic_g start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d - italic_a end_POSTSUBSCRIPT is the aggregation function for aggregating messages from the plane neighborhood. Combda,n(k)𝐶𝑜𝑚subscriptsuperscript𝑏𝑘𝑑𝑎𝑛Comb^{(k)}_{d-a,n}italic_C italic_o italic_m italic_b start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d - italic_a , italic_n end_POSTSUBSCRIPT is the update functions for updating the plane latent vector. auvwisubscriptsuperscript𝑎𝑖𝑢𝑣𝑤a^{i}_{uvw}italic_a start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_v italic_w end_POSTSUBSCRIPT is the information from the neighboring plane and the corresponding dihedral angle after being aggregated.
The representation vectors of the atoms at the final iteration are integrated to gain the molecular graph representation vector Fmolsubscript𝐹𝑚𝑜𝑙F_{mol}italic_F start_POSTSUBSCRIPT italic_m italic_o italic_l end_POSTSUBSCRIPT by the Readout function, which is formalized as:

Fmol=Readout(Fun|uV)subscript𝐹𝑚𝑜𝑙𝑅𝑒𝑎𝑑𝑜𝑢𝑡conditionalsubscriptsuperscript𝐹𝑛𝑢𝑢𝑉F_{mol}=Readout({F^{n}_{u}|u\in V})italic_F start_POSTSUBSCRIPT italic_m italic_o italic_l end_POSTSUBSCRIPT = italic_R italic_e italic_a italic_d italic_o italic_u italic_t ( italic_F start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT | italic_u ∈ italic_V ) (12)

where Fnsuperscript𝐹𝑛F^{n}italic_F start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is the last 3D-Mol layer output. The molecular latent vector Fmolsubscript𝐹𝑚𝑜𝑙F_{mol}italic_F start_POSTSUBSCRIPT italic_m italic_o italic_l end_POSTSUBSCRIPT is used to predict molecular properties.

3.2 Pretrain Strategy

To improve the performance of the 3D-Mol encoder, we employ contrastive learning for pretraining, categorizing conformations with identical topological structures as weighted positive pairs and contrasting ones as negatives, as shown in figure 4. Inspired by GearNet[18], we also combine our pretraining method with self-supervised tasks based on physicochemical and geometric properties.
Our objective is to facilitate the learning of the consistency and differences between molecular 3D conformations. To accomplish this, we employ weighted contrastive learning using a batch of molecular representations, with the loss function defined as follows:

Li,jconf=logexp(wi,jconfsim(Fi,Fjmk)/τ)Σk=12N1{ki}exp(wi,kfpsim(Fi,Fk)/τ)superscriptsubscript𝐿𝑖𝑗𝑐𝑜𝑛𝑓𝑙𝑜𝑔𝑒𝑥𝑝subscriptsuperscript𝑤𝑐𝑜𝑛𝑓𝑖𝑗𝑠𝑖𝑚subscript𝐹𝑖subscriptsuperscript𝐹𝑚𝑘𝑗𝜏superscriptsubscriptΣ𝑘12𝑁1𝑘𝑖𝑒𝑥𝑝subscriptsuperscript𝑤𝑓𝑝𝑖𝑘𝑠𝑖𝑚subscript𝐹𝑖subscript𝐹𝑘𝜏L_{i,j}^{conf}=-log\frac{exp(w^{conf}_{i,j}sim(F_{i},F^{mk}_{j})/\tau)}{\Sigma% _{k=1}^{2N}1{\{k\neq i\}}exp(w^{fp}_{i,k}sim(F_{i},F_{k})/\tau)}italic_L start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c italic_o italic_n italic_f end_POSTSUPERSCRIPT = - italic_l italic_o italic_g divide start_ARG italic_e italic_x italic_p ( italic_w start_POSTSUPERSCRIPT italic_c italic_o italic_n italic_f end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_s italic_i italic_m ( italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_F start_POSTSUPERSCRIPT italic_m italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) / italic_τ ) end_ARG start_ARG roman_Σ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_N end_POSTSUPERSCRIPT 1 { italic_k ≠ italic_i } italic_e italic_x italic_p ( italic_w start_POSTSUPERSCRIPT italic_f italic_p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT italic_s italic_i italic_m ( italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) / italic_τ ) end_ARG (13)
wi,jconf=λconfSimdsp(Dspi,Dspj)subscriptsuperscript𝑤𝑐𝑜𝑛𝑓𝑖𝑗subscript𝜆𝑐𝑜𝑛𝑓𝑆𝑖subscript𝑚𝑑𝑠𝑝𝐷𝑠subscript𝑝𝑖𝐷𝑠subscript𝑝𝑗w^{conf}_{i,j}=\lambda_{conf}*Sim_{dsp}(Dsp_{i},Dsp_{j})italic_w start_POSTSUPERSCRIPT italic_c italic_o italic_n italic_f end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT italic_c italic_o italic_n italic_f end_POSTSUBSCRIPT ∗ italic_S italic_i italic_m start_POSTSUBSCRIPT italic_d italic_s italic_p end_POSTSUBSCRIPT ( italic_D italic_s italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_D italic_s italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) (14)
wi,kfp=1λfpSimFP(Mconfi,Mconfk)subscriptsuperscript𝑤𝑓𝑝𝑖𝑘1subscript𝜆𝑓𝑝𝑆𝑖subscript𝑚𝐹𝑃𝑀𝑐𝑜𝑛subscript𝑓𝑖𝑀𝑐𝑜𝑛subscript𝑓𝑘w^{fp}_{i,k}=1-\lambda_{fp}*Sim_{FP}(Mconf_{i},Mconf_{k})italic_w start_POSTSUPERSCRIPT italic_f italic_p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT = 1 - italic_λ start_POSTSUBSCRIPT italic_f italic_p end_POSTSUBSCRIPT ∗ italic_S italic_i italic_m start_POSTSUBSCRIPT italic_F italic_P end_POSTSUBSCRIPT ( italic_M italic_c italic_o italic_n italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_M italic_c italic_o italic_n italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) (15)

where the two conformations with same SMILES, denoted as Mconfi𝑀𝑐𝑜𝑛subscript𝑓𝑖Mconf_{i}italic_M italic_c italic_o italic_n italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Mconfj𝑀𝑐𝑜𝑛subscript𝑓𝑗Mconf_{j}italic_M italic_c italic_o italic_n italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Fisubscript𝐹𝑖F_{i}italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the latent vector extracted from Mconfi𝑀𝑐𝑜𝑛subscript𝑓𝑖Mconf_{i}italic_M italic_c italic_o italic_n italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and sim()𝑠𝑖𝑚sim()italic_s italic_i italic_m ( ) measures the similarity between latent vectors, penalized by a weight coefficient wi,jconfsubscriptsuperscript𝑤𝑐𝑜𝑛𝑓𝑖𝑗w^{conf}_{i,j}italic_w start_POSTSUPERSCRIPT italic_c italic_o italic_n italic_f end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT, which is computed based on the 3D conformation descriptor similarity between Mconfi𝑀𝑐𝑜𝑛subscript𝑓𝑖Mconf_{i}italic_M italic_c italic_o italic_n italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Mconfj𝑀𝑐𝑜𝑛subscript𝑓𝑗Mconf_{j}italic_M italic_c italic_o italic_n italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. wi,jconfsubscriptsuperscript𝑤𝑐𝑜𝑛𝑓𝑖𝑗w^{conf}_{i,j}italic_w start_POSTSUPERSCRIPT italic_c italic_o italic_n italic_f end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT represents the similarity between Dspi𝐷𝑠subscript𝑝𝑖Dsp_{i}italic_D italic_s italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Dspj𝐷𝑠subscript𝑝𝑗Dsp_{j}italic_D italic_s italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, which correspond to the 3D conformation descriptors of Mconfi𝑀𝑐𝑜𝑛subscript𝑓𝑖Mconf_{i}italic_M italic_c italic_o italic_n italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Mconfj𝑀𝑐𝑜𝑛subscript𝑓𝑗Mconf_{j}italic_M italic_c italic_o italic_n italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

Refer to caption
Figure 4: The overview of weighted positive/negative pairs. Using the molecule Oc1ccc(cc1)CC[NH3+]. a). Low weight positive pairs: Features two conformations with the same SMILES and significant structural differences (e.g., chirality, geometric angles). b). High weight positive pairs: Depicts conformations with the same SMILES but minor differences, such as slight dihedral angle variations. c). Low weight negative pair: Shows conformations from different SMILES with similar scaffolds but missing little functional group, like an (-OH) group. d). High weight negative pair: Illustrates conformations from different SMILES, differing greatly in both scaffold and functional groups.

Simdsp()𝑆𝑖subscript𝑚𝑑𝑠𝑝Sim_{dsp}()italic_S italic_i italic_m start_POSTSUBSCRIPT italic_d italic_s italic_p end_POSTSUBSCRIPT ( ) evaluates the similarity between 3D conformation descriptors, and λconf[0,1]subscript𝜆𝑐𝑜𝑛𝑓01\lambda_{conf}\in[0,1]italic_λ start_POSTSUBSCRIPT italic_c italic_o italic_n italic_f end_POSTSUBSCRIPT ∈ [ 0 , 1 ] is the hyperparameter that determines the scale of penalty for the similarity between two conformations. τ𝜏\tauitalic_τ is the temperature parameter. In addition to using different conformations as the positive pair, we also employ node masking as a molecular data augmentation strategy. We random select 15%percent\%% of atoms, mask them and their corresponding bonds, and the masked molecular Mconfj𝑀𝑐𝑜𝑛subscript𝑓𝑗Mconf_{j}italic_M italic_c italic_o italic_n italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT latent vector is denoted as Fjmksubscriptsuperscript𝐹𝑚𝑘𝑗F^{mk}_{j}italic_F start_POSTSUPERSCRIPT italic_m italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. The similarity measurement between two latent vectors Fisubscript𝐹𝑖F_{i}italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, Fksubscript𝐹𝑘F_{k}italic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT from a negative molecule pair (Mconfi,Mconfk𝑀𝑐𝑜𝑛subscript𝑓𝑖𝑀𝑐𝑜𝑛subscript𝑓𝑘Mconf_{i},Mconf_{k}italic_M italic_c italic_o italic_n italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_M italic_c italic_o italic_n italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT) is penalized by a weight coefficient wi,kfpsubscriptsuperscript𝑤𝑓𝑝𝑖𝑘w^{fp}_{i,k}italic_w start_POSTSUPERSCRIPT italic_f italic_p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT, which computed by molecular fingerprint similarity between Mconfi𝑀𝑐𝑜𝑛subscript𝑓𝑖Mconf_{i}italic_M italic_c italic_o italic_n italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Mconfk𝑀𝑐𝑜𝑛subscript𝑓𝑘Mconf_{k}italic_M italic_c italic_o italic_n italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. SimFP()𝑆𝑖subscript𝑚𝐹𝑃Sim_{FP}()italic_S italic_i italic_m start_POSTSUBSCRIPT italic_F italic_P end_POSTSUBSCRIPT ( ) evaluates similarity between molecular fingerprints, and λfp[0,1]subscript𝜆𝑓𝑝01\lambda_{fp}\in[0,1]italic_λ start_POSTSUBSCRIPT italic_f italic_p end_POSTSUBSCRIPT ∈ [ 0 , 1 ] is the hyperparameter that determines the scale of penalty for faulty negatives. The details of the 3D conformation descriptors and fingerprint are shown in Appendix A.
Since physicochemical and geometric information has been demonstrated to be important for molecular property prediction, we also employ geometry tasks as the pretraining method. For bond angle and dihedral angle prediction, we sample adjacent atoms to better capture local structural information. Since angular values are more sensitive to errors in protein structures than distances, we use discretized values for prediction. The following are the loss functions for the local geometry task:

Li,jl=(fl(Fnn,imk,Fnn,jmk)li,j)2superscriptsubscript𝐿𝑖𝑗𝑙superscriptsubscript𝑓𝑙𝐹subscriptsuperscript𝑛𝑚𝑘𝑛𝑖𝐹subscriptsuperscript𝑛𝑚𝑘𝑛𝑗subscript𝑙𝑖𝑗2L_{i,j}^{l}=(f_{l}(Fn^{mk}_{n,i},Fn^{mk}_{n,j})-l_{i,j})^{2}italic_L start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT = ( italic_f start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_F italic_n start_POSTSUPERSCRIPT italic_m italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n , italic_i end_POSTSUBSCRIPT , italic_F italic_n start_POSTSUPERSCRIPT italic_m italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n , italic_j end_POSTSUBSCRIPT ) - italic_l start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (16)
Li,j,kθ=CE(fθ(Fnn,imk,Fnn,jmk,Fnn,kmk),bin(θi,j,k))superscriptsubscript𝐿𝑖𝑗𝑘𝜃𝐶𝐸subscript𝑓𝜃𝐹subscriptsuperscript𝑛𝑚𝑘𝑛𝑖𝐹subscriptsuperscript𝑛𝑚𝑘𝑛𝑗𝐹subscriptsuperscript𝑛𝑚𝑘𝑛𝑘𝑏𝑖𝑛subscript𝜃𝑖𝑗𝑘L_{i,j,k}^{\theta}=CE(f_{\theta}(Fn^{mk}_{n,i},Fn^{mk}_{n,j},Fn^{mk}_{n,k}),% bin({\theta}_{i,j,k}))italic_L start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT = italic_C italic_E ( italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_F italic_n start_POSTSUPERSCRIPT italic_m italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n , italic_i end_POSTSUBSCRIPT , italic_F italic_n start_POSTSUPERSCRIPT italic_m italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n , italic_j end_POSTSUBSCRIPT , italic_F italic_n start_POSTSUPERSCRIPT italic_m italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT ) , italic_b italic_i italic_n ( italic_θ start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT ) ) (17)
Li,j,k,pϕ=CE(fϕ(Fnn,imk,Fnn,jmk,Fnn,kmk,Fnn,pmk),bin(ϕi,j,k,p))superscriptsubscript𝐿𝑖𝑗𝑘𝑝italic-ϕ𝐶𝐸subscript𝑓italic-ϕ𝐹subscriptsuperscript𝑛𝑚𝑘𝑛𝑖𝐹subscriptsuperscript𝑛𝑚𝑘𝑛𝑗𝐹subscriptsuperscript𝑛𝑚𝑘𝑛𝑘𝐹subscriptsuperscript𝑛𝑚𝑘𝑛𝑝𝑏𝑖𝑛subscriptitalic-ϕ𝑖𝑗𝑘𝑝\begin{split}L_{i,j,k,p}^{\phi}=CE(&f_{\phi}(Fn^{mk}_{n,i},Fn^{mk}_{n,j},Fn^{% mk}_{n,k},Fn^{mk}_{n,p}),\\ &bin({\phi}_{i,j,k,p}))\end{split}start_ROW start_CELL italic_L start_POSTSUBSCRIPT italic_i , italic_j , italic_k , italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT = italic_C italic_E ( end_CELL start_CELL italic_f start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_F italic_n start_POSTSUPERSCRIPT italic_m italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n , italic_i end_POSTSUBSCRIPT , italic_F italic_n start_POSTSUPERSCRIPT italic_m italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n , italic_j end_POSTSUBSCRIPT , italic_F italic_n start_POSTSUPERSCRIPT italic_m italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT , italic_F italic_n start_POSTSUPERSCRIPT italic_m italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n , italic_p end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_b italic_i italic_n ( italic_ϕ start_POSTSUBSCRIPT italic_i , italic_j , italic_k , italic_p end_POSTSUBSCRIPT ) ) end_CELL end_ROW (18)

where fϕ()subscript𝑓italic-ϕf_{\phi}()italic_f start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( ), fθ()subscript𝑓𝜃f_{\theta}()italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( ) and flsubscript𝑓𝑙f_{l}italic_f start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT are the MLPs for the local geometry task, and Li,jlsuperscriptsubscript𝐿𝑖𝑗𝑙L_{i,j}^{l}italic_L start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT, Li,j,kθsuperscriptsubscript𝐿𝑖𝑗𝑘𝜃L_{i,j,k}^{\theta}italic_L start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT, Li,j,k,pϕsuperscriptsubscript𝐿𝑖𝑗𝑘𝑝italic-ϕL_{i,j,k,p}^{\phi}italic_L start_POSTSUBSCRIPT italic_i , italic_j , italic_k , italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT and LiFPsuperscriptsubscript𝐿𝑖𝐹𝑃L_{i}^{FP}italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_F italic_P end_POSTSUPERSCRIPT are the loss functions for each task. CE()𝐶𝐸CE()italic_C italic_E ( ) is the cross entropy loss, and bin()𝑏𝑖𝑛bin()italic_b italic_i italic_n ( ) is used to discretize the bond angle and dihedral angle. Fnn,imk𝐹subscriptsuperscript𝑛𝑚𝑘𝑛𝑖Fn^{mk}_{n,i}italic_F italic_n start_POSTSUPERSCRIPT italic_m italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n , italic_i end_POSTSUBSCRIPT is the latent vector of node i after masking the corresponding sampled items in each task.
In addition to the aforementioned pretraining tasks to capture global molecular information, we leverage masked molecular latent vectors for FP prediction and atom distance prediction, effectively incorporating latent representations to enrich the predictive capability. The following are the loss functions for the global geometry task:

LiFP=BCE(fFP(Fmk),FPi)superscriptsubscript𝐿𝑖𝐹𝑃𝐵𝐶𝐸subscript𝑓𝐹𝑃superscript𝐹𝑚𝑘𝐹subscript𝑃𝑖L_{i}^{FP}=BCE(f_{FP}(F^{mk}),FP_{i})italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_F italic_P end_POSTSUPERSCRIPT = italic_B italic_C italic_E ( italic_f start_POSTSUBSCRIPT italic_F italic_P end_POSTSUBSCRIPT ( italic_F start_POSTSUPERSCRIPT italic_m italic_k end_POSTSUPERSCRIPT ) , italic_F italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (19)
Li,jdist=(fdist(Fnn,imk,Fnn,jmk)disti,j)2superscriptsubscript𝐿𝑖𝑗𝑑𝑖𝑠𝑡superscriptsubscript𝑓𝑑𝑖𝑠𝑡𝐹subscriptsuperscript𝑛𝑚𝑘𝑛𝑖𝐹subscriptsuperscript𝑛𝑚𝑘𝑛𝑗𝑑𝑖𝑠subscript𝑡𝑖𝑗2L_{i,j}^{dist}=(f_{dist}(Fn^{mk}_{n,i},Fn^{mk}_{n,j})-dist_{i,j})^{2}italic_L start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d italic_i italic_s italic_t end_POSTSUPERSCRIPT = ( italic_f start_POSTSUBSCRIPT italic_d italic_i italic_s italic_t end_POSTSUBSCRIPT ( italic_F italic_n start_POSTSUPERSCRIPT italic_m italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n , italic_i end_POSTSUBSCRIPT , italic_F italic_n start_POSTSUPERSCRIPT italic_m italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n , italic_j end_POSTSUBSCRIPT ) - italic_d italic_i italic_s italic_t start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (20)

where fFPsubscript𝑓𝐹𝑃f_{FP}italic_f start_POSTSUBSCRIPT italic_F italic_P end_POSTSUBSCRIPT and fdist()subscript𝑓𝑑𝑖𝑠𝑡f_{dist}()italic_f start_POSTSUBSCRIPT italic_d italic_i italic_s italic_t end_POSTSUBSCRIPT ( ) are the MLPs for global geometric tasks, and LiFPsuperscriptsubscript𝐿𝑖𝐹𝑃L_{i}^{FP}italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_F italic_P end_POSTSUPERSCRIPT and Li,jdistsuperscriptsubscript𝐿𝑖𝑗𝑑𝑖𝑠𝑡L_{i,j}^{dist}italic_L start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d italic_i italic_s italic_t end_POSTSUPERSCRIPT are the loss functions for each task. BCE()𝐵𝐶𝐸BCE()italic_B italic_C italic_E ( ) is binary cross entropy loss. Fmksuperscript𝐹𝑚𝑘F^{mk}italic_F start_POSTSUPERSCRIPT italic_m italic_k end_POSTSUPERSCRIPT is the latent vector of the masking molecule.
In the culmination of our pretraining stage, we consolidate the various loss functions into a unified training objective through an uncertainty weighted sum approach. The following is the final loss function:

Lfinal=LFP/σFP2+Ldist/σdist2+Lϕ/σϕ2+Lθ/σθ2+Ll/σl2+Lconf/σconf2+logσFP+logσdist+logσϕ+logσθ+logσl+logσconfsuperscript𝐿𝑓𝑖𝑛𝑎𝑙superscript𝐿𝐹𝑃superscriptsubscript𝜎𝐹𝑃2superscript𝐿𝑑𝑖𝑠𝑡superscriptsubscript𝜎𝑑𝑖𝑠𝑡2superscript𝐿italic-ϕsuperscriptsubscript𝜎italic-ϕ2superscript𝐿𝜃superscriptsubscript𝜎𝜃2superscript𝐿𝑙superscriptsubscript𝜎𝑙2superscript𝐿𝑐𝑜𝑛𝑓superscriptsubscript𝜎𝑐𝑜𝑛𝑓2𝑙𝑜𝑔subscript𝜎𝐹𝑃𝑙𝑜𝑔subscript𝜎𝑑𝑖𝑠𝑡𝑙𝑜𝑔subscript𝜎italic-ϕ𝑙𝑜𝑔subscript𝜎𝜃𝑙𝑜𝑔subscript𝜎𝑙𝑙𝑜𝑔subscript𝜎𝑐𝑜𝑛𝑓\begin{split}L^{final}=&L^{FP}/{\sigma_{FP}^{2}}+L^{dist}/{\sigma_{dist}^{2}}+% L^{\phi}/{\sigma_{\phi}^{2}}+L^{\theta}/{\sigma_{\theta}^{2}}+L^{l}/{\sigma_{l% }^{2}}+L^{conf}/{\sigma_{conf}^{2}}\\ &+log\sigma_{FP}+log\sigma_{dist}+log\sigma_{\phi}+log\sigma_{\theta}+log% \sigma_{l}+log\sigma_{conf}\end{split}start_ROW start_CELL italic_L start_POSTSUPERSCRIPT italic_f italic_i italic_n italic_a italic_l end_POSTSUPERSCRIPT = end_CELL start_CELL italic_L start_POSTSUPERSCRIPT italic_F italic_P end_POSTSUPERSCRIPT / italic_σ start_POSTSUBSCRIPT italic_F italic_P end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_L start_POSTSUPERSCRIPT italic_d italic_i italic_s italic_t end_POSTSUPERSCRIPT / italic_σ start_POSTSUBSCRIPT italic_d italic_i italic_s italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_L start_POSTSUPERSCRIPT italic_ϕ end_POSTSUPERSCRIPT / italic_σ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_L start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT / italic_σ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_L start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT / italic_σ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_L start_POSTSUPERSCRIPT italic_c italic_o italic_n italic_f end_POSTSUPERSCRIPT / italic_σ start_POSTSUBSCRIPT italic_c italic_o italic_n italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + italic_l italic_o italic_g italic_σ start_POSTSUBSCRIPT italic_F italic_P end_POSTSUBSCRIPT + italic_l italic_o italic_g italic_σ start_POSTSUBSCRIPT italic_d italic_i italic_s italic_t end_POSTSUBSCRIPT + italic_l italic_o italic_g italic_σ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT + italic_l italic_o italic_g italic_σ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT + italic_l italic_o italic_g italic_σ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT + italic_l italic_o italic_g italic_σ start_POSTSUBSCRIPT italic_c italic_o italic_n italic_f end_POSTSUBSCRIPT end_CELL end_ROW (21)

Where Lfinalsuperscript𝐿𝑓𝑖𝑛𝑎𝑙L^{final}italic_L start_POSTSUPERSCRIPT italic_f italic_i italic_n italic_a italic_l end_POSTSUPERSCRIPT is the final loss we use to train the encoder in pretraining stage, and the σFPsubscript𝜎𝐹𝑃\sigma_{FP}italic_σ start_POSTSUBSCRIPT italic_F italic_P end_POSTSUBSCRIPT, σdistsubscript𝜎𝑑𝑖𝑠𝑡\sigma_{dist}italic_σ start_POSTSUBSCRIPT italic_d italic_i italic_s italic_t end_POSTSUBSCRIPT, σϕsubscript𝜎italic-ϕ\sigma_{\phi}italic_σ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT, σθsubscript𝜎𝜃\sigma_{\theta}italic_σ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, σlsubscript𝜎𝑙\sigma_{l}italic_σ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT and σconfsubscript𝜎𝑐𝑜𝑛𝑓\sigma_{conf}italic_σ start_POSTSUBSCRIPT italic_c italic_o italic_n italic_f end_POSTSUBSCRIPT are the uncertainty associated with each loss component, representing the model’s confidence in each of these loss terms. This method employs individual uncertainty terms for each loss component, allowing the model to dynamically adjust the influence of each based on its confidence in the respective predictions, which facilitates a balanced optimization across diverse molecular features, from spatial arrangements to angular orientations.

4 Experiment

In this section, we conduct experiments on 7 benchmark datasets in MoleculeNet[19] to demonstrate the effectiveness of our method for molecular property prediction. We use a large amount of unlabeled data and our pretrain strategy to pretrain our encoder, then use the downstream task to finetune the well-pretrained model and predict the molecular property. We compare it with a variety of SOTA methods and conduct several ablation studies to confirm the effectiveness of our method.

4.1 Datasets and Setup

4.1.1 Pretrain

We use 20 million unlabeled molecules to pretrain 3D-Mol. The unlabeled data is extracted from ZINC20[46] and PubChem[47], both of which are publicly accessible databases containing drug-like compounds. The raw data obtained from ZINC20 and PubChem is provided in SMILES format. To convert SMILES into molecular conformations for our pretraining stage, we utilize RDKit, a versatile Python cheminformatics package. RDKit enables the transformation of SMILES into structured molecular forms. We employ its ETKDG method, which generates realistic 3D conformations by integrating experimental torsion data with geometric principles. To ensure consistency with prior research[17, 16], we randomly select 90%percent\%% of these samples for training purposes, while the remaining 10%percent\%% were set aside for evaluation. For our model, we use the Adam optimizer with a learning rate of 1e-3. The batch size is set to 256 for pretraining and 32 for finetuning. The hidden size of all models is 256. The geometric embedding dimension K is 64, and the number of angle domains is 8. The hyperparameters λconfsubscript𝜆𝑐𝑜𝑛𝑓\lambda_{conf}italic_λ start_POSTSUBSCRIPT italic_c italic_o italic_n italic_f end_POSTSUBSCRIPT and λfpsubscript𝜆𝑓𝑝\lambda_{fp}italic_λ start_POSTSUBSCRIPT italic_f italic_p end_POSTSUBSCRIPT are both set to 0.5. The details of the pretraining environment and are in Appendix B.

Table 1: Benchmarking the 3D-Mol and other pretraining methods. We compare the performance on the 7 molecular property prediction tasks, marking the best results in bold and underlining the second best.
Classification (ROC-AUC % higher is better ↑) Regression (RMSE, lower is better ↓)
Datasets BACE SIDER Tox21 ToxCast ESOL FreeSolv Lipophilicity
# Molecules 1513 1427 7831 8597 1128 643 4200
# Tasks 1 27 12 617 1 1 1
N-GramRF 0.7790.015subscript0.7790.015\rm 0.779_{0.015}0.779 start_POSTSUBSCRIPT 0.015 end_POSTSUBSCRIPT 0.6680.007subscript0.6680.007\rm 0.668_{0.007}0.668 start_POSTSUBSCRIPT 0.007 end_POSTSUBSCRIPT 0.7430.004subscript0.7430.004\rm 0.743_{0.004}0.743 start_POSTSUBSCRIPT 0.004 end_POSTSUBSCRIPT -- 1.0740.107subscript1.0740.107\rm 1.074_{0.107}1.074 start_POSTSUBSCRIPT 0.107 end_POSTSUBSCRIPT 2.6880.085subscript2.6880.085\rm 2.688_{0.085}2.688 start_POSTSUBSCRIPT 0.085 end_POSTSUBSCRIPT 0.8120.028subscript0.8120.028\rm 0.812_{0.028}0.812 start_POSTSUBSCRIPT 0.028 end_POSTSUBSCRIPT
N-GramXGB 0.7910.013subscript0.7910.013\rm 0.791_{0.013}0.791 start_POSTSUBSCRIPT 0.013 end_POSTSUBSCRIPT 0.6550.007subscript0.6550.007\rm 0.655_{0.007}0.655 start_POSTSUBSCRIPT 0.007 end_POSTSUBSCRIPT 0.7580.009subscript0.7580.009\rm 0.758_{0.009}0.758 start_POSTSUBSCRIPT 0.009 end_POSTSUBSCRIPT -- 1.0830.107subscript1.0830.107\rm 1.083_{0.107}1.083 start_POSTSUBSCRIPT 0.107 end_POSTSUBSCRIPT 5.0610.744subscript5.0610.744\rm 5.061_{0.744}5.061 start_POSTSUBSCRIPT 0.744 end_POSTSUBSCRIPT 2.0720.030subscript2.0720.030\rm 2.072_{0.030}2.072 start_POSTSUBSCRIPT 0.030 end_POSTSUBSCRIPT
PretrainGNN 0.8450.007subscript0.8450.007\rm 0.845_{0.007}0.845 start_POSTSUBSCRIPT 0.007 end_POSTSUBSCRIPT 0.6270.008subscript0.6270.008\rm 0.627_{0.008}0.627 start_POSTSUBSCRIPT 0.008 end_POSTSUBSCRIPT 0.7810.006subscript0.7810.006\rm 0.781_{0.006}0.781 start_POSTSUBSCRIPT 0.006 end_POSTSUBSCRIPT 0.6570.006subscript0.6570.006\rm 0.657_{0.006}0.657 start_POSTSUBSCRIPT 0.006 end_POSTSUBSCRIPT 1.1000.006subscript1.1000.006\rm 1.100_{0.006}1.100 start_POSTSUBSCRIPT 0.006 end_POSTSUBSCRIPT 2.7640.002subscript2.7640.002\rm 2.764_{0.002}2.764 start_POSTSUBSCRIPT 0.002 end_POSTSUBSCRIPT 0.7390.003subscript0.7390.003\rm 0.739_{0.003}0.739 start_POSTSUBSCRIPT 0.003 end_POSTSUBSCRIPT
3D Infomax 0.7970.015subscript0.7970.015\rm 0.797_{0.015}0.797 start_POSTSUBSCRIPT 0.015 end_POSTSUBSCRIPT 0.6060.008subscript0.6060.008\rm 0.606_{0.008}0.606 start_POSTSUBSCRIPT 0.008 end_POSTSUBSCRIPT 0.6440.011subscript0.6440.011\rm 0.644_{0.011}0.644 start_POSTSUBSCRIPT 0.011 end_POSTSUBSCRIPT 0.7450.007subscript0.7450.007\rm 0.745_{0.007}0.745 start_POSTSUBSCRIPT 0.007 end_POSTSUBSCRIPT 0.8940.028subscript0.8940.028\rm 0.894_{0.028}0.894 start_POSTSUBSCRIPT 0.028 end_POSTSUBSCRIPT 2.3370.107subscript2.3370.107\rm 2.337_{0.107}2.337 start_POSTSUBSCRIPT 0.107 end_POSTSUBSCRIPT 0.6950.012subscript0.6950.012\rm 0.695_{0.012}0.695 start_POSTSUBSCRIPT 0.012 end_POSTSUBSCRIPT
GraphMVP 0.8120.009subscript0.8120.009\rm 0.812_{0.009}0.812 start_POSTSUBSCRIPT 0.009 end_POSTSUBSCRIPT 0.6390.012subscript0.6390.012\rm 0.639_{0.012}0.639 start_POSTSUBSCRIPT 0.012 end_POSTSUBSCRIPT 0.7590.005subscript0.7590.005\rm 0.759_{0.005}0.759 start_POSTSUBSCRIPT 0.005 end_POSTSUBSCRIPT 0.6310.004subscript0.6310.004\rm 0.631_{0.004}0.631 start_POSTSUBSCRIPT 0.004 end_POSTSUBSCRIPT 1.0290.033subscript1.0290.033\rm 1.029_{0.033}1.029 start_POSTSUBSCRIPT 0.033 end_POSTSUBSCRIPT -- 0.6810.010subscript0.6810.010\rm 0.681_{0.010}0.681 start_POSTSUBSCRIPT 0.010 end_POSTSUBSCRIPT
GROVERbasesubscriptGROVERbase\rm GROVER_{base}roman_GROVER start_POSTSUBSCRIPT roman_base end_POSTSUBSCRIPT 0.8260.007subscript0.8260.007\rm 0.826_{0.007}0.826 start_POSTSUBSCRIPT 0.007 end_POSTSUBSCRIPT 0.6480.006subscript0.6480.006\rm 0.648_{0.006}0.648 start_POSTSUBSCRIPT 0.006 end_POSTSUBSCRIPT 0.7430.001subscript0.7430.001\rm 0.743_{0.001}0.743 start_POSTSUBSCRIPT 0.001 end_POSTSUBSCRIPT 0.6540.004subscript0.6540.004\rm 0.654_{0.004}0.654 start_POSTSUBSCRIPT 0.004 end_POSTSUBSCRIPT 0.9830.090subscript0.9830.090\rm 0.983_{0.090}0.983 start_POSTSUBSCRIPT 0.090 end_POSTSUBSCRIPT 2.1760.052subscript2.1760.052\rm 2.176_{0.052}2.176 start_POSTSUBSCRIPT 0.052 end_POSTSUBSCRIPT 0.8170.008subscript0.8170.008\rm 0.817_{0.008}0.817 start_POSTSUBSCRIPT 0.008 end_POSTSUBSCRIPT
GROVERlargesubscriptGROVERlarge\rm GROVER_{large}roman_GROVER start_POSTSUBSCRIPT roman_large end_POSTSUBSCRIPT 0.8100.014subscript0.8100.014\rm 0.810_{0.014}0.810 start_POSTSUBSCRIPT 0.014 end_POSTSUBSCRIPT 0.6540.001subscript0.6540.001\rm 0.654_{0.001}0.654 start_POSTSUBSCRIPT 0.001 end_POSTSUBSCRIPT 0.7350.001subscript0.7350.001\rm 0.735_{0.001}0.735 start_POSTSUBSCRIPT 0.001 end_POSTSUBSCRIPT 0.6530.005subscript0.6530.005\rm 0.653_{0.005}0.653 start_POSTSUBSCRIPT 0.005 end_POSTSUBSCRIPT 0.8950.017subscript0.8950.017\rm 0.895_{0.017}0.895 start_POSTSUBSCRIPT 0.017 end_POSTSUBSCRIPT 2.2720.051subscript2.2720.051\rm 2.272_{0.051}2.272 start_POSTSUBSCRIPT 0.051 end_POSTSUBSCRIPT 0.8230.010subscript0.8230.010\rm 0.823_{0.010}0.823 start_POSTSUBSCRIPT 0.010 end_POSTSUBSCRIPT
MolCLRMolCLR\rm MolCLRroman_MolCLR 0.8240.009subscript0.8240.009\rm 0.824_{0.009}0.824 start_POSTSUBSCRIPT 0.009 end_POSTSUBSCRIPT 0.5890.014subscript0.5890.014\rm 0.589_{0.014}0.589 start_POSTSUBSCRIPT 0.014 end_POSTSUBSCRIPT 0.7500.002subscript0.7500.002\rm 0.750_{0.002}0.750 start_POSTSUBSCRIPT 0.002 end_POSTSUBSCRIPT -- 1.2710.040subscript1.2710.040\rm 1.271_{0.040}1.271 start_POSTSUBSCRIPT 0.040 end_POSTSUBSCRIPT 2.5940.249subscript2.5940.249\rm 2.594_{0.249}2.594 start_POSTSUBSCRIPT 0.249 end_POSTSUBSCRIPT 0.6910.004subscript0.6910.004\rm 0.691_{0.004}0.691 start_POSTSUBSCRIPT 0.004 end_POSTSUBSCRIPT
GEMGEM\rm GEMroman_GEM 0.8560.011subscript0.8560.011\rm 0.856_{0.011}0.856 start_POSTSUBSCRIPT 0.011 end_POSTSUBSCRIPT 0.6720.004subscript0.6720.004\rm\textbf{0.672}_{0.004}0.672 start_POSTSUBSCRIPT 0.004 end_POSTSUBSCRIPT 0.7810.005subscript0.7810.005\rm 0.781_{0.005}0.781 start_POSTSUBSCRIPT 0.005 end_POSTSUBSCRIPT 0.6920.004subscript0.6920.004\rm 0.692_{0.004}0.692 start_POSTSUBSCRIPT 0.004 end_POSTSUBSCRIPT 0.7980.029subscript0.7980.029\rm 0.798_{0.029}0.798 start_POSTSUBSCRIPT 0.029 end_POSTSUBSCRIPT 1.8770.094subscript1.8770.094\rm 1.877_{0.094}1.877 start_POSTSUBSCRIPT 0.094 end_POSTSUBSCRIPT 0.6600.008subscript0.6600.008\rm 0.660_{0.008}0.660 start_POSTSUBSCRIPT 0.008 end_POSTSUBSCRIPT
UniUni\rm Uniroman_Uni-MolMol\rm Molroman_Mol 0.857¯0.005subscript¯0.8570.005\rm\underline{0.857}_{0.005}under¯ start_ARG 0.857 end_ARG start_POSTSUBSCRIPT 0.005 end_POSTSUBSCRIPT 0.659¯0.013subscript¯0.6590.013\rm\underline{0.659}_{0.013}under¯ start_ARG 0.659 end_ARG start_POSTSUBSCRIPT 0.013 end_POSTSUBSCRIPT 0.7960.006subscript0.7960.006\rm\textbf{0.796}_{0.006}0.796 start_POSTSUBSCRIPT 0.006 end_POSTSUBSCRIPT 0.696¯0.001subscript¯0.6960.001\rm\underline{0.696}_{0.001}under¯ start_ARG 0.696 end_ARG start_POSTSUBSCRIPT 0.001 end_POSTSUBSCRIPT 0.788¯0.029subscript¯0.7880.029\rm\underline{0.788}_{0.029}under¯ start_ARG 0.788 end_ARG start_POSTSUBSCRIPT 0.029 end_POSTSUBSCRIPT 1.620¯0.035subscript¯1.6200.035\rm\underline{1.620}_{0.035}under¯ start_ARG 1.620 end_ARG start_POSTSUBSCRIPT 0.035 end_POSTSUBSCRIPT 0.603¯0.010subscript¯0.6030.010\rm\underline{0.603}_{0.010}under¯ start_ARG 0.603 end_ARG start_POSTSUBSCRIPT 0.010 end_POSTSUBSCRIPT
3D3D\rm 3D3 roman_D-MolMol\rm Molroman_Mol 0.8720.004subscript0.8720.004\rm\textbf{0.872}_{0.004}0.872 start_POSTSUBSCRIPT 0.004 end_POSTSUBSCRIPT 0.6580.003subscript0.6580.003\rm 0.658_{0.003}0.658 start_POSTSUBSCRIPT 0.003 end_POSTSUBSCRIPT 0.792¯0.003subscript¯0.7920.003\rm\underline{0.792}_{0.003}under¯ start_ARG 0.792 end_ARG start_POSTSUBSCRIPT 0.003 end_POSTSUBSCRIPT 0.7010.003subscript0.7010.003\rm\textbf{0.701}_{0.003}0.701 start_POSTSUBSCRIPT 0.003 end_POSTSUBSCRIPT 0.7820.008subscript0.7820.008\rm\textbf{0.782}_{0.008}0.782 start_POSTSUBSCRIPT 0.008 end_POSTSUBSCRIPT 1.6170.050subscript1.6170.050\rm\textbf{1.617}_{0.050}1.617 start_POSTSUBSCRIPT 0.050 end_POSTSUBSCRIPT 0.6000.015subscript0.6000.015\rm\textbf{0.600}_{0.015}0.600 start_POSTSUBSCRIPT 0.015 end_POSTSUBSCRIPT

4.1.2 Finetune

We use 7 molecular property prediction datasets obtained from MoleculeNet to demonstrate the effectiveness of 3D-Mol. These datasets encompass a range of biophysics, physical chemistry and physiology. The details of the datasets are as followings:

  • BACE. The BACE dataset provides both quantitative (IC50) and qualitative (binary label) binding results for a set of inhibitors targeting human β𝛽\betaitalic_β-secretase 1 (BACE-1).

  • Tox21. The Tox21 initiative aims to advance toxicology practices in the 21st century and has created a public database containing qualitative toxicity measurements for 12 biological targets, including nuclear receptors and stress response pathways.

  • Toxcast. ToxCast, an initiative related to Tox21, offers a comprehensive collection of toxicology data obtained through in vitro high-throughput screening. It includes information from over 600 experiments and covers a large library of compounds.

  • SIDER. The SIDER database is a compilation of marketed drugs and their associated adverse drug reactions (ADRs), categorized into 27 system organ classes.

  • ESOL. The ESOL dataset is a smaller collection of water solubility data, specifically providing information on the log solubility in mols per liter for common organic small molecules.

  • FreeSolv. The FreeSolv database offers experimental and calculated hydration-free energy values for small molecules dissolved in water.

  • Lipo. Lipophilicity is a crucial characteristic of drug molecules that affects their membrane permeability and solubility. The Lipo dataset contains experimental data on the octanol/water distribution coefficient (logD at pH 7.4).

Extending from previous studies, we partition our datasets into training, validation, and test sets in an 80/10/10 ratio using scaffold splitting. This method groups molecules by their core structures, ensuring that each set features unique chemical scaffolds, in contrast to random splitting which allocates data indiscriminately of molecular similarity. This approach rigorously tests the model on novel chemical entities, offering a more stringent evaluation of its generalization capabilities. We report the mean and standard deviation by the results of 3 random seeds. The details of the finetuning settings and are in Appendix C.

4.2 Metric

In alignment with previous research, we employ the area under the receiver operating characteristic curve (ROC-AUC) as our evaluation metric for classification datasets. ROC-AUC is a prevalent and reliable measure for gauging the effectiveness of binary classification models. For regression datasets, we apply root-mean-squared-error (RMSE) as our assessment metric, which is a standard for evaluating the accuracy of regression models in predicting continuous variables.

4.3 Result

4.3.1 Overall performance

To validate the efficacy of our method, we compare it with several baseline methods. The baseline methods are as follows: N-Gram[8] generated a graph representation by constructing node embeddings based on short walks. PretrainGNN[7] implemented several types of self-supervised learning tasks. 3D Infomax[48] maximized the mutual information between learned 3D summary vectors and the representations of a GNN. MolCLR[10] is a 2D-2D view contrastive learning model that involves atom masking, bond deletion, and subgraph removal. GraphMVP[44] used 2D-3D view contrastive learning approaches. GROVER[11] focuses on node and graph level representation and corresponding pretraining tasks for each level.

Table 2: Benchmarking the 3D-Mol encoder and other non-pretraining methods. We compare the performance on the 7 molecular property prediction tasks, marking the best results in bold and underlining the second best.
Classification (ROC-AUC % higher is better ↑) Regression (RMSE, lower is better ↓)
Datasets BACE SIDER Tox21 ToxCast ESOL FreeSolv Lipophilicity
# Molecules 1513 1427 7831 8597 1128 643 4200
# Tasks 1 27 12 617 1 1 1
DMPNNDMPNN\rm DMPNNroman_DMPNN 0.8090.006subscript0.8090.006\rm 0.809_{0.006}0.809 start_POSTSUBSCRIPT 0.006 end_POSTSUBSCRIPT 0.5700.007subscript0.5700.007\rm 0.570_{0.007}0.570 start_POSTSUBSCRIPT 0.007 end_POSTSUBSCRIPT 0.7590.007subscript0.7590.007\rm 0.759_{0.007}0.759 start_POSTSUBSCRIPT 0.007 end_POSTSUBSCRIPT 0.6550.003subscript0.6550.003\rm 0.655_{0.003}0.655 start_POSTSUBSCRIPT 0.003 end_POSTSUBSCRIPT 1.0500.008subscript1.0500.008\rm 1.050_{0.008}1.050 start_POSTSUBSCRIPT 0.008 end_POSTSUBSCRIPT 2.0820.082subscript2.0820.082\rm 2.082_{0.082}2.082 start_POSTSUBSCRIPT 0.082 end_POSTSUBSCRIPT 0.6830.016subscript0.6830.016\rm 0.683_{0.016}0.683 start_POSTSUBSCRIPT 0.016 end_POSTSUBSCRIPT
AttentiveFPAttentiveFP\rm AttentiveFProman_AttentiveFP 0.7840.000subscript0.7840.000\rm 0.784_{0.000}0.784 start_POSTSUBSCRIPT 0.000 end_POSTSUBSCRIPT 0.6060.032subscript0.6060.032\rm 0.606_{0.032}0.606 start_POSTSUBSCRIPT 0.032 end_POSTSUBSCRIPT 0.7610.005subscript0.7610.005\rm 0.761_{0.005}0.761 start_POSTSUBSCRIPT 0.005 end_POSTSUBSCRIPT 0.6370.002subscript0.6370.002\rm 0.637_{0.002}0.637 start_POSTSUBSCRIPT 0.002 end_POSTSUBSCRIPT 0.8770.029subscript0.8770.029\rm 0.877_{0.029}0.877 start_POSTSUBSCRIPT 0.029 end_POSTSUBSCRIPT 2.0730.183subscript2.0730.183\rm 2.073_{0.183}2.073 start_POSTSUBSCRIPT 0.183 end_POSTSUBSCRIPT 0.7210.001subscript0.7210.001\rm 0.721_{0.001}0.721 start_POSTSUBSCRIPT 0.001 end_POSTSUBSCRIPT
MGCNMGCN\rm MGCNroman_MGCN 0.7340.008subscript0.7340.008\rm 0.734_{0.008}0.734 start_POSTSUBSCRIPT 0.008 end_POSTSUBSCRIPT 0.5870.019subscript0.5870.019\rm 0.587_{0.019}0.587 start_POSTSUBSCRIPT 0.019 end_POSTSUBSCRIPT 0.7410.006subscript0.7410.006\rm 0.741_{0.006}0.741 start_POSTSUBSCRIPT 0.006 end_POSTSUBSCRIPT -- -- -- --
SGCNSGCN\rm SGCNroman_SGCN -- 0.5590.005subscript0.5590.005\rm 0.559_{0.005}0.559 start_POSTSUBSCRIPT 0.005 end_POSTSUBSCRIPT 0.7660.002subscript0.7660.002\rm 0.766_{0.002}0.766 start_POSTSUBSCRIPT 0.002 end_POSTSUBSCRIPT 0.6570.003subscript0.6570.003\rm 0.657_{0.003}0.657 start_POSTSUBSCRIPT 0.003 end_POSTSUBSCRIPT 1.6290.001subscript1.6290.001\rm 1.629_{0.001}1.629 start_POSTSUBSCRIPT 0.001 end_POSTSUBSCRIPT 2.3630.050subscript2.3630.050\rm 2.363_{0.050}2.363 start_POSTSUBSCRIPT 0.050 end_POSTSUBSCRIPT 1.0210.013subscript1.0210.013\rm 1.021_{0.013}1.021 start_POSTSUBSCRIPT 0.013 end_POSTSUBSCRIPT
HMGNNHMGNN\rm HMGNNroman_HMGNN -- 0.615¯0.005subscript¯0.6150.005\rm\underline{0.615}_{0.005}under¯ start_ARG 0.615 end_ARG start_POSTSUBSCRIPT 0.005 end_POSTSUBSCRIPT 0.7680.002subscript0.7680.002\rm 0.768_{0.002}0.768 start_POSTSUBSCRIPT 0.002 end_POSTSUBSCRIPT 0.6720.001subscript0.6720.001\rm 0.672_{0.001}0.672 start_POSTSUBSCRIPT 0.001 end_POSTSUBSCRIPT 1.3900.073subscript1.3900.073\rm 1.390_{0.073}1.390 start_POSTSUBSCRIPT 0.073 end_POSTSUBSCRIPT 2.1230.179subscript2.1230.179\rm 2.123_{0.179}2.123 start_POSTSUBSCRIPT 0.179 end_POSTSUBSCRIPT 2.1160.473subscript2.1160.473\rm 2.116_{0.473}2.116 start_POSTSUBSCRIPT 0.473 end_POSTSUBSCRIPT
DimeNetDimeNet\rm DimeNetroman_DimeNet -- 0.6120.004subscript0.6120.004\rm 0.612_{0.004}0.612 start_POSTSUBSCRIPT 0.004 end_POSTSUBSCRIPT 0.774¯0.006subscript¯0.7740.006\rm\underline{0.774}_{0.006}under¯ start_ARG 0.774 end_ARG start_POSTSUBSCRIPT 0.006 end_POSTSUBSCRIPT 0.6370.004subscript0.6370.004\rm 0.637_{0.004}0.637 start_POSTSUBSCRIPT 0.004 end_POSTSUBSCRIPT 0.8780.023subscript0.8780.023\rm 0.878_{0.023}0.878 start_POSTSUBSCRIPT 0.023 end_POSTSUBSCRIPT 2.0940.118subscript2.0940.118\rm 2.094_{0.118}2.094 start_POSTSUBSCRIPT 0.118 end_POSTSUBSCRIPT 0.7270.019subscript0.7270.019\rm 0.727_{0.019}0.727 start_POSTSUBSCRIPT 0.019 end_POSTSUBSCRIPT
GEMGEM\rm GEMroman_GEM 0.828¯0.012subscript¯0.8280.012\underline{0.828}_{0.012}under¯ start_ARG 0.828 end_ARG start_POSTSUBSCRIPT 0.012 end_POSTSUBSCRIPT 0.6060.010subscript0.6060.010\rm 0.606_{0.010}0.606 start_POSTSUBSCRIPT 0.010 end_POSTSUBSCRIPT 0.7730.007subscript0.7730.007\rm{0.773}_{0.007}0.773 start_POSTSUBSCRIPT 0.007 end_POSTSUBSCRIPT 0.675¯0.005subscript¯0.6750.005\rm\underline{0.675}_{0.005}under¯ start_ARG 0.675 end_ARG start_POSTSUBSCRIPT 0.005 end_POSTSUBSCRIPT 0.832¯0.010subscript¯0.8320.010\rm\underline{0.832}_{0.010}under¯ start_ARG 0.832 end_ARG start_POSTSUBSCRIPT 0.010 end_POSTSUBSCRIPT 1.857¯0.071subscript¯1.8570.071\rm\underline{1.857}_{0.071}under¯ start_ARG 1.857 end_ARG start_POSTSUBSCRIPT 0.071 end_POSTSUBSCRIPT 0.666¯0.015subscript¯0.6660.015\rm\underline{0.666}_{0.015}under¯ start_ARG 0.666 end_ARG start_POSTSUBSCRIPT 0.015 end_POSTSUBSCRIPT
3D3D\rm 3D3 roman_D-Molw.opretrainsubscriptMolformulae-sequencewopretrain\rm Mol_{w.o\ pretrain}roman_Mol start_POSTSUBSCRIPT roman_w . roman_o roman_pretrain end_POSTSUBSCRIPT 0.8390.005subscript0.8390.005\rm\textbf{0.839}_{0.005}0.839 start_POSTSUBSCRIPT 0.005 end_POSTSUBSCRIPT 0.6480.013subscript0.6480.013\rm\textbf{0.648}_{0.013}0.648 start_POSTSUBSCRIPT 0.013 end_POSTSUBSCRIPT 0.7900.004subscript0.7900.004\rm\textbf{0.790}_{0.004}0.790 start_POSTSUBSCRIPT 0.004 end_POSTSUBSCRIPT 0.6950.007subscript0.6950.007\rm\textbf{0.695}_{0.007}0.695 start_POSTSUBSCRIPT 0.007 end_POSTSUBSCRIPT 0.8070.027subscript0.8070.027\rm\textbf{0.807}_{0.027}0.807 start_POSTSUBSCRIPT 0.027 end_POSTSUBSCRIPT 1.6670.037subscript1.6670.037\rm\textbf{1.667}_{0.037}1.667 start_POSTSUBSCRIPT 0.037 end_POSTSUBSCRIPT 0.6200.004subscript0.6200.004\rm\textbf{0.620}_{0.004}0.620 start_POSTSUBSCRIPT 0.004 end_POSTSUBSCRIPT

GEM[16] employed predictive geometry self-supervised learning schemes that leverage 3D molecular information.

Table 3: Ablation study. We study the performance of 3D-Mol in four scenarios: 3D-Mol, 3D-Mol without pretraining, 3D-Mol without weight of contrastive learning, 3D-Mol without dihedral-angle graph, then mark the best results in bold and underline the second best.
Classification (ROC-AUC % higher is better ↑) Regression (RMSE, lower is better ↓)
Datasets BACE SIDER Tox21 ToxCast ESOL FreeSolv Lipophilicity
# Molecules 1513 1427 7831 8597 1128 643 4200
# Tasks 1 27 12 617 1 1 1
3D3D\rm 3D3 roman_D-MolMol\rm Molroman_Mol 0.8720.004subscript0.8720.004\rm\textbf{0.872}_{0.004}0.872 start_POSTSUBSCRIPT 0.004 end_POSTSUBSCRIPT 0.6580.003subscript0.6580.003\rm\textbf{0.658}_{0.003}0.658 start_POSTSUBSCRIPT 0.003 end_POSTSUBSCRIPT 0.7920.003subscript0.7920.003\rm\textbf{0.792}_{0.003}0.792 start_POSTSUBSCRIPT 0.003 end_POSTSUBSCRIPT 0.7010.003subscript0.7010.003\rm\textbf{0.701}_{0.003}0.701 start_POSTSUBSCRIPT 0.003 end_POSTSUBSCRIPT 0.7820.008subscript0.7820.008\rm\textbf{0.782}_{0.008}0.782 start_POSTSUBSCRIPT 0.008 end_POSTSUBSCRIPT 1.6170.050subscript1.6170.050\rm\textbf{1.617}_{0.050}1.617 start_POSTSUBSCRIPT 0.050 end_POSTSUBSCRIPT 0.6000.015subscript0.6000.015\rm\textbf{0.600}_{0.015}0.600 start_POSTSUBSCRIPT 0.015 end_POSTSUBSCRIPT
3D3D\rm 3D3 roman_D-Molw.opretrainsubscriptMolformulae-sequencewopretrain\rm Mol_{w.o\ pretrain}roman_Mol start_POSTSUBSCRIPT roman_w . roman_o roman_pretrain end_POSTSUBSCRIPT 0.8390.005subscript0.8390.005\rm 0.839_{0.005}0.839 start_POSTSUBSCRIPT 0.005 end_POSTSUBSCRIPT 0.6480.013subscript0.6480.013\rm 0.648_{0.013}0.648 start_POSTSUBSCRIPT 0.013 end_POSTSUBSCRIPT 0.7900.004subscript0.7900.004\rm 0.790_{0.004}0.790 start_POSTSUBSCRIPT 0.004 end_POSTSUBSCRIPT 0.6950.007subscript0.6950.007\rm 0.695_{0.007}0.695 start_POSTSUBSCRIPT 0.007 end_POSTSUBSCRIPT 0.8070.027subscript0.8070.027\rm 0.807_{0.027}0.807 start_POSTSUBSCRIPT 0.027 end_POSTSUBSCRIPT 1.667¯0.037subscript¯1.6670.037\rm\underline{1.667}_{0.037}under¯ start_ARG 1.667 end_ARG start_POSTSUBSCRIPT 0.037 end_POSTSUBSCRIPT 0.6200.004subscript0.6200.004\rm 0.620_{0.004}0.620 start_POSTSUBSCRIPT 0.004 end_POSTSUBSCRIPT
3D3D\rm 3D3 roman_D-Molw.o.clweightsubscriptMolformulae-sequencewoclweight\rm Mol_{w.o.cl-weight}roman_Mol start_POSTSUBSCRIPT roman_w . roman_o . roman_cl - roman_weight end_POSTSUBSCRIPT 0.851¯0.003subscript¯0.8510.003\rm\underline{0.851}_{0.003}under¯ start_ARG 0.851 end_ARG start_POSTSUBSCRIPT 0.003 end_POSTSUBSCRIPT 0.6450.009subscript0.6450.009\rm 0.645_{0.009}0.645 start_POSTSUBSCRIPT 0.009 end_POSTSUBSCRIPT 0.7860.005subscript0.7860.005\rm 0.786_{0.005}0.786 start_POSTSUBSCRIPT 0.005 end_POSTSUBSCRIPT 0.6960.002subscript0.6960.002\rm 0.696_{0.002}0.696 start_POSTSUBSCRIPT 0.002 end_POSTSUBSCRIPT 0.795¯0.016subscript¯0.7950.016\rm\underline{0.795}_{0.016}under¯ start_ARG 0.795 end_ARG start_POSTSUBSCRIPT 0.016 end_POSTSUBSCRIPT 1.7050.038subscript1.7050.038\rm 1.705_{0.038}1.705 start_POSTSUBSCRIPT 0.038 end_POSTSUBSCRIPT 0.612¯0.010subscript¯0.6120.010\rm\underline{0.612}_{0.010}under¯ start_ARG 0.612 end_ARG start_POSTSUBSCRIPT 0.010 end_POSTSUBSCRIPT
3D3D\rm 3D3 roman_D-Molw.o.dihesanglegraphsubscriptMolformulae-sequencewodihesanglegraph\rm Mol_{w.o.dihes-angle-graph}roman_Mol start_POSTSUBSCRIPT roman_w . roman_o . roman_dihes - roman_angle - roman_graph end_POSTSUBSCRIPT 0.8440.004subscript0.8440.004\rm 0.844_{0.004}0.844 start_POSTSUBSCRIPT 0.004 end_POSTSUBSCRIPT 0.649¯0.006subscript¯0.6490.006\rm\underline{0.649}_{0.006}under¯ start_ARG 0.649 end_ARG start_POSTSUBSCRIPT 0.006 end_POSTSUBSCRIPT 0.791¯0.005subscript¯0.7910.005\rm\underline{0.791}_{0.005}under¯ start_ARG 0.791 end_ARG start_POSTSUBSCRIPT 0.005 end_POSTSUBSCRIPT 0.698¯0.006subscript¯0.6980.006\rm\underline{0.698}_{0.006}under¯ start_ARG 0.698 end_ARG start_POSTSUBSCRIPT 0.006 end_POSTSUBSCRIPT 0.8120.015subscript0.8120.015\rm 0.812_{0.015}0.812 start_POSTSUBSCRIPT 0.015 end_POSTSUBSCRIPT 1.7820.007subscript1.7820.007\rm 1.782_{0.007}1.782 start_POSTSUBSCRIPT 0.007 end_POSTSUBSCRIPT 0.612¯0.005subscript¯0.6120.005\rm\underline{0.612}_{0.005}under¯ start_ARG 0.612 end_ARG start_POSTSUBSCRIPT 0.005 end_POSTSUBSCRIPT

Uni-Mol[17] enlarged the application scope and representation ability of molecular representation learning by using transformer. Table 1 presents compelling evidence that methods based on 3D information surpass those based on 2D in molecular modeling, as evidenced by the improved outcomes across multiple datasets. Besides, compared to current 3D-based approaches, our method achieves the best results in 5 datasets and the second-best in Tox21, highlighting its exceptional performance. Notably, our method exhibits a remarkable lead in BACE dataset. This not only affirms the value of considering spatial configurations in predictive models but also indicates our method’s potential in utilizing this information for high-precision molecular property predictions. Moreover, our method’s dominance extends across all datasets, except for some toxicity datasets. In these cases, its focus on geometric rather than substructural information, which is crucial for toxicity prediction, suggests an avenue for further refinement.

4.3.2 Encoder performance

To validate the efficacy of 3D-Mol encoder, we compare it with several baseline molecular encoder that do not employ pretraining. The baseline molecular encoders are as follows: DMPNN[6] employed a message passing scheme for molecular property prediction. AttentiveFP[9] is an attention-based GNN that incorporates graph-level information. MGCN[28] designed a hierarchical GNN to directly extract features from conformation and spatial information, followed by multilevel interactions. HMGNN[14] leverages global molecular representations through an attention mechanism. SGCN[15] applies different weights according to atomic distances during the message passing process. DimeNet[13] proposed directional message passing to fully utilize directional information within molecule. GEM[16] employed message passing strategy to extract 3D molecular information. As the results shown in Table 2, 3D-Mol encoder significantly outperforms all the baselines on both types of tasks and improves the performance over the best baselines with 2%percent\%% and 11%percent\%% for classification and regression tasks, respectively, since 3D-Mol incorporates geometrical parameters.

4.3.3 Ablation study

To validate the efficacy of our pretraining task, we study the performance of the 3D-Mol encoder in three scenarios: 3D-Mol, 3D-Mol without pretraining, and 3D-Mol without the dihedral-angle graph. The results are shown in Table 3. Compared with the 3D-Mol and 3D-Mol without pretraining, the former performs better in all datasets, demonstrating that our pretraining method can improve encoder performance. Compared to the version without contrastive learning weights from fingerprints and 3D descriptors, 3D-Mol demonstrates superior performance across all datasets. This improvement shows that using weights from fingerprints and 3D descriptors effectively optimizes our contrastive loss, enhancing encoder performance. Similarly, Compared with the 3D-Mol and 3D-Mol without Gdasubscript𝐺𝑑𝑎G_{d-a}italic_G start_POSTSUBSCRIPT italic_d - italic_a end_POSTSUBSCRIPT, the former also shows better performance in all datasets, indicating that the dihedral-angle graph contributes to improved encoder performance. In general, our pretraining and modeling methods enhance the 3D-Mol encoder performance, as the model can more effectively learn the 3D structural information of molecule.

Refer to caption
Figure 5: Case study of 3D information enhance using 3D-Mol for the BACE task. This figure illustrates the prediction results for three molecules identified as BACE inhibitors, showcasing the efficacy of 3D-Mol, which utilizes three-dimensional molecular information, versus GIN, which relies on two-dimensional data. Each molecule is displayed with its respective name (Q27467123, SCHEMBL12917066, Q27455563), and the results indicate that 3D-Mol accurately predicts BACE inhibition across all cases, while GIN fails to do so.
Refer to caption
Figure 6: Case study of 3D information enhance using 3D-Mol for the Freesolv task. This figure illustrates the prediction results for hydration free energy of three molecules, showcasing the efficacy of 3D-Mol, which utilizes three-dimensional molecular information, versus GIN, which relies on two-dimensional data. Each molecule is displayed with its respective name (2,3,4,5-Tetrachlorobiphenyl, p-Anisidine, 3-Nitroaniline), and the results indicate that 3D-Mol accurately predicts hydration free energy across all cases, while GIN fails to do so.

4.3.4 Case Study

In this case study, we explore the predictive capabilities of 3D-Mol, which utilizes three-dimensional molecular data, compared to GIN, which relies solely on two-dimensional information, focusing on their efficacy in identifying inhibitors of the β𝛽\betaitalic_β-secretase 1 (BACE) enzyme, a crucial target for Alzheimer’s disease treatment. We specifically analyze three molecules: Q27467123, which features a chlorinated aromatic ring that may form key halogen bonds within the BACE active site; SCHEMBL12917066, with a fluorinated aromatic ring that enhances molecular stacking interactions; and Q27455563, a molecule with a bicyclic structure potentially forming multiple hydrogen bonds. As shown in Figure 5, 3D-Mol’s approach, which integrates these 3D conformations and potential interactions such as hydrogen bonding, hydrophobic contacts, and precise geometric fitting into the BACE active site, leads to accurate predictions of their inhibitory activities. In contrast, GIN, unable to account for such intricate 3D-dependent interactions, fails to recognize these molecules as potential inhibitors, demonstrating limitations in capturing the necessary depth and spatial relationships that are crucial for binding. This case study highlights the importance of incorporating 3D spatial awareness in computational models for drug discovery, particularly when targeting enzymes like BACE, where the precise alignment and interaction of molecules within a complex three-dimensional space significantly influence their therapeutic efficacy.
Having validated our 3D-Mol framework with the BACE dataset, we extend its application to the Freesolv dataset, which is instrumental for predicting the hydration free energy of small molecules in water—a crucial determinant for solubility in drug discovery and molecular design. We present a case study of three molecules with distinctive 3D structures that influence their solvation behaviors: 2,3,4,5-Tetrachlorobiphenyl, noted for its hydrophobicity due to chlorine substitution; p-Anisidine, which can engage in hydrogen bonding; and 3-Nitroaniline, where the nitro group impacts solvation through electron withdrawal. Figure 6 displays a comparative analysis of the 3D-Mol and GIN model predictions for these molecules. The Freesolv values and RMSE scores indicate 3D-Mol’s superior ability to capture the impact of molecular geometry on solvation. This case study highlights the necessity of 3D information for precise prediction of solvation-related molecular properties, reinforcing the utility of the 3D-Mol framework in computational chemistry.

5 Conclusion and Discussion

3D-Mol, our novel framework introduced in this paper, significantly advances molecular property prediction by leveraging a unique hierarchical graph-based embedding and a contrastive learning component. This approach allows for a comprehensive capture of 3D molecular structures, setting a new standard in the field. Demonstrating superior performance over multiple benchmark models, 3D-Mol holds immense potential in revolutionizing AI-assisted drug discovery and molecular design.

3D-Mol’s encoder distinctly outperforms all baselines, with performance improvements of 2% in classification and 11% in regression tasks compared to traditional and non-pretraining methods. These advancements are pivotal in drug discovery, as they promise to expedite the development of new treatments through more accurate molecular predictions. The primary challenge faced by 3D-Mol is the time-intensive generation of 3D conformations and the encoding of hierarchical graphs. Future work will focus on optimizing these processes to enhance the model’s efficiency and practical scalability, further solidifying its role in advancing computational chemistry and pharmaceutical development.

Acknowledgment

The research was supported by the Peng Cheng Laboratory and by Peng Cheng Laboratory Cloud-Brain.

References

  • [1] Garrett B. Goh, Nathan O. Hodas, Charles Siegel, and Abhinav Vishnu. SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties. 2017.
  • [2] Kexin Huang, Tianfan Fu, Lucas M Glass, Marinka Zitnik, Cao Xiao, and Jimeng Sun. Deeppurpose: a deep learning library for drug–target interaction prediction. Bioinformatics, 36(22-23):5545–5547, 2020.
  • [3] Seyone Chithrananda, Gabriel Grand, and Bharath Ramsundar. ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction. 2020.
  • [4] David Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences, 28(1):31–36, 1988.
  • [5] Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. Neural message passing for quantum chemistry. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1263–1272. PMLR, 06–11 Aug 2017.
  • [6] Kevin Yang, Kyle Swanson, Wengong **, Connor Coley, Philipp Eiden, Hua Gao, Angel Guzman-Perez, Timothy Hopper, Brian Kelley, Miriam Mathea, et al. Analyzing learned molecular representations for property prediction. Journal of chemical information and modeling, 59(8):3370–3388, 2019.
  • [7] Weihua Hu, Bowen Liu, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, and Jure Leskovec. Strategies for Pre-training Graph Neural Networks. 2019.
  • [8] Shengchao Liu, Mehmet F Demirel, and Yingyu Liang. N-gram graph: Simple unsupervised representation for graphs, with applications to molecules. Advances in neural information processing systems, 32, 2019.
  • [9] Zhao** Xiong, Dingyan Wang, Xiaohong Liu, Feisheng Zhong, Xiaozhe Wan, Xutong Li, Zhaojun Li, Xiaomin Luo, Kaixian Chen, Hualiang Jiang, et al. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. Journal of medicinal chemistry, 63(16):8749–8760, 2019.
  • [10] Yuyang Wang, Jianren Wang, Zhonglin Cao, and Amir Barati Farimani. Molecular contrastive learning of representations via graph neural networks. Nature Machine Intelligence, 4(3):279–287, March 2022.
  • [11] Yu Rong, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying WEI, Wenbing Huang, and Junzhou Huang. Self-supervised graph transformer on large-scale molecular data. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 12559–12571. Curran Associates, Inc., 2020.
  • [12] Kristof Schütt, Pieter-Jan Kindermans, Huziel Enoc Sauceda Felix, Stefan Chmiela, Alexandre Tkatchenko, and Klaus-Robert Müller. Schnet: A continuous-filter convolutional neural network for modeling quantum interactions. Advances in neural information processing systems, 30, 2017.
  • [13] Johannes Gasteiger, Janek Groß, and Stephan Günnemann. Directional message passing for molecular graphs. arXiv preprint arXiv:2003.03123, 2020.
  • [14] Zeren Shui and George Karypis. Heterogeneous molecular graph neural networks for predicting molecule properties. In 2020 IEEE International Conference on Data Mining (ICDM), pages 492–500. IEEE, 2020.
  • [15] Tomasz Danel, Przemysław Spurek, Jacek Tabor, Marek Śmieja, Łukasz Struski, Agnieszka Słowik, and Łukasz Maziarka. Spatial graph convolutional networks. In Neural Information Processing: 27th International Conference, ICONIP 2020, Bangkok, Thailand, November 18–22, 2020, Proceedings, Part V, pages 668–675. Springer, 2020.
  • [16] Xiaomin Fang, Lihang Liu, Jieqiong Lei, Donglong He, Shanzhuo Zhang, **gbo Zhou, Fan Wang, Hua Wu, and Haifeng Wang. Geometry-enhanced molecular representation learning for property prediction. Nature Machine Intelligence, 4(2):127–134, 2022.
  • [17] Gengmo Zhou, Zhifeng Gao, Qiankun Ding, Hang Zheng, Hongteng Xu, Zhewei Wei, Linfeng Zhang, and Guolin Ke. Uni-mol: a universal 3d molecular representation learning framework. 2023.
  • [18] Zuobai Zhang, Minghao Xu, Arian Jamasb, Vijil Chenthamarakshan, Aurelie Lozano, Payel Das, and Jian Tang. Protein representation learning by geometric structure pretraining. arXiv preprint arXiv:2203.06125, 2022.
  • [19] Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S Pappu, Karl Leswing, and Vijay Pande. Moleculenet: a benchmark for molecular machine learning. Chemical science, 9(2):513–530, 2018.
  • [20] Adrià Cereto-Massagué, María José Ojeda, Cristina Valls, Miquel Mulero, Santiago Garcia-Vallvé, and Gerard Pujadas. Molecular fingerprint similarity search in virtual screening. Methods, 71:58–63, 2015.
  • [21] Connor W Coley, Regina Barzilay, William H Green, Tommi S Jaakkola, and Klavs F Jensen. Convolutional embedding of attributed molecular graphs for physical property prediction. Journal of chemical information and modeling, 57(8):1757–1772, 2017.
  • [22] David Rogers and Mathew Hahn. Extended-connectivity fingerprints. Journal of chemical information and modeling, 50(5):742–754, 2010.
  • [23] Joseph L Durant, Burton A Leland, Douglas R Henry, and James G Nourse. Reoptimization of mdl keys for use in drug discovery. Journal of chemical information and computer sciences, 42(6):1273–1280, 2002.
  • [24] Sheng Wang, Yuzhi Guo, Yuhong Wang, Hongmao Sun, and Junzhou Huang. Smiles-bert: large scale unsupervised pre-training for molecular property prediction. In Proceedings of the 10th ACM international conference on bioinformatics, computational biology and health informatics, pages 429–436, 2019.
  • [25] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2018.
  • [26] Jike Wang, Dongsheng Cao, Cunchen Tang, Lei Xu, Qiaojun He, Bo Yang, Xi Chen, Huiyong Sun, and Tingjun Hou. Deepatomiccharge: a new graph convolutional network-based architecture for accurate prediction of atomic charges. Briefings in bioinformatics, 22(3):bbaa183, 2021.
  • [27] Xiao-Shuang Li, Xiang Liu, Le Lu, Xian-Sheng Hua, Ying Chi, and Kelin Xia. Multiphysical graph neural network (mp-gnn) for covid-19 drug design. Briefings in Bioinformatics, 23(4), 2022.
  • [28] Chengqiang Lu, Qi Liu, Chao Wang, Zhenya Huang, Peize Lin, and Lixin He. Molecular property prediction: A multilevel quantum interactions modeling perspective. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 1052–1060, 2019.
  • [29] Zhuoran Qiao, Matthew Welborn, Animashree Anandkumar, Frederick R Manby, and Thomas F Miller. Orbnet: Deep learning for quantum chemistry using symmetry-adapted atomic-orbital features. The Journal of chemical physics, 153(12), 2020.
  • [30] Zhen Li, Mingjian Jiang, Shuang Wang, and Shugang Zhang. Deep learning methods for molecular representation and property prediction. Drug Discovery Today, page 103373, 2022.
  • [31] Marta M Stepniewska-Dziubinska, Piotr Zielenkiewicz, and Pawel Siedlecki. Development and evaluation of a deep learning model for protein–ligand binding affinity prediction. Bioinformatics, 34(21):3666–3674, 2018.
  • [32] Jocelyn Sunseri and David R Koes. Libmolgrid: graphics processing unit accelerated molecular gridding for deep learning applications. Journal of chemical information and modeling, 60(3):1079–1084, 2020.
  • [33] Qinqing Liu, Peng-Shuai Wang, Chunjiang Zhu, Blake Blumenfeld Gaines, Tan Zhu, **bo Bi, and Minghu Song. Octsurf: Efficient hierarchical voxel-based molecular surface representation for protein-ligand affinity prediction. Journal of Molecular Graphics and Modelling, 105:107865, 2021.
  • [34] Luciano Floridi and Massimo Chiriatti. Gpt-3: Its nature, scope, limits, and consequences. Minds and Machines, 30:681–694, 2020.
  • [35] Yinhan Liu, Myle Ott, Naman Goyal, **gfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  • [36] Shion Honda, Shoi Shi, and Hiroki R Ueda. Smiles transformer: Pre-trained molecular fingerprint for low data drug discovery. arXiv preprint arXiv:1911.04738, 2019.
  • [37] Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. Graph contrastive learning with augmentations. Advances in neural information processing systems, 33:5812–5823, 2020.
  • [38] Mengying Sun, **g Xing, Huijun Wang, Bin Chen, and Jiayu Zhou. Mocl: Data-driven molecular fingerprint via knowledge-aware contrastive learning from molecular graph. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 3585–3594, 2021.
  • [39] Pengyong Li, Jun Wang, Yixuan Qiao, Hao Chen, Yihuan Yu, Xiaojun Yao, Peng Gao, Guotong Xie, and Sen Song. An effective self-supervised framework for learning expressive molecular global representations to drug discovery. Briefings in Bioinformatics, 22(6):bbab109, 2021.
  • [40] Yuyang Wang, Rishikesh Magar, Chen Liang, and Amir Barati Farimani. Improving molecular contrastive learning via faulty negative mitigation and decomposed fragment contrast. Journal of Chemical Information and Modeling, 62(11):2713–2725, 2022.
  • [41] Qingyun Sun, Jianxin Li, Hao Peng, Jia Wu, Yuanxing Ning, Philip S Yu, and Lifang He. Sugar: Subgraph neural network with reinforcement pooling and self-supervised mutual information mechanism. In Proceedings of the Web Conference 2021, pages 2081–2091, 2021.
  • [42] Zewei Ji, Runhan Shi, Jiarui Lu, Fang Li, and Yang Yang. Relmole: Molecular representation learning based on two-level graph similarities. Journal of Chemical Information and Modeling, 62(22):5361–5372, 2022.
  • [43] Hyeoncheol Cho and Insung S Choi. Enhanced deep-learning prediction of molecular properties via augmentation of bond topology. ChemMedChem, 14(17):1604–1609, 2019.
  • [44] Shengchao Liu, Hanchen Wang, Weiyang Liu, Joan Lasenby, Hongyu Guo, and Jian Tang. Pre-training molecular graph representation with 3d geometry. arXiv preprint arXiv:2110.07728, 2021.
  • [45] Greg Landrum et al. Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum, 8, 2013.
  • [46] John J Irwin, Khanh G Tang, Jennifer Young, Chinzorig Dandarchuluun, Benjamin R Wong, Munkhzul Khurelbaatar, Yurii S Moroz, John Mayfield, and Roger A Sayle. Zinc20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling, 60(12):6065–6073, 2020.
  • [47] Yanli Wang, Jewen Xiao, Tugba O Suzek, Jian Zhang, Jiyao Wang, and Stephen H Bryant. Pubchem: a public information system for analyzing bioactivities of small molecules. Nucleic acids research, 37(suppl_2):W623–W633, 2009.
  • [48] Hannes Stärk, Dominique Beaini, Gabriele Corso, Prudencio Tossou, Christian Dallago, Stephan Günnemann, and Pietro Liò. 3d infomax improves gnns for molecular property prediction. In International Conference on Machine Learning, pages 20479–20502. PMLR, 2022.

Appendix A 3D Conformation Descriptor and Fingerprint

A.1 Fingerprint

In our study, we integrate molecular fingerprints, particularly Morgan fingerprints, to calculate weights for negative pairs in our model. These fingerprints, which provide a compact numerical representation of molecular structures, are crucial for computational chemistry tasks. The Morgan fingerprint method iteratively updates each atom’s representation based on its chemical surroundings, resulting in a detailed binary vector of the molecule. By evaluating the similarity between Morgan fingerprints, we derive a precise weighting mechanism for negative pairs, enhancing our model’s ability to detect and differentiate molecular structures. This methodology not only improves our model’s accuracy in molecular interaction analysis but also adds to its overall predictive capabilities.

A.2 3D Conformation Descriptor

Molecular 3D conformation descriptors are computational tools used to represent the three-dimensional arrangement of atoms within a molecule, capturing critical aspects of its spatial geometry. These descriptors are crucial in understanding how molecular shape influences chemical and biological properties, and they play a significant role in fields like drug design and materials science. The 3D-Morse descriptor, specifically, is a type of 3D molecular descriptor that quantifies the molecular structure using electron diffraction patterns, offering a unique approach to encapsulating the spatial distribution of atoms. It provides a detailed and nuanced representation of molecular conformation, making it highly valuable in computational chemistry and cheminformatics. In our research, we employ 3D-Morse descriptors to measure the similarity of molecular 3D conformations, enabling us to compare and analyze molecular structures effectively and identify potential similarities in their biological or chemical behavior. This application of 3D-Morse descriptors is instrumental in fields such as drug discovery, where understanding molecular similarities can lead to the identification of new therapeutic compounds or the prediction of their activities.

Appendix B The contribution of pretraining method

Table 4: The contribution of pretraining method. We study the performance of 3D-Mol in three scenarios: contrastive learning only, supervised pretraining only, complete pretraining method, then mark the best results in bold and underline the second best.
Classification (ROC-AUC % higher is better ↑) Regression (RMSE, lower is better ↓)
Datasets BACE SIDER Tox21 ToxCast ESOL FreeSolv Lipophilicity
# Molecules 1513 1427 7831 8597 1128 643 4200
# Tasks 1 27 12 617 1 1 1
ContrastiveLearningOnlyContrastiveLearningOnly\rm Contrastive-Learning-Onlyroman_Contrastive - roman_Learning - roman_Only 0.8470.002subscript0.8470.002\rm 0.847_{0.002}0.847 start_POSTSUBSCRIPT 0.002 end_POSTSUBSCRIPT 0.652¯0.012subscript¯0.6520.012\rm\underline{0.652}_{0.012}under¯ start_ARG 0.652 end_ARG start_POSTSUBSCRIPT 0.012 end_POSTSUBSCRIPT 0.7910.008subscript0.7910.008\rm 0.791_{0.008}0.791 start_POSTSUBSCRIPT 0.008 end_POSTSUBSCRIPT 0.6930.003subscript0.6930.003\rm 0.693_{0.003}0.693 start_POSTSUBSCRIPT 0.003 end_POSTSUBSCRIPT 0.8020.036subscript0.8020.036\rm 0.802_{0.036}0.802 start_POSTSUBSCRIPT 0.036 end_POSTSUBSCRIPT 1.6820.86subscript1.6820.86\rm 1.682_{0.86}1.682 start_POSTSUBSCRIPT 0.86 end_POSTSUBSCRIPT 0.6160.023subscript0.6160.023\rm 0.616_{0.023}0.616 start_POSTSUBSCRIPT 0.023 end_POSTSUBSCRIPT
SupervisedPretrainingOnlySupervisedPretrainingOnly\rm Supervised-Pretraining-Onlyroman_Supervised - roman_Pretraining - roman_Only 0.862¯0.006subscript¯0.8620.006\rm\underline{0.862}_{0.006}under¯ start_ARG 0.862 end_ARG start_POSTSUBSCRIPT 0.006 end_POSTSUBSCRIPT 0.6470.007subscript0.6470.007\rm 0.647_{0.007}0.647 start_POSTSUBSCRIPT 0.007 end_POSTSUBSCRIPT 0.7960.002subscript0.7960.002\rm\textbf{0.796}_{0.002}0.796 start_POSTSUBSCRIPT 0.002 end_POSTSUBSCRIPT 0.697¯0.003subscript¯0.6970.003\rm\underline{0.697}_{0.003}under¯ start_ARG 0.697 end_ARG start_POSTSUBSCRIPT 0.003 end_POSTSUBSCRIPT 0.795¯0.022subscript¯0.7950.022\rm\underline{0.795}_{0.022}under¯ start_ARG 0.795 end_ARG start_POSTSUBSCRIPT 0.022 end_POSTSUBSCRIPT 1.664¯0.070subscript¯1.6640.070\rm\underline{1.664}_{0.070}under¯ start_ARG 1.664 end_ARG start_POSTSUBSCRIPT 0.070 end_POSTSUBSCRIPT 0.613¯0.004subscript¯0.6130.004\rm\underline{0.613}_{0.004}under¯ start_ARG 0.613 end_ARG start_POSTSUBSCRIPT 0.004 end_POSTSUBSCRIPT
CompletePretrainingMethodCompletePretrainingMethod\rm Complete-Pretraining-Methodroman_Complete - roman_Pretraining - roman_Method 0.8720.004subscript0.8720.004\rm\textbf{0.872}_{0.004}0.872 start_POSTSUBSCRIPT 0.004 end_POSTSUBSCRIPT 0.6580.003subscript0.6580.003\rm\textbf{0.658}_{0.003}0.658 start_POSTSUBSCRIPT 0.003 end_POSTSUBSCRIPT 0.792¯0.003subscript¯0.7920.003\rm\underline{0.792}_{0.003}under¯ start_ARG 0.792 end_ARG start_POSTSUBSCRIPT 0.003 end_POSTSUBSCRIPT 0.7010.003subscript0.7010.003\rm\textbf{0.701}_{0.003}0.701 start_POSTSUBSCRIPT 0.003 end_POSTSUBSCRIPT 0.7820.008subscript0.7820.008\rm\textbf{0.782}_{0.008}0.782 start_POSTSUBSCRIPT 0.008 end_POSTSUBSCRIPT 1.6170.050subscript1.6170.050\rm\textbf{1.617}_{0.050}1.617 start_POSTSUBSCRIPT 0.050 end_POSTSUBSCRIPT 0.6000.015subscript0.6000.015\rm\textbf{0.600}_{0.015}0.600 start_POSTSUBSCRIPT 0.015 end_POSTSUBSCRIPT

In this section, we discuss the contributions of contrastive learning and supervised pretraining methods to our pretraining approach. We pretrained our model using three approaches: contrastive Learning only, supervised pretraining only, and complete pretraining method. We compared their performance on 7 benchmark datasets. As the Table 4 shown, the contributions of both contrastive learning and supervised pretraining were less significant than the complete method. These findings emphasize that while both contrastive learning and supervised pretraining contribute positively to the model’s performance, their combination is crucial for achieving optimal results.

Appendix C Finetuning Details

During finetuning for each downstream task, we randomly search the hyper-parameters to find the best performing setting on the validation set and report the results on the test set. Table 5 lists the combinations of different hyper-parameters.

Table 5: hyper-parameter setting
Name Description Range
lrMLP𝑙subscript𝑟𝑀𝐿𝑃lr_{MLP}italic_l italic_r start_POSTSUBSCRIPT italic_M italic_L italic_P end_POSTSUBSCRIPT Initial learning rate for MLP head {0.004,0.001,0.0004}0.0040.0010.0004\{0.004,0.001,0.0004\}{ 0.004 , 0.001 , 0.0004 }
lrENC𝑙subscript𝑟𝐸𝑁𝐶lr_{ENC}italic_l italic_r start_POSTSUBSCRIPT italic_E italic_N italic_C end_POSTSUBSCRIPT Initial learning rate for the pre-trained encoder {0.001,0.0004,0.0001}0.0010.00040.0001\{0.001,0.0004,0.0001\}{ 0.001 , 0.0004 , 0.0001 }
Epoch𝐸𝑝𝑜𝑐Epochitalic_E italic_p italic_o italic_c italic_h The number of epoch in finetuning stage {60,80,100}6080100\{60,80,100\}{ 60 , 80 , 100 }
numlayer𝑛𝑢subscript𝑚𝑙𝑎𝑦𝑒𝑟num_{layer}italic_n italic_u italic_m start_POSTSUBSCRIPT italic_l italic_a italic_y italic_e italic_r end_POSTSUBSCRIPT Number of hidden layers in MLP {2,3}23\{2,3\}{ 2 , 3 }
Dropout𝐷𝑟𝑜𝑝𝑜𝑢𝑡Dropoutitalic_D italic_r italic_o italic_p italic_o italic_u italic_t Dropout ratio for the model {0,0.1,0.2,0.5}00.10.20.5\{0,0.1,0.2,0.5\}{ 0 , 0.1 , 0.2 , 0.5 }
Hiddensize𝐻𝑖𝑑𝑑𝑒subscript𝑛𝑠𝑖𝑧𝑒Hidden_{size}italic_H italic_i italic_d italic_d italic_e italic_n start_POSTSUBSCRIPT italic_s italic_i italic_z italic_e end_POSTSUBSCRIPT Size of hidden layers in MLP {32,64,128,256}3264128256\{32,64,128,256\}{ 32 , 64 , 128 , 256 }

Appendix D Environment

CPU:
\bullet Architect: X86𝑋86X86italic_X 86 64646464
\bullet Number of CPUs: 96
\bullet Model: Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz

GPU:
\bullet Type: Tesla V100-SXM2-32GB
\bullet Count: 8
\bullet Driver Version: 450.80.02
\bullet CUDA Version: 11.7

Software Environment:
\bullet Operating System: Ubuntu 20.04.6 LTS
\bullet Python Version: 3.10.9
\bullet Paddle Version: 2.4.2
\bullet PGL Version: 2.2.5
\bullet RDKit Version: 2023.3.2