Search | arXiv e-print repository

doi 10.1111/2041-210X.14229

Fossil Image Identification using Deep Learning Ensembles of Data Augmented Multiviews

Authors: Chengbin Hou, Xinyu Lin, Hanhui Huang, Sheng Xu, Junxuan Fan, Yukun Shi, Hairong Lv

Abstract: Identification of fossil species is crucial to evolutionary studies. Recent advances from deep learning have shown promising prospects in fossil image identification. However, the quantity and quality of labeled fossil images are often limited due to fossil preservation, conditioned sampling, and expensive and inconsistent label annotation by domain experts, which pose great challenges to training… ▽ More Identification of fossil species is crucial to evolutionary studies. Recent advances from deep learning have shown promising prospects in fossil image identification. However, the quantity and quality of labeled fossil images are often limited due to fossil preservation, conditioned sampling, and expensive and inconsistent label annotation by domain experts, which pose great challenges to training deep learning based image classification models. To address these challenges, we follow the idea of the wisdom of crowds and propose a multiview ensemble framework, which collects Original (O), Gray (G), and Skeleton (S) views of each fossil image reflecting its different characteristics to train multiple base models, and then makes the final decision via soft voting. Experiments on the largest fusulinid dataset with 2400 images show that the proposed OGS consistently outperforms baselines (using a single model for each view), and obtains superior or comparable performance compared to OOO (using three base models for three the same Original views). Besides, as the training data decreases, the proposed framework achieves more gains. While considering the identification consistency estimation with respect to human experts, OGS receives the highest agreement with the original labels of dataset and with the re-identifications of two human experts. The validation performance provides a quantitative estimation of consistency across different experts and genera. We conclude that the proposed framework can present state-of-the-art performance in the fusulinid fossil identification case study. This framework is designed for general fossil identification and it is expected to see applications to other fossil datasets in future work. The source code is publicly available at https://github.com/houchengbin/Fossil-Image-Identification to benefit future research in fossil image identification. △ Less

Submitted 1 February, 2024; v1 submitted 15 February, 2023; originally announced February 2023.

Comments: published in Methods in Ecology and Evolution

Journal ref: Methods in Ecology and Evolution, 14, 3020-3034 (2023)

arXiv:2110.01866 [pdf, other]

doi 10.1016/j.physrep.2021.10.005

Social physics

Authors: Marko Jusup, Petter Holme, Kiyoshi Kanazawa, Misako Takayasu, Ivan Romic, Zhen Wang, Suncana Gecek, Tomislav Lipic, Boris Podobnik, Lin Wang, Wei Luo, Tin Klanjscek, **gfang Fan, Stefano Boccaletti, Matjaz Perc

Abstract: Recent decades have seen a rise in the use of physics methods to study different societal phenomena. This development has been due to physicists venturing outside of their traditional domains of interest, but also due to scientists from other disciplines taking from physics the methods that have proven so successful throughout the 19th and the 20th century. Here we dub this field 'social physics'… ▽ More Recent decades have seen a rise in the use of physics methods to study different societal phenomena. This development has been due to physicists venturing outside of their traditional domains of interest, but also due to scientists from other disciplines taking from physics the methods that have proven so successful throughout the 19th and the 20th century. Here we dub this field 'social physics' and pay our respect to intellectual mavericks who nurtured it to maturity. We do so by reviewing the current state of the art. Starting with a set of topics that are at the heart of modern human societies, we review research dedicated to urban development and traffic, the functioning of financial markets, cooperation as the basis for our evolutionary success, the structure of social networks, and the integration of intelligent machines into these networks. We then shift our attention to a set of topics that explore potential threats to society. These include criminal behaviour, large-scale migrations, epidemics, environmental challenges, and climate change. We end the coverage of each topic with promising directions for future research. Based on this, we conclude that the future for social physics is bright. Physicists studying societal phenomena are no longer a curiosity, but rather a force to be reckoned with. Notwithstanding, it remains of the utmost importance that we continue to foster constructive dialogue and mutual respect at the interfaces of different scientific disciplines. △ Less

Submitted 11 January, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

Comments: 359 pages, 78 figures; published in Physics Reports

Journal ref: Phys. Rep. 948, 1-148 (2022)

arXiv:2108.04682 [pdf, other]

ChemiRise: a data-driven retrosynthesis engine

Authors: Xiangyan Sun, Ke Liu, Yuquan Lin, Lingjie Wu, Haoming Xing, Minghong Gao, Ji Liu, Suocheng Tan, Zekun Ni, Qi Han, Junqiu Wu, Jie Fan

Abstract: We have developed an end-to-end, retrosynthesis system, named ChemiRise, that can propose complete retrosynthesis routes for organic compounds rapidly and reliably. The system was trained on a processed patent database of over 3 million organic reactions. Experimental reactions were atom-mapped, clustered, and extracted into reaction templates. We then trained a graph convolutional neural network-… ▽ More We have developed an end-to-end, retrosynthesis system, named ChemiRise, that can propose complete retrosynthesis routes for organic compounds rapidly and reliably. The system was trained on a processed patent database of over 3 million organic reactions. Experimental reactions were atom-mapped, clustered, and extracted into reaction templates. We then trained a graph convolutional neural network-based one-step reaction proposer using template embeddings and developed a guiding algorithm on the directed acyclic graph (DAG) of chemical compounds to find the best candidate to explore. The atom-map** algorithm and the one-step reaction proposer were benchmarked against previous studies and showed better results. The final product was demonstrated by retrosynthesis routes reviewed and rated by human experts, showing satisfying functionality and a potential productivity boost in real-life use cases. △ Less

Submitted 9 August, 2021; originally announced August 2021.

arXiv:2104.08961 [pdf, other]

A cross-study analysis of drug response prediction in cancer cell lines

Authors: Fangfang Xia, Jonathan Allen, Prasanna Balaprakash, Thomas Brettin, Cristina Garcia-Cardona, Austin Clyde, Judith Cohn, James Doroshow, Xiaotian Duan, Veronika Dubinkina, Yvonne Evrard, Ya Ju Fan, Jason Gans, Stewart He, Pinyi Lu, Sergei Maslov, Alexander Partin, Maulik Shukla, Eric Stahlberg, Justin M. Wozniak, Hyunseung Yoo, George Zaki, Yitan Zhu, Rick Stevens

Abstract: To enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross validation within a single study to assess model accuracy. While an essential first step, cross validation within a biological data set typically provides an overly optimistic estimat… ▽ More To enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross validation within a single study to assess model accuracy. While an essential first step, cross validation within a biological data set typically provides an overly optimistic estimate of the prediction performance on independent test sets. To provide a more rigorous assessment of model generalizability between different studies, we use machine learning to analyze five publicly available cell line-based data sets: NCI60, CTRP, GDSC, CCLE and gCSI. Based on observed experimental variability across studies, we explore estimates of prediction upper bounds. We report performance results of a variety of machine learning models, with a multitasking deep neural network achieving the best cross-study generalizability. By multiple measures, models trained on CTRP yield the most accurate predictions on the remaining testing data, and gCSI is the most predictable among the cell line data sets included in this study. With these experiments and further simulations on partial data, two lessons emerge: (1) differences in viability assays can limit model generalizability across studies, and (2) drug diversity, more than tumor diversity, is crucial for raising model generalizability in preclinical screening. △ Less

Submitted 13 August, 2021; v1 submitted 18 April, 2021; originally announced April 2021.

Comments: Accepted by Briefings in Bioinformatics

arXiv:2103.04162 [pdf, other]

Molecular modeling with machine-learned universal potential functions

Authors: Ke Liu, Zekun Ni, Zhenyu Zhou, Suocheng Tan, Xun Zou, Haoming Xing, Xiangyan Sun, Qi Han, Junqiu Wu, Jie Fan

Abstract: Molecular modeling is an important topic in drug discovery. Decades of research have led to the development of high quality scalable molecular force fields. In this paper, we show that neural networks can be used to train a universal approximator for energy potential functions. By incorporating a fully automated training process we have been able to train smooth, differentiable, and predictive pot… ▽ More Molecular modeling is an important topic in drug discovery. Decades of research have led to the development of high quality scalable molecular force fields. In this paper, we show that neural networks can be used to train a universal approximator for energy potential functions. By incorporating a fully automated training process we have been able to train smooth, differentiable, and predictive potential functions on large-scale crystal structures. A variety of tests have also been performed to show the superiority and versatility of the machine-learned model. △ Less

Submitted 19 April, 2021; v1 submitted 6 March, 2021; originally announced March 2021.

arXiv:2102.05236 [pdf, other]

A General Framework for Revealing Human Mind with auto-encoding GANs

Authors: Pan Wang, Rui Zhou, Shuo Wang, Ling Li, Wenjia Bai, Jialu Fan, Chunlin Li, Peter Childs, Yike Guo

Abstract: Addressing the question of visualising human mind could help us to find regions that are associated with observed cognition and responsible for expressing the elusive mental image, leading to a better understanding of cognitive function. The traditional approach treats brain decoding as a classification problem, reading the mind through statistical analysis of brain activity. However, human though… ▽ More Addressing the question of visualising human mind could help us to find regions that are associated with observed cognition and responsible for expressing the elusive mental image, leading to a better understanding of cognitive function. The traditional approach treats brain decoding as a classification problem, reading the mind through statistical analysis of brain activity. However, human thought is rich and varied, that it is often influenced by more of a combination of object features than a specific type of category. For this reason, we propose an end-to-end brain decoding framework which translates brain activity into an image by latent space alignment. To find the correspondence from brain signal features to image features, we embedded them into two latent spaces with modality-specific encoders and then aligned the two spaces by minimising the distance between paired latent representations. The proposed framework was trained by simultaneous electroencephalogram and functional MRI data, which were recorded when the subjects were viewing or imagining a set of image stimuli. In this paper, we focused on implementing the fMRI experiment. Our experimental results demonstrated the feasibility of translating brain activity to an image. The reconstructed image matches image stimuli approximate in both shape and colour. Our framework provides a promising direction for building a direct visualisation to reveal human mind. △ Less

Submitted 9 February, 2021; originally announced February 2021.

arXiv:1906.02418 [pdf]

doi 10.1021/acsomega.9b01997

OnionNet: a multiple-layer inter-molecular contact based convolutional neural network for protein-ligand binding affinity prediction

Authors: Liangzhen Zheng, **grong Fan, Yuguang Mu

Abstract: Computational drug discovery provides an efficient tool hel** large scale lead molecules screening. One of the major tasks of lead discovery is identifying molecules with promising binding affinities towards a target, a protein in general. The accuracies of current scoring functions which are used to predict the binding affinity are not satisfactory enough. Thus, machine learning (ML) or deep le… ▽ More Computational drug discovery provides an efficient tool hel** large scale lead molecules screening. One of the major tasks of lead discovery is identifying molecules with promising binding affinities towards a target, a protein in general. The accuracies of current scoring functions which are used to predict the binding affinity are not satisfactory enough. Thus, machine learning (ML) or deep learning (DL) based methods have been developed recently to improve the scoring functions. In this study, a deep convolutional neural network (CNN) model (called OnionNet) is introduced and the features are based on rotation-free element-pair specific contacts between ligands and protein atoms, and the contacts were further grouped in different distance ranges to cover both the local and non-local interaction information between the ligand and the protein. The prediction power of the model is evaluated and compared with other scoring functions using the comparative assessment of scoring functions (CASF-2013) benchmark and the v2016 core set of PDBbind database. When compared to a previous CNN-based scoring function, our model shows improvements of 0.08 and 0.16 in the correlations (R) and standard deviations (SD) of regression, respectively, between the predicted binding affinities and the experimental measured binding affinities. The robustness of the model is further explored by predicting the binding affinities of the complexes generated from docking simulations instead of experimentally determined PDB structures. △ Less

Submitted 6 June, 2019; originally announced June 2019.

Comments: 29 pages, 6 figures

arXiv:1903.04921 [pdf]

Biomechanics of Collective Cell Migration in Cancer Progression -- Experimental and Computational Methods

Authors: Catalina-Paula Spatarelu, Hao Zhang, Dung Trung Nguyen, Xinyue Han, Ruchuan Liu, Qiaohang Guo, Jacob Notbohm, **g Fan, Liyu Liu, Zi Chen

Abstract: Cell migration is essential for regulating many biological processes in physiological or pathological conditions, including embryonic development and cancer invasion. In vitro and in silico studies suggest that collective cell migration is associated with some biomechanical particularities, such as restructuring of extracellular matrix, stress and force distribution profiles, and reorganization of… ▽ More Cell migration is essential for regulating many biological processes in physiological or pathological conditions, including embryonic development and cancer invasion. In vitro and in silico studies suggest that collective cell migration is associated with some biomechanical particularities, such as restructuring of extracellular matrix, stress and force distribution profiles, and reorganization of cytoskeleton. Therefore, the phenomenon could be understood by an in-depth study of cells' behavior determinants, including but not limited to mechanical cues from the environment and from fellow travelers. This review article aims to cover the recent development of experimental and computational methods for studying the biomechanics of collective cell migration during cancer progression and invasion. We also summarized the tested hypotheses regarding the mechanism underlying collective cell migration enabled by these methods. Together, the paper enables a broad overview on the methods and tools currently available to unravel the biophysical mechanisms pertinent to cell collective migration, as well as providing perspectives on future development towards eventually deciphering the key mechanisms behind the most lethal feature of cancer. △ Less

Submitted 12 March, 2019; originally announced March 2019.

arXiv:1803.06236 [pdf]

Chemi-net: a graph convolutional network for accurate drug property prediction

Authors: Ke Liu, Xiangyan Sun, Lei Jia, Jun Ma, Haoming Xing, Junqiu Wu, Hua Gao, Yax Sun, Florian Boulnois, Jie Fan

Abstract: Absorption, distribution, metabolism, and excretion (ADME) studies are critical for drug discovery. Conventionally, these tasks, together with other chemical property predictions, rely on domain-specific feature descriptors, or fingerprints. Following the recent success of neural networks, we developed Chemi-Net, a completely data-driven, domain knowledge-free, deep learning method for ADME proper… ▽ More Absorption, distribution, metabolism, and excretion (ADME) studies are critical for drug discovery. Conventionally, these tasks, together with other chemical property predictions, rely on domain-specific feature descriptors, or fingerprints. Following the recent success of neural networks, we developed Chemi-Net, a completely data-driven, domain knowledge-free, deep learning method for ADME property prediction. To compare the relative performance of Chemi-Net with Cubist, one of the popular machine learning programs used by Amgen, a large-scale ADME property prediction study was performed on-site at Amgen. The results showed that our deep neural network method improved current methods by a large margin. We foresee that the significantly increased accuracy of ADME prediction seen with Chemi-Net over Cubist will greatly accelerate drug discovery. △ Less

Submitted 21 March, 2018; v1 submitted 16 March, 2018; originally announced March 2018.

arXiv:1707.08381 [pdf]

Prediction of amino acid side chain conformation using a deep neural network

Authors: Ke Liu, Xiangyan Sun, Jun Ma, Zhenyu Zhou, Qilin Dong, Shengwen Peng, Junqiu Wu, Suocheng Tan, Günter Blobel, Jie Fan

Abstract: A deep neural network based architecture was constructed to predict amino acid side chain conformation with unprecedented accuracy. Amino acid side chain conformation prediction is essential for protein homology modeling and protein design. Current widely-adopted methods use physics-based energy functions to evaluate side chain conformation. Here, using a deep neural network architecture without p… ▽ More A deep neural network based architecture was constructed to predict amino acid side chain conformation with unprecedented accuracy. Amino acid side chain conformation prediction is essential for protein homology modeling and protein design. Current widely-adopted methods use physics-based energy functions to evaluate side chain conformation. Here, using a deep neural network architecture without physics-based assumptions, we have demonstrated that side chain conformation prediction accuracy can be improved by more than 25%, especially for aromatic residues compared with current standard methods. More strikingly, the prediction method presented here is robust enough to identify individual conformational outliers from high resolution structures in a protein data bank without providing its structural factors. We envisage that our amino acid side chain predictor could be used as a quality check step for future protein structure model validation and many other potential applications such as side chain assignment in Cryo-electron microscopy, crystallography model auto-building, protein folding and small molecule ligand docking. △ Less

Submitted 26 July, 2017; originally announced July 2017.

Showing 1–10 of 10 results for author: Fan, J