Search | arXiv e-print repository

arXiv:2004.01215 [pdf, other]

CogMol: Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models

Authors: Vijil Chenthamarakshan, Payel Das, Samuel C. Hoffman, Hendrik Strobelt, Inkit Padhi, Kar Wai Lim, Benjamin Hoover, Matteo Manica, Jannis Born, Teodoro Laino, Aleksandra Mojsilovic

Abstract: The novel nature of SARS-CoV-2 calls for the development of efficient de novo drug design approaches. In this study, we propose an end-to-end framework, named CogMol (Controlled Generation of Molecules), for designing new drug-like small molecules targeting novel viral proteins with high affinity and off-target selectivity. CogMol combines adaptive pre-training of a molecular SMILES Variational Au… ▽ More The novel nature of SARS-CoV-2 calls for the development of efficient de novo drug design approaches. In this study, we propose an end-to-end framework, named CogMol (Controlled Generation of Molecules), for designing new drug-like small molecules targeting novel viral proteins with high affinity and off-target selectivity. CogMol combines adaptive pre-training of a molecular SMILES Variational Autoencoder (VAE) and an efficient multi-attribute controlled sampling scheme that uses guidance from attribute predictors trained on latent features. To generate novel and optimal drug-like molecules for unseen viral targets, CogMol leverages a protein-molecule binding affinity predictor that is trained using SMILES VAE embeddings and protein sequence embeddings learned unsupervised from a large corpus. CogMol framework is applied to three SARS-CoV-2 target proteins: main protease, receptor-binding domain of the spike protein, and non-structural protein 9 replicase. The generated candidates are novel at both molecular and chemical scaffold levels when compared to the training data. CogMol also includes insilico screening for assessing toxicity of parent molecules and their metabolites with a multi-task toxicity classifier, synthetic feasibility with a chemical retrosynthesis predictor, and target structure binding with docking simulations. Docking reveals favorable binding of generated molecules to the target protein structure, where 87-95 % of high affinity molecules showed docking free energy < -6 kcal/mol. When compared to approved drugs, the majority of designed compounds show low parent molecule and metabolite toxicity and high synthetic feasibility. In summary, CogMol handles multi-constraint design of synthesizable, low-toxic, drug-like molecules with high target specificity and selectivity, and does not need target-dependent fine-tuning of the framework or target structure information. △ Less

Submitted 23 June, 2020; v1 submitted 2 April, 2020; originally announced April 2020.

arXiv:1903.06661 [pdf, other]

GEE: A Gradient-based Explainable Variational Autoencoder for Network Anomaly Detection

Authors: Quoc Phong Nguyen, Kar Wai Lim, Dinil Mon Divakaran, Kian Hsiang Low, Mun Choon Chan

Abstract: This paper looks into the problem of detecting network anomalies by analyzing NetFlow records. While many previous works have used statistical models and machine learning techniques in a supervised way, such solutions have the limitations that they require large amount of labeled data for training and are unlikely to detect zero-day attacks. Existing anomaly detection solutions also do not provide… ▽ More This paper looks into the problem of detecting network anomalies by analyzing NetFlow records. While many previous works have used statistical models and machine learning techniques in a supervised way, such solutions have the limitations that they require large amount of labeled data for training and are unlikely to detect zero-day attacks. Existing anomaly detection solutions also do not provide an easy way to explain or identify attacks in the anomalous traffic. To address these limitations, we develop and present GEE, a framework for detecting and explaining anomalies in network traffic. GEE comprises of two components: (i) Variational Autoencoder (VAE) - an unsupervised deep-learning technique for detecting anomalies, and (ii) a gradient-based fingerprinting technique for explaining anomalies. Evaluation of GEE on the recent UGR dataset demonstrates that our approach is effective in detecting different anomalies as well as identifying fingerprints that are good representations of these various attacks. △ Less

Submitted 15 March, 2019; originally announced March 2019.

Comments: to appear in 2019 IEEE Conference on Communications and Network Security (CNS)

arXiv:1803.04654 [pdf, other]

Simulation and Calibration of a Fully Bayesian Marked Multidimensional Hawkes Process with Dissimilar Decays

Authors: Kar Wai Lim, Young Lee, Leif Hanlen, Hongbiao Zhao

Abstract: We propose a simulation method for multidimensional Hawkes processes based on superposition theory of point processes. This formulation allows us to design efficient simulations for Hawkes processes with differing exponentially decaying intensities. We demonstrate that inter-arrival times can be decomposed into simpler auxiliary variables that can be sampled directly, giving exact simulation with… ▽ More We propose a simulation method for multidimensional Hawkes processes based on superposition theory of point processes. This formulation allows us to design efficient simulations for Hawkes processes with differing exponentially decaying intensities. We demonstrate that inter-arrival times can be decomposed into simpler auxiliary variables that can be sampled directly, giving exact simulation with no approximation. We establish that the auxiliary variables provides information on the parent process for each event time. The algorithm correctness is shown by verifying the simulated intensities with their theoretical moments. A modular inference procedure consisting of Gibbs samplers through the auxiliary variable augmentation and adaptive rejection sampling is presented. Finally, we compare our proposed simulation method against existing methods, and find significant improvement in terms of algorithm speed. Our inference algorithm is used to discover the strengths of mutually excitations in real dark networks. △ Less

Submitted 13 March, 2018; originally announced March 2018.

Comments: 24 pages, long version of ACML paper with supplementary material

Journal ref: In Proceedings of the 6th Asian Conference on Machine Learning (ACML), pp. 238-253. 2016

arXiv:1609.06831 [pdf, other]

Hawkes Processes with Stochastic Excitations

Authors: Young Lee, Kar Wai Lim, Cheng Soon Ong

Abstract: We propose an extension to Hawkes processes by treating the levels of self-excitation as a stochastic differential equation. Our new point process allows better approximation in application domains where events and intensities accelerate each other with correlated levels of contagion. We generalize a recent algorithm for simulating draws from Hawkes processes whose levels of excitation are stochas… ▽ More We propose an extension to Hawkes processes by treating the levels of self-excitation as a stochastic differential equation. Our new point process allows better approximation in application domains where events and intensities accelerate each other with correlated levels of contagion. We generalize a recent algorithm for simulating draws from Hawkes processes whose levels of excitation are stochastic processes, and propose a hybrid Markov chain Monte Carlo approach for model fitting. Our sampling procedure scales linearly with the number of required events and does not require stationarity of the point process. A modular inference procedure consisting of a combination between Gibbs and Metropolis Hastings steps is put forward. We recover expectation maximization as a special case. Our general approach is illustrated for contagion following geometric Brownian motion and exponential Langevin dynamics. △ Less

Submitted 22 September, 2016; originally announced September 2016.

Comments: Copy of ICML paper

Journal ref: Proceedings of The 33rd International Conference on Machine Learning (ICML), pp. 79-88. JMLR. 2016

arXiv:1609.06826 [pdf, other]

Bibliographic Analysis with the Citation Network Topic Model

Authors: Kar Wai Lim, Wray Buntine

Abstract: Bibliographic analysis considers author's research areas, the citation network and paper content among other things. In this paper, we combine these three in a topic model that produces a bibliographic model of authors, topics and documents using a non-parametric extension of a combination of the Poisson mixed-topic link model and the author-topic model. We propose a novel and efficient inference… ▽ More Bibliographic analysis considers author's research areas, the citation network and paper content among other things. In this paper, we combine these three in a topic model that produces a bibliographic model of authors, topics and documents using a non-parametric extension of a combination of the Poisson mixed-topic link model and the author-topic model. We propose a novel and efficient inference algorithm for the model to explore subsets of research publications from CiteSeerX. Our model demonstrates improved performance in both model fitting and a clustering task compared to several baselines. △ Less

Submitted 22 September, 2016; originally announced September 2016.

Comments: A copy of ACML paper. arXiv admin note: substantial text overlap with arXiv:1609.06532

Journal ref: Proceedings of the Sixth Asian Conference on Machine Learning (ACML), pp. 142-158. JMLR. 2014

arXiv:1609.06783 [pdf, other]

doi 10.1016/j.ijar.2016.07.007

Nonparametric Bayesian Topic Modelling with the Hierarchical Pitman-Yor Processes

Authors: Kar Wai Lim, Wray Buntine, Changyou Chen, Lan Du

Abstract: The Dirichlet process and its extension, the Pitman-Yor process, are stochastic processes that take probability distributions as a parameter. These processes can be stacked up to form a hierarchical nonparametric Bayesian model. In this article, we present efficient methods for the use of these processes in this hierarchical context, and apply them to latent variable models for text analytics. In… ▽ More The Dirichlet process and its extension, the Pitman-Yor process, are stochastic processes that take probability distributions as a parameter. These processes can be stacked up to form a hierarchical nonparametric Bayesian model. In this article, we present efficient methods for the use of these processes in this hierarchical context, and apply them to latent variable models for text analytics. In particular, we propose a general framework for designing these Bayesian models, which are called topic models in the computer science community. We then propose a specific nonparametric Bayesian topic model for modelling text from social media. We focus on tweets (posts on Twitter) in this article due to their ease of access. We find that our nonparametric model performs better than existing parametric models in both goodness of fit and real world applications. △ Less

Submitted 21 September, 2016; originally announced September 2016.

Comments: Preprint for International Journal of Approximate Reasoning

Journal ref: International Journal of Approximate Reasoning, Volume 78, pp. 172-191. Elsevier. 2016

arXiv:1609.06532 [pdf, other]

doi 10.1007/s10994-016-5554-z

Bibliographic Analysis on Research Publications using Authors, Categorical Labels and the Citation Network

Authors: Kar Wai Lim, Wray Buntine

Abstract: Bibliographic analysis considers the author's research areas, the citation network and the paper content among other things. In this paper, we combine these three in a topic model that produces a bibliographic model of authors, topics and documents, using a nonparametric extension of a combination of the Poisson mixed-topic link model and the author-topic model. This gives rise to the Citation Net… ▽ More Bibliographic analysis considers the author's research areas, the citation network and the paper content among other things. In this paper, we combine these three in a topic model that produces a bibliographic model of authors, topics and documents, using a nonparametric extension of a combination of the Poisson mixed-topic link model and the author-topic model. This gives rise to the Citation Network Topic Model (CNTM). We propose a novel and efficient inference algorithm for the CNTM to explore subsets of research publications from CiteSeerX. The publication datasets are organised into three corpora, totalling to about 168k publications with about 62k authors. The queried datasets are made available online. In three publicly available corpora in addition to the queried datasets, our proposed model demonstrates an improved performance in both model fitting and document clustering, compared to several baselines. Moreover, our model allows extraction of additional useful knowledge from the corpora, such as the visualisation of the author-topics network. Additionally, we propose a simple method to incorporate supervision into topic modelling to achieve further improvement on the clustering task. △ Less

Submitted 21 September, 2016; originally announced September 2016.

Comments: Preprint for Journal Machine Learning

Journal ref: Machine Learning 103(2):185-213, 2016

Showing 1–7 of 7 results for author: Lim, K W