Search | arXiv e-print repository

Representing Molecules as Random Walks Over Interpretable Grammars

Authors: Michael Sun, Minghao Guo, Weize Yuan, Veronika Thost, Crystal Elaine Owens, Aristotle Franklin Grosz, Sharvaa Selvan, Katelyn Zhou, Hassan Mohiuddin, Benjamin J Pedretti, Zachary P Smith, Jie Chen, Wojciech Matusik

Abstract: Recent research in molecular discovery has primarily been devoted to small, drug-like molecules, leaving many similarly important applications in material design without adequate technology. These applications often rely on more complex molecular structures with fewer examples that are carefully designed using known substructures. We propose a data-efficient and interpretable model for representin… ▽ More Recent research in molecular discovery has primarily been devoted to small, drug-like molecules, leaving many similarly important applications in material design without adequate technology. These applications often rely on more complex molecular structures with fewer examples that are carefully designed using known substructures. We propose a data-efficient and interpretable model for representing and reasoning over such molecules in terms of graph grammars that explicitly describe the hierarchical design space featuring motifs to be the design basis. We present a novel representation in the form of random walks over the design space, which facilitates both molecule generation and property prediction. We demonstrate clear advantages over existing methods in terms of performance, efficiency, and synthesizability of predicted molecules, and we provide detailed insights into the method's chemical interpretability. △ Less

Submitted 2 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2306.09111 [pdf, other]

Enhanced Sampling with Machine Learning: A Review

Authors: Shams Mehdi, Zachary Smith, Lukas Herron, Ziyue Zou, Pratyush Tiwary

Abstract: Molecular dynamics (MD) enables the study of physical systems with excellent spatiotemporal resolution but suffers from severe time-scale limitations. To address this, enhanced sampling methods have been developed to improve exploration of configurational space. However, implementing these is challenging and requires domain expertise. In recent years, integration of machine learning (ML) technique… ▽ More Molecular dynamics (MD) enables the study of physical systems with excellent spatiotemporal resolution but suffers from severe time-scale limitations. To address this, enhanced sampling methods have been developed to improve exploration of configurational space. However, implementing these is challenging and requires domain expertise. In recent years, integration of machine learning (ML) techniques in different domains has shown promise, prompting their adoption in enhanced sampling as well. Although ML is often employed in various fields primarily due to its data-driven nature, its integration with enhanced sampling is more natural with many common underlying synergies. This review explores the merging of ML and enhanced MD by presenting different shared viewpoints. It offers a comprehensive overview of this rapidly evolving field, which can be difficult to stay updated on. We highlight successful strategies like dimensionality reduction, reinforcement learning, and flow-based methods. Finally, we discuss open problems at the exciting ML-enhanced MD interface. △ Less

Submitted 16 June, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: Submitted as invited article to Annual Review of Physical Chemistry vol 75; updated formatting issues

arXiv:2201.08686 [pdf, other]

Modelling Agent-Skip** Attacks in Message Forwarding Protocols

Authors: Zach Smith, Hugo Jonker, Sjouke Mauw, Hyunwoo Lee

Abstract: Message forwarding protocols are protocols in which a chain of agents handles transmission of a message. Each agent forwards the received message to the next agent in the chain. For example, TLS middleboxes act as intermediary agents in TLS, adding functionality such as filtering or compressing data. In such protocols, an attacker may attempt to bypass one or more intermediary agents. Such an agen… ▽ More Message forwarding protocols are protocols in which a chain of agents handles transmission of a message. Each agent forwards the received message to the next agent in the chain. For example, TLS middleboxes act as intermediary agents in TLS, adding functionality such as filtering or compressing data. In such protocols, an attacker may attempt to bypass one or more intermediary agents. Such an agent-skip** attack can the violate security requirements of the protocol. Using the multiset rewriting model in the symbolic setting, we construct a comprehensive framework of such path protocols. In particular, we introduce a set of security goals related to path integrity: the notion that a message faithfully travels through participants in the order intended by the initiating agent. We perform a security analysis of several such protocols, highlighting key attacks on modern protocols. △ Less

Submitted 21 January, 2022; originally announced January 2022.

arXiv:2110.03136 [pdf, other]

The Gromov-Hausdorff distance between ultrametric spaces: its structure and computation

Authors: Facundo Mémoli, Zane Smith, Zhengchao Wan

Abstract: The Gromov-Hausdorff distance ($d_\mathrm{GH}$) provides a natural way of quantifying the dissimilarity between two given metric spaces. It is known that computing $d_\mathrm{GH}$ between two finite metric spaces is NP-hard, even in the case of finite ultrametric spaces which are highly structured metric spaces in the sense that they satisfy the so-called \emph{strong triangle inequality}. Ultrame… ▽ More The Gromov-Hausdorff distance ($d_\mathrm{GH}$) provides a natural way of quantifying the dissimilarity between two given metric spaces. It is known that computing $d_\mathrm{GH}$ between two finite metric spaces is NP-hard, even in the case of finite ultrametric spaces which are highly structured metric spaces in the sense that they satisfy the so-called \emph{strong triangle inequality}. Ultrametric spaces naturally arise in many applications such as hierarchical clustering, phylogenetics, genomics, and even linguistics. By exploiting the special structures of ultrametric spaces, (1) we identify a one parameter family $\{d_\mathrm{GH}^{(p)}\}_{p\in[1,\infty]}$ of distances defined in a flavor similar to the Gromov-Hausdorff distance on the collection of finite ultrametric spaces, and in particular $d_\mathrm{GH}^{(1)} =d_\mathrm{GH}$. The extreme case when $p=\infty$, which we also denote by $u_\mathrm{GH}$, turns out to be an ultrametric on the collection of ultrametric spaces. Whereas for all $p\in[1,\infty)$, $d_\mathrm{GH}^{(p)}$ yields NP-hard problems, we prove that surprisingly $u_\mathrm{GH}$ can be computed in polynomial time. The proof is based on a structural theorem for $u_\mathrm{GH}$ established in this paper; (2) inspired by the structural theorem for $u_\mathrm{GH}$, and by carefully leveraging properties of ultrametric spaces, we also establish a structural theorem for $d_\mathrm{GH}$ when restricted to ultrametric spaces. This structural theorem allows us to identify special families of ultrametric spaces on which $d_\mathrm{GH}$ is computationally tractable. These families are determined by properties related to the doubling constant of metric space. Based on these families, we devise a fixed-parameter tractable (FPT) algorithm for computing the exact value of $d_\mathrm{GH}$ between ultrametric spaces. We believe ours is the first such algorithm to be identified. △ Less

Submitted 6 October, 2021; originally announced October 2021.

arXiv:2011.00317 [pdf, other]

Capture times in the Bridge-burning Cops and Robbers game

Authors: Rebekah Herrman, Peter van Hintum, Stephen G. Z. Smith

Abstract: In this paper, we consider a variant of the cops and robbers game on a graph, introduced by Kinnersley and Peterson, in which every time the robber uses an edge, it is removed from the graph, known as bridge-burning cops and robbers. In particular, we study the maximum time it takes the cops to capture the robber. In this paper, we consider a variant of the cops and robbers game on a graph, introduced by Kinnersley and Peterson, in which every time the robber uses an edge, it is removed from the graph, known as bridge-burning cops and robbers. In particular, we study the maximum time it takes the cops to capture the robber. △ Less

Submitted 31 October, 2020; originally announced November 2020.

Comments: 16 pages, 3 figures

MSC Class: 05C57; 49N75; 91A24; 91A46; 91A05; 05C80

arXiv:1810.07793 [pdf, other]

The Wasserstein transform

Authors: Facundo Mémoli, Zane Smith, Zhengchao Wan

Abstract: We introduce the Wasserstein transform, a method for enhancing and denoising datasets defined on general metric spaces. The construction draws inspiration from Optimal Transportation ideas. We establish precise connections with the mean shift family of algorithms and establish the stability of both our method and mean shift under data perturbation. We introduce the Wasserstein transform, a method for enhancing and denoising datasets defined on general metric spaces. The construction draws inspiration from Optimal Transportation ideas. We establish precise connections with the mean shift family of algorithms and establish the stability of both our method and mean shift under data perturbation. △ Less

Submitted 17 October, 2018; originally announced October 2018.

arXiv:1701.07243 [pdf, other]

Decoding Epileptogenesis in a Reduced State Space

Authors: François G. Meyer, Alexander M. Benison, Zachariah Smith, Daniel S. Barth

Abstract: We describe here the recent results of a multidisciplinary effort to design a biomarker that can actively and continuously decode the progressive changes in neuronal organization leading to epilepsy, a process known as epileptogenesis. Using an animal model of acquired epilepsy, wechronically record hippocampal evoked potentials elicited by an auditory stimulus. Using a set of reduced coordinates,… ▽ More We describe here the recent results of a multidisciplinary effort to design a biomarker that can actively and continuously decode the progressive changes in neuronal organization leading to epilepsy, a process known as epileptogenesis. Using an animal model of acquired epilepsy, wechronically record hippocampal evoked potentials elicited by an auditory stimulus. Using a set of reduced coordinates, our algorithm can identify universal smooth low-dimensional configurations of the auditory evoked potentials that correspond to distinct stages of epileptogenesis. We use a hidden Markov model to learn the dynamics of the evoked potential, as it evolves along these smooth low-dimensional subsets. We provide experimental evidence that the biomarker is able to exploit subtle changes in the evoked potential to reliably decode the stage of epileptogenesis and predict whether an animal will eventually recover from the injury, or develop spontaneous seizures. △ Less

Submitted 25 January, 2017; originally announced January 2017.

Showing 1–7 of 7 results for author: Smith, Z