Search | arXiv e-print repository

Automated Inference of Graph Transformation Rules

Authors: Jakob L. Andersen, Akbar Davoodi, Rolf Fagerberg, Christoph Flamm, Walter Fontana, Juri Kolčák, Christophe V. F. P. Laurent, Daniel Merkle, Nikolai Nøjgaard

Abstract: The explosion of data available in life sciences is fueling an increasing demand for expressive models and computational methods. Graph transformation is a model for dynamic systems with a large variety of applications. We introduce a novel method of the graph transformation model construction, combining generative and dynamical viewpoints to give a fully automated data-driven model inference meth… ▽ More The explosion of data available in life sciences is fueling an increasing demand for expressive models and computational methods. Graph transformation is a model for dynamic systems with a large variety of applications. We introduce a novel method of the graph transformation model construction, combining generative and dynamical viewpoints to give a fully automated data-driven model inference method. The method takes the input dynamical properties, given as a "snapshot" of the dynamics encoded by explicit transitions, and constructs a compatible model. The obtained model is guaranteed to be minimal, thus framing the approach as model compression (from a set of transitions into a set of rules). The compression is permissive to a lossy case, where the constructed model is allowed to exhibit behavior outside of the input transitions, thus suggesting a completion of the input dynamics. The task of graph transformation model inference is naturally highly challenging due to the combinatorics involved. We tackle the exponential explosion by proposing a heuristically minimal translation of the task into a well-established problem, set cover, for which highly optimized solutions exist. We further showcase how our results relate to Kolmogorov complexity expressed in terms of graph transformation. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: Preprint

arXiv:2309.10629 [pdf, ps, other]

On the Realisability of Chemical Pathways

Authors: Jakob L. Andersen, Sissel Banke, Rolf Fagerberg, Christoph Flamm, Daniel Merkle, Peter F. Stadler

Abstract: The exploration of pathways and alternative pathways that have a specific function is of interest in numerous chemical contexts. A framework for specifying and searching for pathways has previously been developed, but a focus on which of the many pathway solutions are realisable, or can be made realisable, is missing. Realisable here means that there actually exists some sequencing of the reaction… ▽ More The exploration of pathways and alternative pathways that have a specific function is of interest in numerous chemical contexts. A framework for specifying and searching for pathways has previously been developed, but a focus on which of the many pathway solutions are realisable, or can be made realisable, is missing. Realisable here means that there actually exists some sequencing of the reactions of the pathway that will execute the pathway. We present a method for analysing the realisability of pathways based on the reachability question in Petri nets. For realisable pathways, our method also provides a certificate encoding an order of the reactions which realises the pathway. We present two extended notions of realisability of pathways, one of which is related to the concept of network catalysts. We exemplify our findings on the pentose phosphate pathway. Lastly, we discuss the relevance of our concepts for elucidating the choices often implicitly made when depicting pathways. △ Less

Submitted 19 September, 2023; originally announced September 2023.

Comments: Accepted in LNBI proceedings

arXiv:2308.12735 [pdf, other]

Reconciling Inconsistent Molecular Structures from Biochemical Databases

Authors: Casper Asbjørn Eriksen, Jakob Lykke Andersen, Rolf Fagerberg, Daniel Merkle

Abstract: Information on the structure of molecules, retrieved via biochemical databases, plays a pivotal role in various disciplines, such as metabolomics, systems biology, and drug discovery. However, no such database can be complete, and the chemical structure for a given compound is not necessarily consistent between databases. This paper presents StructRecon, a novel tool for resolving unique and corre… ▽ More Information on the structure of molecules, retrieved via biochemical databases, plays a pivotal role in various disciplines, such as metabolomics, systems biology, and drug discovery. However, no such database can be complete, and the chemical structure for a given compound is not necessarily consistent between databases. This paper presents StructRecon, a novel tool for resolving unique and correct molecular structures from database identifiers. StructRecon traverses the cross-links between database entries in different databases to construct what we call an identifier graph, which offers a more complete view of the total information available on a particular compound across all the databases. In order to reconcile discrepancies between databases, we first present an extensible model for chemical structure which supports multiple independent levels of detail, allowing standardisation of the structure to be applied iteratively. In some cases, our standardisation approach results in multiple structures for a given compound, in which case a random walk-based algorithm is used to select the most likely structure among incompatible alternates. We applied StructRecon to the EColiCore2 model, resolving a unique chemical structure for 85.11 % of identifiers. StructRecon is open-source and modular, which enables the potential support for more databases in the future. △ Less

Submitted 24 August, 2023; originally announced August 2023.

Comments: 14 pages, 4 figures, accepted at ISBRA 2023

arXiv:2201.04515 [pdf, other]

Representing catalytic mechanisms with rule composition

Authors: Jakob L. Andersen, Rolf Fagerberg, Christoph Flamm, Walter Fontana, Juri Kolčák, Christophe V. F. P. Laurent, Daniel Merkle, Nikolai Nøjgaard

Abstract: Reaction mechanisms are often presented as sequences of elementary steps, such as codified by arrow pushing. We propose an approach for representing such mechanisms using graph transformation. In this framework, each elementary step is a rule for modifying a molecular graph and a mechanism is a sequence of such rules. To generate a compact representation of a multi-step reaction, we compose the ru… ▽ More Reaction mechanisms are often presented as sequences of elementary steps, such as codified by arrow pushing. We propose an approach for representing such mechanisms using graph transformation. In this framework, each elementary step is a rule for modifying a molecular graph and a mechanism is a sequence of such rules. To generate a compact representation of a multi-step reaction, we compose the rules of individual steps into a composite rule, providing a rigorous and fully automated approach to coarse-graining. While the composite rule retains the graphical conditions necessary for the execution of a mechanism, it also records information about transient changes not visible by comparing educts and products. By projecting the rule onto a single "overlay graph", we generalize Fujita's idea of an Imaginary Transition Structure from elementary reactions to composite reactions. The utility of the overlay graph construct is exemplified in the context of enzyme-catalyzed reactions. In a first application, we exploit mechanistic information in the Mechanism and Catalytic Site Atlas to construct overlay graphs of hydrolase reactions listed in the database. These graphs point at a spectrum of catalytic entanglement of enzyme and substrate, de-emphasizing the notion of a singular catalyst in favor of a collection of catalytic sites that can be distributed across enzyme and substrate. In a second application, we deploy composite rules to search the Rhea database for reactions of known or unknown mechanism that are, in principle, compatible with the mechanisms implied by the composite rules. We believe this work adds to the utility of graph-transformation formalisms in representing and reasoning about chemistry in an automated yet insightful fashion. △ Less

Submitted 25 August, 2022; v1 submitted 12 January, 2022; originally announced January 2022.

Comments: Preprint

arXiv:2201.04360 [pdf, other]

Efficient Modular Graph Transformation Rule Application

Authors: Jakob L. Andersen, Rolf Fagerberg, Juri Kolčák, Christophe V. F. P. Laurent, Daniel Merkle, Nikolai Nøjgaard

Abstract: Graph transformation formalisms have proven to be suitable tools for the modelling of chemical reactions. They are well established in theoretical studies and increasingly also in practical applications in chemistry. The latter is made feasible via the development of programming frameworks which makes the formalisms executable. The application of such frameworks to large networks of chemical rea… ▽ More Graph transformation formalisms have proven to be suitable tools for the modelling of chemical reactions. They are well established in theoretical studies and increasingly also in practical applications in chemistry. The latter is made feasible via the development of programming frameworks which makes the formalisms executable. The application of such frameworks to large networks of chemical reactions, however, poses unique computational challenges. One such characteristic is the inherent combinatorial nature of the graphs involved. The graphs consist of many connected components, representing individual molecules. While the existing methods for implementing graph transformations can be applied to such graphs, the combinatorics of constructing graph matches quickly becomes a computational bottleneck as the size of the chemical reaction network grows. In this contribution, we develop a new method of enumerating graph matches during graph transformation rule application. The method is designed to improve performance in such scenarios and is based on constructing graph matches in an iterative, component-wise fashion which allows redundant applications to be detected early and pruned. We further extend the algorithm with an efficient heuristic based on local symmetries of the graphs, which allow us to detect and discard isomorphic applications early. Finally, we conduct chemical network generation experiments on real-life as well as synthetic data and compare against the state-of-the-art algorithm in the field. △ Less

Submitted 25 August, 2022; v1 submitted 12 January, 2022; originally announced January 2022.

Comments: preprint

arXiv:2108.04077 [pdf, other]

doi 10.1089/CMB.2020.0548

Cayley Graphs of Semigroups Applied to Atom Tracking in Chemistry

Authors: Nikolai Nøjgaard, Walter Fontana, Marc Hellmuth, Daniel Merkle

Abstract: While atom tracking with isotope-labeled compounds is an essential and sophisticated wet-lab tool in order to, e.g., illuminate reaction mechanisms, there exists only a limited amount of formal methods to approach the problem. Specifically when large (bio-)chemical networks are considered where reactions are stereo-specific, rigorous techniques are inevitable. We present an approach using the righ… ▽ More While atom tracking with isotope-labeled compounds is an essential and sophisticated wet-lab tool in order to, e.g., illuminate reaction mechanisms, there exists only a limited amount of formal methods to approach the problem. Specifically when large (bio-)chemical networks are considered where reactions are stereo-specific, rigorous techniques are inevitable. We present an approach using the right Cayley graph of a monoid in order to track atoms concurrently through sequences of reactions and predict their potential location in product molecules. This can not only be used to systematically build hypothesis or reject reaction mechanisms (we will use the ANRORC mechanism "Addition of the Nucleophile, Ring Opening, and Ring Closure" as an example), but also to infer naturally occurring subsystems of (bio-)chemical systems. Our results include the analysis of the carbon traces within the TCA cycle and infer subsystems based on projections of the right Cayley graph onto a set of relevant atoms. △ Less

Submitted 9 August, 2021; originally announced August 2021.

arXiv:2107.03086 [pdf, other]

Defining Autocatalysis in Chemical Reaction Networks

Authors: Jakob L. Andersen, Christoph Flamm, Daniel Merkle, Peter F. Stadler

Abstract: Autocatalysis is a deceptively simple concept, referring to the situation that a chemical species $X$ catalyzes its own formation. From the perspective of chemical kinetics, autocatalysts show a regime of super-linear growth. Given a chemical reaction network, however, it is not at all straightforward to identify species that are autocatalytic in the sense that there is a sub-network that takes… ▽ More Autocatalysis is a deceptively simple concept, referring to the situation that a chemical species $X$ catalyzes its own formation. From the perspective of chemical kinetics, autocatalysts show a regime of super-linear growth. Given a chemical reaction network, however, it is not at all straightforward to identify species that are autocatalytic in the sense that there is a sub-network that takes $X$ as input and produces more than one copy of $X$ as output. The difficulty arises from the need to distinguish autocatalysis e.g. from the superposition of a cycle that consumes and produces equal amounts of $X$ and a pathway that produces $X$. To deal with this issue, a number of competing notions, such as exclusive autocatalysis and autocatalytic cycles, have been introduced. A closer inspection of concepts and their usage by different authors shows, however, that subtle differences in the definitions often makes conceptually matching ideas difficult to bring together formally. In this contribution we make some of the available approaches comparable by translating them into a common formal framework that uses integer hyperflows as a basis to study autocatalysis in large chemical reaction networks. As an application we investigate the prevalence of autocatalysis in metabolic networks. △ Less

Submitted 7 July, 2021; originally announced July 2021.

arXiv:2106.02573 [pdf, other]

Rewriting Theory for the Life Sciences: A Unifying Theory of CTMC Semantics (Long version)

Authors: Nicolas Behr, Jean Krivine, Jakob L. Andersen, Daniel Merkle

Abstract: The Kappa biochemistry and the MØD organic chemistry frameworks are amongst the most intensely developed applications of rewriting-based methods in the life sciences to date. A typical feature of these types of rewriting theories is the necessity to implement certain structural constraints on the objects to be rewritten (a protein is empirically found to have a certain signature of sites, a carbon… ▽ More The Kappa biochemistry and the MØD organic chemistry frameworks are amongst the most intensely developed applications of rewriting-based methods in the life sciences to date. A typical feature of these types of rewriting theories is the necessity to implement certain structural constraints on the objects to be rewritten (a protein is empirically found to have a certain signature of sites, a carbon atom can form at most four bonds, ...). In this paper, we contribute a number of original developments that permit to implement a universal theory of continuous-time Markov chains (CTMCs) for stochastic rewriting systems. Our core mathematical concepts are a novel rule algebra construction for the relevant setting of rewriting rules with conditions, both in Double- and in Sesqui-Pushout semantics, augmented by a suitable stochastic mechanics formalism extension that permits to derive dynamical evolution equations for pattern-counting statistics. A second main contribution of our paper is a novel framework of restricted rewriting theories, which comprises a rule-algebra calculus under the restriction to so-called constraint-preserving completions of application conditions (for rules considered to act only upon objects of the underlying category satisfying a globally fixed set of structural constraints). This novel framework in turn renders a faithful encoding of bio- and organo-chemical rewriting in the sense of Kappa and MØD possible, which allows us to derive a rewriting-based formulation of reaction systems including a full-fledged CTMC semantics as instances of our universal CTMC framework. While offering an interesting new perspective and conceptual simplification of this semantics in the setting of Kappa, both the formal encoding and the CTMC semantics of organo-chemical reaction systems as motivated by the MØD framework are the first such results of their kind. △ Less

Submitted 4 June, 2021; originally announced June 2021.

Comments: 62 pages; long version of arXiv:2003.09395

MSC Class: 16B50; 60J27; 68Q42 (Primary) 60J28; 16B50; 05E99 (Secondary) ACM Class: F.4.2; G.3; G.2.2

arXiv:2102.03292 [pdf, other]

Graph Transformation for Enzymatic Mechanisms

Authors: Jakob L. Andersen, Rolf Fagerberg, Christoph Flamm, Walter Fontana, Juraj Kolčák, Christophe V. F. P. Laurent, Daniel Merkle, Nikolai Nøjaard

Abstract: Motivation: The design of enzymes is as challenging as it is consequential for making chemical synthesis in medical and industrial applications more efficient, cost-effective and environmentally friendly. While several aspects of this complex problem are computationally assisted, the drafting of catalytic mechanisms, i.e. the specification of the chemical steps-and hence intermediate states-that t… ▽ More Motivation: The design of enzymes is as challenging as it is consequential for making chemical synthesis in medical and industrial applications more efficient, cost-effective and environmentally friendly. While several aspects of this complex problem are computationally assisted, the drafting of catalytic mechanisms, i.e. the specification of the chemical steps-and hence intermediate states-that the enzyme is meant to implement, is largely left to human expertise. The ability to capture specific chemistries of multi-step catalysis in a fashion that enables its computational construction and design is therefore highly desirable and would equally impact the elucidation of existing enzymatic reactions whose mechanisms are unknown. Results: We use the mathematical framework of graph transformation to express the distinction between rules and reactions in chemistry. We derive about 1000 rules for amino acid side chain chemistry from the M-CSA database, a curated repository of enzymatic mechanisms. Using graph transformation we are able to propose hundreds of hypothetical catalytic mechanisms for a large number of unrelated reactions in the Rhea database. We analyze these mechanisms to find that they combine in chemically sound fashion individual steps from a variety of known multi-step mechanisms, showing that plausible novel mechanisms for catalysis can be constructed computationally. △ Less

Submitted 26 March, 2021; v1 submitted 5 February, 2021; originally announced February 2021.

Comments: Preprint submitted to ISMB/ECCB 2021. Prototype implementation source code available at https://github.com/Nojgaard/mechsearch Live demo available at https://cheminf.imada.sdu.dk/mechsearch/ Supplementary material available at https://cheminf.imada.sdu.dk/preprints/ECCB-2021

arXiv:1911.00407 [pdf, other]

A Graph-Based Tool to Embed the π-Calculus into a Computational DPO Framework

Authors: Jakob Lykke Andersen, Marc Hellmuth, Daniel Merkle, Nikolai Nøjgaard, Marco Peressotti

Abstract: Graph transformation approaches have been successfully used to analyse and design chemical and biological systems. Here we build on top of a DPO framework, in which molecules are modelled as typed attributed graphs and chemical reactions are modelled as graph transformations. Edges and vertexes can be labelled with first-order terms, which can be used to encode, e.g., steric information of molecul… ▽ More Graph transformation approaches have been successfully used to analyse and design chemical and biological systems. Here we build on top of a DPO framework, in which molecules are modelled as typed attributed graphs and chemical reactions are modelled as graph transformations. Edges and vertexes can be labelled with first-order terms, which can be used to encode, e.g., steric information of molecules. While targeted to chemical settings, the computational framework is intended to be very generic and applicable to the exploration of arbitrary spaces derived via iterative application of rewrite rules, such as process calculi like Milner's π-calculus. To illustrate the generality of the framework, we introduce EpiM: a tool for computing execution spaces of π-calculus processes. EpiM encodes π-calculus processes as typed attributed graphs and then exploits the existing DPO framework to compute their dynamics in the form of graphs where nodes are π-calculus processes and edges are reduction steps. EpiM takes advantage of the graph-based representation and facilities offered by the framework, like efficient isomorphism checking to prune the space without resorting to explicit structural equivalences. EpiM is available as an online Python-based tool. △ Less

Submitted 29 October, 2019; originally announced November 2019.

arXiv:1712.02594 [pdf, other]

Chemical Transformation Motifs - Modelling Pathways as Integer Hyperflows

Authors: Jakob L. Andersen, Christoph Flamm, Daniel Merkle, Peter F. Stadler

Abstract: We present an elaborate framework for formally modelling pathways in chemical reaction networks on a mechanistic level. Networks are modelled mathematically as directed multi-hypergraphs, with vertices corresponding to molecules and hyperedges to reactions. Pathways are modelled as integer hyperflows and we expand the network model by detailed routing constraints. In contrast to the more tradition… ▽ More We present an elaborate framework for formally modelling pathways in chemical reaction networks on a mechanistic level. Networks are modelled mathematically as directed multi-hypergraphs, with vertices corresponding to molecules and hyperedges to reactions. Pathways are modelled as integer hyperflows and we expand the network model by detailed routing constraints. In contrast to the more traditional approaches like Flux Balance Analysis or Elementary Mode analysis we insist on integer-valued flows. While this choice makes it necessary to solve possibly hard integer linear programs, it has the advantage that more detailed mechanistic questions can be formulated. It is thus possible to query networks for general transformation motifs, and to automatically enumerate optimal and near-optimal pathways. Similarities and differences between our work and traditional approaches in metabolic network analysis are discussed in detail. To demonstrate the applicability of the mathematical framework to real-life problems we first explore the design space of possible non-oxidative glycolysis pathways and show that recent manually designed pathways can be further optimised. We then use a model of sugar chemistry to investigate pathways in the autocatalytic formose process. A graph transformation-based approach is used to automatically generate the reaction networks of interest. △ Less

Submitted 7 December, 2017; originally announced December 2017.

arXiv:1711.08289 [pdf, other]

A Generic Framework for Engineering Graph Canonization Algorithms

Authors: Jakob L. Andersen, Daniel Merkle

Abstract: The state-of-the-art tools for practical graph canonization are all based on the individualization-refinement paradigm, and their difference is primarily in the choice of heuristics they include and in the actual tool implementation. It is thus not possible to make a direct comparison of how individual algorithmic ideas affect the performance on different graph classes. We present an algorithmic… ▽ More The state-of-the-art tools for practical graph canonization are all based on the individualization-refinement paradigm, and their difference is primarily in the choice of heuristics they include and in the actual tool implementation. It is thus not possible to make a direct comparison of how individual algorithmic ideas affect the performance on different graph classes. We present an algorithmic software framework that facilitates implementation of heuristics as independent extensions to a common core algorithm. It therefore becomes easy to perform a detailed comparison of the performance and behaviour of different algorithmic ideas. Implementations are provided of a range of algorithms for tree traversal, target cell selection, and node invariant, including choices from the literature and new variations. The framework readily supports extraction and visualization of detailed data from separate algorithm executions for subsequent analysis and development of new heuristics. Using collections of different graph classes we investigate the effect of varying the selections of heuristics, often revealing exactly which individual algorithmic choice is responsible for particularly good or bad performance. On several benchmark collections, including a newly proposed class of difficult instances, we additionally find that our implementation performs better than the current state-of-the-art tools. △ Less

Submitted 22 November, 2017; originally announced November 2017.

arXiv:1711.00504 [pdf, other]

Partial Homology Relations - Satisfiability in terms of Di-Cographs

Authors: Nikolai Nøjgaard, Nadia El-Mabrouk, Daniel Merkle, Nikolas Wieseke, Marc Hellmuth

Abstract: Directed cographs (di-cographs) play a crucial role in the reconstruction of evolutionary histories of genes based on homology relations which are binary relations between genes. A variety of methods based on pairwise sequence comparisons can be used to infer such homology relations (e.g.\ orthology, paralogy, xenology). They are \emph{satisfiable} if the relations can be explained by an event-lab… ▽ More Directed cographs (di-cographs) play a crucial role in the reconstruction of evolutionary histories of genes based on homology relations which are binary relations between genes. A variety of methods based on pairwise sequence comparisons can be used to infer such homology relations (e.g.\ orthology, paralogy, xenology). They are \emph{satisfiable} if the relations can be explained by an event-labeled gene tree, i.e., they can simultaneously co-exist in an evolutionary history of the underlying genes. Every gene tree is equivalently interpreted as a so-called cotree that entirely encodes the structure of a di-cograph. Thus, satisfiable homology relations must necessarily form a di-cograph. The inferred homology relations might not cover each pair of genes and thus, provide only partial knowledge on the full set of homology relations. Moreover, for particular pairs of genes, it might be known with a high degree of certainty that they are not orthologs (resp.\ paralogs, xenologs) which yields forbidden pairs of genes. Motivated by this observation, we characterize (partial) satisfiable homology relations with or without forbidden gene pairs, provide a quadratic-time algorithm for their recognition and for the computation of a cotree that explains the given relations. △ Less

Submitted 3 May, 2018; v1 submitted 1 November, 2017; originally announced November 2017.

arXiv:1705.02179 [pdf, other]

Forbidden Time Travel: Characterization of Time-Consistent Tree Reconciliation Maps

Authors: Nikolai Nøjgaard, Manuela Geiß, Peter F. Stadler, Daniel Merkle, Nicolas Wieseke, Marc Hellmuth

Abstract: In the absence of horizontal gene transfer it is possible to reconstruct the history of gene families from empirically determined orthology relations, which are equivalent to event-labeled gene trees. Knowledge of the event labels considerably simplifies the problem of reconciling a gene tree T with a species trees S, relative to the reconciliation problem without prior knowledge of the event type… ▽ More In the absence of horizontal gene transfer it is possible to reconstruct the history of gene families from empirically determined orthology relations, which are equivalent to event-labeled gene trees. Knowledge of the event labels considerably simplifies the problem of reconciling a gene tree T with a species trees S, relative to the reconciliation problem without prior knowledge of the event types. It is well-known that optimal reconciliations in the unlabeled case may violate time-consistency and thus are not biologically feasible. Here we investigate the mathematical structure of the event labeled reconciliation problem with horizontal transfer. We investigate the issue of time-consistency for the event-labeled version of the reconciliation problem, provide a convenient axiomatic framework, and derive a complete characterization of time-consistent reconciliations. This characterization depends on certain weak conditions on the event-labeled gene trees that reflect conditions under which evolutionary events are observable at least in principle. We give an O(|V(T)|log(|V(S)|))-time algorithm to decide whether a time-consistent reconciliation map exists. It does not require the construction of explicit timing maps, but relies entirely on the comparably easy task of checking whether a small auxiliary graph is acyclic. The combinatorial characterization of time consistency and thus biologically feasible reconciliation is an important step towards the inference of gene family histories with horizontal transfer from orthology data, i.e., without presupposed gene and species trees. The fast algorithm to decide time consistency is useful in a broader context because it constitutes an attractive component for all tools that address tree reconciliation problems. △ Less

Submitted 5 May, 2017; originally announced May 2017.

ACM Class: G.2.2; G.2.3; F.2.2

arXiv:1701.09097 [pdf, other]

doi 10.1098/rsta.2016.0354

An Intermediate Level of Abstraction for Computational Systems Chemistry

Authors: Jakob L. Andersen, Christoph Flamm, Daniel Merkle, Peter F. Stadler

Abstract: Computational techniques are required for narrowing down the vast space of possibilities to plausible prebiotic scenarios, since precise information on the molecular composition, the dominant reaction chemistry, and the conditions for that era are scarce. The exploration of large chemical reaction networks is a central aspect in this endeavour. While quantum chemical methods can accurately predict… ▽ More Computational techniques are required for narrowing down the vast space of possibilities to plausible prebiotic scenarios, since precise information on the molecular composition, the dominant reaction chemistry, and the conditions for that era are scarce. The exploration of large chemical reaction networks is a central aspect in this endeavour. While quantum chemical methods can accurately predict the structures and reactivities of small molecules, they are not efficient enough to cope with large-scale reaction systems. The formalization of chemical reactions as graph grammars provides a generative system, well grounded in category theory, at the right level of abstraction for the analysis of large and complex reaction networks. An extension of the basic formalism into the realm of integer hyperflows allows for the identification of complex reaction patterns, such as auto-catalysis, in large reaction networks using optimization techniques. △ Less

Submitted 31 January, 2017; originally announced January 2017.

arXiv:1604.06379 [pdf, other]

Automatic Inference of Graph Transformation Rules Using the Cyclic Nature of Chemical Reactions

Authors: Christoph Flamm, Daniel Merkle, Peter F. Stadler, Uffe Thorsen

Abstract: Graph transformation systems have the potential to be realistic models of chemistry, provided a comprehensive collection of reaction rules can be extracted from the body of chemical knowledge. A first key step for rule learning is the computation of atom-atom map**s, i.e., the atom-wise correspondence between products and educts of all published chemical reactions. This can be phrased as a maxim… ▽ More Graph transformation systems have the potential to be realistic models of chemistry, provided a comprehensive collection of reaction rules can be extracted from the body of chemical knowledge. A first key step for rule learning is the computation of atom-atom map**s, i.e., the atom-wise correspondence between products and educts of all published chemical reactions. This can be phrased as a maximum common edge subgraph problem with the constraint that transition states must have cyclic structure. We describe a search tree method well suited for small edit distance and an integer linear program best suited for general instances and demonstrate that it is feasible to compute atom-atom maps at large scales using a manually curated database of biochemical reactions as an example. In this context we address the network completion problem. △ Less

Submitted 21 April, 2016; originally announced April 2016.

Comments: ICGT 2016 : 9th International Conference on Graph Transformation, extended technical report

arXiv:1603.02481 [pdf, other]

A Software Package for Chemically Inspired Graph Transformation

Authors: Jakob L. Andersen, Christoph Flamm, Daniel Merkle, Peter F. Stadler

Abstract: Chemical reaction networks can be automatically generated from graph grammar descriptions, where rewrite rules model reaction patterns. Because a molecule graph is connected and reactions in general involve multiple molecules, the rewriting must be performed on multisets of graphs. We present a general software package for this type of graph rewriting system, which can be used for modelling chemic… ▽ More Chemical reaction networks can be automatically generated from graph grammar descriptions, where rewrite rules model reaction patterns. Because a molecule graph is connected and reactions in general involve multiple molecules, the rewriting must be performed on multisets of graphs. We present a general software package for this type of graph rewriting system, which can be used for modelling chemical systems. The package contains a C++ library with algorithms for working with transformation rules in the Double Pushout formalism, e.g., composition of rules and a domain specific language for programming graph language generation. A Python interface makes these features easily accessible. The package also has extensive procedures for automatically visualising not only graphs and rewrite rules, but also Double Pushout diagrams and graph languages in form of directed hypergraphs. The software is available as an open source package, and interactive examples can be found on the accompanying webpage. △ Less

Submitted 21 April, 2016; v1 submitted 8 March, 2016; originally announced March 2016.

arXiv:1502.07555 [pdf, other]

Support for Eschenmoser's Glyoxylate Scenario

Authors: Jakob L. Andersen, Christoph Flamm, Daniel Merkle, Peter F. Stadler

Abstract: A core topic of research in prebiotic chemistry is the search for plausible synthetic routes that connect the building blocks of modern life such as sugars, nucleotides, amino acids, and lipids to "molecular food sources" that have likely been abundant on Early Earth. In a recent contribution, Albert Eschenmoser emphasised the importance of catalytic and autocatalytic cycles in establishing such a… ▽ More A core topic of research in prebiotic chemistry is the search for plausible synthetic routes that connect the building blocks of modern life such as sugars, nucleotides, amino acids, and lipids to "molecular food sources" that have likely been abundant on Early Earth. In a recent contribution, Albert Eschenmoser emphasised the importance of catalytic and autocatalytic cycles in establishing such abiotic synthesis pathways. The accumulation of intermediate products furthermore provides additional catalysts that allow pathways to change over time. We show here that generative models of chemical spaces based on graph grammars make it possible to study such phenomena is a systematic manner. In addition to repro- ducing the key steps of Eschenmoser's hypothesis paper, we discovered previously unexplored potentially autocatalytic pathways from HCN to glyoxylate. A cascading of autocatalytic cycles could efficiently re-route matter, distributed over the combinatorial complex network of HCN hydrolysation chemistry, towards a potential primordial metabolism. The generative approach also has it intrinsic limitations: the unsupervised expansion of the chemical space remains infeasible due to the exponential growth of possible molecules and reactions between them. Here in particular the combinatorial complexity of the HCN polymerisation and hydrolysation networks forms the computational bottleneck. As a consequence, guidance of the computational exploration by chemical experience is indispensable. △ Less

Submitted 26 February, 2015; originally announced February 2015.

arXiv:1406.6163 [pdf, other]

Group Communication Patterns for High Performance Computing in Scala

Authors: Felix P. Hargreaves, Daniel Merkle, Peter Schneider-Kamp

Abstract: We developed a Functional object-oriented Parallel framework (FooPar) for high-level high-performance computing in Scala. Central to this framework are Distributed Memory Parallel Data structures (DPDs), i.e., collections of data distributed in a shared nothing system together with parallel operations on these data. In this paper, we first present FooPar's architecture and the idea of DPDs and gro… ▽ More We developed a Functional object-oriented Parallel framework (FooPar) for high-level high-performance computing in Scala. Central to this framework are Distributed Memory Parallel Data structures (DPDs), i.e., collections of data distributed in a shared nothing system together with parallel operations on these data. In this paper, we first present FooPar's architecture and the idea of DPDs and group communications. Then, we show how DPDs can be implemented elegantly and efficiently in Scala based on the Traversable/Builder pattern, unifying Functional and Object-Oriented Programming. We prove the correctness and safety of one communication algorithm and show how specification testing (via ScalaCheck) can be used to bridge the gap between proof and implementation. Furthermore, we show that the group communication operations of FooPar outperform those of the MPJ Express open source MPI-bindings for Java, both asymptotically and empirically. FooPar has already been shown to be capable of achieving close-to-optimal performance for dense matrix-matrix multiplication via JNI. In this article, we present results on a parallel implementation of the Floyd-Warshall algorithm in FooPar, achieving more than 94 % efficiency compared to the serial version on a cluster using 100 cores for matrices of dimension 38000 x 38000. △ Less

Submitted 24 June, 2014; originally announced June 2014.

arXiv:1309.7198 [pdf, other]

On the Complexity of Reconstructing Chemical Reaction Networks

Authors: Rolf Fagerberg, Christoph Flamm, Daniel Merkle, Philipp Peters, Peter F. Stadler

Abstract: The analysis of the structure of chemical reaction networks is crucial for a better understanding of chemical processes. Such networks are well described as hypergraphs. However, due to the available methods, analyses regarding network properties are typically made on standard graphs derived from the full hypergraph description, e.g.\ on the so-called species and reaction graphs. However, a recons… ▽ More The analysis of the structure of chemical reaction networks is crucial for a better understanding of chemical processes. Such networks are well described as hypergraphs. However, due to the available methods, analyses regarding network properties are typically made on standard graphs derived from the full hypergraph description, e.g.\ on the so-called species and reaction graphs. However, a reconstruction of the underlying hypergraph from these graphs is not necessarily unique. In this paper, we address the problem of reconstructing a hypergraph from its species and reaction graph and show NP-completeness of the problem in its Boolean formulation. Furthermore we study the problem empirically on random and real world instances in order to investigate its computational limits in practice. △ Less

Submitted 27 September, 2013; originally announced September 2013.

arXiv:1304.2550 [pdf, other]

FooPar: A Functional Object Oriented Parallel Framework in Scala

Authors: Felix P. Hargreaves, Daniel Merkle

Abstract: We present FooPar, an extension for highly efficient Parallel Computing in the multi-paradigm programming language Scala. Scala offers concise and clean syntax and integrates functional programming features. Our framework FooPar combines these features with parallel computing techniques. FooPar is designed modular and supports easy access to different communication backends for distributed memory… ▽ More We present FooPar, an extension for highly efficient Parallel Computing in the multi-paradigm programming language Scala. Scala offers concise and clean syntax and integrates functional programming features. Our framework FooPar combines these features with parallel computing techniques. FooPar is designed modular and supports easy access to different communication backends for distributed memory architectures as well as high performance math libraries. In this article we use it to parallelize matrix matrix multiplication and show its scalability by a isoefficiency analysis. In addition, results based on a empirical analysis on two supercomputers are given. We achieve close-to-optimal performance wrt. theoretical peak performance. Based on this result we conclude that FooPar allows to fully access Scala's design features without suffering from performance drops when compared to implementations purely based on C and MPI. △ Less

Submitted 13 June, 2013; v1 submitted 9 April, 2013; originally announced April 2013.

arXiv:1302.4006 [pdf, other]

Generic Strategies for Chemical Space Exploration

Authors: Jakob L. Andersen, Christoph Flamm, Daniel Merkle, Peter F. Stadler

Abstract: Computational approaches to exploring "chemical universes", i.e., very large sets, potentially infinite sets of compounds that can be constructed by a prescribed collection of reaction mechanisms, in practice suffer from a combinatorial explosion. It quickly becomes impossible to test, for all pairs of compounds in a rapidly growing network, whether they can react with each other. More sophisticat… ▽ More Computational approaches to exploring "chemical universes", i.e., very large sets, potentially infinite sets of compounds that can be constructed by a prescribed collection of reaction mechanisms, in practice suffer from a combinatorial explosion. It quickly becomes impossible to test, for all pairs of compounds in a rapidly growing network, whether they can react with each other. More sophisticated and efficient strategies are therefore required to construct very large chemical reaction networks. Undirected labeled graphs and graph rewriting are natural models of chemical compounds and chemical reactions. Borrowing the idea of partial evaluation from functional programming, we introduce partial applications of rewrite rules. Binding substrate to rules increases the number of rules but drastically prunes the substrate sets to which it might match, resulting in dramatically reduced resource requirements. At the same time, exploration strategies can be guided, e.g. based on restrictions on the product molecules to avoid the explicit enumeration of very unlikely compounds. To this end we introduce here a generic framework for the specification of exploration strategies in graph-rewriting systems. Using key examples of complex chemical networks from sugar chemistry and the realm of metabolic networks we demonstrate the feasibility of a high-level strategy framework. The ideas presented here can not only be used for a strategy-based chemical space exploration that has close correspondence of experimental results, but are much more general. In particular, the framework can be used to emulate higher-level transformation models such as illustrated in a small puzzle game. △ Less

Submitted 15 April, 2014; v1 submitted 16 February, 2013; originally announced February 2013.

arXiv:1208.3153 [pdf, other]

Inferring Chemical Reaction Patterns Using Rule Composition in Graph Grammars

Authors: Jakob L. Andersen, Christoph Flamm, Daniel Merkle, Peter F. Stadler

Abstract: Modeling molecules as undirected graphs and chemical reactions as graph rewriting operations is a natural and convenient approach tom odeling chemistry. Graph grammar rules are most naturally employed to model elementary reactions like merging, splitting, and isomerisation of molecules. It is often convenient, in particular in the analysis of larger systems, to summarize several subsequent reactio… ▽ More Modeling molecules as undirected graphs and chemical reactions as graph rewriting operations is a natural and convenient approach tom odeling chemistry. Graph grammar rules are most naturally employed to model elementary reactions like merging, splitting, and isomerisation of molecules. It is often convenient, in particular in the analysis of larger systems, to summarize several subsequent reactions into a single composite chemical reaction. We use a generic approach for composing graph grammar rules to define a chemically useful rule compositions. We iteratively apply these rule compositions to elementary transformations in order to automatically infer complex transformation patterns. This is useful for instance to understand the net effect of complex catalytic cycles such as the Formose reaction. The automatically inferred graph grammar rule is a generic representative that also covers the overall reaction pattern of the Formose cycle, namely two carbonyl groups that can react with a bound glycolaldehyde to a second glycolaldehyde. Rule composition also can be used to study polymerization reactions as well as more complicated iterative reaction schemes. Terpenes and the polyketides, for instance, form two naturally occurring classes of compounds of utmost pharmaceutical interest that can be understood as "generalized polymers" consisting of five-carbon (isoprene) and two-carbon units, respectively. △ Less

Submitted 16 August, 2012; v1 submitted 15 August, 2012; originally announced August 2012.

arXiv:1110.6051 [pdf, other]

Maximizing Output and Recognizing Autocatalysis in Chemical Reaction Networks is NP-Complete

Authors: Jakob L. Andersen, Christoph Flamm, Daniel Merkle, Peter F. Stadler

Abstract: Background: A classical problem in metabolic design is to maximize the production of desired compound in a given chemical reaction network by appropriately directing the mass flow through the network. Computationally, this problem is addressed as a linear optimization problem over the "flux cone". The prior construction of the flux cone is computationally expensive and no polynomial-time algorithm… ▽ More Background: A classical problem in metabolic design is to maximize the production of desired compound in a given chemical reaction network by appropriately directing the mass flow through the network. Computationally, this problem is addressed as a linear optimization problem over the "flux cone". The prior construction of the flux cone is computationally expensive and no polynomial-time algorithms are known. Results: Here we show that the output maximization problem in chemical reaction networks is NP-complete. This statement remains true even if all reactions are monomolecular or bimolecular and if only a single molecular species is used as influx. As a corollary we show, furthermore, that the detection of autocatalytic species, i.e., types that can only be produced from the influx material when they are present in the initial reaction mixture, is an NP-complete computational problem. Conclusions: Hardness results on combinatorial problems and optimization problems are important to guide the development of computational tools for the analysis of metabolic networks in particular and chemical reaction networks in general. Our results indicate that efficient heuristics and approximate algorithms need to be employed for the analysis of large chemical networks since even conceptually simple flow problems are provably intractable. △ Less

Submitted 27 October, 2011; originally announced October 2011.

Showing 1–24 of 24 results for author: Merkle, D