Search | arXiv e-print repository

EdgeAlpha: Bringing Process Discovery to the Data Sources

Authors: Julia Andersen, Patrick Rathje, Olaf Landsiedel

Abstract: Process Mining is moving beyond mining traditional event logs and nowadays includes, for example, data sourced from sensors in the Internet of Things (IoT). The volume and velocity of data generated by such sensors makes it increasingly challenging for traditional process discovery algorithms to store and mine such data in traditional event logs. Further, privacy considerations often prevent data… ▽ More Process Mining is moving beyond mining traditional event logs and nowadays includes, for example, data sourced from sensors in the Internet of Things (IoT). The volume and velocity of data generated by such sensors makes it increasingly challenging for traditional process discovery algorithms to store and mine such data in traditional event logs. Further, privacy considerations often prevent data collection at a central location in the first place. To address this challenge, this paper introduces EdgeAlpha, a distributed algorithm for process discovery operating directly on sensor nodes and edge devices on a stream of real-time event data. Based on the Alpha Miner, EdgeAlpha tracks each event and its predecessor and successor events directly on the sensor node where the event is sensed and recorded. From this local view, each node in EdgeAlpha derives a partial footprint matrix, which we then merge at a central location, whenever we query the system to compute a process model. EdgeAlpha enables (a) scalable mining, as a node, for each event, only interacts with its predecessors and, when queried, only exchanges aggregates, i.e., partial footprint matrices, with the central location and (b) privacy preserving process mining, as nodes only store their own as well as predecessor and successor events. On the Sepsis Cases event log, for example, a node queries on average 18.7% of all nodes. For the Hospital Log, we can even reduce the overall querying to 3.87% of the nodes. △ Less

Submitted 13 June, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.02692 [pdf, other]

Automated Inference of Graph Transformation Rules

Authors: Jakob L. Andersen, Akbar Davoodi, Rolf Fagerberg, Christoph Flamm, Walter Fontana, Juri Kolčák, Christophe V. F. P. Laurent, Daniel Merkle, Nikolai Nøjgaard

Abstract: The explosion of data available in life sciences is fueling an increasing demand for expressive models and computational methods. Graph transformation is a model for dynamic systems with a large variety of applications. We introduce a novel method of the graph transformation model construction, combining generative and dynamical viewpoints to give a fully automated data-driven model inference meth… ▽ More The explosion of data available in life sciences is fueling an increasing demand for expressive models and computational methods. Graph transformation is a model for dynamic systems with a large variety of applications. We introduce a novel method of the graph transformation model construction, combining generative and dynamical viewpoints to give a fully automated data-driven model inference method. The method takes the input dynamical properties, given as a "snapshot" of the dynamics encoded by explicit transitions, and constructs a compatible model. The obtained model is guaranteed to be minimal, thus framing the approach as model compression (from a set of transitions into a set of rules). The compression is permissive to a lossy case, where the constructed model is allowed to exhibit behavior outside of the input transitions, thus suggesting a completion of the input dynamics. The task of graph transformation model inference is naturally highly challenging due to the combinatorics involved. We tackle the exponential explosion by proposing a heuristically minimal translation of the task into a well-established problem, set cover, for which highly optimized solutions exist. We further showcase how our results relate to Kolmogorov complexity expressed in terms of graph transformation. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: Preprint

arXiv:2312.00582 [pdf, other]

Design Patterns for Machine Learning Based Systems with Human-in-the-Loop

Authors: Jakob Smedegaard Andersen, Walid Maalej

Abstract: The development and deployment of systems using supervised machine learning (ML) remain challenging: mainly due to the limited reliability of prediction models and the lack of knowledge on how to effectively integrate human intelligence into automated decision-making. Humans involvement in the ML process is a promising and powerful paradigm to overcome the limitations of pure automated predictions… ▽ More The development and deployment of systems using supervised machine learning (ML) remain challenging: mainly due to the limited reliability of prediction models and the lack of knowledge on how to effectively integrate human intelligence into automated decision-making. Humans involvement in the ML process is a promising and powerful paradigm to overcome the limitations of pure automated predictions and improve the applicability of ML in practice. We compile a catalog of design patterns to guide developers select and implement suitable human-in-the-loop (HiL) solutions. Our catalog takes into consideration key requirements as the cost of human involvement and model retraining. It includes four training patterns, four deployment patterns, and two orthogonal cooperation patterns. △ Less

Submitted 1 December, 2023; originally announced December 2023.

arXiv:2309.16285 [pdf, other]

A Framework to Assess Knowledge Graphs Accountability

Authors: Jennie Andersen, Sylvie Cazalens, Philippe Lamarre, Pierre Maillot

Abstract: Knowledge Graphs (KGs), and Linked Open Data in particular, enable the generation and exchange of more and more information on the Web. In order to use and reuse these data properly, the presence of accountability information is essential. Accountability requires specific and accurate information about people's responsibilities and actions. In this article, we define KGAcc, a framework dedicated t… ▽ More Knowledge Graphs (KGs), and Linked Open Data in particular, enable the generation and exchange of more and more information on the Web. In order to use and reuse these data properly, the presence of accountability information is essential. Accountability requires specific and accurate information about people's responsibilities and actions. In this article, we define KGAcc, a framework dedicated to the assessment of RDF graphs accountability. It consists of accountability requirements and a measure of accountability for KGs. Then, we evaluate KGs from the LOD cloud and describe the results obtained. Finally, we compare our approach with data quality and FAIR assessment frameworks to highlight the differences. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Comments: 8 pages, to be published in: 2023 IEEE International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)

arXiv:2309.10629 [pdf, ps, other]

On the Realisability of Chemical Pathways

Authors: Jakob L. Andersen, Sissel Banke, Rolf Fagerberg, Christoph Flamm, Daniel Merkle, Peter F. Stadler

Abstract: The exploration of pathways and alternative pathways that have a specific function is of interest in numerous chemical contexts. A framework for specifying and searching for pathways has previously been developed, but a focus on which of the many pathway solutions are realisable, or can be made realisable, is missing. Realisable here means that there actually exists some sequencing of the reaction… ▽ More The exploration of pathways and alternative pathways that have a specific function is of interest in numerous chemical contexts. A framework for specifying and searching for pathways has previously been developed, but a focus on which of the many pathway solutions are realisable, or can be made realisable, is missing. Realisable here means that there actually exists some sequencing of the reactions of the pathway that will execute the pathway. We present a method for analysing the realisability of pathways based on the reachability question in Petri nets. For realisable pathways, our method also provides a certificate encoding an order of the reactions which realises the pathway. We present two extended notions of realisability of pathways, one of which is related to the concept of network catalysts. We exemplify our findings on the pentose phosphate pathway. Lastly, we discuss the relevance of our concepts for elucidating the choices often implicitly made when depicting pathways. △ Less

Submitted 19 September, 2023; originally announced September 2023.

Comments: Accepted in LNBI proceedings

arXiv:2308.12735 [pdf, other]

Reconciling Inconsistent Molecular Structures from Biochemical Databases

Authors: Casper Asbjørn Eriksen, Jakob Lykke Andersen, Rolf Fagerberg, Daniel Merkle

Abstract: Information on the structure of molecules, retrieved via biochemical databases, plays a pivotal role in various disciplines, such as metabolomics, systems biology, and drug discovery. However, no such database can be complete, and the chemical structure for a given compound is not necessarily consistent between databases. This paper presents StructRecon, a novel tool for resolving unique and corre… ▽ More Information on the structure of molecules, retrieved via biochemical databases, plays a pivotal role in various disciplines, such as metabolomics, systems biology, and drug discovery. However, no such database can be complete, and the chemical structure for a given compound is not necessarily consistent between databases. This paper presents StructRecon, a novel tool for resolving unique and correct molecular structures from database identifiers. StructRecon traverses the cross-links between database entries in different databases to construct what we call an identifier graph, which offers a more complete view of the total information available on a particular compound across all the databases. In order to reconcile discrepancies between databases, we first present an extensible model for chemical structure which supports multiple independent levels of detail, allowing standardisation of the structure to be applied iteratively. In some cases, our standardisation approach results in multiple structures for a given compound, in which case a random walk-based algorithm is used to select the most likely structure among incompatible alternates. We applied StructRecon to the EColiCore2 model, resolving a unique chemical structure for 85.11 % of identifiers. StructRecon is open-source and modular, which enables the potential support for more databases in the future. △ Less

Submitted 24 August, 2023; originally announced August 2023.

Comments: 14 pages, 4 figures, accepted at ISBRA 2023

arXiv:2306.09911 [pdf]

doi 10.1002/asi.24852

Uncited articles and their effect on the concentration of citations

Authors: Diego Kozlowski1, Jens Peter Andersen, Vincent Larivière

Abstract: Empirical evidence demonstrates that citations received by scholarly publications follow a pattern of preferential attachment, resulting in a power-law distribution. Such asymmetry has sparked significant debate regarding the use of citations for research evaluation. However, a consensus has yet to be established concerning the historical trends in citation concentration. Are citations becoming mo… ▽ More Empirical evidence demonstrates that citations received by scholarly publications follow a pattern of preferential attachment, resulting in a power-law distribution. Such asymmetry has sparked significant debate regarding the use of citations for research evaluation. However, a consensus has yet to be established concerning the historical trends in citation concentration. Are citations becoming more concentrated in a small number of articles? Or have recent geopolitical and technical changes in science led to more decentralized distributions? This ongoing debate stems from a lack of technical clarity in measuring inequality. Given the variations in citation practices across disciplines and over time, it is crucial to account for multiple factors that can influence the findings. This article explores how reference-based and citation-based approaches, uncited articles, citation inflation, the expansion of bibliometric databases, disciplinary differences, and self-citations affect the evolution of citation concentration. Our results indicate a decreasing trend in citation concentration, primarily driven by a decline in uncited articles, which, in turn, can be attributed to the growing significance of Asia and Europe. On the whole, our findings clarify current debates on citation concentration and show that, contrary to a widely-held belief, citations are increasingly scattered. △ Less

Submitted 16 June, 2023; originally announced June 2023.

Comments: 17 pages, 8 figures

arXiv:2212.02842 [pdf, other]

doi 10.1038/s41597-023-02173-4

VISEM-Tracking, a human spermatozoa tracking dataset

Authors: Vajira Thambawita, Steven A. Hicks, Andrea M. Storås, Thu Nguyen, Jorunn M. Andersen, Oliwia Witczak, Trine B. Haugen, Hugo L. Hammer, Pål Halvorsen, Michael A. Riegler

Abstract: A manual assessment of sperm motility requires microscopy observation, which is challenging due to the fast-moving spermatozoa in the field of view. To obtain correct results, manual evaluation requires extensive training. Therefore, computer-assisted sperm analysis (CASA) has become increasingly used in clinics. Despite this, more data is needed to train supervised machine learning approaches in… ▽ More A manual assessment of sperm motility requires microscopy observation, which is challenging due to the fast-moving spermatozoa in the field of view. To obtain correct results, manual evaluation requires extensive training. Therefore, computer-assisted sperm analysis (CASA) has become increasingly used in clinics. Despite this, more data is needed to train supervised machine learning approaches in order to improve accuracy and reliability in the assessment of sperm motility and kinematics. In this regard, we provide a dataset called VISEM-Tracking with 20 video recordings of 30 seconds (comprising 29,196 frames) of wet sperm preparations with manually annotated bounding-box coordinates and a set of sperm characteristics analyzed by experts in the domain. In addition to the annotated data, we provide unlabeled video clips for easy-to-use access and analysis of the data via methods such as self- or unsupervised learning. As part of this paper, we present baseline sperm detection performances using the YOLOv5 deep learning (DL) model trained on the VISEM-Tracking dataset. As a result, we show that the dataset can be used to train complex DL models to analyze spermatozoa. △ Less

Submitted 10 May, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

Report number: Scientific Data volume 10

Journal ref: Sci Data 10, 260 (2023)

arXiv:2204.01334 [pdf, other]

Efficient, Uncertainty-based Moderation of Neural Networks Text Classifiers

Authors: Jakob Smedegaard Andersen, Walid Maalej

Abstract: To maximize the accuracy and increase the overall acceptance of text classifiers, we propose a framework for the efficient, in-operation moderation of classifiers' output. Our framework focuses on use cases in which F1-scores of modern Neural Networks classifiers (ca.~90%) are still inapplicable in practice. We suggest a semi-automated approach that uses prediction uncertainties to pass unconfiden… ▽ More To maximize the accuracy and increase the overall acceptance of text classifiers, we propose a framework for the efficient, in-operation moderation of classifiers' output. Our framework focuses on use cases in which F1-scores of modern Neural Networks classifiers (ca.~90%) are still inapplicable in practice. We suggest a semi-automated approach that uses prediction uncertainties to pass unconfident, probably incorrect classifications to human moderators. To minimize the workload, we limit the human moderated data to the point where the accuracy gains saturate and further human effort does not lead to substantial improvements. A series of benchmarking experiments based on three different datasets and three state-of-the-art classifiers show that our framework can improve the classification F1-scores by 5.1 to 11.2% (up to approx.~98 to 99%), while reducing the moderation load up to 73.3% compared to a random moderation. △ Less

Submitted 4 April, 2022; originally announced April 2022.

arXiv:2201.04515 [pdf, other]

Representing catalytic mechanisms with rule composition

Authors: Jakob L. Andersen, Rolf Fagerberg, Christoph Flamm, Walter Fontana, Juri Kolčák, Christophe V. F. P. Laurent, Daniel Merkle, Nikolai Nøjgaard

Abstract: Reaction mechanisms are often presented as sequences of elementary steps, such as codified by arrow pushing. We propose an approach for representing such mechanisms using graph transformation. In this framework, each elementary step is a rule for modifying a molecular graph and a mechanism is a sequence of such rules. To generate a compact representation of a multi-step reaction, we compose the ru… ▽ More Reaction mechanisms are often presented as sequences of elementary steps, such as codified by arrow pushing. We propose an approach for representing such mechanisms using graph transformation. In this framework, each elementary step is a rule for modifying a molecular graph and a mechanism is a sequence of such rules. To generate a compact representation of a multi-step reaction, we compose the rules of individual steps into a composite rule, providing a rigorous and fully automated approach to coarse-graining. While the composite rule retains the graphical conditions necessary for the execution of a mechanism, it also records information about transient changes not visible by comparing educts and products. By projecting the rule onto a single "overlay graph", we generalize Fujita's idea of an Imaginary Transition Structure from elementary reactions to composite reactions. The utility of the overlay graph construct is exemplified in the context of enzyme-catalyzed reactions. In a first application, we exploit mechanistic information in the Mechanism and Catalytic Site Atlas to construct overlay graphs of hydrolase reactions listed in the database. These graphs point at a spectrum of catalytic entanglement of enzyme and substrate, de-emphasizing the notion of a singular catalyst in favor of a collection of catalytic sites that can be distributed across enzyme and substrate. In a second application, we deploy composite rules to search the Rhea database for reactions of known or unknown mechanism that are, in principle, compatible with the mechanisms implied by the composite rules. We believe this work adds to the utility of graph-transformation formalisms in representing and reasoning about chemistry in an automated yet insightful fashion. △ Less

Submitted 25 August, 2022; v1 submitted 12 January, 2022; originally announced January 2022.

Comments: Preprint

arXiv:2201.04360 [pdf, other]

Efficient Modular Graph Transformation Rule Application

Authors: Jakob L. Andersen, Rolf Fagerberg, Juri Kolčák, Christophe V. F. P. Laurent, Daniel Merkle, Nikolai Nøjgaard

Abstract: Graph transformation formalisms have proven to be suitable tools for the modelling of chemical reactions. They are well established in theoretical studies and increasingly also in practical applications in chemistry. The latter is made feasible via the development of programming frameworks which makes the formalisms executable. The application of such frameworks to large networks of chemical rea… ▽ More Graph transformation formalisms have proven to be suitable tools for the modelling of chemical reactions. They are well established in theoretical studies and increasingly also in practical applications in chemistry. The latter is made feasible via the development of programming frameworks which makes the formalisms executable. The application of such frameworks to large networks of chemical reactions, however, poses unique computational challenges. One such characteristic is the inherent combinatorial nature of the graphs involved. The graphs consist of many connected components, representing individual molecules. While the existing methods for implementing graph transformations can be applied to such graphs, the combinatorics of constructing graph matches quickly becomes a computational bottleneck as the size of the chemical reaction network grows. In this contribution, we develop a new method of enumerating graph matches during graph transformation rule application. The method is designed to improve performance in such scenarios and is based on constructing graph matches in an iterative, component-wise fashion which allows redundant applications to be detected early and pruned. We further extend the algorithm with an efficient heuristic based on local symmetries of the graphs, which allow us to detect and discard isomorphic applications early. Finally, we conduct chemical network generation experiments on real-life as well as synthetic data and compare against the state-of-the-art algorithm in the field. △ Less

Submitted 25 August, 2022; v1 submitted 12 January, 2022; originally announced January 2022.

Comments: preprint

arXiv:2109.13644 [pdf, ps, other]

Clustering to the Fewest Clusters Under Intra-Cluster Dissimilarity Constraints

Authors: Jennie Andersen, Brice Chardin, Mohamed Tribak

Abstract: This paper introduces the equiwide clustering problem, where valid partitions must satisfy intra-cluster dissimilarity constraints. Unlike most existing clustering algorithms, equiwide clustering relies neither on density nor on a predefined number of expected classes, but on a dissimilarity threshold. Its main goal is to ensure an upper bound on the error induced by ultimately replacing any objec… ▽ More This paper introduces the equiwide clustering problem, where valid partitions must satisfy intra-cluster dissimilarity constraints. Unlike most existing clustering algorithms, equiwide clustering relies neither on density nor on a predefined number of expected classes, but on a dissimilarity threshold. Its main goal is to ensure an upper bound on the error induced by ultimately replacing any object with its cluster representative. Under this constraint, we then primarily focus on minimizing the number of clusters, along with potential sub-objectives. We argue that equiwide clustering is a sound clustering problem, and discuss its relationship with other optimization problems, existing and novel implementations as well as approximation strategies. We review and evaluate suitable clustering algorithms to identify trade-offs between the various practical solutions for this clustering problem. △ Less

Submitted 28 September, 2021; originally announced September 2021.

Journal ref: Proceedings of the 33rd IEEE International Conference on Tools with Artificial Intelligence, Nov 2021, Athens, Greece

arXiv:2107.03086 [pdf, other]

Defining Autocatalysis in Chemical Reaction Networks

Authors: Jakob L. Andersen, Christoph Flamm, Daniel Merkle, Peter F. Stadler

Abstract: Autocatalysis is a deceptively simple concept, referring to the situation that a chemical species $X$ catalyzes its own formation. From the perspective of chemical kinetics, autocatalysts show a regime of super-linear growth. Given a chemical reaction network, however, it is not at all straightforward to identify species that are autocatalytic in the sense that there is a sub-network that takes… ▽ More Autocatalysis is a deceptively simple concept, referring to the situation that a chemical species $X$ catalyzes its own formation. From the perspective of chemical kinetics, autocatalysts show a regime of super-linear growth. Given a chemical reaction network, however, it is not at all straightforward to identify species that are autocatalytic in the sense that there is a sub-network that takes $X$ as input and produces more than one copy of $X$ as output. The difficulty arises from the need to distinguish autocatalysis e.g. from the superposition of a cycle that consumes and produces equal amounts of $X$ and a pathway that produces $X$. To deal with this issue, a number of competing notions, such as exclusive autocatalysis and autocatalytic cycles, have been introduced. A closer inspection of concepts and their usage by different authors shows, however, that subtle differences in the definitions often makes conceptually matching ideas difficult to bring together formally. In this contribution we make some of the available approaches comparable by translating them into a common formal framework that uses integer hyperflows as a basis to study autocatalysis in large chemical reaction networks. As an application we investigate the prevalence of autocatalysis in metabolic networks. △ Less

Submitted 7 July, 2021; originally announced July 2021.

arXiv:2106.02573 [pdf, other]

Rewriting Theory for the Life Sciences: A Unifying Theory of CTMC Semantics (Long version)

Authors: Nicolas Behr, Jean Krivine, Jakob L. Andersen, Daniel Merkle

Abstract: The Kappa biochemistry and the MØD organic chemistry frameworks are amongst the most intensely developed applications of rewriting-based methods in the life sciences to date. A typical feature of these types of rewriting theories is the necessity to implement certain structural constraints on the objects to be rewritten (a protein is empirically found to have a certain signature of sites, a carbon… ▽ More The Kappa biochemistry and the MØD organic chemistry frameworks are amongst the most intensely developed applications of rewriting-based methods in the life sciences to date. A typical feature of these types of rewriting theories is the necessity to implement certain structural constraints on the objects to be rewritten (a protein is empirically found to have a certain signature of sites, a carbon atom can form at most four bonds, ...). In this paper, we contribute a number of original developments that permit to implement a universal theory of continuous-time Markov chains (CTMCs) for stochastic rewriting systems. Our core mathematical concepts are a novel rule algebra construction for the relevant setting of rewriting rules with conditions, both in Double- and in Sesqui-Pushout semantics, augmented by a suitable stochastic mechanics formalism extension that permits to derive dynamical evolution equations for pattern-counting statistics. A second main contribution of our paper is a novel framework of restricted rewriting theories, which comprises a rule-algebra calculus under the restriction to so-called constraint-preserving completions of application conditions (for rules considered to act only upon objects of the underlying category satisfying a globally fixed set of structural constraints). This novel framework in turn renders a faithful encoding of bio- and organo-chemical rewriting in the sense of Kappa and MØD possible, which allows us to derive a rewriting-based formulation of reaction systems including a full-fledged CTMC semantics as instances of our universal CTMC framework. While offering an interesting new perspective and conceptual simplification of this semantics in the setting of Kappa, both the formal encoding and the CTMC semantics of organo-chemical reaction systems as motivated by the MØD framework are the first such results of their kind. △ Less

Submitted 4 June, 2021; originally announced June 2021.

Comments: 62 pages; long version of arXiv:2003.09395

MSC Class: 16B50; 60J27; 68Q42 (Primary) 60J28; 16B50; 05E99 (Secondary) ACM Class: F.4.2; G.3; G.2.2

arXiv:2102.03292 [pdf, other]

Graph Transformation for Enzymatic Mechanisms

Authors: Jakob L. Andersen, Rolf Fagerberg, Christoph Flamm, Walter Fontana, Juraj Kolčák, Christophe V. F. P. Laurent, Daniel Merkle, Nikolai Nøjaard

Abstract: Motivation: The design of enzymes is as challenging as it is consequential for making chemical synthesis in medical and industrial applications more efficient, cost-effective and environmentally friendly. While several aspects of this complex problem are computationally assisted, the drafting of catalytic mechanisms, i.e. the specification of the chemical steps-and hence intermediate states-that t… ▽ More Motivation: The design of enzymes is as challenging as it is consequential for making chemical synthesis in medical and industrial applications more efficient, cost-effective and environmentally friendly. While several aspects of this complex problem are computationally assisted, the drafting of catalytic mechanisms, i.e. the specification of the chemical steps-and hence intermediate states-that the enzyme is meant to implement, is largely left to human expertise. The ability to capture specific chemistries of multi-step catalysis in a fashion that enables its computational construction and design is therefore highly desirable and would equally impact the elucidation of existing enzymatic reactions whose mechanisms are unknown. Results: We use the mathematical framework of graph transformation to express the distinction between rules and reactions in chemistry. We derive about 1000 rules for amino acid side chain chemistry from the M-CSA database, a curated repository of enzymatic mechanisms. Using graph transformation we are able to propose hundreds of hypothetical catalytic mechanisms for a large number of unrelated reactions in the Rhea database. We analyze these mechanisms to find that they combine in chemically sound fashion individual steps from a variety of known multi-step mechanisms, showing that plausible novel mechanisms for catalysis can be constructed computationally. △ Less

Submitted 26 March, 2021; v1 submitted 5 February, 2021; originally announced February 2021.

Comments: Preprint submitted to ISMB/ECCB 2021. Prototype implementation source code available at https://github.com/Nojgaard/mechsearch Live demo available at https://cheminf.imada.sdu.dk/mechsearch/ Supplementary material available at https://cheminf.imada.sdu.dk/preprints/ECCB-2021

arXiv:2005.06303 [pdf]

doi 10.7554/eLife.58807

Meta-Research: COVID-19 medical papers have fewer women first authors than expected

Authors: Jens Peter Andersen, Mathias Wullum Nielsen, Nicole L. Simone, Resa E. Lewiss, Reshma Jagsi

Abstract: The COVID-19 pandemic has resulted in school closures and distancing requirements that have disrupted both work and family life for many. Concerns exist that these disruptions caused by the pandemic may not have influenced men and women researchers equally. Many medical journals have published papers on the pandemic, which were generated by researchers facing the challenges of these disruptions. H… ▽ More The COVID-19 pandemic has resulted in school closures and distancing requirements that have disrupted both work and family life for many. Concerns exist that these disruptions caused by the pandemic may not have influenced men and women researchers equally. Many medical journals have published papers on the pandemic, which were generated by researchers facing the challenges of these disruptions. Here we report the results of an analysis that compared the gender distribution of authors on 1,893 medical papers related to the pandemic with that on papers published in the same journals in 2019, for papers with first authors and last authors from the United States. Using mixed-effects regression models, we estimated that the proportion of COVID-19 papers with a woman first author was 19% lower than that for papers published in the same journals in 2019, while our comparisons for last authors and overall proportion of women authors per paper were inconclusive. A closer examination suggested that women's representation as first authors of COVID-19 research was particularly low for papers published in March and April 2020. Our findings are consistent with the idea that the research productivity of women, especially early-career women, has been affected more than the research productivity of men. △ Less

Submitted 11 June, 2020; v1 submitted 13 May, 2020; originally announced May 2020.

Comments: Submitted to eLife. First revision

Journal ref: eLife 2020;9:e58807

arXiv:2003.07190 [pdf, ps, other]

On the parameterized complexity of 2-partitions

Authors: Jonas Bamse Andersen, Jørgen Bang-Jensen, Anders Yeo

Abstract: We give an FPT algorithm for deciding whether the vertex set a digraph $D$ can be partitioned into two disjoint sets $V_1,V_2$ such that the digraph $D[V_1]$ induced by $V_1$ has a vertex that can reach all other vertices by directed paths, the digraph $D[V_2]$ has no vertex of in-degree zero and $|V_i|\geq k_i$, where $k_1,k_2$ are part of the input. This settles an open problem from[1,4]. We give an FPT algorithm for deciding whether the vertex set a digraph $D$ can be partitioned into two disjoint sets $V_1,V_2$ such that the digraph $D[V_1]$ induced by $V_1$ has a vertex that can reach all other vertices by directed paths, the digraph $D[V_2]$ has no vertex of in-degree zero and $|V_i|\geq k_i$, where $k_1,k_2$ are part of the input. This settles an open problem from[1,4]. △ Less

Submitted 16 March, 2020; originally announced March 2020.

MSC Class: 05C20 ACM Class: F.2.2; G.2.2

arXiv:1911.00407 [pdf, other]

A Graph-Based Tool to Embed the π-Calculus into a Computational DPO Framework

Authors: Jakob Lykke Andersen, Marc Hellmuth, Daniel Merkle, Nikolai Nøjgaard, Marco Peressotti

Abstract: Graph transformation approaches have been successfully used to analyse and design chemical and biological systems. Here we build on top of a DPO framework, in which molecules are modelled as typed attributed graphs and chemical reactions are modelled as graph transformations. Edges and vertexes can be labelled with first-order terms, which can be used to encode, e.g., steric information of molecul… ▽ More Graph transformation approaches have been successfully used to analyse and design chemical and biological systems. Here we build on top of a DPO framework, in which molecules are modelled as typed attributed graphs and chemical reactions are modelled as graph transformations. Edges and vertexes can be labelled with first-order terms, which can be used to encode, e.g., steric information of molecules. While targeted to chemical settings, the computational framework is intended to be very generic and applicable to the exploration of arbitrary spaces derived via iterative application of rewrite rules, such as process calculi like Milner's π-calculus. To illustrate the generality of the framework, we introduce EpiM: a tool for computing execution spaces of π-calculus processes. EpiM encodes π-calculus processes as typed attributed graphs and then exploits the existing DPO framework to compute their dynamics in the form of graphs where nodes are π-calculus processes and edges are reduction steps. EpiM takes advantage of the graph-based representation and facilities offered by the framework, like efficient isomorphism checking to prune the space without resorting to explicit structural equivalences. EpiM is available as an online Python-based tool. △ Less

Submitted 29 October, 2019; originally announced November 2019.

arXiv:1910.13327 [pdf, other]

Machine Learning-Based Analysis of Sperm Videos and Participant Data for Male Fertility Prediction

Authors: Steven A. Hicks, Jorunn M. Andersen, Oliwia Witczak, Vajira Thambawita, Påll Halvorsen, Hugo L. Hammer, Trine B. Haugen, Michael A. Riegler

Abstract: Methods for automatic analysis of clinical data are usually targeted towards a specific modality and do not make use of all relevant data available. In the field of male human reproduction, clinical and biological data are not used to its fullest potential. Manual evaluation of a semen sample using a microscope is time-consuming and requires extensive training. Furthermore, the validity of manual… ▽ More Methods for automatic analysis of clinical data are usually targeted towards a specific modality and do not make use of all relevant data available. In the field of male human reproduction, clinical and biological data are not used to its fullest potential. Manual evaluation of a semen sample using a microscope is time-consuming and requires extensive training. Furthermore, the validity of manual semen analysis has been questioned due to limited reproducibility, and often high inter-personnel variation. The existing computer-aided sperm analyzer systems are not recommended for routine clinical use due to methodological challenges caused by the consistency of the semen sample. Thus, there is a need for an improved methodology. We use modern and classical machine learning techniques together with a dataset consisting of 85 videos of human semen samples and related participant data to automatically predict sperm motility. Used techniques include simple linear regression and more sophisticated methods using convolutional neural networks. Our results indicate that sperm motility prediction based on deep learning using sperm motility videos is rapid to perform and consistent. The algorithms performed worse when participant data was added. In conclusion, machine learning-based automatic analysis may become a valuable tool in male infertility investigation and research. △ Less

Submitted 29 October, 2019; originally announced October 2019.

Comments: Preprint, accepted by Nature Scientific Reports for publication 24.10.2019

arXiv:1812.11177 [pdf, other]

Degree Bounded Bottleneck Spanning Trees in Three Dimensions

Authors: Patrick J. Andersen, Charl J. Ras

Abstract: The geometric $δ$-minimum spanning tree problem ($δ$-MST) is the problem of finding a minimum spanning tree for a set of points in a normed vector space, such that no vertex in the tree has a degree which exceeds $δ$, and the sum of the lengths of the edges in the tree is minimum. The similarly defined geometric $δ$-minimum bottleneck spanning tree problem ($δ$-MBST), is the problem of finding a d… ▽ More The geometric $δ$-minimum spanning tree problem ($δ$-MST) is the problem of finding a minimum spanning tree for a set of points in a normed vector space, such that no vertex in the tree has a degree which exceeds $δ$, and the sum of the lengths of the edges in the tree is minimum. The similarly defined geometric $δ$-minimum bottleneck spanning tree problem ($δ$-MBST), is the problem of finding a degree bounded spanning tree such that the length of the longest edge is minimum. For point sets that lie in the Euclidean plane, both of these problems have been shown to be NP-hard for certain specific values of $δ$. In this paper, we investigate the $δ$-MBST problem in $3$-dimensional Euclidean space and $3$-dimensional rectilinear space. We show that the problems are NP-hard for certain values of $δ$, and we provide inapproximability results for these cases. We also describe new approximation algorithms for solving these $3$-dimensional variants, and then analyse their worst-case performance. △ Less

Submitted 25 January, 2019; v1 submitted 28 December, 2018; originally announced December 2018.

Comments: 35 pages, 22 figures

MSC Class: 90C27

arXiv:1809.09348 [pdf, other]

Algorithms for Euclidean Degree Bounded Spanning Tree Problems

Authors: Patrick J. Andersen, Charl J. Ras

Abstract: Given a set of points in the Euclidean plane, the Euclidean \textit{$δ$-minimum spanning tree} ($δ$-MST) problem is the problem of finding a spanning tree with maximum degree no more than $δ$ for the set of points such the sum of the total length of its edges is minimum. Similarly, the Euclidean \textit{$δ$-minimum bottleneck spanning tree} ($δ$-MBST) problem, is the problem of finding a degree-bo… ▽ More Given a set of points in the Euclidean plane, the Euclidean \textit{$δ$-minimum spanning tree} ($δ$-MST) problem is the problem of finding a spanning tree with maximum degree no more than $δ$ for the set of points such the sum of the total length of its edges is minimum. Similarly, the Euclidean \textit{$δ$-minimum bottleneck spanning tree} ($δ$-MBST) problem, is the problem of finding a degree-bounded spanning tree for a set of points in the plane such that the length of the longest edge is minimum. When $δ\leq 4$, these two problems may yield disjoint sets of optimal solutions for the same set of points. In this paper, we perform computational experiments to compare the accuracies of a variety of heuristic and approximation algorithms for both these problems. We develop heuristics for these problems and compare them with existing algorithms. We also describe a new type of edge swap algorithm for these problems that outperforms all the algorithms we tested. △ Less

Submitted 25 September, 2018; originally announced September 2018.

Comments: 38 pages, 8 pages of appendices

MSC Class: 90C27

arXiv:1712.02594 [pdf, other]

Chemical Transformation Motifs - Modelling Pathways as Integer Hyperflows

Authors: Jakob L. Andersen, Christoph Flamm, Daniel Merkle, Peter F. Stadler

Abstract: We present an elaborate framework for formally modelling pathways in chemical reaction networks on a mechanistic level. Networks are modelled mathematically as directed multi-hypergraphs, with vertices corresponding to molecules and hyperedges to reactions. Pathways are modelled as integer hyperflows and we expand the network model by detailed routing constraints. In contrast to the more tradition… ▽ More We present an elaborate framework for formally modelling pathways in chemical reaction networks on a mechanistic level. Networks are modelled mathematically as directed multi-hypergraphs, with vertices corresponding to molecules and hyperedges to reactions. Pathways are modelled as integer hyperflows and we expand the network model by detailed routing constraints. In contrast to the more traditional approaches like Flux Balance Analysis or Elementary Mode analysis we insist on integer-valued flows. While this choice makes it necessary to solve possibly hard integer linear programs, it has the advantage that more detailed mechanistic questions can be formulated. It is thus possible to query networks for general transformation motifs, and to automatically enumerate optimal and near-optimal pathways. Similarities and differences between our work and traditional approaches in metabolic network analysis are discussed in detail. To demonstrate the applicability of the mathematical framework to real-life problems we first explore the design space of possible non-oxidative glycolysis pathways and show that recent manually designed pathways can be further optimised. We then use a model of sugar chemistry to investigate pathways in the autocatalytic formose process. A graph transformation-based approach is used to automatically generate the reaction networks of interest. △ Less

Submitted 7 December, 2017; originally announced December 2017.

arXiv:1711.08289 [pdf, other]

A Generic Framework for Engineering Graph Canonization Algorithms

Authors: Jakob L. Andersen, Daniel Merkle

Abstract: The state-of-the-art tools for practical graph canonization are all based on the individualization-refinement paradigm, and their difference is primarily in the choice of heuristics they include and in the actual tool implementation. It is thus not possible to make a direct comparison of how individual algorithmic ideas affect the performance on different graph classes. We present an algorithmic… ▽ More The state-of-the-art tools for practical graph canonization are all based on the individualization-refinement paradigm, and their difference is primarily in the choice of heuristics they include and in the actual tool implementation. It is thus not possible to make a direct comparison of how individual algorithmic ideas affect the performance on different graph classes. We present an algorithmic software framework that facilitates implementation of heuristics as independent extensions to a common core algorithm. It therefore becomes easy to perform a detailed comparison of the performance and behaviour of different algorithmic ideas. Implementations are provided of a range of algorithms for tree traversal, target cell selection, and node invariant, including choices from the literature and new variations. The framework readily supports extraction and visualization of detailed data from separate algorithm executions for subsequent analysis and development of new heuristics. Using collections of different graph classes we investigate the effect of varying the selections of heuristics, often revealing exactly which individual algorithmic choice is responsible for particularly good or bad performance. On several benchmark collections, including a newly proposed class of difficult instances, we additionally find that our implementation performs better than the current state-of-the-art tools. △ Less

Submitted 22 November, 2017; originally announced November 2017.

arXiv:1701.09097 [pdf, other]

doi 10.1098/rsta.2016.0354

An Intermediate Level of Abstraction for Computational Systems Chemistry

Authors: Jakob L. Andersen, Christoph Flamm, Daniel Merkle, Peter F. Stadler

Abstract: Computational techniques are required for narrowing down the vast space of possibilities to plausible prebiotic scenarios, since precise information on the molecular composition, the dominant reaction chemistry, and the conditions for that era are scarce. The exploration of large chemical reaction networks is a central aspect in this endeavour. While quantum chemical methods can accurately predict… ▽ More Computational techniques are required for narrowing down the vast space of possibilities to plausible prebiotic scenarios, since precise information on the molecular composition, the dominant reaction chemistry, and the conditions for that era are scarce. The exploration of large chemical reaction networks is a central aspect in this endeavour. While quantum chemical methods can accurately predict the structures and reactivities of small molecules, they are not efficient enough to cope with large-scale reaction systems. The formalization of chemical reactions as graph grammars provides a generative system, well grounded in category theory, at the right level of abstraction for the analysis of large and complex reaction networks. An extension of the basic formalism into the realm of integer hyperflows allows for the identification of complex reaction patterns, such as auto-catalysis, in large reaction networks using optimization techniques. △ Less

Submitted 31 January, 2017; originally announced January 2017.

arXiv:1612.06079 [pdf, other]

doi 10.1016/j.joi.2017.02.009

An empirical and theoretical critique of the Euclidean index

Authors: Jens Peter Andersen

Abstract: The recently proposed Euclidean index offers a novel approach to measure the citation impact of academic authors, in particular as an alternative to the h-index. We test if the index provides new, robust information, not covered by existing bibliometric indicators, discuss the measurement scale and the degree of distinction between analytical units the index offers. We find that the Euclidean inde… ▽ More The recently proposed Euclidean index offers a novel approach to measure the citation impact of academic authors, in particular as an alternative to the h-index. We test if the index provides new, robust information, not covered by existing bibliometric indicators, discuss the measurement scale and the degree of distinction between analytical units the index offers. We find that the Euclidean index does not outperform existing indicators on these topics and that the main application of the index would be solely for ranking, which is not seen as a recommended practice. △ Less

Submitted 7 March, 2017; v1 submitted 19 December, 2016; originally announced December 2016.

Comments: Accepted for publication in Journal of Informetrics

arXiv:1603.02481 [pdf, other]

A Software Package for Chemically Inspired Graph Transformation

Authors: Jakob L. Andersen, Christoph Flamm, Daniel Merkle, Peter F. Stadler

Abstract: Chemical reaction networks can be automatically generated from graph grammar descriptions, where rewrite rules model reaction patterns. Because a molecule graph is connected and reactions in general involve multiple molecules, the rewriting must be performed on multisets of graphs. We present a general software package for this type of graph rewriting system, which can be used for modelling chemic… ▽ More Chemical reaction networks can be automatically generated from graph grammar descriptions, where rewrite rules model reaction patterns. Because a molecule graph is connected and reactions in general involve multiple molecules, the rewriting must be performed on multisets of graphs. We present a general software package for this type of graph rewriting system, which can be used for modelling chemical systems. The package contains a C++ library with algorithms for working with transformation rules in the Double Pushout formalism, e.g., composition of rules and a domain specific language for programming graph language generation. A Python interface makes these features easily accessible. The package also has extensive procedures for automatically visualising not only graphs and rewrite rules, but also Double Pushout diagrams and graph languages in form of directed hypergraphs. The software is available as an open source package, and interactive examples can be found on the accompanying webpage. △ Less

Submitted 21 April, 2016; v1 submitted 8 March, 2016; originally announced March 2016.

arXiv:1510.08943 [pdf, other]

MessageGuard: A Browser-based Platform for Usable, Content-Based Encryption Research

Authors: Scott Ruoti, Jeff Andersen, Tyler Monson, Daniel Zappala, Kent Seamons

Abstract: This paper describes MessageGuard, a browser-based platform for research into usable content-based encryption. MessageGuard is designed to enable collaboration between security and usability researchers on long-standing research questions in this area. It significantly simplifies the effort required to work in this space and provides a place for research results to be shared, replicated, and compa… ▽ More This paper describes MessageGuard, a browser-based platform for research into usable content-based encryption. MessageGuard is designed to enable collaboration between security and usability researchers on long-standing research questions in this area. It significantly simplifies the effort required to work in this space and provides a place for research results to be shared, replicated, and compared with minimal confounding factors. MessageGuard provides ubiquitous encryption and secure cryptographic operations, enabling research on any existing web application, with realistic usability studies on a secure platform. We validate MessageGuard's compatibility and performance, and we illustrate its utility with case studies for Gmail and Facebook Chat. △ Less

Submitted 16 May, 2016; v1 submitted 29 October, 2015; originally announced October 2015.

arXiv:1510.08555 [pdf, other]

Why Johnny Still, Still Can't Encrypt: Evaluating the Usability of a Modern PGP Client

Authors: Scott Ruoti, Jeff Andersen, Daniel Zappala, Kent Seamons

Abstract: This paper presents the results of a laboratory study involving Mailvelope, a modern PGP client that integrates tightly with existing webmail providers. In our study, we brought in pairs of participants and had them attempt to use Mailvelope to communicate with each other. Our results shown that more than a decade and a half after \textit{Why Johnny Can't Encrypt}, modern PGP tools are still unusa… ▽ More This paper presents the results of a laboratory study involving Mailvelope, a modern PGP client that integrates tightly with existing webmail providers. In our study, we brought in pairs of participants and had them attempt to use Mailvelope to communicate with each other. Our results shown that more than a decade and a half after \textit{Why Johnny Can't Encrypt}, modern PGP tools are still unusable for the masses. We finish with a discussion of pain points encountered using Mailvelope, and discuss what might be done to address them in future PGP systems. △ Less

Submitted 13 January, 2016; v1 submitted 28 October, 2015; originally announced October 2015.

Comments: This is the Mailvelope study discussed in the CHI 2016 paper arXiv:1510.08554 "We're on the Same Page": A Usability Study of Secure Email Using Pairs of Novice Users"

arXiv:1510.08554 [pdf, other]

doi 10.1145/2858036.2858400

"We're on the Same Page": A Usability Study of Secure Email Using Pairs of Novice Users

Authors: Scott Ruoti, Jeff Andersen, Scott Heidbrink, Mark O'Neil, Elham Vaziripour, Justin Wu, Daniel Zappala, Kent Seamons

Abstract: Secure email is increasingly being touted as usable by novice users, with a push for adoption based on recent concerns about government surveillance. To determine whether secure email is for grassroots adoption, we employ a laboratory user study that recruits pairs of novice to install and use several of the latest systems to exchange secure messages. We present quantitative and qualitative result… ▽ More Secure email is increasingly being touted as usable by novice users, with a push for adoption based on recent concerns about government surveillance. To determine whether secure email is for grassroots adoption, we employ a laboratory user study that recruits pairs of novice to install and use several of the latest systems to exchange secure messages. We present quantitative and qualitative results from 25 pairs of novice users as they use Pwm, Tutanota, and Virtru. Participants report being more at ease with this type of study and better able to cope with mistakes since both participants are "on the same page". We find that users prefer integrated solutions over depot-based solutions, and that tutorials are important in hel** first-time users. Hiding the details of how a secure email system provides security can lead to a lack of trust in the system. Participants expressed a desire to use secure email, but few wanted to use it regularly and most were unsure of when they might use it. △ Less

Submitted 11 January, 2016; v1 submitted 28 October, 2015; originally announced October 2015.

Comments: 34th Annual ACM Conference on Human Factors in Computing Systems (CHI 2016)

ACM Class: H.1.2; H.5.2

arXiv:1510.08435 [pdf, other]

doi 10.1145/2984511:2984580

Private Webmail 2.0: Simple and Easy-to-Use Secure Email

Authors: Scott Ruoti, Jeff Andersen, Travis Hendershot, Daniel Zappala, Kent Seamons

Abstract: Private Webmail 2.0 (Pwm 2.0) improves upon the current state of the art by increasing the usability and practical security of secure email for ordinary users. More users are able to send and receive encrypted emails without mistakenly revealing sensitive information. In this paper we describe user interface traits that positively affect the usability and security of Pwm 2.0: (1) an artificial del… ▽ More Private Webmail 2.0 (Pwm 2.0) improves upon the current state of the art by increasing the usability and practical security of secure email for ordinary users. More users are able to send and receive encrypted emails without mistakenly revealing sensitive information. In this paper we describe user interface traits that positively affect the usability and security of Pwm 2.0: (1) an artificial delay to encryption that enhances user confidence in Pwm 2.0 while simultaneously instructing users on who can read their encrypted messages; (2) a modified composition interface that helps protect users from mistakenly sending sensitive information in the clear; (3) an annotated secure email composition interface that instructs users on how to correctly use secure email; and (4) inline, context-sensitive tutorials, which improved view rates for tutorials from less than 10% in earlier systems to over 90% for Pwm 2.0. In a user study involving 51 participants we validate these interface modifications, and also show that the use of manual encryption has no effect on usability or security. △ Less

Submitted 8 August, 2016; v1 submitted 28 October, 2015; originally announced October 2015.

Comments: 29th ACM Conference on User Interface Software and Technology (UIST '16)

ACM Class: H.5.2; H.1.2

arXiv:1507.00154 [pdf]

Influence of study type on Twitter activity for medical research papers

Authors: Jens Peter Andersen, Stefanie Haustein

Abstract: Twitter has been identified as one of the most popular and promising altmetrics data sources, as it possibly reflects a broader use of research articles by the general public. Several factors, such as document age, scientific discipline, number of authors and document type, have been shown to affect the number of tweets received by scientific documents. The particular meaning of tweets mentioning… ▽ More Twitter has been identified as one of the most popular and promising altmetrics data sources, as it possibly reflects a broader use of research articles by the general public. Several factors, such as document age, scientific discipline, number of authors and document type, have been shown to affect the number of tweets received by scientific documents. The particular meaning of tweets mentioning scholarly papers is, however, not entirely understood and their validity as impact indicators debatable. This study contributes to the understanding of factors influencing Twitter popularity of medical papers investigating differences between medical study types. 162,830 documents indexed in Embase to a medical study type have been analysed for the study type specific tweet frequency. Meta-analyses, systematic reviews and clinical trials were found to be tweeted substantially more frequently than other study types, while all basic research received less attention than the average. The findings correspond well with clinical evidence hierarchies. It is suggested that interest from laymen and patients may be a factor in the observed effects. △ Less

Submitted 1 July, 2015; originally announced July 2015.

Comments: Presented at the 15th International Society on Scientometrics & Informetrics (ISSI) Conference, 01 Jul 2015, Istanbul, Turkey

arXiv:1502.07555 [pdf, other]

Support for Eschenmoser's Glyoxylate Scenario

Authors: Jakob L. Andersen, Christoph Flamm, Daniel Merkle, Peter F. Stadler

Abstract: A core topic of research in prebiotic chemistry is the search for plausible synthetic routes that connect the building blocks of modern life such as sugars, nucleotides, amino acids, and lipids to "molecular food sources" that have likely been abundant on Early Earth. In a recent contribution, Albert Eschenmoser emphasised the importance of catalytic and autocatalytic cycles in establishing such a… ▽ More A core topic of research in prebiotic chemistry is the search for plausible synthetic routes that connect the building blocks of modern life such as sugars, nucleotides, amino acids, and lipids to "molecular food sources" that have likely been abundant on Early Earth. In a recent contribution, Albert Eschenmoser emphasised the importance of catalytic and autocatalytic cycles in establishing such abiotic synthesis pathways. The accumulation of intermediate products furthermore provides additional catalysts that allow pathways to change over time. We show here that generative models of chemical spaces based on graph grammars make it possible to study such phenomena is a systematic manner. In addition to repro- ducing the key steps of Eschenmoser's hypothesis paper, we discovered previously unexplored potentially autocatalytic pathways from HCN to glyoxylate. A cascading of autocatalytic cycles could efficiently re-route matter, distributed over the combinatorial complex network of HCN hydrolysation chemistry, towards a potential primordial metabolism. The generative approach also has it intrinsic limitations: the unsupervised expansion of the chemical space remains infeasible due to the exponential growth of possible molecules and reactions between them. Here in particular the combinatorial complexity of the HCN polymerisation and hydrolysation networks forms the computational bottleneck. As a consequence, guidance of the computational exploration by chemical experience is indispensable. △ Less

Submitted 26 February, 2015; originally announced February 2015.

arXiv:1302.4006 [pdf, other]

Generic Strategies for Chemical Space Exploration

Authors: Jakob L. Andersen, Christoph Flamm, Daniel Merkle, Peter F. Stadler

Abstract: Computational approaches to exploring "chemical universes", i.e., very large sets, potentially infinite sets of compounds that can be constructed by a prescribed collection of reaction mechanisms, in practice suffer from a combinatorial explosion. It quickly becomes impossible to test, for all pairs of compounds in a rapidly growing network, whether they can react with each other. More sophisticat… ▽ More Computational approaches to exploring "chemical universes", i.e., very large sets, potentially infinite sets of compounds that can be constructed by a prescribed collection of reaction mechanisms, in practice suffer from a combinatorial explosion. It quickly becomes impossible to test, for all pairs of compounds in a rapidly growing network, whether they can react with each other. More sophisticated and efficient strategies are therefore required to construct very large chemical reaction networks. Undirected labeled graphs and graph rewriting are natural models of chemical compounds and chemical reactions. Borrowing the idea of partial evaluation from functional programming, we introduce partial applications of rewrite rules. Binding substrate to rules increases the number of rules but drastically prunes the substrate sets to which it might match, resulting in dramatically reduced resource requirements. At the same time, exploration strategies can be guided, e.g. based on restrictions on the product molecules to avoid the explicit enumeration of very unlikely compounds. To this end we introduce here a generic framework for the specification of exploration strategies in graph-rewriting systems. Using key examples of complex chemical networks from sugar chemistry and the realm of metabolic networks we demonstrate the feasibility of a high-level strategy framework. The ideas presented here can not only be used for a strategy-based chemical space exploration that has close correspondence of experimental results, but are much more general. In particular, the framework can be used to emulate higher-level transformation models such as illustrated in a small puzzle game. △ Less

Submitted 15 April, 2014; v1 submitted 16 February, 2013; originally announced February 2013.

arXiv:1301.5782 [pdf]

Association between quality of clinical practice guidelines and citations given to their references

Authors: Jens Peter Andersen

Abstract: It has been suggested that bibliometric analysis of different document types may reveal new aspects of research performance. In medical research a number of study types play different roles in the research process and it has been shown, that the evidence-level of study types is associated with varying citation rates. This study focuses on clinical practice guidelines, which are supposed to gather… ▽ More It has been suggested that bibliometric analysis of different document types may reveal new aspects of research performance. In medical research a number of study types play different roles in the research process and it has been shown, that the evidence-level of study types is associated with varying citation rates. This study focuses on clinical practice guidelines, which are supposed to gather the highest evidence on a given topic to give the best possible recommendation for practitioners. The quality of clinical practice guidelines, measured using the AGREE score, is compared to the citations given to the references used in these guidelines, as it is hypothesised, that better guidelines are based on higher cited references. AGREE scores are gathered from reviews of clinical practice guidelines on a number of diseases and treatments. Their references are collected from Web of Science and citation counts are normalised using the item-oriented z-score and the PPtop-10% indicators. A positive correlation between both citation indicators and the AGREE score of clinical practice guidelines is found. Some potential confounding factors are identified. While confounding cannot be excluded, results indicate low likelihood for the identified confounders. The results provide a new perspective to and application of citation analysis. △ Less

Submitted 24 January, 2013; originally announced January 2013.

Comments: Paper submitted to 14th International Society of Scientometrics and Informetrics Conference

arXiv:1208.3153 [pdf, other]

Inferring Chemical Reaction Patterns Using Rule Composition in Graph Grammars

Authors: Jakob L. Andersen, Christoph Flamm, Daniel Merkle, Peter F. Stadler

Abstract: Modeling molecules as undirected graphs and chemical reactions as graph rewriting operations is a natural and convenient approach tom odeling chemistry. Graph grammar rules are most naturally employed to model elementary reactions like merging, splitting, and isomerisation of molecules. It is often convenient, in particular in the analysis of larger systems, to summarize several subsequent reactio… ▽ More Modeling molecules as undirected graphs and chemical reactions as graph rewriting operations is a natural and convenient approach tom odeling chemistry. Graph grammar rules are most naturally employed to model elementary reactions like merging, splitting, and isomerisation of molecules. It is often convenient, in particular in the analysis of larger systems, to summarize several subsequent reactions into a single composite chemical reaction. We use a generic approach for composing graph grammar rules to define a chemically useful rule compositions. We iteratively apply these rule compositions to elementary transformations in order to automatically infer complex transformation patterns. This is useful for instance to understand the net effect of complex catalytic cycles such as the Formose reaction. The automatically inferred graph grammar rule is a generic representative that also covers the overall reaction pattern of the Formose cycle, namely two carbonyl groups that can react with a bound glycolaldehyde to a second glycolaldehyde. Rule composition also can be used to study polymerization reactions as well as more complicated iterative reaction schemes. Terpenes and the polyketides, for instance, form two naturally occurring classes of compounds of utmost pharmaceutical interest that can be understood as "generalized polymers" consisting of five-carbon (isoprene) and two-carbon units, respectively. △ Less

Submitted 16 August, 2012; v1 submitted 15 August, 2012; originally announced August 2012.

arXiv:1110.6051 [pdf, other]

Maximizing Output and Recognizing Autocatalysis in Chemical Reaction Networks is NP-Complete

Authors: Jakob L. Andersen, Christoph Flamm, Daniel Merkle, Peter F. Stadler

Abstract: Background: A classical problem in metabolic design is to maximize the production of desired compound in a given chemical reaction network by appropriately directing the mass flow through the network. Computationally, this problem is addressed as a linear optimization problem over the "flux cone". The prior construction of the flux cone is computationally expensive and no polynomial-time algorithm… ▽ More Background: A classical problem in metabolic design is to maximize the production of desired compound in a given chemical reaction network by appropriately directing the mass flow through the network. Computationally, this problem is addressed as a linear optimization problem over the "flux cone". The prior construction of the flux cone is computationally expensive and no polynomial-time algorithms are known. Results: Here we show that the output maximization problem in chemical reaction networks is NP-complete. This statement remains true even if all reactions are monomolecular or bimolecular and if only a single molecular species is used as influx. As a corollary we show, furthermore, that the detection of autocatalytic species, i.e., types that can only be produced from the influx material when they are present in the initial reaction mixture, is an NP-complete computational problem. Conclusions: Hardness results on combinatorial problems and optimization problems are important to guide the development of computational tools for the analysis of metabolic networks in particular and chemical reaction networks in general. Our results indicate that efficient heuristics and approximate algorithms need to be employed for the analysis of large chemical networks since even conceptually simple flow problems are provably intractable. △ Less

Submitted 27 October, 2011; originally announced October 2011.

Showing 1–36 of 36 results for author: Andersen, J