-
A computational framework for weighted simplicial homology
Authors:
Andrei C. Bura,
Neelav S. Dutta,
Thomas J. X. Li,
Christian M. Reidys
Abstract:
We provide a bottom up construction of torsion generators for weighted homology of a weighted complex over a discrete valuation ring $R=\mathbb{F}[[π]]$. This is achieved by starting from a basis for classical homology of the $n$-th skeleton for the underlying complex with coefficients in the residue field $\mathbb{F}$ and then lifting it to a basis for the weighted homology with coefficients in t…
▽ More
We provide a bottom up construction of torsion generators for weighted homology of a weighted complex over a discrete valuation ring $R=\mathbb{F}[[π]]$. This is achieved by starting from a basis for classical homology of the $n$-th skeleton for the underlying complex with coefficients in the residue field $\mathbb{F}$ and then lifting it to a basis for the weighted homology with coefficients in the ring $R$. Using the latter, a bijection is established between $n+1$ and $n$ dimensional simplices whose weight ratios provide the exponents of the $π$-monomials that generate each torsion summand in the structure theorem of the weighted homology modules over $R$. We present algorithms that subsume the torsion computation by reducing it to normalization over the residue field of $R$, and describe a Python package we implemented that takes advantage of this reduction and performs the computation efficiently.
△ Less
Submitted 9 June, 2022;
originally announced June 2022.
-
On Weighted Simplicial Homology
Authors:
Thomas J. X. Li,
Christian M. Reidys
Abstract:
We develop a framework for computing the homology of weighted simplicial complexes with coefficients in a discrete valuation ring. A weighted simplicial complex, $(X,v)$, introduced by Dawson [Cah. Topol. Géom. Différ. Catég. 31 (1990), pp. 229--243], is a simplicial complex, $X$, together with an integer-valued function, $v$, assigning weights to simplices, such that the weight of any of faces ar…
▽ More
We develop a framework for computing the homology of weighted simplicial complexes with coefficients in a discrete valuation ring. A weighted simplicial complex, $(X,v)$, introduced by Dawson [Cah. Topol. Géom. Différ. Catég. 31 (1990), pp. 229--243], is a simplicial complex, $X$, together with an integer-valued function, $v$, assigning weights to simplices, such that the weight of any of faces are monotonously increasing. In addition, weighted homology, $H_n^v(X)$, features a new boundary operator, $\partial_n^v$. In difference to Dawson, our approach is centered at a natural homomorphism $θ$ of weighted chain complexes. The key object is $H^v_{n}(X/θ)$, the weighted homology of a quotient of chain complexes induced by $θ$, appearing in a long exact sequence linking weighted homologies with different weights. We shall construct bases for the kernel and image of the weighted boundary map, identifying $n$-simplices as either $κ_n$- or $μ_n$-vertices. Long exact sequences of weighted homology groups and the bases, allow us to prove a structure theorem for the weighted simplicial homology with coefficients in a ring of formal power series $R=\mathbb{F}[[π]]$, where $\mathbb{F}$ is a field. Relative to simplicial homology new torsion arises and we shall show that the torsion modules are connected to a pairing between distinguished $κ_n$ and $μ_{n+1}$ simplices.
△ Less
Submitted 6 May, 2022;
originally announced May 2022.
-
The energy-spectrum of bicompatible sequences
Authors:
Fenix W. Huang,
Christopher L. Barrett,
Christian M. Reidys
Abstract:
Background: Genotype-phenotype maps provide a meaningful filtration of sequence space and RNA secondary structures are particular such phenotypes. Compatible sequences i.e.~sequences that satisfy the base pairing constraints of a given RNA structure play an important role in the context of neutral networks and inverse folding. Sequences satisfying the constraints of two structures simultaneously a…
▽ More
Background: Genotype-phenotype maps provide a meaningful filtration of sequence space and RNA secondary structures are particular such phenotypes. Compatible sequences i.e.~sequences that satisfy the base pairing constraints of a given RNA structure play an important role in the context of neutral networks and inverse folding. Sequences satisfying the constraints of two structures simultaneously are called bicompatible and phenotypic change, induced by erroneously replicating populations of RNA sequences, is closely connected to bicompatibility. Furthermore, bicompatible sequences are relevant for riboswitch sequences, beacons of evolution, realizing two distinct phenotypes.
Results: We present a full loop energy model Boltzmann sampler of bicompatible sequences for pairs of structures. The novel dynamic programming algorithm is based on a topological framework encapsulating the relations between loops. We utilize our sequence sampler to study the energy spectra and density of bicompatible sequences, the rankings of the structures and key properties for evolutionary transitions.
Conclusion: Our analysis of riboswitch sequences shows that key properties of bicompatible sequences depend on the particular pair of structures. While there always exist bicompatible sequences for random structure pairs, they are less suited to facilitate transitions. We show that native riboswitch sequences exhibit a distinct signature with regards to the ranking of their two phenotypes relative to the minimum free energy, suggesting a new criterion for identifying native sequences and sequences subjected to evolutionary pressure.
△ Less
Submitted 30 September, 2019;
originally announced October 2019.
-
On an enhancement of RNA probing data using Information Theory
Authors:
Thomas J. X. Li,
Christian M. Reidys
Abstract:
Identifying the secondary structure of an RNA is crucial for understanding its diverse regulatory functions. This paper focuses on how to enhance target identification in a Boltzmann ensemble of structures via chemical probing data. We employ an information-theoretic approach to solve the problem, via considering a variant of the Rényi-Ulam game. Our framework is centered around the ensemble tree,…
▽ More
Identifying the secondary structure of an RNA is crucial for understanding its diverse regulatory functions. This paper focuses on how to enhance target identification in a Boltzmann ensemble of structures via chemical probing data. We employ an information-theoretic approach to solve the problem, via considering a variant of the Rényi-Ulam game. Our framework is centered around the ensemble tree, a hierarchical bi-partition of the input ensemble, that is constructed by recursively querying about whether or not a base pair of maximum information entropy is contained in the target. These queries are answered via relating local with global probing data, employing the modularity in RNA secondary structures. We present that leaves of the tree are comprised of sub-samples exhibiting a distinguished structure with high probability. In particular, for a Boltzmann ensemble incorporating probing data, which is well established in the literature, the probability of our framework correctly identifying the target in the leaf is greater than $90\%$.
△ Less
Submitted 12 September, 2019;
originally announced September 2019.
-
Loop Homology of Bi-secondary Structures II
Authors:
Andrei C. Bura,
Qijun He,
Christian M. Reidys
Abstract:
In this paper we further describe the features of the topological space $K(R)$ obtained from the loop nerve of $R$, for $R=(S,T)$ a bi-secondary structure. We will first identify certain distinct combinatorial structures in the arc diagram of $R$ which we will call crossing components. The main theorem of this paper shows that the total number of these crossing components equals the rank of…
▽ More
In this paper we further describe the features of the topological space $K(R)$ obtained from the loop nerve of $R$, for $R=(S,T)$ a bi-secondary structure. We will first identify certain distinct combinatorial structures in the arc diagram of $R$ which we will call crossing components. The main theorem of this paper shows that the total number of these crossing components equals the rank of $H_2(R)$, the second homology group of the loop nerve.
△ Less
Submitted 3 September, 2019;
originally announced September 2019.
-
Loop Homology of Bi-secondary Structures
Authors:
Andrei C. Bura,
Qijun He,
Christian M. Reidys
Abstract:
In this paper we compute the loop homology of bi-secondary structures. Bi-secondary structures were introduced by Haslinger and Stadler and are pairs of RNA secondary structures, i.e. diagrams having non-crossing arcs in the upper half-plane. A bi-secondary structure is represented by drawing its respective secondary structures in the upper and lower half-plane. An RNA secondary structure has a lo…
▽ More
In this paper we compute the loop homology of bi-secondary structures. Bi-secondary structures were introduced by Haslinger and Stadler and are pairs of RNA secondary structures, i.e. diagrams having non-crossing arcs in the upper half-plane. A bi-secondary structure is represented by drawing its respective secondary structures in the upper and lower half-plane. An RNA secondary structure has a loop decomposition, where a loop corresponds to a boundary component, regarding the secondary structure as an orientable fatgraph. The loop-decomposition of secondary structures facilitates the computation of its free energy and any two loops intersect either trivially or in exactly two vertices. In bi-secondary structures the intersection of loops is more complex and is of importance in current algorithmic work in bio-informatics and evolutionary optimization. We shall construct a simplicial complex capturing the intersections of loops and compute its homology. We prove that only the zeroth and second homology groups are nontrivial and furthermore show that the second homology group is free. Finally, we provide evidence that the generators of the second homology group have a bio-physical interpretation: they correspond to pairs of mutually exclusive substructures.
△ Less
Submitted 3 April, 2019;
originally announced April 2019.
-
D-chain tomography of networks: a new structure spectrum and an application to the SIR process
Authors:
Ricky X. F. Chen,
Christian M. Reidys,
Andrei C. Bura
Abstract:
The analysis of the dynamics on complex networks is closely connected to structural features of the networks. Features like, for instance, graph-cores and node degrees have been studied ubiquitously. Here we introduce the D-spectrum of a network, a novel new framework that is based on a collection of nested chains of subgraphs within the network. Graph-cores and node degrees are merely from two pa…
▽ More
The analysis of the dynamics on complex networks is closely connected to structural features of the networks. Features like, for instance, graph-cores and node degrees have been studied ubiquitously. Here we introduce the D-spectrum of a network, a novel new framework that is based on a collection of nested chains of subgraphs within the network. Graph-cores and node degrees are merely from two particular such chains of the D-spectrum. Each chain gives rise to a ranking of nodes and, for a fixed node, the collection of these ranks provides us with the D-spectrum of the node. Besides a node deletion algorithm, we discover a connection between the D-spectrum of a network and some fixed points of certain graph dynamical systems (MC systems) on the network. Using the D-spectrum we identify nodes of similar spreading power in the susceptible-infectious-recovered (SIR) model on a collection of real world networks as a quick application. We then discuss our results and conclude that D-spectra represent a meaningful augmentation of graph-cores and node degrees.
△ Less
Submitted 27 April, 2019; v1 submitted 12 October, 2018;
originally announced October 2018.
-
The block spectrum of RNA pseudoknot structures
Authors:
Thomas J. X. Li,
Christina S. Burris,
Christian M. Reidys
Abstract:
In this paper we analyze the length-spectrum of blocks in $γ$-structures. $γ$-structures are a class of RNA pseudoknot structures that plays a key role in the context of polynomial time RNA folding. A $γ$-structure is constructed by nesting and concatenating specific building components having topological genus at most $γ$. A block is a substructure enclosed by crossing maximal arcs with respect t…
▽ More
In this paper we analyze the length-spectrum of blocks in $γ$-structures. $γ$-structures are a class of RNA pseudoknot structures that plays a key role in the context of polynomial time RNA folding. A $γ$-structure is constructed by nesting and concatenating specific building components having topological genus at most $γ$. A block is a substructure enclosed by crossing maximal arcs with respect to the partial order induced by nesting. We show that, in uniformly generated $γ$-structures, there is a significant gap in this length-spectrum, i.e., there asymptotically almost surely exists a unique longest block of length at least $n-O(n^{1/2})$ and that with high probability any other block has finite length. For fixed $γ$, we prove that the length of the longest block converges to a discrete limit law, and that the distribution of short blocks of given length tends to a negative binomial distribution in the limit of long sequences. We refine this analysis to the length spectrum of blocks of specific pseudoknot types, such as H-type and kissing hairpins. Our results generalize the rainbow spectrum on secondary structures by the first and third authors and are being put into context with the structural prediction of long non-coding RNAs.
△ Less
Submitted 12 June, 2018;
originally announced June 2018.
-
The rainbow-spectrum of RNA secondary structures
Authors:
Thomas J. X. Li,
Christian M. Reidys
Abstract:
In this paper we analyze the length-spectrum of rainbows in RNA secondary structures. A rainbow in a secondary structure is a maximal arc with respect to the partial order induced by nesting. We show that there is a significant gap in this length-spectrum. We shall prove that there asymptotically almost surely exists a unique longest rainbow of length at least $n-O(n^{1/2})$ and that with high pro…
▽ More
In this paper we analyze the length-spectrum of rainbows in RNA secondary structures. A rainbow in a secondary structure is a maximal arc with respect to the partial order induced by nesting. We show that there is a significant gap in this length-spectrum. We shall prove that there asymptotically almost surely exists a unique longest rainbow of length at least $n-O(n^{1/2})$ and that with high probability any other rainbow has finite length. We show that the distribution of the length of the longest rainbow converges to a discrete limit law and that, for finite $k$, the distribution of rainbows of length $k$, becomes for large $n$ a negative binomial distribution. We then put the results of this paper into context, comparing the analytical results with those observed in RNA minimum free energy structures, biological RNA structures and relate our findings to the sparsification of folding algorithms.
△ Less
Submitted 8 June, 2018;
originally announced June 2018.
-
From unicellular fatgraphs to trees
Authors:
Thomas J. X. Li,
Christian M. Reidys
Abstract:
In this paper we study the minimum number of reversals needed to transform a unicellular fatgraph into a tree. We consider reversals acting on boundary components, having the natural interpretation as gluing, slicing or half-flip** of vertices. Our main result is an expression for the minimum number of reversals needed to transform a unicellular fatgraph to a plane tree. The expression involves…
▽ More
In this paper we study the minimum number of reversals needed to transform a unicellular fatgraph into a tree. We consider reversals acting on boundary components, having the natural interpretation as gluing, slicing or half-flip** of vertices. Our main result is an expression for the minimum number of reversals needed to transform a unicellular fatgraph to a plane tree. The expression involves the Euler genus of the fatgraph and an additional parameter, which counts the number of certain orientable blocks in the decomposition of the fatgraph. In the process we derive a constructive proof of how to decompose non-orientable, irreducible, unicellular fatgraphs into smaller fatgraphs of the same type or trivial fatgraphs, consisting of a single ribbon. We furthermore provide a detailed analysis how reversals affect the component-structure of the underlying fatgraphs. Our results generalize the Hannenhalli-Pevzner formula for the reversal distance of signed permutations.
△ Less
Submitted 8 June, 2018;
originally announced June 2018.
-
Garden-of-Eden states and fixed points of monotone dynamical systems
Authors:
Ricky X. F. Chen,
Henning S. Mortveit,
Christian M. Reidys
Abstract:
In this paper we analyze Garden-of-Eden (GoE) states and fixed points of monotone, sequential dynamical systems (SDS). For any monotone SDS and fixed update schedule, we identify a particular set of states, each state being either a GoE state or reaching a fixed point, while both determining if a state is a GoE state and finding out all fixed points are generally hard. As a result, we show that th…
▽ More
In this paper we analyze Garden-of-Eden (GoE) states and fixed points of monotone, sequential dynamical systems (SDS). For any monotone SDS and fixed update schedule, we identify a particular set of states, each state being either a GoE state or reaching a fixed point, while both determining if a state is a GoE state and finding out all fixed points are generally hard. As a result, we show that the maximum size of their limit cycles is strictly less than ${n\choose \lfloor n/2 \rfloor}$. We connect these results to the Knaster-Tarski theorem and the LYM inequality. Finally, we establish that there exist monotone, parallel dynamical systems (PDS) that cannot be expressed as monotone SDS, despite the fact that the converse is always true.
△ Less
Submitted 8 May, 2018;
originally announced May 2018.
-
Genetic robustness of let-7 miRNA sequence-structure pairs
Authors:
Qijun He,
Fenix W. Huang,
Christopher Barrett,
Christian M. Reidys
Abstract:
Genetic robustness, the preservation of evolved phenotypes against genotypic mutations, is one of the central concepts in evolution. In recent years a large body of work has focused on the origins, mechanisms, and consequences of robustness in a wide range of biological systems. In particular, research on ncRNAs studied the ability of sequences to maintain folded structures against single-point mu…
▽ More
Genetic robustness, the preservation of evolved phenotypes against genotypic mutations, is one of the central concepts in evolution. In recent years a large body of work has focused on the origins, mechanisms, and consequences of robustness in a wide range of biological systems. In particular, research on ncRNAs studied the ability of sequences to maintain folded structures against single-point mutations. In these studies, the structure is merely a reference. However, recent work revealed evidence that structure itself contributes to the genetic robustness of ncRNAs. We follow this line of thought and consider sequence-structure pairs as the unit of evolution and introduce the spectrum of inverse folding rates (IFR-spectrum) as a measurement of genetic robustness. Our analysis of the miRNA let-7 family captures key features of structure-modulated evolution and facilitates the study of robustness against multiple-point mutations.
△ Less
Submitted 11 January, 2018;
originally announced January 2018.
-
An efficient dual sampling algorithm with Hamming distance filtration
Authors:
Fenix W. Huang,
Qijun He,
Christopher Barrett,
Christian M. Reidys
Abstract:
Recently, a framework considering RNA sequences and their RNA secondary structures as pairs, led to some information-theoretic perspectives on how the semantics encoded in RNA sequences can be inferred. In this context, the pairing arises naturally from the energy model of RNA secondary structures. Fixing the sequence in the pairing produces the RNA energy landscape, whose partition function was d…
▽ More
Recently, a framework considering RNA sequences and their RNA secondary structures as pairs, led to some information-theoretic perspectives on how the semantics encoded in RNA sequences can be inferred. In this context, the pairing arises naturally from the energy model of RNA secondary structures. Fixing the sequence in the pairing produces the RNA energy landscape, whose partition function was discovered by McCaskill. Dually, fixing the structure induces the energy landscape of sequences. The latter has been considered for designing more efficient inverse folding algorithms.
We present here the Hamming distance filtered, dual partition function, together with a Boltzmann sampler using novel dynamic programming routines for the loop-based energy model. The time complexity of the algorithm is $O(h^2n)$, where $h,n$ are Hamming distance and sequence length, respectively, reducing the time complexity of samplers, reported in the literature by $O(n^2)$. We then present two applications, the first being in the context of the evolution of natural sequence-structure pairs of microRNAs and the second constructing neutral paths. The former studies the inverse fold rate (IFR) of sequence-structure pairs, filtered by Hamming distance, observing that such pairs evolve towards higher levels of robustness, i.e.,~increasing IFR. The latter is an algorithm that constructs neutral paths: given two sequences in a neutral network, we employ the sampler in order to construct short paths connecting them, consisting of sequences all contained in the neutral network.
△ Less
Submitted 31 October, 2017;
originally announced November 2017.
-
The boundary length and point spectrum enumeration of partial chord diagrams using cut and join recursion
Authors:
Jørgen Ellegaard Andersen,
Hiroyuki Fuji,
Robert C. Penner,
Christian M. Reidys
Abstract:
We introduce the boundary length and point spectrum, as a joint generalization of the boundary length spectrum and boundary point spectrum in arXiv:1307.0967. We establish by cut-and-join methods that the number of partial chord diagrams filtered by the boundary length and point spectrum satisfies a recursion relation, which combined with an initial condition determines these numbers uniquely. Thi…
▽ More
We introduce the boundary length and point spectrum, as a joint generalization of the boundary length spectrum and boundary point spectrum in arXiv:1307.0967. We establish by cut-and-join methods that the number of partial chord diagrams filtered by the boundary length and point spectrum satisfies a recursion relation, which combined with an initial condition determines these numbers uniquely. This recursion relation is equivalent to a second order, non-linear, algebraic partial differential equation for the generating function of the numbers of partial chord diagrams filtered by the boundary length and point spectrum.
△ Less
Submitted 1 April, 2017; v1 submitted 19 December, 2016;
originally announced December 2016.
-
Statistics of topological RNA structures
Authors:
Thomas J. X. Li,
Christian M. Reidys
Abstract:
In this paper we study properties of topological RNA structures, i.e.~RNA contact structures with cross-serial interactions that are filtered by their topological genus. RNA secondary structures within this framework are topological structures having genus zero. We derive a new bivariate generating function whose singular expansion allows us to analyze the distributions of arcs, stacks, hairpin- ,…
▽ More
In this paper we study properties of topological RNA structures, i.e.~RNA contact structures with cross-serial interactions that are filtered by their topological genus. RNA secondary structures within this framework are topological structures having genus zero. We derive a new bivariate generating function whose singular expansion allows us to analyze the distributions of arcs, stacks, hairpin- , interior- and multi-loops. We then extend this analysis to H-type pseudoknots, kissing hairpins as well as $3$-knots and compute their respective expectation values. Finally we discuss our results and put them into context with data obtained by uniform sampling structures of fixed genus.
△ Less
Submitted 22 June, 2016;
originally announced June 2016.
-
Topological language for RNA
Authors:
Fenix W. D. Huang,
Christian M. Reidys
Abstract:
In this paper we introduce a novel, context-free grammar, {\it RNAFeatures$^*$}, capable of generating any RNA structure including pseudoknot structures (pk-structure). We represent pk-structures as orientable fatgraphs, which naturally leads to a filtration by their topological genus. Within this framework, RNA secondary structures correspond to pk-structures of genus zero. {\it RNAFeatures$^*$}…
▽ More
In this paper we introduce a novel, context-free grammar, {\it RNAFeatures$^*$}, capable of generating any RNA structure including pseudoknot structures (pk-structure). We represent pk-structures as orientable fatgraphs, which naturally leads to a filtration by their topological genus. Within this framework, RNA secondary structures correspond to pk-structures of genus zero. {\it RNAFeatures$^*$} acts on formal, arc-labeled RNA secondary structures, called $λ$-structures. $λ$-structures correspond one-to-one to pk-structures together with some additional information. This information consists of the specific rearrangement of the backbone, by which a pk-structure can be made cross-free. {\it RNAFeatures$^*$} is an extension of the grammar for secondary structures and employs an enhancement by labelings of the symbols as well as the production rules. We discuss how to use {\it RNAFeatures$^*$} to obtain a stochastic context-free grammar for pk-structures, using data of RNA sequences and structures. The induced grammar facilitates fast Boltzmann sampling and statistical analysis. As a first application, we present an $O(n log(n))$ runtime algorithm which samples pk-structures based on ninety tRNA sequences and structures from the Nucleic Acid Database (NDB).
△ Less
Submitted 9 May, 2016;
originally announced May 2016.
-
RNA secondary structures having a compatible sequence of certain nucleotide ratios
Authors:
Christopher L. Barrett,
Thomas J. X. Li,
Christian M. Reidys
Abstract:
Given a random RNA secondary structure, $S$, we study RNA sequences having fixed ratios of nuclotides that are compatible with $S$. We perform this analysis for RNA secondary structures subject to various base pairing rules and minimum arc- and stack-length restrictions. Our main result reads as follows: in the simplex of the nucleotide ratios there exists a convex region in which, in the limit of…
▽ More
Given a random RNA secondary structure, $S$, we study RNA sequences having fixed ratios of nuclotides that are compatible with $S$. We perform this analysis for RNA secondary structures subject to various base pairing rules and minimum arc- and stack-length restrictions. Our main result reads as follows: in the simplex of the nucleotide ratios there exists a convex region in which, in the limit of long sequences, a random structure a.a.s.~has compatible sequence with these ratios and outside of which a.a.s.~a random structure has no such compatible sequence. We localize this region for RNA secondary structures subject to various base pairing rules and minimum arc- and stack-length restrictions. In particular, for {\bf GC}-sequences having a ratio of {\bf G} nucleotides smaller than $1/3$, a random RNA secondary structure without any minimum arc- and stack-length restrictions has a.a.s.~no such compatible sequence. For sequences having a ratio of {\bf G} nucleotides larger than $1/3$, a random RNA secondary structure has a.a.s. such compatible sequences. We discuss our results in the context of various families of RNA structures.
△ Less
Submitted 11 March, 2016;
originally announced March 2016.
-
On a lower bound for sorting signed permutations by reversals
Authors:
Andrei C. Bura,
Ricky X. F. Chen,
Christian M. Reidys
Abstract:
Computing the reversal distances of signed permutations is an important topic in Bioinformatics. Recently, a new lower bound for the reversal distance was obtained via the plane permutation framework. This lower bound appears different from the existing lower bound obtained by Bafna and Pevzner through breakpoint graphs. In this paper, we prove that the two lower bounds are equal. Moreover, we con…
▽ More
Computing the reversal distances of signed permutations is an important topic in Bioinformatics. Recently, a new lower bound for the reversal distance was obtained via the plane permutation framework. This lower bound appears different from the existing lower bound obtained by Bafna and Pevzner through breakpoint graphs. In this paper, we prove that the two lower bounds are equal. Moreover, we confirm a related conjecture on skew-symmetric plane permutations, which can be restated as follows: let $p=(0,-1,-2,\ldots -n,n,n-1,\ldots 1)$ and let
$$
\tilde{s}=(0,a_1,a_2,\ldots a_n,-a_n,-a_{n-1},\ldots -a_1)
$$ be any long cycle on the set $\{-n,-n+1,\ldots 0,1,\ldots n\}$. Then, $n$ and $a_n$ are always in the same cycle of the product $p\tilde{s}$. Furthermore, we show the new lower bound via plane permutations can be interpreted as the topological genera of orientable surfaces associated to signed permutations.
△ Less
Submitted 22 June, 2017; v1 submitted 1 February, 2016;
originally announced February 2016.
-
On the local genus distribution of graph embeddings
Authors:
Ricky X. F. Chen,
Christian M. Reidys
Abstract:
The $2$-cell embeddings of graphs on closed surfaces have been widely studied. It is well known that ($2$-cell) embedding a given graph $G$ on a closed orientable surface is equivalent to cyclically ordering the edges incident to each vertex of $G$. In this paper, we study the following problem: given a genus $g$ embedding $ε$ of the graph $G$ and a vertex of $G$, how many different ways of reembe…
▽ More
The $2$-cell embeddings of graphs on closed surfaces have been widely studied. It is well known that ($2$-cell) embedding a given graph $G$ on a closed orientable surface is equivalent to cyclically ordering the edges incident to each vertex of $G$. In this paper, we study the following problem: given a genus $g$ embedding $ε$ of the graph $G$ and a vertex of $G$, how many different ways of reembedding the vertex such that the resulting embedding $ε'$ is of genus $g+Δg$? We give formulas to compute this quantity and the local minimal genus achieved by reembedding. In the process we obtain miscellaneous results. In particular, if there exists a one-face embedding of $G$, then the probability of a random embedding of $G$ to be one-face is at least $\prod_{ν\in V(G)}\frac{2}{deg(ν)+2}$, where $deg(ν)$ denotes the vertex degree of $ν$. Furthermore we obtain an easy-to-check necessary condition for a given embedding of $G$ to be an embedding of minimum genus.
△ Less
Submitted 11 January, 2016;
originally announced January 2016.
-
Sequence-structure relations of biopolymers
Authors:
Christopher Barrett,
Fenix W. Huang,
Christian M. Reidys
Abstract:
Motivation: DNA data is transcribed into single-stranded RNA, which folds into specific molecular structures. In this paper we pose the question to what extent sequence- and structure-information correlate. We view this correlation as structural semantics of sequence data that allows for a different interpretation than conventional sequence alignment. Structural semantics could enable us to identi…
▽ More
Motivation: DNA data is transcribed into single-stranded RNA, which folds into specific molecular structures. In this paper we pose the question to what extent sequence- and structure-information correlate. We view this correlation as structural semantics of sequence data that allows for a different interpretation than conventional sequence alignment. Structural semantics could enable us to identify more general embedded "patterns" in DNA and RNA sequences. Results: We compute the partition function of sequences with respect to a fixed structure and connect this computation to the mutual information of a sequence-structure pair for RNA secondary structures. We present a Boltzmann sampler and obtain the a priori probability of specific sequence patterns. We present a detailed analysis for the three PDB-structures, 2JXV (hairpin), 2N3R (3-branch multi-loop) and 1EHZ (tRNA). We localize specific sequence patterns, contrast the energy spectrum of the Boltzmann sampled sequences versus those sequences that refold into the same structure and derive a criterion to identify native structures. We illustrate that there are multiple sequences in the partition function of a fixed structure, each having nearly the same mutual information, that are nevertheless poorly aligned. This indicates the possibility of the existence of relevant patterns embedded in the sequences that are not discoverable using alignments.
△ Less
Submitted 22 August, 2016; v1 submitted 10 November, 2015;
originally announced November 2015.
-
New formulas counting one-face maps and Chapuy's recursion
Authors:
Ricky X. F. Chen,
Christian M. Reidys
Abstract:
In this paper, we begin with the Lehman-Walsh formula counting one-face maps and construct two involutions on pairs of permutations to obtain a new formula for the number $A(n,g)$ of one-face maps of genus $g$. Our new formula is in the form of a convolution of the Stirling numbers of the first kind which immediately implies a formula for the generating function…
▽ More
In this paper, we begin with the Lehman-Walsh formula counting one-face maps and construct two involutions on pairs of permutations to obtain a new formula for the number $A(n,g)$ of one-face maps of genus $g$. Our new formula is in the form of a convolution of the Stirling numbers of the first kind which immediately implies a formula for the generating function $A_n(x)=\sum_{g\geq 0}A(n,g)x^{n+1-2g}$ other than the well-known Harer-Zagier formula. By reformulating our expression for $A_n(x)$ in terms of the backward shift operator $E: f(x)\rightarrow f(x-1)$ and proving a property satisfied by polynomials of the form $p(E)f(x)$, we easily establish the recursion obtained by Chapuy for $A(n,g)$. Moreover, we give a simple combinatorial interpretation for the Harer-Zagier recurrence.
△ Less
Submitted 21 April, 2017; v1 submitted 16 October, 2015;
originally announced October 2015.
-
Linear sequential dynamical systems, incidence algebras, and Möbius functions
Authors:
Ricky X. F. Chen,
Christian M. Reidys
Abstract:
A sequential dynamical system (SDS) consists of a graph, a set of local functions and an update schedule. A linear sequential dynamical system is an SDS whose local functions are linear. In this paper, we derive an explicit closed formula for any linear SDS as a synchronous dynamical system. We also show constructively, that any synchronous linear system can be expressed as a linear SDS, i.e. it c…
▽ More
A sequential dynamical system (SDS) consists of a graph, a set of local functions and an update schedule. A linear sequential dynamical system is an SDS whose local functions are linear. In this paper, we derive an explicit closed formula for any linear SDS as a synchronous dynamical system. We also show constructively, that any synchronous linear system can be expressed as a linear SDS, i.e. it can be written as a product of linear local functions. Furthermore, we study the connection between linear SDS and the incidence algebras of partially ordered sets (posets). Specifically, we show that the Möbius function of any poset can be computed via an SDS, whose graph is induced by the Hasse diagram of the poset. Finally, we prove a cut theorem for the Möbius functions of posets with respect to certain chain decompositions.
△ Less
Submitted 3 May, 2018; v1 submitted 16 October, 2015;
originally announced October 2015.
-
Variation of the local topological structure of graph embeddings
Authors:
Ricky X. F. Chen,
Christian M. Reidys
Abstract:
The $2$-cell embeddings of graphs on closed surfaces have been widely studied. It is well known that ($2$-cell) embedding a given graph $G$ on a closed orientable surface is equivalent to cyclically ordering the edges incident to each vertex of $G$. In this paper, we study the following problem: given a genus $g$ embedding $\mathbb{E}$ of the graph $G$, if we randomly rearrange the edges around a…
▽ More
The $2$-cell embeddings of graphs on closed surfaces have been widely studied. It is well known that ($2$-cell) embedding a given graph $G$ on a closed orientable surface is equivalent to cyclically ordering the edges incident to each vertex of $G$. In this paper, we study the following problem: given a genus $g$ embedding $\mathbb{E}$ of the graph $G$, if we randomly rearrange the edges around a vertex, i.e., re-embedding, what is the probability of the resulting embedding $\mathbb{E}'$ having genus $g+Δg$? We give a formula to compute this probability. Meanwhile, some other known and unknown results are also obtained. For example, we show that the probability of preserving the genus is at least $\frac{2}{deg(v)+2}$ for re-embedding any vertex $v$ of degree $deg(v)$ in a one-face embedding; and we obtain a necessary condition for a given embedding of $G$ to be an embedding with the minimum genus.
△ Less
Submitted 4 March, 2015;
originally announced March 2015.
-
A simple framework on sorting permutations
Authors:
Ricky X. F. Chen,
Christian M. Reidys
Abstract:
In this paper we present a simple framework to study various distance problems of permutations, including the transposition and block-interchange distance of permutations as well as the reversal distance of signed permutations. These problems are very important in the study of the evolution of genomes. We give a general formulation for lower bounds of the transposition and block-interchange distan…
▽ More
In this paper we present a simple framework to study various distance problems of permutations, including the transposition and block-interchange distance of permutations as well as the reversal distance of signed permutations. These problems are very important in the study of the evolution of genomes. We give a general formulation for lower bounds of the transposition and block-interchange distance from which the existing lower bounds obtained by Bafna and Pevzner, and Christie can be easily derived. As to the reversal distance of signed permutations, we translate it into a block-interchange distance problem of permutations so that we obtain a new lower bound. Furthermore, studying distance problems via our framework motivates several interesting combinatorial problems related to product of permutations, some of which are studied in this paper as well.
△ Less
Submitted 16 March, 2015; v1 submitted 27 February, 2015;
originally announced February 2015.
-
Plane permutations and applications to a result of Zagier-Stanley and distances of permutations
Authors:
Ricky X. F. Chen,
Christian M. Reidys
Abstract:
In this paper, we introduce plane permutations, i.e. pairs $\mathfrak{p}=(s,π)$ where $s$ is an $n$-cycle and $π$ is an arbitrary permutation, represented as a two-row array. Accordingly a plane permutation gives rise to three distinct permutations: the permutation induced by the upper horizontal ($s$), the vertical $π$) and the diagonal ($D_{\mathfrak{p}}$) of the array. The latter can also be vi…
▽ More
In this paper, we introduce plane permutations, i.e. pairs $\mathfrak{p}=(s,π)$ where $s$ is an $n$-cycle and $π$ is an arbitrary permutation, represented as a two-row array. Accordingly a plane permutation gives rise to three distinct permutations: the permutation induced by the upper horizontal ($s$), the vertical $π$) and the diagonal ($D_{\mathfrak{p}}$) of the array. The latter can also be viewed as the three permutations of a hypermap. In particular, a map corresponds to a plane permutation, in which the diagonal is a fixed point-free involution. We study the transposition action on plane permutations obtained by permuting their diagonal-blocks. We establish basic properties of plane permutations and study transpositions and exceedances and derive various enumerative results. In particular, we prove a recurrence for the number of plane permutations having a fixed diagonal and $k$ cycles in the vertical, generalizing Chapuy's recursion for maps filtered by the genus. As applications of this framework, we present a combinatorial proof of a result of Zagier and Stanley, on the number of $n$-cycles $ω$, for which the product $ω(1~2~\cdots ~n)$ has exactly $k$ cycles. Furthermore, we integrate studies on the transposition and block-interchange distance of permutations as well as the reversal distance of signed permutations. Plane permutations allow us to generalize and recover various lower bounds for transposition and block-interchange distances and to connect reversals with block-interchanges.
△ Less
Submitted 23 June, 2016; v1 submitted 26 February, 2015;
originally announced February 2015.
-
On plane permutations
Authors:
Ricky X. F. Chen,
Christian M. Reidys
Abstract:
In this paper we generalize permutations to plane permutations. We employ this framework to derive a combinatorial proof of a result of Zagier and Stanley, that enumerates the number of $n$-cycles $ω$, for which $ω(12\cdots n)$ has exactly $k$ cycles. This quantity is $0$, if $n-k$ is odd and $\frac{2C(n+1,k)}{n(n+1)}$, otherwise, where $C(n,k)$ is the unsigned Stirling number of the first kind. T…
▽ More
In this paper we generalize permutations to plane permutations. We employ this framework to derive a combinatorial proof of a result of Zagier and Stanley, that enumerates the number of $n$-cycles $ω$, for which $ω(12\cdots n)$ has exactly $k$ cycles. This quantity is $0$, if $n-k$ is odd and $\frac{2C(n+1,k)}{n(n+1)}$, otherwise, where $C(n,k)$ is the unsigned Stirling number of the first kind. The proof is facilitated by a natural transposition action on plane permutations which gives rise to various recurrences. Furthermore we study several distance problems of permutations. It turns out that plane permutations allow to study transposition and block-interchange distance of permutations as well as the reversal distance of signed permutations. Novel connections between these different distance problems are established via plane permutations.
△ Less
Submitted 16 March, 2015; v1 submitted 20 November, 2014;
originally announced November 2014.
-
Narayana polynomials and some generalizations
Authors:
Ricky X. F. Chen,
Christian M. Reidys
Abstract:
In this note, by counting some colored plane trees we obtain several binomial identities. These identities can be viewed as specific evaluations of certain generalizations of the Narayana polynomials. As consequences, it provides combinatorial proofs for a bijective problem in Stanley's collection "Bijective Proof Problems", a new formula for the Narayana polynomials as well as a new expression fo…
▽ More
In this note, by counting some colored plane trees we obtain several binomial identities. These identities can be viewed as specific evaluations of certain generalizations of the Narayana polynomials. As consequences, it provides combinatorial proofs for a bijective problem in Stanley's collection "Bijective Proof Problems", a new formula for the Narayana polynomials as well as a new expression for the Harer-Zagier formula enumerating unicellular maps, in a unified way. Furthermore, we identify a class of plane trees, whose enumeration is closely connected to the Schröder numbers. Many other binomial identities are presented as well.
△ Less
Submitted 12 December, 2015; v1 submitted 10 November, 2014;
originally announced November 2014.
-
A topological framework for signed permutations
Authors:
Fenix W. D. Huang,
Christian M. Reidys
Abstract:
In this paper we present a topological framework for studying signed permutations and their reversal distance. As a result we can give an alternative approach and interpretation of the Hannenhalli-Pevzner formula for the reversal distance of signed permutations. Our approach utlizes the Poincaré dual, upon which reversals act in a particular way and obsoletes the notion of "padding" of the signed…
▽ More
In this paper we present a topological framework for studying signed permutations and their reversal distance. As a result we can give an alternative approach and interpretation of the Hannenhalli-Pevzner formula for the reversal distance of signed permutations. Our approach utlizes the Poincaré dual, upon which reversals act in a particular way and obsoletes the notion of "padding" of the signed permutations. To this end we construct a bijection between signed permutations and an equivalence class of particular fatgraphs, called $π$-maps, and analyze the action of reversals on the latter. We show that reversals act via either slicing, gluing or half-flip** of external vertices, which implies that any reversal changes the topological genus by at most one. Finally we revisit the Hannenhalli-Pevzner formula employing orientable and non-orientable, irreducible, $π$-maps.
△ Less
Submitted 17 October, 2014;
originally announced October 2014.
-
A combinatorial interpretation of the $κ^{\star}_{g}(n)$ coefficients
Authors:
Thomas J. X. Li,
Christian M. Reidys
Abstract:
Studying the virtual Euler characteristic of the moduli space of curves, Harer and Zagier compute the generating function $C_g(z)$ of unicellular maps of genus $g$. They furthermore identify coefficients, $κ^{\star}_{g}(n)$, which fully determine the series $C_g(z)$. The main result of this paper is a combinatorial interpretation of $κ^{\star}_{g}(n)$. We show that these enumerate a class of unice…
▽ More
Studying the virtual Euler characteristic of the moduli space of curves, Harer and Zagier compute the generating function $C_g(z)$ of unicellular maps of genus $g$. They furthermore identify coefficients, $κ^{\star}_{g}(n)$, which fully determine the series $C_g(z)$. The main result of this paper is a combinatorial interpretation of $κ^{\star}_{g}(n)$. We show that these enumerate a class of unicellular maps, which correspond $1$-to-$2^{2g}$ to a specific type of trees, referred to as O-trees. O-trees are a variant of the C-decorated trees introduced by Chapuy, Féray and Fusy. We exhaustively enumerate the number $s_{g}(n)$ of shapes of genus $g$ with $n$ edges, which is a specific class of unicellular maps with vertex degree at least three. Furthermore we give combinatorial proofs for expressing the generating functions $C_g(z)$ and $S_g(z)$ for unicellular maps and shapes in terms of $κ^{\star}_{g}(n)$, respectively. We then prove a two term recursion for $κ^{\star}_{g}(n)$ and that for any fixed $g$, the sequence $\{κ_{g,t}\}_{t=0}^g$ is log-concave, where $κ^{\star}_{g}(n)= κ_{g,t}$, for $n=2g+t-1$.
△ Less
Submitted 23 June, 2014; v1 submitted 12 June, 2014;
originally announced June 2014.
-
Shapes of interacting RNA complexes
Authors:
Benjamin MingMing Fu,
Christian M. Reidys
Abstract:
Shapes of interacting RNA complexes are studied using a filtration via their topological genus. A shape of an RNA complex is obtained by (iteratively) collapsing stacks and eliminating hairpin loops. This shape-projection preserves the topological core of the RNA complex and for fixed topological genus there are only finitely many such shapes.Our main result is a new bijection that relates the sha…
▽ More
Shapes of interacting RNA complexes are studied using a filtration via their topological genus. A shape of an RNA complex is obtained by (iteratively) collapsing stacks and eliminating hairpin loops. This shape-projection preserves the topological core of the RNA complex and for fixed topological genus there are only finitely many such shapes.Our main result is a new bijection that relates the shapes of RNA complexes with shapes of RNA structures.This allows to compute the shape polynomial of RNA complexes via the shape polynomial of RNA structures. We furthermore present a linear time uniform sampling algorithm for shapes of RNA complexes of fixed topological genus.
△ Less
Submitted 20 May, 2014; v1 submitted 14 May, 2014;
originally announced May 2014.
-
Shapes of topological RNA structures
Authors:
Fenix W. D. Huang,
Christian M. Reidys
Abstract:
A topological RNA structure is derived from a diagram and its shape is obtained by collapsing the stacks of the structure into single arcs and by removing any arcs of length one. Shapes contain key topological, information and for fixed topological genus there exist only finitely many such shapes. We shall express topological RNA structures as unicellular maps, i.e. graphs together with a cyclic o…
▽ More
A topological RNA structure is derived from a diagram and its shape is obtained by collapsing the stacks of the structure into single arcs and by removing any arcs of length one. Shapes contain key topological, information and for fixed topological genus there exist only finitely many such shapes. We shall express topological RNA structures as unicellular maps, i.e. graphs together with a cyclic ordering of their half-edges. In this paper we prove a bijection of shapes of topological RNA structures. We furthermore derive a linear time algorithm generating shapes of fixed topological genus. We derive explicit expressions for the coefficients of the generating polynomial of these shapes and the generating function of RNA structures of genus $g$. Furthermore we outline how shapes can be used in order to extract essential information of RNA structure databases.
△ Less
Submitted 11 March, 2014;
originally announced March 2014.
-
Uniform generation of RNA-RNA interaction structures of fixed topological genus
Authors:
Benjamin Mingming Fu,
Hillary Siwei Han,
Christian M. Reidys
Abstract:
Interacting RNA complexes are studied via bicellular maps using a filtration via their topological genus. Our main result is a new bijection for RNA-RNA interaction structures and linear time uniform sampling algorithm for RNA complexes of fixed topological genus. The bijection allows to either reduce the topological genus of a bicellular map directly, or to lose connectivity by decomposing the co…
▽ More
Interacting RNA complexes are studied via bicellular maps using a filtration via their topological genus. Our main result is a new bijection for RNA-RNA interaction structures and linear time uniform sampling algorithm for RNA complexes of fixed topological genus. The bijection allows to either reduce the topological genus of a bicellular map directly, or to lose connectivity by decomposing the complex into a pair of single stranded RNA structures. Our main result is proved bijectively. It provides an explicit algorithm of how to rewire the corresponding complexes and an unambiguous decomposition grammar. Using the concept of genus induction, we construct bicellular maps of fixed topological genus $g$ uniformly in linear time. We present various statistics on these topological RNA complexes and compare our findings with biological complexes. Furthermore we show how to construct loop-energy based complexes using our decomposition grammar.
△ Less
Submitted 12 April, 2014; v1 submitted 4 November, 2013;
originally announced November 2013.
-
On the genus filtration of diagrams over two backbones
Authors:
Benjamin Mingming Fu,
Christian M. Reidys
Abstract:
In this paper we compute the bivariate generating function of $γ$-matchings over two backbones, filtered by the number of arcs and the topological genus. $γ$-matchings over two backbones are chord-diagrams, obtained via concatenation and nesting of irreducible shapes of topological genus $\le γ$. We show that the key information is contained in the polynomials counting these shapes and provide rec…
▽ More
In this paper we compute the bivariate generating function of $γ$-matchings over two backbones, filtered by the number of arcs and the topological genus. $γ$-matchings over two backbones are chord-diagrams, obtained via concatenation and nesting of irreducible shapes of topological genus $\le γ$. We show that the key information is contained in the polynomials counting these shapes and provide recursions that allow to compute the latter. In particular we give a bijection between such irreducible shapes over one and two backbones. We present two applications of our results. The first is concerned with RNA-RNA interaction structures, obtained from the $γ$-matchings via symbolic methods. We secondly show that, using analytic-combinatorial methods, the topological genus satisfies a central limit theorem.
△ Less
Submitted 4 November, 2013;
originally announced November 2013.
-
A bijection for tri-cellular maps
Authors:
Hillary S. W. Han,
Christian M. Reidys
Abstract:
In this paper we give a bijective proof for a relation between uni- bi- and tricellular maps of certain topological genus. While this relation can formally be obtained using Matrix-theory as a result of the Schwinger-Dyson equation, we here present a bijection for the corresponding coefficient equation. Our construction is facilitated by repeated application of a certain cutting, the contraction o…
▽ More
In this paper we give a bijective proof for a relation between uni- bi- and tricellular maps of certain topological genus. While this relation can formally be obtained using Matrix-theory as a result of the Schwinger-Dyson equation, we here present a bijection for the corresponding coefficient equation. Our construction is facilitated by repeated application of a certain cutting, the contraction of edges, incident to two vertices and the deletion of certain edges.
△ Less
Submitted 15 May, 2013;
originally announced May 2013.
-
Uniform generation of RNA pseudoknot structures with genus filtration
Authors:
Fenix W. D. Huang,
Markus E. Nebel,
Christian M. Reidys
Abstract:
In this paper we present a sampling framework for RNA structures of fixed topological genus. We introduce a novel, linear time, uniform sampling algorithm for RNA structures of fixed topological genus $g$, for arbitrary $g>0$. Furthermore we develop a linear time sampling algorithm for RNA structures of fixed topological genus $g$ that are weighted by a simplified, loop-based energy functional. Fo…
▽ More
In this paper we present a sampling framework for RNA structures of fixed topological genus. We introduce a novel, linear time, uniform sampling algorithm for RNA structures of fixed topological genus $g$, for arbitrary $g>0$. Furthermore we develop a linear time sampling algorithm for RNA structures of fixed topological genus $g$ that are weighted by a simplified, loop-based energy functional. For this process the partition function of the energy functional has to be computed once, which has $O(n^2)$ time complexity.
△ Less
Submitted 27 April, 2013;
originally announced April 2013.
-
Enumeration of RNA complexes via random matrix theory
Authors:
Jørgen E. Andersen,
Leonid O. Chekhov,
R. C. Penner,
Christian M. Reidys,
Piotr Sułkowski
Abstract:
We review a derivation of the numbers of RNA complexes of an arbitrary topology. These numbers are encoded in the free energy of the hermitian matrix model with potential V(x)=x^2/2-stx/(1-tx), where s and t are respective generating parameters for the number of RNA molecules and hydrogen bonds in a given complex. The free energies of this matrix model are computed using the so-called topological…
▽ More
We review a derivation of the numbers of RNA complexes of an arbitrary topology. These numbers are encoded in the free energy of the hermitian matrix model with potential V(x)=x^2/2-stx/(1-tx), where s and t are respective generating parameters for the number of RNA molecules and hydrogen bonds in a given complex. The free energies of this matrix model are computed using the so-called topological recursion, which is a powerful new formalism arising from random matrix theory. These numbers of RNA complexes also have profound meaning in mathematics: they provide the number of chord diagrams of fixed genus with specified numbers of backbones and chords as well as the number of cells in Riemann's moduli spaces for bordered surfaces of fixed topological type.
△ Less
Submitted 6 March, 2013;
originally announced March 2013.
-
A bijection between unicellular and bicellular maps
Authors:
Hillary S. W. Han,
Christian M. Reidys
Abstract:
In this paper we present a combinatorial proof of a relation between the generating functions of unicellular and bicellular maps. This relation is a consequence of the Schwinger-Dyson equation of matrix theory. Alternatively it can be proved using representation theory of the symmetric group. Here we give a bijective proof by rewiring unicellular maps of topological genus $(g+1)$ into bicellular m…
▽ More
In this paper we present a combinatorial proof of a relation between the generating functions of unicellular and bicellular maps. This relation is a consequence of the Schwinger-Dyson equation of matrix theory. Alternatively it can be proved using representation theory of the symmetric group. Here we give a bijective proof by rewiring unicellular maps of topological genus $(g+1)$ into bicellular maps of genus $g$ and pairs of unicellular maps of lower topological genera. Our result has immediate consequences for the folding of RNA interaction structures, since the time complexity of folding the transformed structure is $O((n+m)^5)$, where $n,m$ are the lengths of the respective backbones, while the folding of the original structure has $O(n^6)$ time complexity, where $n$ is the length of the longer sequence.
△ Less
Submitted 30 January, 2013;
originally announced January 2013.
-
A phase transition in energy-filtered RNA secondary structures
Authors:
Hillary S. W. Han,
Christian M. Reidys
Abstract:
In this paper we study the effect of energy parameters on minimum free energy (mfe) RNA secondary structures. Employing a simplified combinatorial energy model, that is only dependent on the diagram representation and that is not sequence specific, we prove the following dichotomy result. Mfe structures derived via the Turner energy parameters contain only finitely many complex irreducible substru…
▽ More
In this paper we study the effect of energy parameters on minimum free energy (mfe) RNA secondary structures. Employing a simplified combinatorial energy model, that is only dependent on the diagram representation and that is not sequence specific, we prove the following dichotomy result. Mfe structures derived via the Turner energy parameters contain only finitely many complex irreducible substructures and just minor parameter changes produce a class of mfe-structures that contain a large number of small irreducibles. We localize the exact point where the distribution of irreducibles experiences this phase transition from a discrete limit to a central limit distribution and subsequently put our result into the context of quantifying the effect of sparsification of the folding of these respective mfe-structures. We show that the sparsification of realistic mfe-structures leads to a constant time and space reduction and that the sparsifcation of the folding of structures with modified parameters leads to a linear time and space reduction. We furthermore identify the limit distribution at the phase transition as a Rayleigh distribution.
△ Less
Submitted 16 May, 2012;
originally announced May 2012.
-
Topological recursion for chord diagrams, RNA complexes, and cells in moduli spaces
Authors:
Jørgen E. Andersen,
Leonid O. Chekhov,
R. C. Penner,
Christian M. Reidys,
Piotr Sułkowski
Abstract:
We introduce and study the Hermitian matrix model with potential V(x)=x^2/2-stx/(1-tx), which enumerates the number of linear chord diagrams of fixed genus with specified numbers of backbones generated by s and chords generated by t. For the one-cut solution, the partition function, correlators and free energies are convergent for small t and all s as a perturbation of the Gaussian potential, whic…
▽ More
We introduce and study the Hermitian matrix model with potential V(x)=x^2/2-stx/(1-tx), which enumerates the number of linear chord diagrams of fixed genus with specified numbers of backbones generated by s and chords generated by t. For the one-cut solution, the partition function, correlators and free energies are convergent for small t and all s as a perturbation of the Gaussian potential, which arises for st=0. This perturbation is computed using the formalism of the topological recursion. The corresponding enumeration of chord diagrams gives at once the number of RNA complexes of a given topology as well as the number of cells in Riemann's moduli spaces for bordered surfaces. The free energies are computed here in principle for all genera and explicitly for genera less than four.
△ Less
Submitted 3 May, 2012;
originally announced May 2012.
-
The topological filtration of $γ$-structures
Authors:
Thomas J. X. Li,
Christian M. Reidys
Abstract:
In this paper we study $γ$-structures filtered by topological genus. $γ$-structures are a class of RNA pseudoknot structures that plays a key role in the context of polynomial time folding of RNA pseudoknot structures. A $γ$-structure is composed by specific building blocks, that have topological genus less than or equal to $γ$, where composition means concatenation and nesting of such blocks. Our…
▽ More
In this paper we study $γ$-structures filtered by topological genus. $γ$-structures are a class of RNA pseudoknot structures that plays a key role in the context of polynomial time folding of RNA pseudoknot structures. A $γ$-structure is composed by specific building blocks, that have topological genus less than or equal to $γ$, where composition means concatenation and nesting of such blocks. Our main results are the derivation of a new bivariate generating function for $γ$-structures via symbolic methods, the singularity analysis of the solutions and a central limit theorem for the distribution of topological genus in $γ$-structures of given length. In our derivation specific bivariate polynomials play a central role. Their coefficients count particular motifs of fixed topological genus and they are of relevance in the context of genus recursion and novel folding algorithms.
△ Less
Submitted 6 February, 2012;
originally announced February 2012.
-
On the combinatorics of sparsification
Authors:
Fenix W. D. Huang,
Christian M. Reidys
Abstract:
Background: We study the sparsification of dynamic programming folding algorithms of RNA structures. Sparsification applies to the mfe-folding of RNA structures and can lead to a significant reduction of time complexity. Results: We analyze the sparsification of a particular decomposition rule, $Λ^*$, that splits an interval for RNA secondary and pseudoknot structures of fixed topological genus. E…
▽ More
Background: We study the sparsification of dynamic programming folding algorithms of RNA structures. Sparsification applies to the mfe-folding of RNA structures and can lead to a significant reduction of time complexity. Results: We analyze the sparsification of a particular decomposition rule, $Λ^*$, that splits an interval for RNA secondary and pseudoknot structures of fixed topological genus. Essential for quantifying the sparsification is the size of its so called candidate set. We present a combinatorial framework which allows by means of probabilities of irreducible substructures to obtain the expected size of the set of $Λ^*$-candidates. We compute these expectations for arc-based energy models via energy-filtered generating functions (GF) for RNA secondary structures as well as RNA pseudoknot structures. For RNA secondary structures we also consider a simplified loop-energy model. This combinatorial analysis is then compared to the expected number of $Λ^*$-candidates obtained from folding mfe-structures. In case of the mfe-folding of RNA secondary structures with a simplified loop energy model our results imply that sparsification provides a reduction of time complexity by a constant factor of 91% (theory) versus a 96% reduction (experiment). For the "full" loop-energy model there is a reduction of 98% (experiment).
△ Less
Submitted 6 February, 2012; v1 submitted 31 December, 2011;
originally announced January 2012.
-
Topology of RNA-RNA interaction structures
Authors:
Jørgen E. Andersen,
Fenix W. D. Huang,
Robert C. Penner,
Christian M. Reidys
Abstract:
The topological filtration of interacting RNA complexes is studied and the role is analyzed of certain diagrams called irreducible shadows, which form suitable building blocks for more general structures. We prove that for two interacting RNAs, called interaction structures, there exist for fixed genus only finitely many irreducible shadows. This implies that for fixed genus there are only finitel…
▽ More
The topological filtration of interacting RNA complexes is studied and the role is analyzed of certain diagrams called irreducible shadows, which form suitable building blocks for more general structures. We prove that for two interacting RNAs, called interaction structures, there exist for fixed genus only finitely many irreducible shadows. This implies that for fixed genus there are only finitely many classes of interaction structures. In particular the simplest case of genus zero already provides the formalism for certain types of structures that occur in nature and are not covered by other filtrations. This case of genus zero interaction structures is already of practical interest, is studied here in detail and found to be expressed by a multiple context-free grammar extending the usual one for RNA secondary structures. We show that in $O(n^6)$ time and $O(n^4)$ space complexity, this grammar for genus zero interaction structures provides not only minimum free energy solutions but also the complete partition function and base pairing probabilities.
△ Less
Submitted 28 December, 2011;
originally announced December 2011.
-
Combinatorics of $γ$-structures
Authors:
Hillary S. W. Han,
Thomas J. X. Li,
Christian M. Reidys
Abstract:
In this paper we study canonical $γ$-structures, a class of RNA pseudoknot structures that plays a key role in the context of polynomial time folding of RNA pseudoknot structures. A $γ$-structure is composed by specific building blocks, that have topological genus less than or equal to $γ$, where composition means concatenation and nesting of such blocks. Our main result is the derivation of the g…
▽ More
In this paper we study canonical $γ$-structures, a class of RNA pseudoknot structures that plays a key role in the context of polynomial time folding of RNA pseudoknot structures. A $γ$-structure is composed by specific building blocks, that have topological genus less than or equal to $γ$, where composition means concatenation and nesting of such blocks. Our main result is the derivation of the generating function of $γ$-structures via symbolic enumeration using so called irreducible shadows. We furthermore recursively compute the generating polynomials of irreducible shadows of genus $\le γ$. $γ$-structures are constructed via $γ$-matchings. For $1\le γ\le 10$, we compute Puiseux-expansions at the unique, dominant singularities, allowing us to derive simple asymptotic formulas for the number of $γ$-structures.
△ Less
Submitted 4 September, 2013; v1 submitted 18 December, 2011;
originally announced December 2011.
-
The 5'-3' distance of RNA secondary structures
Authors:
Hillary S. W. Han,
Christian M. Reidys
Abstract:
Recently Yoffe {\it et al.} observed that the average distances between $5'-3'$ ends of RNA molecules are very small and largely independent of sequence length. This observation is based on numerical computations as well as theoretical arguments maximizing certain entropy functionals. In this paper we compute the exact distribution of $5'-3'$ distances of RNA secondary structures for any finite…
▽ More
Recently Yoffe {\it et al.} observed that the average distances between $5'-3'$ ends of RNA molecules are very small and largely independent of sequence length. This observation is based on numerical computations as well as theoretical arguments maximizing certain entropy functionals. In this paper we compute the exact distribution of $5'-3'$ distances of RNA secondary structures for any finite $n$. We furthermore compute the limit distribution and show that already for $n=30$ the exact distribution and the limit distribution are very close. Our results show that the distances of random RNA secondary structures are distinctively lower than those of minimum free energy structures of random RNA sequences.
△ Less
Submitted 30 December, 2011; v1 submitted 17 April, 2011;
originally announced April 2011.
-
Linear chord diagrams on two intervals
Authors:
Jørgen E. Andersen,
Robert C. Penner,
Christian M. Reidys,
Rita R. Wang
Abstract:
Consider all possible ways of attaching disjoint chords to two ordered and oriented disjoint intervals so as to produce a connected graph. Taking the intervals to lie in the real axis with the induced orientation and the chords to lie in the upper half plane canonically determines a corresponding fatgraph which has some associated genus $g\geq 0$, and we consider the natural generating function…
▽ More
Consider all possible ways of attaching disjoint chords to two ordered and oriented disjoint intervals so as to produce a connected graph. Taking the intervals to lie in the real axis with the induced orientation and the chords to lie in the upper half plane canonically determines a corresponding fatgraph which has some associated genus $g\geq 0$, and we consider the natural generating function ${\bf C}_g^{[2]}(z)=\sum_{n\geq 0} {\bf c}^{[2]}_g(n)z^n$ for the number ${\bf c}^{[2]}_g(n)$ of distinct such chord diagrams of fixed genus $g\geq 0$ with a given number $n\geq 0$ of chords. We prove here the surprising fact that ${\bf C}^{[2]}_g(z)=z^{2g+1} R_g^{[2]}(z)/(1-4z)^{3g+2} $ is a rational function, for $g\geq 0$, where the polynomial $R^{[2]}_g(z)$ with degree at most $g$ has integer coefficients and satisfies $R_g^{[2]}({1\over 4})\neq 0$. Earlier work had already determined that the analogous generating function ${\bf C}_g(z)=z^{2g}R_g(z)/(1-4z)^{3g-{1\over 2}}$ for chords attached to a single interval is algebraic, for $g\geq 1$, where the polynomial $R_g(z)$ with degree at most $g-1$ has integer coefficients and satisfies $R_g(1/4)\neq 0$ in analogy to the generating function ${\bf C}_0(z)$ for the Catalan numbers. The new results here on ${\bf C}_g^{[2]}(z)$ rely on this earlier work, and indeed, we find that $R_g^{[2]}(z)=R_{g+1}(z) -z\sum_{g_1=1}^g R_{g_1}(z) R_{g+1-g_1}(z)$, for $g\geq 1$.
△ Less
Submitted 28 October, 2010;
originally announced October 2010.
-
Enumeration of linear chord diagrams
Authors:
J. E. Andersen,
R. C. Penner,
C. M. Reidys,
M. S. Waterman
Abstract:
A linear chord diagram canonically determines a fatgraph and hence has an associated genus $g$. We compute the natural generating function ${\bf C}_g(z)=\sum_{n\geq 0} {\bf c}_g(n)z^n$ for the number ${\bf c}_g(n)$ of linear chord diagrams of fixed genus $g\geq 1$ with a given number $n\geq 0$ of chords and find the remarkably simple formula ${\bf C}_g(z)=z^{2g}R_g(z) (1-4z)^{{1\over 2}-3g}$, wher…
▽ More
A linear chord diagram canonically determines a fatgraph and hence has an associated genus $g$. We compute the natural generating function ${\bf C}_g(z)=\sum_{n\geq 0} {\bf c}_g(n)z^n$ for the number ${\bf c}_g(n)$ of linear chord diagrams of fixed genus $g\geq 1$ with a given number $n\geq 0$ of chords and find the remarkably simple formula ${\bf C}_g(z)=z^{2g}R_g(z) (1-4z)^{{1\over 2}-3g}$, where $R_g(z)$ is a polynomial of degree at most $g-1$ with integral coefficients satisfying $R_g({1\over 4})\neq 0$ and $R_g(0) = {\bf c}_g(2g)\neq 0.$ In particular, ${\bf C}_g(z)$ is algebraic over $\mathbb C(z)$, which generalizes the corresponding classical fact for the generating function ${\bf C}_0(z)$ of the Catalan numbers. As a corollary, we also calculate a related generating function germaine to the enumeration of knotted RNA secondary structures, which is again found to be algebraic.
△ Less
Submitted 27 October, 2010;
originally announced October 2010.
-
Combinatorial analysis of interacting RNA molecules
Authors:
Thomas J. X. Li,
Christian M. Reidys
Abstract:
Recently several minimum free energy (MFE) folding algorithms for predicting the joint structure of two interacting RNA molecules have been proposed. Their folding targets are interaction structures, that can be represented as diagrams with two backbones drawn horizontally on top of each other such that (1) intramolecular and intermolecular bonds are noncrossing and (2) there is no "zig-zag" confi…
▽ More
Recently several minimum free energy (MFE) folding algorithms for predicting the joint structure of two interacting RNA molecules have been proposed. Their folding targets are interaction structures, that can be represented as diagrams with two backbones drawn horizontally on top of each other such that (1) intramolecular and intermolecular bonds are noncrossing and (2) there is no "zig-zag" configuration. This paper studies joint structures with arc-length at least four in which both, interior and exterior stack-lengths are at least two (no isolated arcs). The key idea in this paper is to consider a new type of shape, based on which joint structures can be derived via symbolic enumeration. Our results imply simple asymptotic formulas for the number of joint structures with surprisingly small exponential growth rates. They are of interest in the context of designing prediction algorithms for RNA-RNA interactions.
△ Less
Submitted 21 June, 2010;
originally announced June 2010.
-
Combinatorics of RNA-RNA interaction
Authors:
Thomas J. X. Li,
Christian M. Reidys
Abstract:
RNA-RNA binding is an important phenomenon observed for many classes of non-coding RNAs and plays a crucial role in a number of regulatory processes. Recently several MFE folding algorithms for predicting the joint structure of two interacting RNA molecules have been proposed. Here joint structure means that in a diagram representation the intramolecular bonds of each partner are pseudoknot-free,…
▽ More
RNA-RNA binding is an important phenomenon observed for many classes of non-coding RNAs and plays a crucial role in a number of regulatory processes. Recently several MFE folding algorithms for predicting the joint structure of two interacting RNA molecules have been proposed. Here joint structure means that in a diagram representation the intramolecular bonds of each partner are pseudoknot-free, that the intermolecular binding pairs are noncrossing, and that there is no so-called ``zig-zag'' configuration. This paper presents the combinatorics of RNA interaction structures including their generating function, singularity analysis as well as explicit recurrence relations. In particular, our results imply simple asymptotic formulas for the number of joint structures.
△ Less
Submitted 15 June, 2010;
originally announced June 2010.
-
On the uniform generation of modular diagrams
Authors:
Fenix W. D. Huang,
Christian M. Reidys
Abstract:
In this paper we present an algorithm that generates $k$-noncrossing, $σ$-modular diagrams with uniform probability. A diagram is a labeled graph of degree $\le 1$ over $n$ vertices drawn in a horizontal line with arcs $(i,j)$ in the upper half-plane. A $k$-crossing in a diagram is a set of $k$ distinct arcs $(i_1, j_1), (i_2, j_2),\ldots,(i_k, j_k)$ with the property…
▽ More
In this paper we present an algorithm that generates $k$-noncrossing, $σ$-modular diagrams with uniform probability. A diagram is a labeled graph of degree $\le 1$ over $n$ vertices drawn in a horizontal line with arcs $(i,j)$ in the upper half-plane. A $k$-crossing in a diagram is a set of $k$ distinct arcs $(i_1, j_1), (i_2, j_2),\ldots,(i_k, j_k)$ with the property $i_1 < i_2 < \ldots < i_k < j_1 < j_2 < \ldots< j_k$. A diagram without any $k$-crossings is called a $k$-noncrossing diagram and a stack of length $σ$ is a maximal sequence $((i,j),(i+1,j-1),\dots,(i+(σ-1),j-(σ-1)))$. A diagram is $σ$-modular if any arc is contained in a stack of length at least $σ$. Our algorithm generates after $O(n^k)$ preprocessing time,
$k$-noncrossing, $σ$-modular diagrams in $O(n)$ time and space complexity.
△ Less
Submitted 14 June, 2010;
originally announced June 2010.
-
RNA-RNA interaction prediction based on multiple sequence alignments
Authors:
Andrew X. Li,
Manja Marz,
**g Qin,
Christian M. Reidys
Abstract:
Many computerized methods for RNA-RNA interaction structure prediction have been developed. Recently, $O(N^6)$ time and $O(N^4)$ space dynamic programming algorithms have become available that compute the partition function of RNA-RNA interaction complexes. However, few of these methods incorporate the knowledge concerning related sequences, thus relevant evolutionary information is often neglecte…
▽ More
Many computerized methods for RNA-RNA interaction structure prediction have been developed. Recently, $O(N^6)$ time and $O(N^4)$ space dynamic programming algorithms have become available that compute the partition function of RNA-RNA interaction complexes. However, few of these methods incorporate the knowledge concerning related sequences, thus relevant evolutionary information is often neglected from the structure determination. Therefore, it is of considerable practical interest to introduce a method taking into consideration both thermodynamic stability and sequence covariation. We present the \emph{a priori} folding algorithm \texttt{ripalign}, whose input consists of two (given) multiple sequence alignments (MSA). \texttt{ripalign} outputs (1) the partition function, (2) base-pairing probabilities, (3) hybrid probabilities and (4) a set of Boltzmann-sampled suboptimal structures consisting of canonical joint structures that are compatible to the alignments. Compared to the single sequence-pair folding algorithm \texttt{rip}, \texttt{ripalign} requires negligible additional memory resource. Furthermore, we incorporate possible structure constraints as input parameters into our algorithm. The algorithm described here is implemented in C as part of the \texttt{rip} package. The supplemental material, source code and input/output files can freely be downloaded from \url{http://www.combinatorics.cn/cbpc/ripalign.html}. \section{Contact} Christian Reidys \texttt{[email protected]}
△ Less
Submitted 14 July, 2010; v1 submitted 21 March, 2010;
originally announced March 2010.