-
Syntactic Language Change in English and German: Metrics, Parsers, and Convergences
Authors:
Yanran Chen,
Wei Zhao,
Anne Breitbarth,
Manuel Stoeckel,
Alexander Mehler,
Steffen Eger
Abstract:
Many studies have shown that human languages tend to optimize for lower complexity and increased communication efficiency. Syntactic dependency distance, which measures the linear distance between dependent words, is often considered a key indicator of language processing difficulty and working memory load. The current paper looks at diachronic trends in syntactic language change in both English a…
▽ More
Many studies have shown that human languages tend to optimize for lower complexity and increased communication efficiency. Syntactic dependency distance, which measures the linear distance between dependent words, is often considered a key indicator of language processing difficulty and working memory load. The current paper looks at diachronic trends in syntactic language change in both English and German, using corpora of parliamentary debates from the last c. 160 years. We base our observations on five dependency parsers, including the widely used Stanford CoreNLP as well as 4 newer alternatives. Our analysis of syntactic language change goes beyond linear dependency distance and explores 15 metrics relevant to dependency distance minimization (DDM) and/or based on tree graph properties, such as the tree height and degree variance. Even though we have evidence that recent parsers trained on modern treebanks are not heavily affected by data 'noise' such as spelling changes and OCR errors in our historic data, we find that results of syntactic language change are sensitive to the parsers involved, which is a caution against using a single parser for evaluating syntactic language change as done in previous work. We also show that syntactic language change over the time period investigated is largely similar between English and German for the different metrics explored: only 4% of cases we examine yield opposite conclusions regarding upwards and downtrends of syntactic metrics across German and English. We also show that changes in syntactic measures seem to be more frequent at the tails of sentence length distributions. To our best knowledge, ours is the most comprehensive analysis of syntactic language change using modern NLP technology in recent corpora of English and German.
△ Less
Submitted 28 March, 2024; v1 submitted 18 February, 2024;
originally announced February 2024.
-
I still have Time(s): Extending HeidelTime for German Texts
Authors:
Andy Lücking,
Manuel Stoeckel,
Giuseppe Abrami,
Alexander Mehler
Abstract:
HeidelTime is one of the most widespread and successful tools for detecting temporal expressions in texts. Since HeidelTime's pattern matching system is based on regular expression, it can be extended in a convenient way. We present such an extension for the German resources of HeidelTime: HeidelTime-EXT . The extension has been brought about by means of observing false negatives within real world…
▽ More
HeidelTime is one of the most widespread and successful tools for detecting temporal expressions in texts. Since HeidelTime's pattern matching system is based on regular expression, it can be extended in a convenient way. We present such an extension for the German resources of HeidelTime: HeidelTime-EXT . The extension has been brought about by means of observing false negatives within real world texts and various time banks. The gain in coverage is 2.7% or 8.5%, depending on the admitted degree of potential overgeneralization. We describe the development of HeidelTime-EXT, its evaluation on text samples from various genres, and share some linguistic observations. HeidelTime ext can be obtained from https://github.com/texttechnologylab/heideltime.
△ Less
Submitted 19 April, 2022;
originally announced April 2022.
-
When Specialization Helps: Using Pooled Contextualized Embeddings to Detect Chemical and Biomedical Entities in Spanish
Authors:
Manuel Stoeckel,
Wahed Hemati,
Alexander Mehler
Abstract:
The recognition of pharmacological substances, compounds and proteins is an essential preliminary work for the recognition of relations between chemicals and other biomedically relevant units. In this paper, we describe an approach to Task 1 of the PharmaCoNER Challenge, which involves the recognition of mentions of chemicals and drugs in Spanish medical texts. We train a state-of-the-art BiLSTM-C…
▽ More
The recognition of pharmacological substances, compounds and proteins is an essential preliminary work for the recognition of relations between chemicals and other biomedically relevant units. In this paper, we describe an approach to Task 1 of the PharmaCoNER Challenge, which involves the recognition of mentions of chemicals and drugs in Spanish medical texts. We train a state-of-the-art BiLSTM-CRF sequence tagger with stacked Pooled Contextualized Embeddings, word and sub-word embeddings using the open-source framework FLAIR. We present a new corpus composed of articles and papers from Spanish health science journals, termed the Spanish Health Corpus, and use it to train domain-specific embeddings which we incorporate in our model training. We achieve a result of 89.76% F1-score using pre-trained embeddings and are able to improve these results to 90.52% F1-score using specialized embeddings.
△ Less
Submitted 8 October, 2019;
originally announced October 2019.
-
SenseFitting: Sense Level Semantic Specialization of Word Embeddings for Word Sense Disambiguation
Authors:
Manuel Stoeckel,
Sajawel Ahmed,
Alexander Mehler
Abstract:
We introduce a neural network-based system of Word Sense Disambiguation (WSD) for German that is based on SenseFitting, a novel method for optimizing WSD. We outperform knowledge-based WSD methods by up to 25% F1-score and produce a new state-of-the-art on the German sense-annotated dataset WebCAGe. Our method uses three feature vectors consisting of a) sense, b) gloss, and c) relational vectors t…
▽ More
We introduce a neural network-based system of Word Sense Disambiguation (WSD) for German that is based on SenseFitting, a novel method for optimizing WSD. We outperform knowledge-based WSD methods by up to 25% F1-score and produce a new state-of-the-art on the German sense-annotated dataset WebCAGe. Our method uses three feature vectors consisting of a) sense, b) gloss, and c) relational vectors to represent target senses and to compare them with the vector centroids of sample contexts. Utilizing widely available word embeddings and lexical resources, we are able to compensate for the lower resource availability of German. SenseFitting builds upon the recently introduced semantic specialization procedure Attract-Repel, and leverages sense level semantic constraints from lexical-semantic networks (e.g. GermaNet) or online social dictionaries (e.g. Wiktionary) to produce high-quality sense embeddings from pre-trained word embeddings. We evaluate our sense embeddings with a new SimLex-999 based similarity dataset, called SimSense, that we developed for this work. We achieve results that outperform current lemma-based specialization methods for German, making them comparable to results achieved for English.
△ Less
Submitted 30 July, 2019;
originally announced July 2019.
-
Constructing Light Spanners Deterministically in Near-Linear Time
Authors:
Stephen Alstrup,
Søren Dahlgaard,
Arnold Filtser,
Morten Stöckel,
Christian Wulff-Nilsen
Abstract:
Graph spanners are well-studied and widely used both in theory and practice. In a recent breakthrough, Chechik and Wulff-Nilsen [CW18] improved the state-of-the-art for light spanners by constructing a $(2k-1)(1+ε)$-spanner with $O(n^{1+1/k})$ edges and $O_ε(n^{1/k})$ lightness. Soon after, Filtser and Solomon [FS19] showed that the classic greedy spanner construction achieves the same bounds The…
▽ More
Graph spanners are well-studied and widely used both in theory and practice. In a recent breakthrough, Chechik and Wulff-Nilsen [CW18] improved the state-of-the-art for light spanners by constructing a $(2k-1)(1+ε)$-spanner with $O(n^{1+1/k})$ edges and $O_ε(n^{1/k})$ lightness. Soon after, Filtser and Solomon [FS19] showed that the classic greedy spanner construction achieves the same bounds The major drawback of the greedy spanner is its running time of $O(mn^{1+1/k})$ (which is faster than [CW16]). This makes the construction impractical even for graphs of moderate size. Much faster spanner constructions do exist but they only achieve lightness $Ω_ε(kn^{1/k})$, even when randomization is used. The contribution of this paper is deterministic spanner constructions that are fast, and achieve similar bounds as the state-of-the-art slower constructions. Our first result is an $O_ε(n^{2+1/k+ε'})$ time spanner construction which achieves the state-of-the-art bounds. Our second result is an $O_ε(m + n\log n)$ time construction of a spanner with $(2k-1)(1+ε)$ stretch, $O(\log k\cdot n^{1+1/k})$ edges and $O_ε(\log k\cdot n^{1/k})$ lightness. This is an exponential improvement in the dependence on $k$ compared to the previous result with such running time. Finally, for the important special case where $k=\log n$, for every constant $ε>0$, we provide an $O(m+n^{1+ε})$ time construction that produces an $O(\log n)$-spanner with $O(n)$ edges and $O(1)$ lightness which is asymptotically optimal. This is the first known sub-quadratic construction of such a spanner for any $k = ω(1)$. To achieve our constructions, we show a novel deterministic incremental approximate distance oracle, which may be of independent interest.
△ Less
Submitted 19 January, 2022; v1 submitted 6 September, 2017;
originally announced September 2017.
-
New Subquadratic Approximation Algorithms for the Girth
Authors:
Søren Dahlgaard,
Mathias Bæk Tejs Knudsen,
Morten Stöckel
Abstract:
We consider the problem of approximating the girth, $g$, of an unweighted and undirected graph $G=(V,E)$ with $n$ nodes and $m$ edges. A seminal result of Itai and Rodeh [SICOMP'78] gave an additive $1$-approximation in $O(n^2)$ time, and the main open question is thus how well we can do in subquadratic time.
In this paper we present two main results. The first is a $(1+\varepsilon,O(1))$-approx…
▽ More
We consider the problem of approximating the girth, $g$, of an unweighted and undirected graph $G=(V,E)$ with $n$ nodes and $m$ edges. A seminal result of Itai and Rodeh [SICOMP'78] gave an additive $1$-approximation in $O(n^2)$ time, and the main open question is thus how well we can do in subquadratic time.
In this paper we present two main results. The first is a $(1+\varepsilon,O(1))$-approximation in truly subquadratic time. Specifically, for any $k\ge 2$ our algorithm returns a cycle of length $2\lceil g/2\rceil+2\left\lceil\frac{g}{2(k-1)}\right\rceil$ in $\tilde{O}(n^{2-1/k})$ time. This generalizes the results of Lingas and Lundell [IPL'09] who showed it for the special case of $k=2$ and Roditty and Vassilevska Williams [SODA'12] who showed it for $k=3$. Our second result is to present an $O(1)$-approximation running in $O(n^{1+\varepsilon})$ time for any $\varepsilon > 0$. Prior to this work the fastest constant-factor approximation was the $\tilde{O}(n^{3/2})$ time $8/3$-approximation of Lingas and Lundell [IPL'09] using the algorithm corresponding to the special case $k=2$ of our first result.
△ Less
Submitted 7 April, 2017;
originally announced April 2017.
-
Finding Even Cycles Faster via Capped k-Walks
Authors:
Søren Dahlgaard,
Mathias Bæk Tejs Knudsen,
Morten Stöckel
Abstract:
In this paper, we consider the problem of finding a cycle of length $2k$ (a $C_{2k}$) in an undirected graph $G$ with $n$ nodes and $m$ edges for constant $k\ge2$. A classic result by Bondy and Simonovits [J.Comb.Th.'74] implies that if $m \ge100k n^{1+1/k}$, then $G$ contains a $C_{2k}$, further implying that one needs to consider only graphs with $m = O(n^{1+1/k})$.
Previously the best known a…
▽ More
In this paper, we consider the problem of finding a cycle of length $2k$ (a $C_{2k}$) in an undirected graph $G$ with $n$ nodes and $m$ edges for constant $k\ge2$. A classic result by Bondy and Simonovits [J.Comb.Th.'74] implies that if $m \ge100k n^{1+1/k}$, then $G$ contains a $C_{2k}$, further implying that one needs to consider only graphs with $m = O(n^{1+1/k})$.
Previously the best known algorithms were an $O(n^2)$ algorithm due to Yuster and Zwick [J.Disc.Math'97] as well as a $O(m^{2-(1+\lceil k/2\rceil^{-1})/(k+1)})$ algorithm by Alon et al. [Algorithmica'97].
We present an algorithm that uses $O(m^{2k/(k+1)})$ time and finds a $C_{2k}$ if one exists. This bound is $O(n^2)$ exactly when $m=Θ(n^{1+1/k})$. For $4$-cycles our new bound coincides with Alon et al., while for every $k>2$ our bound yields a polynomial improvement in $m$.
Yuster and Zwick noted that it is "plausible to conjecture that $O(n^2)$ is the best possible bound in terms of $n$". We show "conditional optimality": if this hypothesis holds then our $O(m^{2k/(k+1)})$ algorithm is tight as well. Furthermore, a folklore reduction implies that no combinatorial algorithm can determine if a graph contains a $6$-cycle in time $O(m^{3/2-ε})$ for any $ε>0$ under the widely believed combinatorial BMM conjecture. Coupled with our main result, this gives tight bounds for finding $6$-cycles combinatorially and also separates the complexity of finding $4$- and $6$-cycles giving evidence that the exponent of $m$ in the running time should indeed increase with $k$.
The key ingredient in our algorithm is a new notion of capped $k$-walks, which are walks of length $k$ that visit only nodes according to a fixed ordering. Our main technical contribution is an involved analysis proving several properties of such walks which may be of independent interest.
△ Less
Submitted 30 March, 2017;
originally announced March 2017.
-
Near-Optimal Induced Universal Graphs for Bounded Degree Graphs
Authors:
Mikkel Abrahamsen,
Stephen Alstrup,
Jacob Holm,
Mathias Bæk Tejs Knudsen,
Morten Stöckel
Abstract:
A graph $U$ is an induced universal graph for a family $F$ of graphs if every graph in $F$ is a vertex-induced subgraph of $U$. For the family of all undirected graphs on $n$ vertices Alstrup, Kaplan, Thorup, and Zwick [STOC 2015] give an induced universal graph with $O\!\left(2^{n/2}\right)$ vertices, matching a lower bound by Moon [Proc. Glasgow Math. Assoc. 1965].
Let $k= \lceil D/2 \rceil$.…
▽ More
A graph $U$ is an induced universal graph for a family $F$ of graphs if every graph in $F$ is a vertex-induced subgraph of $U$. For the family of all undirected graphs on $n$ vertices Alstrup, Kaplan, Thorup, and Zwick [STOC 2015] give an induced universal graph with $O\!\left(2^{n/2}\right)$ vertices, matching a lower bound by Moon [Proc. Glasgow Math. Assoc. 1965].
Let $k= \lceil D/2 \rceil$. Improving asymptotically on previous results by Butler [Graphs and Combinatorics 2009] and Esperet, Arnaud and Ochem [IPL 2008], we give an induced universal graph with $O\!\left(\frac{k2^k}{k!}n^k \right)$ vertices for the family of graphs with $n$ vertices of maximum degree $D$. For constant $D$, Butler gives a lower bound of $Ω\!\left(n^{D/2}\right)$. For an odd constant $D\geq 3$, Esperet et al. and Alon and Capalbo [SODA 2008] give a graph with $O\!\left(n^{k-\frac{1}{D}}\right)$ vertices. Using their techniques for any (including constant) even values of $D$ gives asymptotically worse bounds than we present.
For large $D$, i.e. when $D = Ω\left(\log^3 n\right)$, the previous best upper bound was ${n\choose\lceil D/2\rceil} n^{O(1)}$ due to Adjiashvili and Rotbart [ICALP 2014]. We give upper and lower bounds showing that the size is ${\lfloor n/2\rfloor\choose\lfloor D/2 \rfloor}2^{\pm\tilde{O}\left(\sqrt{D}\right)}$. Hence the optimal size is $2^{\tilde{O}(D)}$ and our construction is within a factor of $2^{\tilde{O}\left(\sqrt{D}\right)}$ from this. The previous results were larger by at least a factor of $2^{Ω(D)}$.
As a part of the above, proving a conjecture by Esperet et al., we construct an induced universal graph with $2n-1$ vertices for the family of graphs with max degree $2$. In addition, we give results for acyclic graphs with max degree $2$ and cycle graphs. Our results imply the first labeling schemes that for any $D$ are at most $o(n)$ bits from optimal.
△ Less
Submitted 21 July, 2016; v1 submitted 17 July, 2016;
originally announced July 2016.
-
I/O-Efficient Similarity Join
Authors:
Rasmus Pagh,
Ninh Pham,
Francesco Silvestri,
Morten Stöckel
Abstract:
We present an I/O-efficient algorithm for computing similarity joins based on locality-sensitive hashing (LSH). In contrast to the filtering methods commonly suggested our method has provable sub-quadratic dependency on the data size. Further, in contrast to straightforward implementations of known LSH-based algorithms on external memory, our approach is able to take significant advantage of the a…
▽ More
We present an I/O-efficient algorithm for computing similarity joins based on locality-sensitive hashing (LSH). In contrast to the filtering methods commonly suggested our method has provable sub-quadratic dependency on the data size. Further, in contrast to straightforward implementations of known LSH-based algorithms on external memory, our approach is able to take significant advantage of the available internal memory: Whereas the time complexity of classical algorithms includes a factor of $N^ρ$, where $ρ$ is a parameter of the LSH used, the I/O complexity of our algorithm merely includes a factor $(N/M)^ρ$, where $N$ is the data size and $M$ is the size of internal memory. Our algorithm is randomized and outputs the correct result with high probability. It is a simple, recursive, cache-oblivious procedure, and we believe that it will be useful also in other computational settings such as parallel computation.
△ Less
Submitted 28 March, 2017; v1 submitted 2 July, 2015;
originally announced July 2015.
-
Quicksort, Largest Bucket, and Min-Wise Hashing with Limited Independence
Authors:
Mathias Bæk Tejs Knudsen,
Morten Stöckel
Abstract:
Randomized algorithms and data structures are often analyzed under the assumption of access to a perfect source of randomness. The most fundamental metric used to measure how "random" a hash function or a random number generator is, is its independence: a sequence of random variables is said to be $k$-independent if every variable is uniform and every size $k$ subset is independent. In this paper…
▽ More
Randomized algorithms and data structures are often analyzed under the assumption of access to a perfect source of randomness. The most fundamental metric used to measure how "random" a hash function or a random number generator is, is its independence: a sequence of random variables is said to be $k$-independent if every variable is uniform and every size $k$ subset is independent. In this paper we consider three classic algorithms under limited independence. We provide new bounds for randomized quicksort, min-wise hashing and largest bucket size under limited independence. Our results can be summarized as follows.
-Randomized quicksort. When pivot elements are computed using a $5$-independent hash function, Karloff and Raghavan, J.ACM'93 showed $O ( n \log n)$ expected worst-case running time for a special version of quicksort. We improve upon this, showing that the same running time is achieved with only $4$-independence.
-Min-wise hashing. For a set $A$, consider the probability of a particular element being mapped to the smallest hash value. It is known that $5$-independence implies the optimal probability $O (1 /n)$. Broder et al., STOC'98 showed that $2$-independence implies it is $O(1 / \sqrt{|A|})$. We show a matching lower bound as well as new tight bounds for $3$- and $4$-independent hash functions.
-Largest bucket. We consider the case where $n$ balls are distributed to $n$ buckets using a $k$-independent hash function and analyze the largest bucket size. Alon et. al, STOC'97 showed that there exists a $2$-independent hash function implying a bucket of size $Ω( n^{1/2})$. We generalize the bound, providing a $k$-independent family of functions that imply size $Ω( n^{1/k})$.
△ Less
Submitted 19 February, 2015;
originally announced February 2015.
-
Association Rule Mining using Maximum Entropy
Authors:
Rasmus Pagh,
Morten Stöckel
Abstract:
Recommendations based on behavioral data may be faced with ambiguous statistical evidence. We consider the case of association rules, relevant e.g.~for query and product recommendations. For example: Suppose that a customer belongs to categories A and B, each of which is known to have positive correlation with buying product C, how do we estimate the probability that she will buy product C?
For…
▽ More
Recommendations based on behavioral data may be faced with ambiguous statistical evidence. We consider the case of association rules, relevant e.g.~for query and product recommendations. For example: Suppose that a customer belongs to categories A and B, each of which is known to have positive correlation with buying product C, how do we estimate the probability that she will buy product C?
For rare terms or products there may not be enough data to directly produce such an estimate --- perhaps we never directly observed a connection between A, B, and C. What can we do when there is no support for estimating the probability by simply computing the observed frequency? In particular, what is the right thing to do when A and B give rise to very different estimates of the probability of C?
We consider the use of maximum entropy probability estimates, which give a principled way of extrapolating probabilities of events that do not even occur in the data set! Focusing on the basic case of three variables, our main technical contributions are that (under mild assumptions): 1) There exists a simple, explicit formula that gives a good approximation of maximum entropy estimates, and 2) Maximum entropy estimates based on a small number of samples are provably tightly concentrated around the true maximum entropy frequency that arises if we let the number of samples go to infinity.
Our empirical work demonstrates the surprising precision of maximum entropy estimates, across a range of real-life transaction data sets. In particular we observe the average absolute error on maximum entropy estimates is a factor $3$--$14$ less compared to using independence or extrapolation estimates, when the data used to make the estimates has low support. We believe that the same principle can be used to synthesize probability estimates in many settings.
△ Less
Submitted 9 January, 2015;
originally announced January 2015.
-
The Input/Output Complexity of Sparse Matrix Multiplication
Authors:
Rasmus Pagh,
Morten Stöckel
Abstract:
We consider the problem of multiplying sparse matrices (over a semiring) where the number of non-zero entries is larger than main memory. In the classical paper of Hong and Kung (STOC '81) it was shown that to compute a product of dense $U \times U$ matrices, $Θ\left(U^3 / (B \sqrt{M}) \right)$ I/Os are necessary and sufficient in the I/O model with internal memory size $M$ and memory block size…
▽ More
We consider the problem of multiplying sparse matrices (over a semiring) where the number of non-zero entries is larger than main memory. In the classical paper of Hong and Kung (STOC '81) it was shown that to compute a product of dense $U \times U$ matrices, $Θ\left(U^3 / (B \sqrt{M}) \right)$ I/Os are necessary and sufficient in the I/O model with internal memory size $M$ and memory block size $B$.
In this paper we generalize the upper and lower bounds of Hong and Kung to the sparse case. Our bounds depend of the number $N = \mathtt{nnz}(A)+\mathtt{nnz}(C)$ of nonzero entries in $A$ and $C$, as well as the number $Z = \mathtt{nnz}(AC)$ of nonzero entries in $AC$.
We show that $AC$ can be computed using $\tilde{O} \left(\tfrac{N}{B} \min\left(\sqrt{\tfrac{Z}{M}},\tfrac{N}{M}\right) \right)$ I/Os, with high probability. This is tight (up to polylogarithmic factors) when only semiring operations are allowed, even for dense rectangular matrices: We show a lower bound of $Ω\left(\tfrac{N}{B} \min\left(\sqrt{\tfrac{Z}{M}},\tfrac{N}{M}\right) \right)$ I/Os.
While our lower bound uses fairly standard techniques, the upper bound makes use of ``compressed matrix multiplication'' sketches, which is new in the context of I/O-efficient algorithms, and a new matrix product size estimation technique that avoids the ``no cancellation'' assumption.
△ Less
Submitted 14 March, 2014;
originally announced March 2014.
-
The Hardness of the Functional Orientation 2-Color Problem
Authors:
Søren Bøg,
Morten Stöckel,
Hjalte Wedel Vildhøj
Abstract:
We consider the Functional Orientation 2-Color problem, which was introduced by Valiant in his seminal paper on holographic algorithms [SIAM J. Comput., 37(5), 2008]. For this decision problem, Valiant gave a polynomial time holographic algorithm for planar graphs of maximum degree 3, and showed that the problem is NP-complete for planar graphs of maximum degree 10. A recent result on defective gr…
▽ More
We consider the Functional Orientation 2-Color problem, which was introduced by Valiant in his seminal paper on holographic algorithms [SIAM J. Comput., 37(5), 2008]. For this decision problem, Valiant gave a polynomial time holographic algorithm for planar graphs of maximum degree 3, and showed that the problem is NP-complete for planar graphs of maximum degree 10. A recent result on defective graph coloring by Corrêa et al. [Australas. J. Combin., 43, 2009] implies that the problem is already hard for planar graphs of maximum degree 8. Together, these results leave open the hardness question for graphs of maximum degree between 4 and 7. We close this gap by showing that the answer is always yes for arbitrary graphs of maximum degree 5, and that the problem is NP-complete for planar graphs of maximum degree 6. Moreover, for graphs of maximum degree 5, we note that a linear time algorithm for finding a solution exists.
△ Less
Submitted 23 April, 2013; v1 submitted 9 October, 2012;
originally announced October 2012.