-
A minimal base or a direct base? That is the question!
Authors:
Jaume Baixeries,
Amedeo Napoli
Abstract:
In this paper we revisit the problem of computing the closure of a set of attributes, given a set of Armstrong dependencies. This problem is of main interest in logics, in the relational database model, in lattice theory and in Formal Concept Analysis as well. We consider here three main closure algorithms, namely Closure, LinClosure and Wild's Closure, which are combined with implication bases wh…
▽ More
In this paper we revisit the problem of computing the closure of a set of attributes, given a set of Armstrong dependencies. This problem is of main interest in logics, in the relational database model, in lattice theory and in Formal Concept Analysis as well. We consider here three main closure algorithms, namely Closure, LinClosure and Wild's Closure, which are combined with implication bases which may have different characteristics, among which being "minimal", e.g., the Duquenne-Guigues basis, and being "direct", e.g., the Canonical Basis and the D-basis. The impacts of minimality and directness on the closure algorithms are then deeply studied also experimentally. The results are extensively analyzed and propose a different and fresh look at computing the closure of a set of attributes.
This paper has been submitted to the International Journal of Approximate Reasoning.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Database Dependencies and Formal Concept Analysis
Authors:
Jaume Baixeries
Abstract:
This is an account of the characterization of database dependencies with Formal Concept Analysis.
This is an account of the characterization of database dependencies with Formal Concept Analysis.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Zipf's laws of meaning in Catalan
Authors:
Neus Català,
Jaume Baixeries,
Ramon Ferrer-Cancho,
Lluís Padró,
Antoni Hernández-Fernández
Abstract:
In his pioneering research, G. K. Zipf formulated a couple of statistical laws on the relationship between the frequency of a word with its number of meanings: the law of meaning distribution, relating the frequency of a word and its frequency rank, and the meaning-frequency law, relating the frequency of a word with its number of meanings. Although these laws were formulated more than half a cent…
▽ More
In his pioneering research, G. K. Zipf formulated a couple of statistical laws on the relationship between the frequency of a word with its number of meanings: the law of meaning distribution, relating the frequency of a word and its frequency rank, and the meaning-frequency law, relating the frequency of a word with its number of meanings. Although these laws were formulated more than half a century ago, they have been only investigated in a few languages. Here we present the first study of these laws in Catalan.
We verify these laws in Catalan via the relationship among their exponents and that of the rank-frequency law. We present a new protocol for the analysis of these Zipfian laws that can be extended to other languages. We report the first evidence of two marked regimes for these laws in written language and speech, paralleling the two regimes in Zipf's rank-frequency law in large multi-author corpora discovered in early 2000s. Finally, the implications of these two regimes will be discussed.
△ Less
Submitted 30 June, 2021;
originally announced July 2021.
-
Polysemy and brevity versus frequency in language
Authors:
Bernardino Casas,
Antoni Hernández-Fernández,
Neus Català,
Ramon Ferrer-i-Cancho,
Jaume Baixeries
Abstract:
The pioneering research of G. K. Zipf on the relationship between word frequency and other word features led to the formulation of various linguistic laws. The most popular is Zipf's law for word frequencies. Here we focus on two laws that have been studied less intensively: the meaning-frequency law, i.e. the tendency of more frequent words to be more polysemous, and the law of abbreviation, i.e.…
▽ More
The pioneering research of G. K. Zipf on the relationship between word frequency and other word features led to the formulation of various linguistic laws. The most popular is Zipf's law for word frequencies. Here we focus on two laws that have been studied less intensively: the meaning-frequency law, i.e. the tendency of more frequent words to be more polysemous, and the law of abbreviation, i.e. the tendency of more frequent words to be shorter. In a previous work, we tested the robustness of these Zipfian laws for English, roughly measuring word length in number of characters and distinguishing adult from child speech. In the present article, we extend our study to other languages (Dutch and Spanish) and introduce two additional measures of length: syllabic length and phonemic length. Our correlation analysis indicates that both the meaning-frequency law and the law of abbreviation hold overall in all the analyzed languages.
△ Less
Submitted 27 March, 2019;
originally announced April 2019.
-
The polysemy of the words that children learn over time
Authors:
Bernardino Casas,
Neus Català,
Ramon Ferrer-i-Cancho,
Antoni Hernández-Fernández,
Jaume Baixeries
Abstract:
Here we study polysemy as a potential learning bias in vocabulary learning in children. Words of low polysemy could be preferred as they reduce the disambiguation effort for the listener. However, such preference could be a side-effect of another bias: the preference of children for nouns in combination with the lower polysemy of nouns with respect to other part-of-speech categories. Our results s…
▽ More
Here we study polysemy as a potential learning bias in vocabulary learning in children. Words of low polysemy could be preferred as they reduce the disambiguation effort for the listener. However, such preference could be a side-effect of another bias: the preference of children for nouns in combination with the lower polysemy of nouns with respect to other part-of-speech categories. Our results show that mean polysemy in children increases over time in two phases, i.e. a fast growth till the 31st month followed by a slower tendency towards adult speech. In contrast, this evolution is not found in adults interacting with children. This suggests that children have a preference for non-polysemous words in their early stages of vocabulary acquisition. Interestingly, the evolutionary pattern described above weakens when controlling for syntactic category (noun, verb, adjective or adverb) but it does not disappear completely, suggesting that it could result from acombination of a standalone bias for low polysemy and a preference for nouns.
△ Less
Submitted 26 March, 2019; v1 submitted 27 November, 2016;
originally announced November 2016.
-
When is Menzerath-Altmann law mathematically trivial? A new approach
Authors:
Ramon Ferrer-i-Cancho,
Antoni Hernández-Fernández,
Jaume Baixeries,
Łukasz Dȩbowski,
Ján Mačutek
Abstract:
Menzerath's law, the tendency of Z, the mean size of the parts, to decrease as X, the number of parts, increases is found in language, music and genomes. Recently, it has been argued that the presence of the law in genomes is an inevitable consequence of the fact that Z = Y/X, which would imply that Z scales with X as Z ~ 1/X. That scaling is a very particular case of Menzerath-Altmann law that ha…
▽ More
Menzerath's law, the tendency of Z, the mean size of the parts, to decrease as X, the number of parts, increases is found in language, music and genomes. Recently, it has been argued that the presence of the law in genomes is an inevitable consequence of the fact that Z = Y/X, which would imply that Z scales with X as Z ~ 1/X. That scaling is a very particular case of Menzerath-Altmann law that has been rejected by means of a correlation test between X and Y in genomes, being X the number of chromosomes of a species, Y its genome size in bases and Z the mean chromosome size. Here we review the statistical foundations of that test and consider three non-parametric tests based upon different correlation metrics and one parametric test to evaluate if Z ~ 1/X in genomes. The most powerful test is a new non-parametric based upon the correlation ratio, which is able to reject Z ~ 1/X in nine out of eleven taxonomic groups and detect a borderline group. Rather than a fact, Z ~ 1/X is a baseline that real genomes do not meet. The view of Menzerath-Altmann law as inevitable is seriously flawed.
△ Less
Submitted 25 April, 2014; v1 submitted 24 October, 2012;
originally announced October 2012.
-
The challenges of statistical patterns of language: the case of Menzerath's law in genomes
Authors:
Ramon Ferrer-i-Cancho,
Núria Forns,
Antoni Hernández-Fernández,
Gemma Bel-Enguix,
Jaume Baixeries
Abstract:
The importance of statistical patterns of language has been debated over decades. Although Zipf's law is perhaps the most popular case, recently, Menzerath's law has begun to be involved. Menzerath's law manifests in language, music and genomes as a tendency of the mean size of the parts to decrease as the number of parts increases in many situations. This statistical regularity emerges also in th…
▽ More
The importance of statistical patterns of language has been debated over decades. Although Zipf's law is perhaps the most popular case, recently, Menzerath's law has begun to be involved. Menzerath's law manifests in language, music and genomes as a tendency of the mean size of the parts to decrease as the number of parts increases in many situations. This statistical regularity emerges also in the context of genomes, for instance, as a tendency of species with more chromosomes to have a smaller mean chromosome size. It has been argued that the instantiation of this law in genomes is not indicative of any parallel between language and genomes because (a) the law is inevitable and (b) non-coding DNA dominates genomes. Here mathematical, statistical and conceptual challenges of these criticisms are discussed. Two major conclusions are drawn: the law is not inevitable and languages also have a correlate of non-coding DNA. However, the wide range of manifestations of the law in and outside genomes suggests that the striking similarities between non-coding DNA and certain linguistics units could be anecdotal for understanding the recurrence of that statistical law.
△ Less
Submitted 29 September, 2012; v1 submitted 3 July, 2012;
originally announced July 2012.
-
The parameters of Menzerath-Altmann law in genomes
Authors:
Jaume Baixeries,
Antoni Hernandez-Fernandez,
Nuria Forns,
Ramon Ferrer-i-Cancho
Abstract:
The relationship between the size of the whole and the size of the parts in language and music is known to follow Menzerath-Altmann law at many levels of description (morphemes, words, sentences...). Qualitatively, the law states that larger the whole, the smaller its parts, e.g., the longer a word (in syllables) the shorter its syllables (in letters or phonemes). This patterning has also been fou…
▽ More
The relationship between the size of the whole and the size of the parts in language and music is known to follow Menzerath-Altmann law at many levels of description (morphemes, words, sentences...). Qualitatively, the law states that larger the whole, the smaller its parts, e.g., the longer a word (in syllables) the shorter its syllables (in letters or phonemes). This patterning has also been found in genomes: the longer a genome (in chromosomes), the shorter its chromosomes (in base pairs). However, it has been argued recently that mean chromosome length is trivially a pure power function of chromosome number with an exponent of -1. The functional dependency between mean chromosome size and chromosome number in groups of organisms from three different kingdoms is studied. The fit of a pure power function yields exponents between -1.6 and 0.1. It is shown that an exponent of -1 is unlikely for fungi, gymnosperm plants, insects, reptiles, ray-finned fishes and amphibians. Even when the exponent is very close to -1, adding an exponential component is able to yield a better fit with regard to a pure power-law in plants, mammals, ray-finned fishes and amphibians. The parameters of Menzerath-Altmann law in genomes deviate significantly from a power law with a -1 exponent with the exception of birds and cartilaginous fishes.
△ Less
Submitted 14 June, 2012; v1 submitted 9 January, 2012;
originally announced January 2012.