Search | arXiv e-print repository

A minimal base or a direct base? That is the question!

Abstract: In this paper we revisit the problem of computing the closure of a set of attributes, given a set of Armstrong dependencies. This problem is of main interest in logics, in the relational database model, in lattice theory and in Formal Concept Analysis as well. We consider here three main closure algorithms, namely Closure, LinClosure and Wild's Closure, which are combined with implication bases wh… ▽ More In this paper we revisit the problem of computing the closure of a set of attributes, given a set of Armstrong dependencies. This problem is of main interest in logics, in the relational database model, in lattice theory and in Formal Concept Analysis as well. We consider here three main closure algorithms, namely Closure, LinClosure and Wild's Closure, which are combined with implication bases which may have different characteristics, among which being "minimal", e.g., the Duquenne-Guigues basis, and being "direct", e.g., the Canonical Basis and the D-basis. The impacts of minimality and directness on the closure algorithms are then deeply studied also experimentally. The results are extensively analyzed and propose a different and fresh look at computing the closure of a set of attributes. This paper has been submitted to the International Journal of Approximate Reasoning. △ Less

Submitted 18 April, 2024; originally announced April 2024.

MSC Class: 68 ACM Class: H.0; H.1.0

arXiv:2403.13914 [pdf, ps, other]

Database Dependencies and Formal Concept Analysis

Authors: Jaume Baixeries

Abstract: This is an account of the characterization of database dependencies with Formal Concept Analysis. This is an account of the characterization of database dependencies with Formal Concept Analysis. △ Less

Submitted 20 March, 2024; originally announced March 2024.

MSC Class: 68 ACM Class: H.0; H.1.0

arXiv:2107.00042 [pdf, other]

doi 10.1371/journal.pone.0260849

Zipf's laws of meaning in Catalan

Authors: Neus Català, Jaume Baixeries, Ramon Ferrer-Cancho, Lluís Padró, Antoni Hernández-Fernández

Abstract: In his pioneering research, G. K. Zipf formulated a couple of statistical laws on the relationship between the frequency of a word with its number of meanings: the law of meaning distribution, relating the frequency of a word and its frequency rank, and the meaning-frequency law, relating the frequency of a word with its number of meanings. Although these laws were formulated more than half a cent… ▽ More In his pioneering research, G. K. Zipf formulated a couple of statistical laws on the relationship between the frequency of a word with its number of meanings: the law of meaning distribution, relating the frequency of a word and its frequency rank, and the meaning-frequency law, relating the frequency of a word with its number of meanings. Although these laws were formulated more than half a century ago, they have been only investigated in a few languages. Here we present the first study of these laws in Catalan. We verify these laws in Catalan via the relationship among their exponents and that of the rank-frequency law. We present a new protocol for the analysis of these Zipfian laws that can be extended to other languages. We report the first evidence of two marked regimes for these laws in written language and speech, paralleling the two regimes in Zipf's rank-frequency law in large multi-author corpora discovered in early 2000s. Finally, the implications of these two regimes will be discussed. △ Less

Submitted 30 June, 2021; originally announced July 2021.

Comments: 21 pages, 11 figures

arXiv:1904.00812 [pdf, other]

doi 10.1016/j.csl.2019.03.007

Polysemy and brevity versus frequency in language

Authors: Bernardino Casas, Antoni Hernández-Fernández, Neus Català, Ramon Ferrer-i-Cancho, Jaume Baixeries

Abstract: The pioneering research of G. K. Zipf on the relationship between word frequency and other word features led to the formulation of various linguistic laws. The most popular is Zipf's law for word frequencies. Here we focus on two laws that have been studied less intensively: the meaning-frequency law, i.e. the tendency of more frequent words to be more polysemous, and the law of abbreviation, i.e.… ▽ More The pioneering research of G. K. Zipf on the relationship between word frequency and other word features led to the formulation of various linguistic laws. The most popular is Zipf's law for word frequencies. Here we focus on two laws that have been studied less intensively: the meaning-frequency law, i.e. the tendency of more frequent words to be more polysemous, and the law of abbreviation, i.e. the tendency of more frequent words to be shorter. In a previous work, we tested the robustness of these Zipfian laws for English, roughly measuring word length in number of characters and distinguishing adult from child speech. In the present article, we extend our study to other languages (Dutch and Spanish) and introduce two additional measures of length: syllabic length and phonemic length. Our correlation analysis indicates that both the meaning-frequency law and the law of abbreviation hold overall in all the analyzed languages. △ Less

Submitted 27 March, 2019; originally announced April 2019.

Journal ref: Computer Speech and Language 58, 19-50 (2019)

arXiv:1611.08807 [pdf, other]

doi 10.1075/is.16036.cas

The polysemy of the words that children learn over time

Authors: Bernardino Casas, Neus Català, Ramon Ferrer-i-Cancho, Antoni Hernández-Fernández, Jaume Baixeries

Abstract: Here we study polysemy as a potential learning bias in vocabulary learning in children. Words of low polysemy could be preferred as they reduce the disambiguation effort for the listener. However, such preference could be a side-effect of another bias: the preference of children for nouns in combination with the lower polysemy of nouns with respect to other part-of-speech categories. Our results s… ▽ More Here we study polysemy as a potential learning bias in vocabulary learning in children. Words of low polysemy could be preferred as they reduce the disambiguation effort for the listener. However, such preference could be a side-effect of another bias: the preference of children for nouns in combination with the lower polysemy of nouns with respect to other part-of-speech categories. Our results show that mean polysemy in children increases over time in two phases, i.e. a fast growth till the 31st month followed by a slower tendency towards adult speech. In contrast, this evolution is not found in adults interacting with children. This suggests that children have a preference for non-polysemous words in their early stages of vocabulary acquisition. Interestingly, the evolutionary pattern described above weakens when controlling for syntactic category (noun, verb, adjective or adverb) but it does not disappear completely, suggesting that it could result from acombination of a standalone bias for low polysemy and a preference for nouns. △ Less

Submitted 26 March, 2019; v1 submitted 27 November, 2016; originally announced November 2016.

Comments: Substantially revised version based on referee comments from Interaction Studies

Journal ref: Interaction Studies 19 (3), 389-426 (2018)

arXiv:1210.6599 [pdf, other]

doi 10.1515/sagmb-2013-0034

When is Menzerath-Altmann law mathematically trivial? A new approach

Authors: Ramon Ferrer-i-Cancho, Antoni Hernández-Fernández, Jaume Baixeries, Łukasz Dȩbowski, Ján Mačutek

Abstract: Menzerath's law, the tendency of Z, the mean size of the parts, to decrease as X, the number of parts, increases is found in language, music and genomes. Recently, it has been argued that the presence of the law in genomes is an inevitable consequence of the fact that Z = Y/X, which would imply that Z scales with X as Z ~ 1/X. That scaling is a very particular case of Menzerath-Altmann law that ha… ▽ More Menzerath's law, the tendency of Z, the mean size of the parts, to decrease as X, the number of parts, increases is found in language, music and genomes. Recently, it has been argued that the presence of the law in genomes is an inevitable consequence of the fact that Z = Y/X, which would imply that Z scales with X as Z ~ 1/X. That scaling is a very particular case of Menzerath-Altmann law that has been rejected by means of a correlation test between X and Y in genomes, being X the number of chromosomes of a species, Y its genome size in bases and Z the mean chromosome size. Here we review the statistical foundations of that test and consider three non-parametric tests based upon different correlation metrics and one parametric test to evaluate if Z ~ 1/X in genomes. The most powerful test is a new non-parametric based upon the correlation ratio, which is able to reject Z ~ 1/X in nine out of eleven taxonomic groups and detect a borderline group. Rather than a fact, Z ~ 1/X is a baseline that real genomes do not meet. The view of Menzerath-Altmann law as inevitable is seriously flawed. △ Less

Submitted 25 April, 2014; v1 submitted 24 October, 2012; originally announced October 2012.

Comments: version improved with a new table, new histograms and a more accurate statistical analysis; a new interpetation of the results is offered; notation has undergone minor corrections

arXiv:1207.0689 [pdf]

doi 10.1002/cplx.21429

The challenges of statistical patterns of language: the case of Menzerath's law in genomes

Authors: Ramon Ferrer-i-Cancho, Núria Forns, Antoni Hernández-Fernández, Gemma Bel-Enguix, Jaume Baixeries

Abstract: The importance of statistical patterns of language has been debated over decades. Although Zipf's law is perhaps the most popular case, recently, Menzerath's law has begun to be involved. Menzerath's law manifests in language, music and genomes as a tendency of the mean size of the parts to decrease as the number of parts increases in many situations. This statistical regularity emerges also in th… ▽ More The importance of statistical patterns of language has been debated over decades. Although Zipf's law is perhaps the most popular case, recently, Menzerath's law has begun to be involved. Menzerath's law manifests in language, music and genomes as a tendency of the mean size of the parts to decrease as the number of parts increases in many situations. This statistical regularity emerges also in the context of genomes, for instance, as a tendency of species with more chromosomes to have a smaller mean chromosome size. It has been argued that the instantiation of this law in genomes is not indicative of any parallel between language and genomes because (a) the law is inevitable and (b) non-coding DNA dominates genomes. Here mathematical, statistical and conceptual challenges of these criticisms are discussed. Two major conclusions are drawn: the law is not inevitable and languages also have a correlate of non-coding DNA. However, the wide range of manifestations of the law in and outside genomes suggests that the striking similarities between non-coding DNA and certain linguistics units could be anecdotal for understanding the recurrence of that statistical law. △ Less

Submitted 29 September, 2012; v1 submitted 3 July, 2012; originally announced July 2012.

Comments: Title changed, abstract and introduction improved and little corrections on the statistical arguments

arXiv:1201.1746 [pdf]

doi 10.1080/09296174.2013.773141

The parameters of Menzerath-Altmann law in genomes

Authors: Jaume Baixeries, Antoni Hernandez-Fernandez, Nuria Forns, Ramon Ferrer-i-Cancho

Abstract: The relationship between the size of the whole and the size of the parts in language and music is known to follow Menzerath-Altmann law at many levels of description (morphemes, words, sentences...). Qualitatively, the law states that larger the whole, the smaller its parts, e.g., the longer a word (in syllables) the shorter its syllables (in letters or phonemes). This patterning has also been fou… ▽ More The relationship between the size of the whole and the size of the parts in language and music is known to follow Menzerath-Altmann law at many levels of description (morphemes, words, sentences...). Qualitatively, the law states that larger the whole, the smaller its parts, e.g., the longer a word (in syllables) the shorter its syllables (in letters or phonemes). This patterning has also been found in genomes: the longer a genome (in chromosomes), the shorter its chromosomes (in base pairs). However, it has been argued recently that mean chromosome length is trivially a pure power function of chromosome number with an exponent of -1. The functional dependency between mean chromosome size and chromosome number in groups of organisms from three different kingdoms is studied. The fit of a pure power function yields exponents between -1.6 and 0.1. It is shown that an exponent of -1 is unlikely for fungi, gymnosperm plants, insects, reptiles, ray-finned fishes and amphibians. Even when the exponent is very close to -1, adding an exponential component is able to yield a better fit with regard to a pure power-law in plants, mammals, ray-finned fishes and amphibians. The parameters of Menzerath-Altmann law in genomes deviate significantly from a power law with a -1 exponent with the exception of birds and cartilaginous fishes. △ Less

Submitted 14 June, 2012; v1 submitted 9 January, 2012; originally announced January 2012.

Comments: Typos and little inaccuracies corrected. Title and references updated (the previous update failed)

Showing 1–8 of 8 results for author: Baixeries, J