-
Zipf's laws of meaning in Catalan
Authors:
Neus Català,
Jaume Baixeries,
Ramon Ferrer-Cancho,
Lluís Padró,
Antoni Hernández-Fernández
Abstract:
In his pioneering research, G. K. Zipf formulated a couple of statistical laws on the relationship between the frequency of a word with its number of meanings: the law of meaning distribution, relating the frequency of a word and its frequency rank, and the meaning-frequency law, relating the frequency of a word with its number of meanings. Although these laws were formulated more than half a cent…
▽ More
In his pioneering research, G. K. Zipf formulated a couple of statistical laws on the relationship between the frequency of a word with its number of meanings: the law of meaning distribution, relating the frequency of a word and its frequency rank, and the meaning-frequency law, relating the frequency of a word with its number of meanings. Although these laws were formulated more than half a century ago, they have been only investigated in a few languages. Here we present the first study of these laws in Catalan.
We verify these laws in Catalan via the relationship among their exponents and that of the rank-frequency law. We present a new protocol for the analysis of these Zipfian laws that can be extended to other languages. We report the first evidence of two marked regimes for these laws in written language and speech, paralleling the two regimes in Zipf's rank-frequency law in large multi-author corpora discovered in early 2000s. Finally, the implications of these two regimes will be discussed.
△ Less
Submitted 30 June, 2021;
originally announced July 2021.
-
Polysemy and brevity versus frequency in language
Authors:
Bernardino Casas,
Antoni Hernández-Fernández,
Neus Català,
Ramon Ferrer-i-Cancho,
Jaume Baixeries
Abstract:
The pioneering research of G. K. Zipf on the relationship between word frequency and other word features led to the formulation of various linguistic laws. The most popular is Zipf's law for word frequencies. Here we focus on two laws that have been studied less intensively: the meaning-frequency law, i.e. the tendency of more frequent words to be more polysemous, and the law of abbreviation, i.e.…
▽ More
The pioneering research of G. K. Zipf on the relationship between word frequency and other word features led to the formulation of various linguistic laws. The most popular is Zipf's law for word frequencies. Here we focus on two laws that have been studied less intensively: the meaning-frequency law, i.e. the tendency of more frequent words to be more polysemous, and the law of abbreviation, i.e. the tendency of more frequent words to be shorter. In a previous work, we tested the robustness of these Zipfian laws for English, roughly measuring word length in number of characters and distinguishing adult from child speech. In the present article, we extend our study to other languages (Dutch and Spanish) and introduce two additional measures of length: syllabic length and phonemic length. Our correlation analysis indicates that both the meaning-frequency law and the law of abbreviation hold overall in all the analyzed languages.
△ Less
Submitted 27 March, 2019;
originally announced April 2019.
-
The polysemy of the words that children learn over time
Authors:
Bernardino Casas,
Neus Català,
Ramon Ferrer-i-Cancho,
Antoni Hernández-Fernández,
Jaume Baixeries
Abstract:
Here we study polysemy as a potential learning bias in vocabulary learning in children. Words of low polysemy could be preferred as they reduce the disambiguation effort for the listener. However, such preference could be a side-effect of another bias: the preference of children for nouns in combination with the lower polysemy of nouns with respect to other part-of-speech categories. Our results s…
▽ More
Here we study polysemy as a potential learning bias in vocabulary learning in children. Words of low polysemy could be preferred as they reduce the disambiguation effort for the listener. However, such preference could be a side-effect of another bias: the preference of children for nouns in combination with the lower polysemy of nouns with respect to other part-of-speech categories. Our results show that mean polysemy in children increases over time in two phases, i.e. a fast growth till the 31st month followed by a slower tendency towards adult speech. In contrast, this evolution is not found in adults interacting with children. This suggests that children have a preference for non-polysemous words in their early stages of vocabulary acquisition. Interestingly, the evolutionary pattern described above weakens when controlling for syntactic category (noun, verb, adjective or adverb) but it does not disappear completely, suggesting that it could result from acombination of a standalone bias for low polysemy and a preference for nouns.
△ Less
Submitted 26 March, 2019; v1 submitted 27 November, 2016;
originally announced November 2016.
-
The infochemical core
Authors:
Antoni Hernández-Fernández,
Ramon Ferrer-i-Cancho
Abstract:
Vocalizations and less often gestures have been the object of linguistic research over decades. However, the development of a general theory of communication with human language as a particular case requires a clear understanding of the organization of communication through other means. Infochemicals are chemical compounds that carry information and are employed by small organisms that cannot emit…
▽ More
Vocalizations and less often gestures have been the object of linguistic research over decades. However, the development of a general theory of communication with human language as a particular case requires a clear understanding of the organization of communication through other means. Infochemicals are chemical compounds that carry information and are employed by small organisms that cannot emit acoustic signals of optimal frequency to achieve successful communication. Here the distribution of infochemicals across species is investigated when they are ranked by their degree or the number of species with which it is associated (because they produce or they are sensitive to it). The quality of the fit of different functions to the dependency between degree and rank is evaluated with a penalty for the number of parameters of the function. Surprisingly, a double Zipf (a Zipf distribution with two regimes with a different exponent each) is the model yielding the best fit although it is the function with the largest number of parameters. This suggests that the world wide repertoire of infochemicals contains a chemical nucleus shared by many species and reminiscent of the core vocabularies found for human language in dictionaries or large corpora.
△ Less
Submitted 24 October, 2016; v1 submitted 18 October, 2016;
originally announced October 2016.
-
Emergence of linguistic laws in human voice
Authors:
Ivan Gonzalez Torre,
Bartolo Luque,
Lucas Lacasa,
Jordi Luque,
Antoni Hernandez-Fernandez
Abstract:
Linguistic laws constitute one of the quantitative cornerstones of modern cognitive sciences and have been routinely investigated in written corpora, or in the equivalent transcription of oral corpora. This means that inferences of statistical patterns of language in acoustics are biased by the arbitrary, language-dependent segmentation of the signal, and virtually precludes the possibility of mak…
▽ More
Linguistic laws constitute one of the quantitative cornerstones of modern cognitive sciences and have been routinely investigated in written corpora, or in the equivalent transcription of oral corpora. This means that inferences of statistical patterns of language in acoustics are biased by the arbitrary, language-dependent segmentation of the signal, and virtually precludes the possibility of making comparative studies between human voice and other animal communication systems. Here we bridge this gap by proposing a method that allows to measure such patterns in acoustic signals of arbitrary origin, without needs to have access to the language corpus underneath. The method has been applied to six different human languages, recovering successfully some well-known laws of human communication at timescales even below the phoneme and finding yet another link between complexity and criticality in a biological system. These methods further pave the way for new comparative studies in animal communication or the analysis of signals of unknown code.
△ Less
Submitted 9 October, 2016;
originally announced October 2016.
-
Compression as a universal principle of animal behavior
Authors:
R. Ferrer-i-Cancho,
A. Hernández-Fernández,
D. Lusseau,
G. Agoramoorthy,
M. J. Hsu,
S. Semple
Abstract:
A key aim in biology and psychology is to identify fundamental principles underpinning the behavior of animals, including humans. Analyses of human language and the behavior of a range of non-human animal species have provided evidence for a common pattern underlying diverse behavioral phenomena: words follow Zipf's law of brevity (the tendency of more frequently used words to be shorter), and con…
▽ More
A key aim in biology and psychology is to identify fundamental principles underpinning the behavior of animals, including humans. Analyses of human language and the behavior of a range of non-human animal species have provided evidence for a common pattern underlying diverse behavioral phenomena: words follow Zipf's law of brevity (the tendency of more frequently used words to be shorter), and conformity to this general pattern has been seen in the behavior of a number of other animals. It has been argued that the presence of this law is a sign of efficient coding in the information theoretic sense. However, no strong direct connection has been demonstrated between the law and compression, the information theoretic principle of minimizing the expected length of a code. Here we show that minimizing the expected code length implies that the length of a word cannot increase as its frequency increases. Furthermore, we show that the mean code length or duration is significantly small in human language, and also in the behavior of other species in all cases where agreement with the law of brevity has been found. We argue that compression is a general principle of animal behavior, that reflects selection for efficiency of coding.
△ Less
Submitted 25 March, 2013;
originally announced March 2013.
-
When is Menzerath-Altmann law mathematically trivial? A new approach
Authors:
Ramon Ferrer-i-Cancho,
Antoni Hernández-Fernández,
Jaume Baixeries,
Łukasz Dȩbowski,
Ján Mačutek
Abstract:
Menzerath's law, the tendency of Z, the mean size of the parts, to decrease as X, the number of parts, increases is found in language, music and genomes. Recently, it has been argued that the presence of the law in genomes is an inevitable consequence of the fact that Z = Y/X, which would imply that Z scales with X as Z ~ 1/X. That scaling is a very particular case of Menzerath-Altmann law that ha…
▽ More
Menzerath's law, the tendency of Z, the mean size of the parts, to decrease as X, the number of parts, increases is found in language, music and genomes. Recently, it has been argued that the presence of the law in genomes is an inevitable consequence of the fact that Z = Y/X, which would imply that Z scales with X as Z ~ 1/X. That scaling is a very particular case of Menzerath-Altmann law that has been rejected by means of a correlation test between X and Y in genomes, being X the number of chromosomes of a species, Y its genome size in bases and Z the mean chromosome size. Here we review the statistical foundations of that test and consider three non-parametric tests based upon different correlation metrics and one parametric test to evaluate if Z ~ 1/X in genomes. The most powerful test is a new non-parametric based upon the correlation ratio, which is able to reject Z ~ 1/X in nine out of eleven taxonomic groups and detect a borderline group. Rather than a fact, Z ~ 1/X is a baseline that real genomes do not meet. The view of Menzerath-Altmann law as inevitable is seriously flawed.
△ Less
Submitted 25 April, 2014; v1 submitted 24 October, 2012;
originally announced October 2012.
-
The challenges of statistical patterns of language: the case of Menzerath's law in genomes
Authors:
Ramon Ferrer-i-Cancho,
Núria Forns,
Antoni Hernández-Fernández,
Gemma Bel-Enguix,
Jaume Baixeries
Abstract:
The importance of statistical patterns of language has been debated over decades. Although Zipf's law is perhaps the most popular case, recently, Menzerath's law has begun to be involved. Menzerath's law manifests in language, music and genomes as a tendency of the mean size of the parts to decrease as the number of parts increases in many situations. This statistical regularity emerges also in th…
▽ More
The importance of statistical patterns of language has been debated over decades. Although Zipf's law is perhaps the most popular case, recently, Menzerath's law has begun to be involved. Menzerath's law manifests in language, music and genomes as a tendency of the mean size of the parts to decrease as the number of parts increases in many situations. This statistical regularity emerges also in the context of genomes, for instance, as a tendency of species with more chromosomes to have a smaller mean chromosome size. It has been argued that the instantiation of this law in genomes is not indicative of any parallel between language and genomes because (a) the law is inevitable and (b) non-coding DNA dominates genomes. Here mathematical, statistical and conceptual challenges of these criticisms are discussed. Two major conclusions are drawn: the law is not inevitable and languages also have a correlate of non-coding DNA. However, the wide range of manifestations of the law in and outside genomes suggests that the striking similarities between non-coding DNA and certain linguistics units could be anecdotal for understanding the recurrence of that statistical law.
△ Less
Submitted 29 September, 2012; v1 submitted 3 July, 2012;
originally announced July 2012.
-
The failure of the law of brevity in two New World primates. Statistical caveats
Authors:
Ramon Ferrer-i-Cancho,
Antoni Hernández-Fernández
Abstract:
Parallels of Zipf's law of brevity, the tendency of more frequent words to be shorter, have been found in bottlenose dolphins and Formosan macaques. Although these findings suggest that behavioral repertoires are shaped by a general principle of compression, common marmosets and golden-backed uakaris do not exhibit the law. However, we argue that the law may be impossible or difficult to detect st…
▽ More
Parallels of Zipf's law of brevity, the tendency of more frequent words to be shorter, have been found in bottlenose dolphins and Formosan macaques. Although these findings suggest that behavioral repertoires are shaped by a general principle of compression, common marmosets and golden-backed uakaris do not exhibit the law. However, we argue that the law may be impossible or difficult to detect statistically in a given species if the repertoire is too small, a problem that could be affecting golden backed uakaris, and show that the law is present in a subset of the repertoire of common marmosets. We suggest that the visibility of the law will depend on the subset of the repertoire under consideration or the repertoire size.
△ Less
Submitted 28 September, 2012; v1 submitted 14 April, 2012;
originally announced April 2012.
-
The parameters of Menzerath-Altmann law in genomes
Authors:
Jaume Baixeries,
Antoni Hernandez-Fernandez,
Nuria Forns,
Ramon Ferrer-i-Cancho
Abstract:
The relationship between the size of the whole and the size of the parts in language and music is known to follow Menzerath-Altmann law at many levels of description (morphemes, words, sentences...). Qualitatively, the law states that larger the whole, the smaller its parts, e.g., the longer a word (in syllables) the shorter its syllables (in letters or phonemes). This patterning has also been fou…
▽ More
The relationship between the size of the whole and the size of the parts in language and music is known to follow Menzerath-Altmann law at many levels of description (morphemes, words, sentences...). Qualitatively, the law states that larger the whole, the smaller its parts, e.g., the longer a word (in syllables) the shorter its syllables (in letters or phonemes). This patterning has also been found in genomes: the longer a genome (in chromosomes), the shorter its chromosomes (in base pairs). However, it has been argued recently that mean chromosome length is trivially a pure power function of chromosome number with an exponent of -1. The functional dependency between mean chromosome size and chromosome number in groups of organisms from three different kingdoms is studied. The fit of a pure power function yields exponents between -1.6 and 0.1. It is shown that an exponent of -1 is unlikely for fungi, gymnosperm plants, insects, reptiles, ray-finned fishes and amphibians. Even when the exponent is very close to -1, adding an exponential component is able to yield a better fit with regard to a pure power-law in plants, mammals, ray-finned fishes and amphibians. The parameters of Menzerath-Altmann law in genomes deviate significantly from a power law with a -1 exponent with the exception of birds and cartilaginous fishes.
△ Less
Submitted 14 June, 2012; v1 submitted 9 January, 2012;
originally announced January 2012.