-
Zipf's laws of meaning in Catalan
Authors:
Neus Català,
Jaume Baixeries,
Ramon Ferrer-Cancho,
Lluís Padró,
Antoni Hernández-Fernández
Abstract:
In his pioneering research, G. K. Zipf formulated a couple of statistical laws on the relationship between the frequency of a word with its number of meanings: the law of meaning distribution, relating the frequency of a word and its frequency rank, and the meaning-frequency law, relating the frequency of a word with its number of meanings. Although these laws were formulated more than half a cent…
▽ More
In his pioneering research, G. K. Zipf formulated a couple of statistical laws on the relationship between the frequency of a word with its number of meanings: the law of meaning distribution, relating the frequency of a word and its frequency rank, and the meaning-frequency law, relating the frequency of a word with its number of meanings. Although these laws were formulated more than half a century ago, they have been only investigated in a few languages. Here we present the first study of these laws in Catalan.
We verify these laws in Catalan via the relationship among their exponents and that of the rank-frequency law. We present a new protocol for the analysis of these Zipfian laws that can be extended to other languages. We report the first evidence of two marked regimes for these laws in written language and speech, paralleling the two regimes in Zipf's rank-frequency law in large multi-author corpora discovered in early 2000s. Finally, the implications of these two regimes will be discussed.
△ Less
Submitted 30 June, 2021;
originally announced July 2021.
-
Coin_flipper at eHealth-KD Challenge 2019: Voting LSTMs for Key Phrases and Semantic Relation Identification Applied to Spanish eHealth Texts
Authors:
Neus Català,
Mario Martin
Abstract:
This paper describes our approach presented for the eHealth-KD 2019 challenge. Our participation was aimed at testing how far we could go using generic tools for Text-Processing but, at the same time, using common optimization techniques in the field of Data Mining. The architecture proposed for both tasks of the challenge is a standard stacked 2-layer bi-LSTM. The main particularities of our appr…
▽ More
This paper describes our approach presented for the eHealth-KD 2019 challenge. Our participation was aimed at testing how far we could go using generic tools for Text-Processing but, at the same time, using common optimization techniques in the field of Data Mining. The architecture proposed for both tasks of the challenge is a standard stacked 2-layer bi-LSTM. The main particularities of our approach are: (a) The use of a surrogate function of F1 as loss function to close the gap between the minimization function and the evaluation metric, and (b) The generation of an ensemble of models for generating predictions by majority vote. Our system ranked second with an F1 score of 62.18% in the main task by a narrow margin with the winner that scored 63.94%.
△ Less
Submitted 26 September, 2019;
originally announced September 2019.
-
Polysemy and brevity versus frequency in language
Authors:
Bernardino Casas,
Antoni Hernández-Fernández,
Neus Català,
Ramon Ferrer-i-Cancho,
Jaume Baixeries
Abstract:
The pioneering research of G. K. Zipf on the relationship between word frequency and other word features led to the formulation of various linguistic laws. The most popular is Zipf's law for word frequencies. Here we focus on two laws that have been studied less intensively: the meaning-frequency law, i.e. the tendency of more frequent words to be more polysemous, and the law of abbreviation, i.e.…
▽ More
The pioneering research of G. K. Zipf on the relationship between word frequency and other word features led to the formulation of various linguistic laws. The most popular is Zipf's law for word frequencies. Here we focus on two laws that have been studied less intensively: the meaning-frequency law, i.e. the tendency of more frequent words to be more polysemous, and the law of abbreviation, i.e. the tendency of more frequent words to be shorter. In a previous work, we tested the robustness of these Zipfian laws for English, roughly measuring word length in number of characters and distinguishing adult from child speech. In the present article, we extend our study to other languages (Dutch and Spanish) and introduce two additional measures of length: syllabic length and phonemic length. Our correlation analysis indicates that both the meaning-frequency law and the law of abbreviation hold overall in all the analyzed languages.
△ Less
Submitted 27 March, 2019;
originally announced April 2019.
-
The polysemy of the words that children learn over time
Authors:
Bernardino Casas,
Neus Català,
Ramon Ferrer-i-Cancho,
Antoni Hernández-Fernández,
Jaume Baixeries
Abstract:
Here we study polysemy as a potential learning bias in vocabulary learning in children. Words of low polysemy could be preferred as they reduce the disambiguation effort for the listener. However, such preference could be a side-effect of another bias: the preference of children for nouns in combination with the lower polysemy of nouns with respect to other part-of-speech categories. Our results s…
▽ More
Here we study polysemy as a potential learning bias in vocabulary learning in children. Words of low polysemy could be preferred as they reduce the disambiguation effort for the listener. However, such preference could be a side-effect of another bias: the preference of children for nouns in combination with the lower polysemy of nouns with respect to other part-of-speech categories. Our results show that mean polysemy in children increases over time in two phases, i.e. a fast growth till the 31st month followed by a slower tendency towards adult speech. In contrast, this evolution is not found in adults interacting with children. This suggests that children have a preference for non-polysemous words in their early stages of vocabulary acquisition. Interestingly, the evolutionary pattern described above weakens when controlling for syntactic category (noun, verb, adjective or adverb) but it does not disappear completely, suggesting that it could result from acombination of a standalone bias for low polysemy and a preference for nouns.
△ Less
Submitted 26 March, 2019; v1 submitted 27 November, 2016;
originally announced November 2016.