Search | arXiv e-print repository

doi 10.1371/journal.pone.0260849

Zipf's laws of meaning in Catalan

Authors: Neus Català, Jaume Baixeries, Ramon Ferrer-Cancho, Lluís Padró, Antoni Hernández-Fernández

Abstract: In his pioneering research, G. K. Zipf formulated a couple of statistical laws on the relationship between the frequency of a word with its number of meanings: the law of meaning distribution, relating the frequency of a word and its frequency rank, and the meaning-frequency law, relating the frequency of a word with its number of meanings. Although these laws were formulated more than half a cent… ▽ More In his pioneering research, G. K. Zipf formulated a couple of statistical laws on the relationship between the frequency of a word with its number of meanings: the law of meaning distribution, relating the frequency of a word and its frequency rank, and the meaning-frequency law, relating the frequency of a word with its number of meanings. Although these laws were formulated more than half a century ago, they have been only investigated in a few languages. Here we present the first study of these laws in Catalan. We verify these laws in Catalan via the relationship among their exponents and that of the rank-frequency law. We present a new protocol for the analysis of these Zipfian laws that can be extended to other languages. We report the first evidence of two marked regimes for these laws in written language and speech, paralleling the two regimes in Zipf's rank-frequency law in large multi-author corpora discovered in early 2000s. Finally, the implications of these two regimes will be discussed. △ Less

Submitted 30 June, 2021; originally announced July 2021.

Comments: 21 pages, 11 figures

arXiv:1909.12339 [pdf, other]

Coin_flipper at eHealth-KD Challenge 2019: Voting LSTMs for Key Phrases and Semantic Relation Identification Applied to Spanish eHealth Texts

Authors: Neus Català, Mario Martin

Abstract: This paper describes our approach presented for the eHealth-KD 2019 challenge. Our participation was aimed at testing how far we could go using generic tools for Text-Processing but, at the same time, using common optimization techniques in the field of Data Mining. The architecture proposed for both tasks of the challenge is a standard stacked 2-layer bi-LSTM. The main particularities of our appr… ▽ More This paper describes our approach presented for the eHealth-KD 2019 challenge. Our participation was aimed at testing how far we could go using generic tools for Text-Processing but, at the same time, using common optimization techniques in the field of Data Mining. The architecture proposed for both tasks of the challenge is a standard stacked 2-layer bi-LSTM. The main particularities of our approach are: (a) The use of a surrogate function of F1 as loss function to close the gap between the minimization function and the evaluation metric, and (b) The generation of an ensemble of models for generating predictions by majority vote. Our system ranked second with an F1 score of 62.18% in the main task by a narrow margin with the winner that scored 63.94%. △ Less

Submitted 26 September, 2019; originally announced September 2019.

Comments: 9 pages, 2 figures, 1 table

Journal ref: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019) co-located with 35th Conference of the Spanish Society for Natural Language Processing (SEPLN 2019) http://ceur-ws.org/Vol-2421/

arXiv:1904.00812 [pdf, other]

doi 10.1016/j.csl.2019.03.007

Polysemy and brevity versus frequency in language

Authors: Bernardino Casas, Antoni Hernández-Fernández, Neus Català, Ramon Ferrer-i-Cancho, Jaume Baixeries

Abstract: The pioneering research of G. K. Zipf on the relationship between word frequency and other word features led to the formulation of various linguistic laws. The most popular is Zipf's law for word frequencies. Here we focus on two laws that have been studied less intensively: the meaning-frequency law, i.e. the tendency of more frequent words to be more polysemous, and the law of abbreviation, i.e.… ▽ More The pioneering research of G. K. Zipf on the relationship between word frequency and other word features led to the formulation of various linguistic laws. The most popular is Zipf's law for word frequencies. Here we focus on two laws that have been studied less intensively: the meaning-frequency law, i.e. the tendency of more frequent words to be more polysemous, and the law of abbreviation, i.e. the tendency of more frequent words to be shorter. In a previous work, we tested the robustness of these Zipfian laws for English, roughly measuring word length in number of characters and distinguishing adult from child speech. In the present article, we extend our study to other languages (Dutch and Spanish) and introduce two additional measures of length: syllabic length and phonemic length. Our correlation analysis indicates that both the meaning-frequency law and the law of abbreviation hold overall in all the analyzed languages. △ Less

Submitted 27 March, 2019; originally announced April 2019.

Journal ref: Computer Speech and Language 58, 19-50 (2019)

arXiv:1611.08807 [pdf, other]

doi 10.1075/is.16036.cas

The polysemy of the words that children learn over time

Authors: Bernardino Casas, Neus Català, Ramon Ferrer-i-Cancho, Antoni Hernández-Fernández, Jaume Baixeries

Abstract: Here we study polysemy as a potential learning bias in vocabulary learning in children. Words of low polysemy could be preferred as they reduce the disambiguation effort for the listener. However, such preference could be a side-effect of another bias: the preference of children for nouns in combination with the lower polysemy of nouns with respect to other part-of-speech categories. Our results s… ▽ More Here we study polysemy as a potential learning bias in vocabulary learning in children. Words of low polysemy could be preferred as they reduce the disambiguation effort for the listener. However, such preference could be a side-effect of another bias: the preference of children for nouns in combination with the lower polysemy of nouns with respect to other part-of-speech categories. Our results show that mean polysemy in children increases over time in two phases, i.e. a fast growth till the 31st month followed by a slower tendency towards adult speech. In contrast, this evolution is not found in adults interacting with children. This suggests that children have a preference for non-polysemous words in their early stages of vocabulary acquisition. Interestingly, the evolutionary pattern described above weakens when controlling for syntactic category (noun, verb, adjective or adverb) but it does not disappear completely, suggesting that it could result from acombination of a standalone bias for low polysemy and a preference for nouns. △ Less

Submitted 26 March, 2019; v1 submitted 27 November, 2016; originally announced November 2016.

Comments: Substantially revised version based on referee comments from Interaction Studies

Journal ref: Interaction Studies 19 (3), 389-426 (2018)

Showing 1–4 of 4 results for author: Català, N