Skip to main content

Showing 1–5 of 5 results for author: Pineda, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2301.06985  [pdf, other

    cs.CL physics.soc-ph

    Statistical analysis of word flow among five Indo-European languages

    Authors: Josué Ely Molina, Jorge Flores, Carlos Gershenson, Carlos Pineda

    Abstract: A recent increase in data availability has allowed the possibility to perform different statistical linguistic studies. Here we use the Google Books Ngram dataset to analyze word flow among English, French, German, Italian, and Spanish. We study what we define as ``migrant words'', a type of loanwords that do not change their spelling. We quantify migrant words from one language to another for dif… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

    Comments: 13 pages

  2. arXiv:2207.00709  [pdf, other

    cs.CL physics.soc-ph

    Language statistics at different spatial, temporal, and grammatical scales

    Authors: Fernanda Sánchez-Puig, Rogelio Lozano-Aranda, Dante Pérez-Méndez, Ewan Colman, Alfredo J. Morales-Guzmán, Carlos Pineda, Pedro Juan Rivera Torres, Carlos Gershenson

    Abstract: Statistical linguistics has advanced considerably in recent decades as data has become available. This has allowed researchers to study how statistical properties of languages change over time. In this work, we use data from Twitter to explore English and Spanish considering the rank diversity at different scales: temporal (from 3 to 96 hour intervals), spatial (from 3km to 3000+km radii), and gra… ▽ More

    Submitted 26 July, 2022; v1 submitted 1 July, 2022; originally announced July 2022.

  3. Statistical Properties of Rankings in Sports and Games

    Authors: José Antonio Morales, Jorge Flores, Carlos Gershenson, Carlos Pineda

    Abstract: Any collection can be ranked. Sports and games are common examples of ranked systems: players and teams are constantly ranked using different methods. The statistical properties of rankings have been studied for almost a century in a variety of fields. More recently, data availability has allowed us to study rank dynamics: how elements of a ranking change in time. Here, we study the rank distribut… ▽ More

    Submitted 5 November, 2021; originally announced November 2021.

    Comments: 15 pages

    Journal ref: Advances in Complex Systems, 0:2150007, 2021

  4. Identifying tax evasion in Mexico with tools from network science and machine learning

    Authors: Martin Zumaya, Rita Guerrero, Eduardo Islas, Omar Pineda, Carlos Gershenson, Gerardo Iñiguez, Carlos Pineda

    Abstract: Mexico has kept electronic records of all taxable transactions since 2014. Anonymized data collected by the Mexican federal government comprises more than 80 million contributors (individuals and companies) and almost 7 billion monthly-aggregations of invoices among contributors between January 2015 and December 2018. This data includes a list of almost ten thousand contributors already identified… ▽ More

    Submitted 24 April, 2021; originally announced April 2021.

    Comments: 26 pages, 12 figures

    Journal ref: Granados, O.M., Nicolás-Carlock, J.R. (eds) Corruption Networks. Understanding Complex Systems (2021, Springer, Cham)

  5. Rank diversity of languages: Generic behavior in computational linguistics

    Authors: Germinal Cocho, Jorge Flores, Carlos Gershenson, Carlos Pineda, Sergio Sánchez

    Abstract: Statistical studies of languages have focused on the rank-frequency distribution of words. Instead, we introduce here a measure of how word ranks change in time and call this distribution \emph{rank diversity}. We calculate this diversity for books published in six European languages since 1800, and find that it follows a universal lognormal distribution. Based on the mean and standard deviation a… ▽ More

    Submitted 14 May, 2015; originally announced May 2015.

    Journal ref: PLoS ONE 10(4): e0121898 (2015)