-
What is a good doge? Analyzing the patrician social network of the Republic of Venice
Authors:
J. J. Merelo-Guervós
Abstract:
The Venetian republic was one of the most successful trans-modern states, surviving for a millennium through innovation, commercial cunning, exploitation of colonies and legal stability. Part of the success might be due to its government structure, a republic ruled by a doge chosen among a relatively limited set of Venetian patrician families. In this paper we analyze the structure of the social n…
▽ More
The Venetian republic was one of the most successful trans-modern states, surviving for a millennium through innovation, commercial cunning, exploitation of colonies and legal stability. Part of the success might be due to its government structure, a republic ruled by a doge chosen among a relatively limited set of Venetian patrician families. In this paper we analyze the structure of the social network they formed through marriage, and how government was monopolized by a relatively small set of families, the ones that became patrician first.
△ Less
Submitted 20 February, 2023; v1 submitted 12 September, 2022;
originally announced September 2022.
-
Agile (data) science: a (draft) manifesto
Authors:
Juan Julián Merelo-Guervós,
Mario García-Valdez
Abstract:
Science has a data management problem, as well as a project management problem. While industrial-grade data science teams have embraced the agile mindset, and adopted or created all kind of tools to create reproducible workflows, academia-based science is still (mostly) mired in a mindset that is focused on a single final product (a paper), without focusing on incremental improvement, on any speci…
▽ More
Science has a data management problem, as well as a project management problem. While industrial-grade data science teams have embraced the agile mindset, and adopted or created all kind of tools to create reproducible workflows, academia-based science is still (mostly) mired in a mindset that is focused on a single final product (a paper), without focusing on incremental improvement, on any specific problem or customer, or, paying any attention reproducibility. In this report we argue towards the adoption of the agile mindset and agile data science tools in academia, to make a more responsible, and over all, reproducible science.
△ Less
Submitted 4 July, 2022; v1 submitted 9 April, 2021;
originally announced April 2021.
-
Tropes in films: an initial analysis
Authors:
Rubén Héctor García-Ortega,
Pablo García Sánchez,
Juan J. Merelo-Guervós
Abstract:
TVTropes is a wiki that describes tropes and which ones are used in which artistic work. We are mostly interested in films, so after releasing the TropeScraper Python module that extracts data from this site, in this report we use scraped information to describe statistically how tropes and films are related to each other and how these relations evolve in time. In order to do so, we generated a da…
▽ More
TVTropes is a wiki that describes tropes and which ones are used in which artistic work. We are mostly interested in films, so after releasing the TropeScraper Python module that extracts data from this site, in this report we use scraped information to describe statistically how tropes and films are related to each other and how these relations evolve in time. In order to do so, we generated a dataset through the tool TropeScraper in April 2020. We have compared it to the latest snapshot of DB Tropes, a dataset covering the same site and published in July 2016, providing descriptive analysis, studying the fundamental differences and addressing the evolution of the wiki in terms of the number of tropes, the number of films and connections. The results show that the number of tropes and films doubled their value and quadrupled their relations, and films are, at large, better described in terms of tropes. However, while the types of films with the most tropes has not changed significantly in years, the list of most popular tropes has. This outcome can help on shedding some light on how popular tropes evolve, which ones become more popular or fade away, and in general how a set of tropes represents a film and might be a key to its success. The dataset generated, the information extracted, and the summaries provided are useful resources for any research involving films and tropes. They can provide proper context and explanations about the behaviour of models built on top of the dataset, including the generation of new content or its use in machine learning.
△ Less
Submitted 12 April, 2021; v1 submitted 7 June, 2020;
originally announced June 2020.
-
Power laws in code repositories: A skeptical approach
Authors:
Bartolomé Ortiz,
J. J. Merelo-Guervós
Abstract:
Software development as done using modern methodologies and source control management systems, has been often established as an example of self-organization, with code growing and evolving organically, through activities that do not stem from entralized power, leader or directives. The main challenge in proving these claims is that self organization cannot be detected through direct observation, b…
▽ More
Software development as done using modern methodologies and source control management systems, has been often established as an example of self-organization, with code growing and evolving organically, through activities that do not stem from entralized power, leader or directives. The main challenge in proving these claims is that self organization cannot be detected through direct observation, but through measurements on the system, looking for hints such as the existence of power laws over some features, such as the size of changes over time. The problem we intend to tackle in this paper is to establish a methodology for checking, for a chosen set of repositories we had already measured in the past, if the claims about power laws actually hold from a precise mathematical point of view, since, although shown as pervasive in the software engineering literature (and others), power laws are more elusive than they might seem at first sight. For that reason, in this paper we present a statistically accurate set of tests that will help us decide, from the way repositories are changing, if they are really distributed by a power law, which could indicate us the existence of a state reached via self-organization, or actually, how accurately a power law fits the observed distribution of the size of changes of commits in git repositories of 16 open source repositories. We revisit one of the most representative papers of these observations to reevaluate its results and compare them with the current status of the repositories analyzed in it, trying to elucidate if there has been any change in the possible presence, or not, of a power law.
△ Less
Submitted 27 May, 2019;
originally announced May 2019.
-
Overview of PicTropes, a film trope dataset
Authors:
Rubén H. García-Ortega,
Juan J. Merelo-Guervós,
Pablo García Sánchez,
Gad Pitaru
Abstract:
From the database DBTropes.org, we have created a dataset of films and the tropes that they use, which we have called PicTropes. In this report we provide the descriptive analysis and a further discussion on the dataset PicTropes: The extracted features will help us decide the best values for a future recommendation system and content generator, whereas the analysis of the distribution functions t…
▽ More
From the database DBTropes.org, we have created a dataset of films and the tropes that they use, which we have called PicTropes. In this report we provide the descriptive analysis and a further discussion on the dataset PicTropes: The extracted features will help us decide the best values for a future recommendation system and content generator, whereas the analysis of the distribution functions that fit the best will help us interpret the relation between the films and the tropes that were found inside them. Additionally, we provide rankings of the top-25 tropes and films, which will help us discuss and formulate questions to guide future extensions of the PicTropes dataset.
△ Less
Submitted 26 October, 2018; v1 submitted 28 September, 2018;
originally announced September 2018.
-
RedDwarfData: a simplified dataset of StarCraft matches
Authors:
Juan J. Merelo-Guervós,
Antonio Fernández-Ares,
Antonio Álvarez Caballero,
Pablo García-Sánchez,
Victor Rivas
Abstract:
The game Starcraft is one of the most interesting arenas to test new machine learning and computational intelligence techniques; however, StarCraft matches take a long time and creating a good dataset for training can be hard. Besides, analyzing match logs to extract the main characteristics can also be done in many different ways to the point that extracting and processing data itself can take an…
▽ More
The game Starcraft is one of the most interesting arenas to test new machine learning and computational intelligence techniques; however, StarCraft matches take a long time and creating a good dataset for training can be hard. Besides, analyzing match logs to extract the main characteristics can also be done in many different ways to the point that extracting and processing data itself can take an inordinate amount of time and of course, depending on what you choose, can bias learning algorithms. In this paper we present a simplified dataset extracted from the set of matches published by Robinson and Watson, which we have called RedDwarfData, containing several thousand matches processed to frames, so that temporal studies can also be undertaken. This dataset is available from GitHub under a free license. An initial analysis and appraisal of these matches is also made.
△ Less
Submitted 29 December, 2017;
originally announced December 2017.
-
Modeling browser-based distributed evolutionary computation systems
Authors:
Juan Julián Merelo-Guervós,
Pablo García-Sánchez
Abstract:
From the era of big science we are back to the "do it yourself", where you do not have any money to buy clusters or subscribe to grids but still have algorithms that crave many computing nodes and need them to measure scalability. Fortunately, this coincides with the era of big data, cloud computing, and browsers that include JavaScript virtual machines. Those are the reasons why this paper will f…
▽ More
From the era of big science we are back to the "do it yourself", where you do not have any money to buy clusters or subscribe to grids but still have algorithms that crave many computing nodes and need them to measure scalability. Fortunately, this coincides with the era of big data, cloud computing, and browsers that include JavaScript virtual machines. Those are the reasons why this paper will focus on two different aspects of volunteer or freeriding computing: first, the pragmatic: where to find those resources, which ones can be used, what kind of support you have to give them; and then, the theoretical: how evolutionary algorithms can be adapted to an environment in which nodes come and go, have different computing capabilities and operate in complete asynchrony of each other. We will examine the setup needed to create a very simple distributed evolutionary algorithm using JavaScript and then find a model of how users react to it by collecting data from several experiments featuring different classical benchmark functions.
△ Less
Submitted 22 March, 2015;
originally announced March 2015.
-
Adapting Heuristic Mastermind Strategies to Evolutionary Algorithms
Authors:
Tomas Philip Runarsson,
Juan J. Merelo-Guervos
Abstract:
The art of solving the Mastermind puzzle was initiated by Donald Knuth and is already more than 30 years old; despite that, it still receives much attention in operational research and computer games journals, not to mention the nature-inspired stochastic algorithm literature. In this paper we try to suggest a strategy that will allow nature-inspired algorithms to obtain results as good as those…
▽ More
The art of solving the Mastermind puzzle was initiated by Donald Knuth and is already more than 30 years old; despite that, it still receives much attention in operational research and computer games journals, not to mention the nature-inspired stochastic algorithm literature. In this paper we try to suggest a strategy that will allow nature-inspired algorithms to obtain results as good as those based on exhaustive search strategies; in order to do that, we first review, compare and improve current approaches to solving the puzzle; then we test one of these strategies with an estimation of distribution algorithm. Finally, we try to find a strategy that falls short of being exhaustive, and is then amenable for inclusion in nature inspired algorithms (such as evolutionary or particle swarm algorithms). This paper proves that by the incorporation of local entropy into the fitness function of the evolutionary algorithm it becomes a better player than a random one, and gives a rule of thumb on how to incorporate the best heuristic strategies to evolutionary algorithms without incurring in an excessive computational cost.
△ Less
Submitted 12 December, 2009;
originally announced December 2009.