Skip to main content

Showing 1–11 of 11 results for author: Karjus, A

.
  1. arXiv:2406.01278  [pdf, other

    cs.CV cs.AI cs.CC cs.LG

    fruit-SALAD: A Style Aligned Artwork Dataset to reveal similarity perception in image embeddings

    Authors: Tillmann Ohm, Andres Karjus, Mikhail Tamm, Maximilian Schich

    Abstract: The notion of visual similarity is essential for computer vision, and in applications and studies revolving around vector embeddings of images. However, the scarcity of benchmark datasets poses a significant hurdle in exploring how these models perceive similarity. Here we introduce Style Aligned Artwork Datasets (SALADs), and an example of fruit-SALAD with 10,000 images of fruit depictions. This… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  2. arXiv:2309.14379  [pdf, other

    cs.CL cs.AI cs.CY

    Machine-assisted mixed methods: augmenting humanities and social sciences with artificial intelligence

    Authors: Andres Karjus

    Abstract: The increasing capacities of large language models (LLMs) present an unprecedented opportunity to scale up data analytics in the humanities and social sciences, augmenting and automating qualitative analytic tasks previously typically allocated to human labor. This contribution proposes a systematic mixed methods framework to harness qualitative analytic expertise, machine scalability, and rigorou… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

  3. arXiv:2309.01659  [pdf, other

    cs.SI cs.CL

    Evolving linguistic divergence on polarizing social media

    Authors: Andres Karjus, Christine Cuskley

    Abstract: Language change is influenced by many factors, but often starts from synchronic variation, where multiple linguistic patterns or forms coexist, or where different speech communities use language in increasingly different ways. Besides regional or economic reasons, communities may form and segregate based on political alignment. The latter, referred to as political polarization, is of growing socie… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

  4. arXiv:2305.15914  [pdf, other

    cs.CL

    Reliable Detection and Quantification of Selective Forces in Language Change

    Authors: Juan Guerrero Montero, Andres Karjus, Kenny Smith, Richard A. Blythe

    Abstract: Language change is a cultural evolutionary process in which variants of linguistic variables change in frequency through processes analogous to mutation, selection and genetic drift. In this work, we apply a recently-introduced method to corpus data to quantify the strength of selection in specific instances of historical language change. We first demonstrate, in the context of English irregular v… ▽ More

    Submitted 21 August, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

  5. arXiv:2305.13047  [pdf

    cs.CL

    Automated stance detection in complex topics and small languages: the challenging case of immigration in polarizing news media

    Authors: Mark Mets, Andres Karjus, Indrek Ibrus, Maximilian Schich

    Abstract: Automated stance detection and related machine learning methods can provide useful insights for media monitoring and academic research. Many of these approaches require annotated training datasets, which limits their applicability for languages where these may not be readily available. This paper explores the applicability of large language models for automated stance detection in a challenging sc… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  6. arXiv:2305.06809  [pdf, other

    cs.CV cs.HC

    Collection Space Navigator: An Interactive Visualization Interface for Multidimensional Datasets

    Authors: Tillmann Ohm, Mar Canet Solà, Andres Karjus, Maximilian Schich

    Abstract: We introduce the Collection Space Navigator (CSN), a browser-based visualization tool to explore, research, and curate large collections of visual digital artifacts that are associated with multidimensional data, such as vector embeddings or tables of metadata. Media objects such as images are often encoded as numerical vectors, for e.g. based on metadata or using machine learning to embed image i… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

  7. arXiv:2205.10271  [pdf, other

    cs.CV

    Compression ensembles quantify aesthetic complexity and the evolution of visual art

    Authors: Andres Karjus, Mar Canet Solà, Tillmann Ohm, Sebastian E. Ahnert, Maximilian Schich

    Abstract: The quantification of visual aesthetics and complexity have a long history, the latter previously operationalized via the application of compression algorithms. Here we generalize and extend the compression approach beyond simple complexity measures to quantify algorithmic distance in historical and contemporary visual media. The proposed "ensemble" approach works by compressing a large number of… ▽ More

    Submitted 20 May, 2022; originally announced May 2022.

  8. Conceptual similarity and communicative need shape colexification: an experimental study

    Authors: Andres Karjus, Richard A. Blythe, Simon Kirby, Tianyu Wang, Kenny Smith

    Abstract: Colexification refers to the phenomenon of multiple meanings sharing one word in a language. Cross-linguistic lexification patterns have been shown to be largely predictable, as similar concepts are often colexified. We test a recent claim that, beyond this general tendency, communicative needs play an important role in sha** colexification patterns. We approach this question by means of a serie… ▽ More

    Submitted 1 September, 2021; v1 submitted 19 March, 2021; originally announced March 2021.

    Journal ref: Cognitive Science (2021) 45 e1303

  9. arXiv:2006.09277  [pdf, other

    cs.CL

    Communicative need modulates competition in language change

    Authors: Andres Karjus, Richard A. Blythe, Simon Kirby, Kenny Smith

    Abstract: All living languages change over time. The causes for this are many, one being the emergence and borrowing of new linguistic elements. Competition between the new elements and older ones with a similar semantic or grammatical function may lead to speakers preferring one of them, and leaving the other to go out of use. We introduce a general method for quantifying competition between linguistic ele… ▽ More

    Submitted 16 June, 2020; originally announced June 2020.

  10. Challenges in detecting evolutionary forces in language change using diachronic corpora

    Authors: Andres Karjus, Richard A. Blythe, Simon Kirby, Kenny Smith

    Abstract: Newberry et al. (Detecting evolutionary forces in language change, Nature 551, 2017) tackle an important but difficult problem in linguistics, the testing of selective theories of language change against a null model of drift. Having applied a test from population genetics (the Frequency Increment Test) to a number of relevant examples, they suggest stochasticity has a previously under-appreciated… ▽ More

    Submitted 13 November, 2019; v1 submitted 3 November, 2018; originally announced November 2018.

    Journal ref: Glossa: a journal of general linguistics, 5(1) (2020), p.45

  11. Quantifying the dynamics of topical fluctuations in language

    Authors: Andres Karjus, Richard A. Blythe, Simon Kirby, Kenny Smith

    Abstract: The availability of large diachronic corpora has provided the impetus for a growing body of quantitative research on language evolution and meaning change. The central quantities in this research are token frequencies of linguistic elements in texts, with changes in frequency taken to reflect the popularity or selective fitness of an element. However, corpus frequencies may change for a wide varie… ▽ More

    Submitted 21 June, 2019; v1 submitted 2 June, 2018; originally announced June 2018.

    Comments: Code to run the analyses described in this paper is now available at https://github.com/andreskarjus/topical_cultural_advection_model . A previous shorter version of this paper outlining the basic model appeared as an extended abstract in the proceedings of the Society for Computation in Linguistics (Karjus et al. 2018, Topical advection as a baseline model for corpus-based lexical dynamics)

    Journal ref: Language Dynamics and Change, 10(1) 2020, 86-125