Skip to main content

Showing 1–10 of 10 results for author: Veselovsky, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.02150  [pdf, other

    cs.CY

    The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates

    Authors: Giuseppe Russo Latona, Manoel Horta Ribeiro, Tim R. Davidson, Veniamin Veselovsky, Robert West

    Abstract: Journals and conferences worry that peer reviews assisted by artificial intelligence (AI), in particular, large language models (LLMs), may negatively influence the validity and fairness of the peer-review system, a cornerstone of modern science. In this work, we address this concern with a quasi-experimental study of the prevalence and impact of AI-assisted peer reviews in the context of the 2024… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: Manoel Horta Ribeiro, Tim R. Davidson, and Veniamin Veselovsky contributed equally to this work

  2. arXiv:2402.10588  [pdf, other

    cs.CL cs.CY

    Do Llamas Work in English? On the Latent Language of Multilingual Transformers

    Authors: Chris Wendler, Veniamin Veselovsky, Giovanni Monea, Robert West

    Abstract: We ask whether multilingual language models trained on unbalanced, English-dominated corpora use English as an internal pivot language -- a question of key importance for understanding how language models function and the origins of linguistic bias. Focusing on the Llama-2 family of transformer models, our study uses carefully constructed non-English prompts with a unique correct single-token cont… ▽ More

    Submitted 8 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: 12 pages. 28 with appendix

  3. arXiv:2401.04536  [pdf, other

    cs.CL cs.AI cs.LG

    Evaluating Language Model Agency through Negotiations

    Authors: Tim R. Davidson, Veniamin Veselovsky, Martin Josifoski, Maxime Peyrard, Antoine Bosselut, Michal Kosinski, Robert West

    Abstract: We introduce an approach to evaluate language model (LM) agency using negotiation games. This approach better reflects real-world use cases and addresses some of the shortcomings of alternative LM benchmarks. Negotiation games enable us to study multi-turn, and cross-model interactions, modulate complexity, and side-step accidental evaluation data leakage. We use our approach to test six widely us… ▽ More

    Submitted 16 March, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: Accepted to ICLR 2024, code and link to project data are made available at https://github.com/epfl-dlab/LAMEN

  4. arXiv:2312.09611  [pdf, other

    cs.CY cs.SI

    Capturing Dynamics in Online Public Discourse: A Case Study of Universal Basic Income Discussions on Reddit

    Authors: Rachel Kim, Veniamin Veselovsky, Ashton Anderson

    Abstract: Societal change is often driven by shifts in public opinion. As citizens evolve in their norms, beliefs, and values, public policies change too. While traditional opinion polling and surveys can outline the broad strokes of whether public opinion on a particular topic is changing, they usually cannot capture the full multi-dimensional richness and diversity of opinion present in a large heterogene… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  5. arXiv:2310.15683  [pdf, other

    cs.CL

    Prevalence and prevention of large language model use in crowd work

    Authors: Veniamin Veselovsky, Manoel Horta Ribeiro, Philip Cozzolino, Andrew Gordon, David Rothschild, Robert West

    Abstract: We show that the use of large language models (LLMs) is prevalent among crowd workers, and that targeted mitigation strategies can significantly reduce, but not eliminate, LLM use. On a text summarization task where workers were not directed in any way regarding their LLM use, the estimated prevalence of LLM use was around 30%, but was reduced by about half by asking workers to not use LLMs and by… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: VV and MHR equal contribution. 14 pages, 1 figure, 1 table

  6. arXiv:2306.17298  [pdf, other

    cs.CY

    Tube2Vec: Social and Semantic Embeddings of YouTube Channels

    Authors: LĂ©opaul Boesinger, Manoel Horta Ribeiro, Veniamin Veselovsky, Robert West

    Abstract: Research using YouTube data often explores social and semantic dimensions of channels and videos. Typically, analyses rely on laborious manual annotation of content and content creators, often found by low-recall methods such as keyword search. Here, we explore an alternative approach, using latent representations (embeddings) obtained via machine learning. Using a large dataset of YouTube links s… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

  7. arXiv:2306.07899  [pdf, other

    cs.CL cs.CY

    Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks

    Authors: Veniamin Veselovsky, Manoel Horta Ribeiro, Robert West

    Abstract: Large language models (LLMs) are remarkable data annotators. They can be used to generate high-fidelity supervised training data, as well as survey and experimental data. With the widespread adoption of LLMs, human gold--standard annotations are key to understanding the capabilities of LLMs and the validity of their results. However, crowdsourcing, an important, inexpensive way to obtain human ann… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

    Comments: 9 pages, 4 figures

  8. arXiv:2305.15041  [pdf, other

    cs.CL

    Generating Faithful Synthetic Data with Large Language Models: A Case Study in Computational Social Science

    Authors: Veniamin Veselovsky, Manoel Horta Ribeiro, Akhil Arora, Martin Josifoski, Ashton Anderson, Robert West

    Abstract: Large Language Models (LLMs) have democratized synthetic data generation, which in turn has the potential to simplify and broaden a wide gamut of NLP tasks. Here, we tackle a pervasive problem in synthetic data generation: its generative distribution often differs from the distribution of real-world data researchers care about (in other words, it is unfaithful). In a case study on sarcasm detectio… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: 8 pages

  9. arXiv:2304.10777  [pdf, other

    cs.CY

    Reddit in the Time of COVID

    Authors: Veniamin Veselovsky, Ashton Anderson

    Abstract: When the COVID-19 pandemic hit, much of life moved online. Platforms of all types reported surges of activity, and people remarked on the various important functions that online platforms suddenly fulfilled. However, researchers lack a rigorous understanding of the pandemic's impacts on social platforms, and whether they were temporary or long-lasting. We present a conceptual framework for studyin… ▽ More

    Submitted 21 April, 2023; originally announced April 2023.

    Comments: 12 pages, published in ICWSM 2023

  10. arXiv:2302.11225  [pdf, other

    cs.CY

    The Amplification Paradox in Recommender Systems

    Authors: Manoel Horta Ribeiro, Veniamin Veselovsky, Robert West

    Abstract: Automated audits of recommender systems found that blindly following recommendations leads users to increasingly partisan, conspiratorial, or false content. At the same time, studies using real user traces suggest that recommender systems are not the primary driver of attention toward extreme content; on the contrary, such content is mostly reached through other means, e.g., other websites. In thi… ▽ More

    Submitted 5 April, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

    Comments: Accepted at ICWSM'23 please cite accordingly