Skip to main content

Showing 1–14 of 14 results for author: Svete, A

.
  1. arXiv:2406.14197  [pdf, other

    cs.CL cs.FL

    On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning

    Authors: Franz Nowak, Anej Svete, Alexandra Butoi, Ryan Cotterell

    Abstract: The performance of modern language models (LMs) has been improved by chain-of-thought (CoT) reasoning, i.e., the process of generating intermediate results that guide the model towards a final answer. A possible explanation for this improvement is that CoT reasoning extends an LM's computational power, as RNNs and transformers with additional scratch space are known to be Turing complete. Comparin… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: To be published at ACL 2024

  2. arXiv:2406.10203  [pdf, other

    cs.CL

    A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors

    Authors: Naaman Tan, Josef Valvoda, Anej Svete, Tianyu Liu, Yanxia Qin, Kan Min-Yen, Ryan Cotterell

    Abstract: The relationship between the quality of a string and its probability $p(\boldsymbol{y})$ under a language model has been influential in the development of techniques to build good text generation systems. For example, several decoding algorithms have been motivated to manipulate $p(\boldsymbol{y})$ to produce higher-quality text. In this work, we examine the probability--quality relationship in la… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  3. arXiv:2406.04289  [pdf, other

    cs.CL

    What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages

    Authors: Nadav Borenstein, Anej Svete, Robin Chan, Josef Valvoda, Franz Nowak, Isabelle Augenstein, Eleanor Chodroff, Ryan Cotterell

    Abstract: What can large language models learn? By definition, language models (LM) are distributions over strings. Therefore, an intuitive way of addressing the above question is to formalize it as a matter of learnability of classes of distributions over strings. While prior work in this direction focused on assessing the theoretical limits, in contrast, we seek to understand the empirical learnability. U… ▽ More

    Submitted 10 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024

  4. arXiv:2406.02329  [pdf, other

    cs.CL cs.LG

    On Affine Homotopy between Language Encoders

    Authors: Robin SM Chan, Reda Boumasmoud, Anej Svete, Yuxin Ren, Qipeng Guo, Zhi**g **, Shauli Ravfogel, Mrinmaya Sachan, Bernhard Schölkopf, Mennatallah El-Assady, Ryan Cotterell

    Abstract: Pre-trained language encoders -- functions that represent text as vectors -- are an integral component of many NLP tasks. We tackle a natural question in language encoder analysis: What does it mean for two encoders to be similar? We contend that a faithful measure of similarity needs to be \emph{intrinsic}, that is, task-independent, yet still be informative of \emph{extrinsic} similarity -- the… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 10 pages

  5. arXiv:2405.19222  [pdf, other

    cs.CL

    Lower Bounds on the Expressivity of Recurrent Neural Language Models

    Authors: Anej Svete, Franz Nowak, Anisha Mohamed Sahabdeen, Ryan Cotterell

    Abstract: The recent successes and spread of large neural language models (LMs) call for a thorough understanding of their computational ability. Describing their computational abilities through LMs' \emph{representational capacity} is a lively area of research. However, investigation into the representational capacity of neural LMs has predominantly focused on their ability to \emph{recognize} formal langu… ▽ More

    Submitted 18 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  6. arXiv:2404.14994  [pdf, other

    cs.CL cs.AI cs.CC cs.FL cs.LG

    Transformers Can Represent $n$-gram Language Models

    Authors: Anej Svete, Ryan Cotterell

    Abstract: Existing work has analyzed the representational capacity of the transformer architecture by means of formal models of computation. However, the focus so far has been on analyzing the architecture in terms of language \emph{acceptance}. We contend that this is an ill-suited problem in the study of \emph{language models} (LMs), which are definitionally \emph{probability distributions} over strings.… ▽ More

    Submitted 20 June, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

  7. arXiv:2403.17240  [pdf, other

    cs.CL

    The Role of $n$-gram Smoothing in the Age of Neural Networks

    Authors: Luca Malagutti, Andrius Buinovskij, Anej Svete, Clara Meister, Afra Amini, Ryan Cotterell

    Abstract: For nearly three decades, language models derived from the $n$-gram assumption held the state of the art on the task. The key to their success lay in the application of various smoothing techniques that served to combat overfitting. However, when neural language models toppled $n$-gram models as the best performers, $n$-gram smoothing techniques became less relevant. Indeed, it would hardly be an… ▽ More

    Submitted 30 April, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: NAACL 2024

  8. arXiv:2402.15814  [pdf, other

    cs.CL cs.CC cs.LG

    On Efficiently Representing Regular Languages as RNNs

    Authors: Anej Svete, Robin Shing Moon Chan, Ryan Cotterell

    Abstract: Recent work by Hewitt et al. (2020) provides an interpretation of the empirical success of recurrent neural networks (RNNs) as language models (LMs). It shows that RNNs can efficiently represent bounded hierarchical structures that are prevalent in human language. This suggests that RNNs' success might be linked to their ability to model hierarchy. However, a closer inspection of Hewitt et al.'s (… ▽ More

    Submitted 18 June, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

  9. arXiv:2311.04329  [pdf, other

    cs.CL

    Formal Aspects of Language Modeling

    Authors: Ryan Cotterell, Anej Svete, Clara Meister, Tianyu Liu, Li Du

    Abstract: Large language models have become one of the most commonly deployed NLP inventions. In the past half-decade, their integration into core natural language processing tools has dramatically increased the performance of such tools, and they have entered the public discourse surrounding artificial intelligence. Consequently, it is important for both developers and researchers alike to understand the m… ▽ More

    Submitted 17 April, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

  10. On the Representational Capacity of Recurrent Neural Language Models

    Authors: Franz Nowak, Anej Svete, Li Du, Ryan Cotterell

    Abstract: This work investigates the computational expressivity of language models (LMs) based on recurrent neural networks (RNNs). Siegelmann and Sontag (1992) famously showed that RNNs with rational weights and hidden states and unbounded computation time are Turing complete. However, LMs define weightings over strings in addition to just (unweighted) language membership and the analysis of the computatio… ▽ More

    Submitted 30 May, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: Added requirement for non-negative probabilities to definitions 2.3 and 3.1, fixed typos

    Journal ref: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7011-7034

  11. arXiv:2310.05161  [pdf, other

    cs.CL cs.CC cs.LG

    Recurrent Neural Language Models as Probabilistic Finite-state Automata

    Authors: Anej Svete, Ryan Cotterell

    Abstract: Studying language models (LMs) in terms of well-understood formalisms allows us to precisely characterize their abilities and limitations. Previous work has investigated the representational capacity of recurrent neural network (RNN) LMs in terms of their capacity to recognize unweighted formal languages. However, LMs do not describe unweighted formal languages -- rather, they define \emph{probabi… ▽ More

    Submitted 19 December, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

    Comments: 9 pages

  12. arXiv:2307.15054  [pdf, other

    cs.CL

    A Geometric Notion of Causal Probing

    Authors: Clément Guerner, Anej Svete, Tianyu Liu, Alexander Warstadt, Ryan Cotterell

    Abstract: The linear subspace hypothesis (Bolukbasi et al., 2016) states that, in a language model's representation space, all information about a concept such as verbal number is encoded in a linear subspace. Prior work has relied on auxiliary classification tasks to identify and evaluate candidate subspaces that might give support for this hypothesis. We instead give a set of intrinsic criteria which char… ▽ More

    Submitted 24 February, 2024; v1 submitted 27 July, 2023; originally announced July 2023.

  13. arXiv:2301.06862  [pdf, other

    cs.DS cs.CL

    Algorithms for Acyclic Weighted Finite-State Automata with Failure Arcs

    Authors: Anej Svete, Benjamin Dayan, Tim Vieira, Ryan Cotterell, Jason Eisner

    Abstract: Weighted finite-state automata (WSFAs) are commonly used in NLP. Failure transitions are a useful extension for compactly representing backoffs or interpolation in $n$-gram models and CRFs, which are special cases of WFSAs. The pathsum in ordinary acyclic WFSAs is efficiently computed by the backward algorithm in time $O(|E|)$, where $E$ is the set of transitions. However, this does not allow fail… ▽ More

    Submitted 11 July, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

    Comments: 9 pages, Proceedings of EMNLP 2022

  14. arXiv:2002.06609  [pdf, other

    cs.SI

    It is not just about the Melody: How Europe Votes for its Favorite Songs

    Authors: Anej Svete, Jakob Hostnik

    Abstract: The Eurovision Song Contest is a popular annual international song competition organized by the European Broadcasting Union. The winner is decided by the audience and expert juries from each participating nation, which is why the analysis of its voting network offers a great insight into what factors, besides the quality of the performances, influence the voting decisions. In this paper, we pres… ▽ More

    Submitted 16 February, 2020; originally announced February 2020.