Skip to main content

Showing 1–7 of 7 results for author: Tezekbayev, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.12660  [pdf, other

    cs.LG

    Gradient Descent Fails to Learn High-frequency Functions and Modular Arithmetic

    Authors: Rustem Takhanov, Maxat Tezekbayev, Artur Pak, Arman Bolatov, Zhenisbek Assylbekov

    Abstract: Classes of target functions containing a large number of approximately orthogonal elements are known to be hard to learn by the Statistical Query algorithms. Recently this classical fact re-emerged in a theory of gradient-based optimization of neural networks. In the novel framework, the hardness of a class is usually quantified by the variance of the gradient with respect to a random choice of a… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

  2. arXiv:2310.01611  [pdf, other

    cs.LG cs.CR

    Intractability of Learning the Discrete Logarithm with Gradient-Based Methods

    Authors: Rustem Takhanov, Maxat Tezekbayev, Artur Pak, Arman Bolatov, Zhibek Kadyrsizova, Zhenisbek Assylbekov

    Abstract: The discrete logarithm problem is a fundamental challenge in number theory with significant implications for cryptographic protocols. In this paper, we investigate the limitations of gradient-based methods for learning the parity bit of the discrete logarithm in finite cyclic groups of prime order. Our main result, supported by theoretical analysis and empirical verification, reveals the concentra… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Comments: ACML 2023

  3. arXiv:2307.10736  [pdf, other

    cs.LG stat.ML

    Long-Tail Theory under Gaussian Mixtures

    Authors: Arman Bolatov, Maxat Tezekbayev, Igor Melnykov, Artur Pak, Vassilina Nikoulina, Zhenisbek Assylbekov

    Abstract: We suggest a simple Gaussian mixture model for data generation that complies with Feldman's long tail theory (2020). We demonstrate that a linear classifier cannot decrease the generalization error below a certain level in the proposed model, whereas a nonlinear classifier with a memorization capacity can. This confirms that for long-tailed distributions, rare training examples must be considered… ▽ More

    Submitted 24 July, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: accepted to ECAI 2023

  4. arXiv:2306.14194  [pdf, other

    cs.LG

    Autoencoders for a manifold learning problem with a Jacobian rank constraint

    Authors: Rustem Takhanov, Y. Sultan Abylkairov, Maxat Tezekbayev

    Abstract: We formulate the manifold learning problem as the problem of finding an operator that maps any point to a close neighbor that lies on a ``hidden'' $k$-dimensional manifold. We call this operator the correcting function. Under this formulation, autoencoders can be viewed as a tool to approximate the correcting function. Given an autoencoder whose Jacobian has rank $k$, we deduce from the classical… ▽ More

    Submitted 25 June, 2023; originally announced June 2023.

  5. arXiv:2111.06832  [pdf, other

    cs.CL cs.LG

    Speeding Up Entmax

    Authors: Maxat Tezekbayev, Vassilina Nikoulina, Matthias Gallé, Zhenisbek Assylbekov

    Abstract: Softmax is the de facto standard in modern neural networks for language processing when it comes to normalizing logits. However, by producing a dense probability distribution each token in the vocabulary has a nonzero chance of being selected at each generation step, leading to a variety of reported problems in text generation. $α$-entmax of Peters et al. (2019, arXiv:1905.05702) solves this probl… ▽ More

    Submitted 19 May, 2022; v1 submitted 12 November, 2021; originally announced November 2021.

    Comments: Findings of NAACL 2022

  6. The Rediscovery Hypothesis: Language Models Need to Meet Linguistics

    Authors: Vassilina Nikoulina, Maxat Tezekbayev, Nuradil Kozhakhmet, Madina Babazhanova, Matthias Gallé, Zhenisbek Assylbekov

    Abstract: There is an ongoing debate in the NLP community whether modern language models contain linguistic knowledge, recovered through so-called probes. In this paper, we study whether linguistic knowledge is a necessary condition for the good performance of modern language models, which we call the \textit{rediscovery hypothesis}. In the first place, we show that language models that are significantly co… ▽ More

    Submitted 3 January, 2022; v1 submitted 2 March, 2021; originally announced March 2021.

    Journal ref: Journal of Artificial Intelligence Vol. 72 (2021) 1343-1384

  7. arXiv:1912.13413  [pdf, other

    cs.CL

    Semantics- and Syntax-related Subvectors in the Skip-gram Embeddings

    Authors: Maxat Tezekbayev, Zhenisbek Assylbekov, Rustem Takhanov

    Abstract: We show that the skip-gram embedding of any word can be decomposed into two subvectors which roughly correspond to semantic and syntactic roles of the word.

    Submitted 23 December, 2019; originally announced December 2019.

    Comments: 2 pages, 1 figure, Student Abstract