Skip to main content

Showing 1–15 of 15 results for author: Assylbekov, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.12660  [pdf, other

    cs.LG

    Gradient Descent Fails to Learn High-frequency Functions and Modular Arithmetic

    Authors: Rustem Takhanov, Maxat Tezekbayev, Artur Pak, Arman Bolatov, Zhenisbek Assylbekov

    Abstract: Classes of target functions containing a large number of approximately orthogonal elements are known to be hard to learn by the Statistical Query algorithms. Recently this classical fact re-emerged in a theory of gradient-based optimization of neural networks. In the novel framework, the hardness of a class is usually quantified by the variance of the gradient with respect to a random choice of a… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

  2. arXiv:2310.01611  [pdf, other

    cs.LG cs.CR

    Intractability of Learning the Discrete Logarithm with Gradient-Based Methods

    Authors: Rustem Takhanov, Maxat Tezekbayev, Artur Pak, Arman Bolatov, Zhibek Kadyrsizova, Zhenisbek Assylbekov

    Abstract: The discrete logarithm problem is a fundamental challenge in number theory with significant implications for cryptographic protocols. In this paper, we investigate the limitations of gradient-based methods for learning the parity bit of the discrete logarithm in finite cyclic groups of prime order. Our main result, supported by theoretical analysis and empirical verification, reveals the concentra… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Comments: ACML 2023

  3. arXiv:2307.10736  [pdf, other

    cs.LG stat.ML

    Long-Tail Theory under Gaussian Mixtures

    Authors: Arman Bolatov, Maxat Tezekbayev, Igor Melnykov, Artur Pak, Vassilina Nikoulina, Zhenisbek Assylbekov

    Abstract: We suggest a simple Gaussian mixture model for data generation that complies with Feldman's long tail theory (2020). We demonstrate that a linear classifier cannot decrease the generalization error below a certain level in the proposed model, whereas a nonlinear classifier with a memorization capacity can. This confirms that for long-tailed distributions, rare training examples must be considered… ▽ More

    Submitted 24 July, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: accepted to ECAI 2023

  4. arXiv:2204.12481  [pdf, other

    cs.CL

    From Hyperbolic Geometry Back to Word Embeddings

    Authors: Sultan Nurmukhamedov, Thomas Mach, Arsen Sheverdin, Zhenisbek Assylbekov

    Abstract: We choose random points in the hyperbolic disc and claim that these points are already word representations. However, it is yet to be uncovered which point corresponds to which word of the human language of interest. This correspondence can be approximately established using a pointwise mutual information between words and recent alignment techniques.

    Submitted 26 April, 2022; originally announced April 2022.

  5. arXiv:2111.06832  [pdf, other

    cs.CL cs.LG

    Speeding Up Entmax

    Authors: Maxat Tezekbayev, Vassilina Nikoulina, Matthias Gallé, Zhenisbek Assylbekov

    Abstract: Softmax is the de facto standard in modern neural networks for language processing when it comes to normalizing logits. However, by producing a dense probability distribution each token in the vocabulary has a nonzero chance of being selected at each generation step, leading to a variety of reported problems in text generation. $α$-entmax of Peters et al. (2019, arXiv:1905.05702) solves this probl… ▽ More

    Submitted 19 May, 2022; v1 submitted 12 November, 2021; originally announced November 2021.

    Comments: Findings of NAACL 2022

  6. The Rediscovery Hypothesis: Language Models Need to Meet Linguistics

    Authors: Vassilina Nikoulina, Maxat Tezekbayev, Nuradil Kozhakhmet, Madina Babazhanova, Matthias Gallé, Zhenisbek Assylbekov

    Abstract: There is an ongoing debate in the NLP community whether modern language models contain linguistic knowledge, recovered through so-called probes. In this paper, we study whether linguistic knowledge is a necessary condition for the good performance of modern language models, which we call the \textit{rediscovery hypothesis}. In the first place, we show that language models that are significantly co… ▽ More

    Submitted 3 January, 2022; v1 submitted 2 March, 2021; originally announced March 2021.

    Journal ref: Journal of Artificial Intelligence Vol. 72 (2021) 1343-1384

  7. arXiv:2002.12005  [pdf, other

    cs.CL stat.ML

    Squashed Shifted PMI Matrix: Bridging Word Embeddings and Hyperbolic Spaces

    Authors: Zhenisbek Assylbekov, Alibi Jangeldin

    Abstract: We show that removing sigmoid transformation in the skip-gram with negative sampling (SGNS) objective does not harm the quality of word vectors significantly and at the same time is related to factorizing a squashed shifted PMI matrix which, in turn, can be treated as a connection probabilities matrix of a random graph. Empirically, such graph is a complex network, i.e. it has strong clustering an… ▽ More

    Submitted 26 September, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

    Comments: AJCAI 2020

  8. arXiv:1912.13413  [pdf, other

    cs.CL

    Semantics- and Syntax-related Subvectors in the Skip-gram Embeddings

    Authors: Maxat Tezekbayev, Zhenisbek Assylbekov, Rustem Takhanov

    Abstract: We show that the skip-gram embedding of any word can be decomposed into two subvectors which roughly correspond to semantic and syntactic roles of the word.

    Submitted 23 December, 2019; originally announced December 2019.

    Comments: 2 pages, 1 figure, Student Abstract

  9. arXiv:1909.13494  [pdf, other

    cs.CL

    A Critique of the Smooth Inverse Frequency Sentence Embeddings

    Authors: Aidana Karipbayeva, Alena Sorokina, Zhenisbek Assylbekov

    Abstract: We critically review the smooth inverse frequency sentence embedding method of Arora, Liang, and Ma (2017), and show inconsistencies in its setup, derivation, and evaluation.

    Submitted 30 September, 2019; originally announced September 2019.

    Comments: 2 pages, 2 figures, Abstract

  10. arXiv:1909.09855  [pdf, other

    cs.CL

    Low-Rank Approximation of Matrices for PMI-based Word Embeddings

    Authors: Alena Sorokina, Aidana Karipbayeva, Zhenisbek Assylbekov

    Abstract: We perform an empirical evaluation of several methods of low-rank approximation in the problem of obtaining PMI-based word embeddings. All word vectors were trained on parts of a large corpus extracted from English Wikipedia (enwik9) which was divided into two equal-sized datasets, from which PMI matrices were obtained. A repeated measures design was used in assigning a method of low-rank approxim… ▽ More

    Submitted 21 September, 2019; originally announced September 2019.

    Comments: 10 pages, 4 figures, CICLing 2019, Springer "Lecture Notes in Computer Science"

  11. arXiv:1902.09859  [pdf, other

    stat.ML cs.CL cs.LG

    Context Vectors are Reflections of Word Vectors in Half the Dimensions

    Authors: Zhenisbek Assylbekov, Rustem Takhanov

    Abstract: This paper takes a step towards theoretical analysis of the relationship between word embeddings and context embeddings in models such as word2vec. We start from basic probabilistic assumptions on the nature of word vectors, context vectors, and text generation. These assumptions are well supported either empirically or theoretically by the existing literature. Next, we show that under these assum… ▽ More

    Submitted 26 February, 2019; originally announced February 2019.

  12. Fourier Neural Networks: A Comparative Study

    Authors: Abylay Zhumekenov, Malika Uteuliyeva, Olzhas Kabdolov, Rustem Takhanov, Zhenisbek Assylbekov, Alejandro J. Castro

    Abstract: We review neural network architectures which were motivated by Fourier series and integrals and which are referred to as Fourier neural networks. These networks are empirically evaluated in synthetic and real-world tasks. Neither of them outperforms the standard neural network with sigmoid activation function in the real-world tasks. All neural networks, both Fourier and the standard one, empirica… ▽ More

    Submitted 8 February, 2019; originally announced February 2019.

    Journal ref: Intell. Data Anal. 24 (2020), 1107-1120

  13. arXiv:1802.08375  [pdf, other

    cs.CL cs.NE stat.ML

    Reusing Weights in Subword-aware Neural Language Models

    Authors: Zhenisbek Assylbekov, Rustem Takhanov

    Abstract: We propose several ways of reusing subword embeddings and other weights in subword-aware neural language models. The proposed techniques do not benefit a competitive character-aware model, but some of them improve the performance of syllable- and morpheme-aware models while showing significant reductions in model sizes. We discover a simple hands-on principle: in a multi-layer input embedding mode… ▽ More

    Submitted 25 April, 2018; v1 submitted 22 February, 2018; originally announced February 2018.

    Comments: accepted to NAACL 2018

    MSC Class: 68T50 ACM Class: I.2.7

  14. arXiv:1709.00541  [pdf, other

    cs.CL cs.LG

    Patterns versus Characters in Subword-aware Neural Language Modeling

    Authors: Rustem Takhanov, Zhenisbek Assylbekov

    Abstract: Words in some natural languages can have a composite structure. Elements of this structure include the root (that could also be composite), prefixes and suffixes with which various nuances and relations to other words can be expressed. Thus, in order to build a proper word representation one must take into account its internal structure. From a corpus of texts we extract a set of frequent subwords… ▽ More

    Submitted 2 September, 2017; originally announced September 2017.

    Comments: 10 pages

  15. arXiv:1707.06480  [pdf, other

    cs.CL cs.NE stat.ML

    Syllable-aware Neural Language Models: A Failure to Beat Character-aware Ones

    Authors: Zhenisbek Assylbekov, Rustem Takhanov, Bagdat Myrzakhmetov, Jonathan N. Washington

    Abstract: Syllabification does not seem to improve word-level RNN language modeling quality when compared to character-based segmentation. However, our best syllable-aware language model, achieving performance comparable to the competitive character-aware model, has 18%-33% fewer parameters and is trained 1.2-2.2 times faster.

    Submitted 20 July, 2017; originally announced July 2017.

    Comments: EMNLP 2017

    MSC Class: 68T50 ACM Class: I.2.7