Skip to main content

Showing 1–15 of 15 results for author: Gromov, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09366  [pdf, other

    cs.LG cs.CV q-bio.NC

    Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations

    Authors: Rylan Schaeffer, Victor Lecomte, Dhruv Bhandarkar Pai, Andres Carranza, Berivan Isik, Alyssa Unell, Mikail Khona, Thomas Yerxa, Yann LeCun, SueYeon Chung, Andrey Gromov, Ravid Shwartz-Ziv, Sanmi Koyejo

    Abstract: Maximum Manifold Capacity Representations (MMCR) is a recent multi-view self-supervised learning (MVSSL) method that matches or surpasses other leading MVSSL methods. MMCR is intriguing because it does not fit neatly into any of the commonplace MVSSL lineages, instead originating from a statistical mechanical perspective on the linear separability of data manifolds. In this paper, we seek to impro… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  2. arXiv:2406.03495  [pdf, other

    cs.LG cond-mat.dis-nn hep-th math.NT stat.ML

    Grokking Modular Polynomials

    Authors: Darshil Doshi, Tianyu He, Aritra Das, Andrey Gromov

    Abstract: Neural networks readily learn a subset of the modular arithmetic tasks, while failing to generalize on the rest. This limitation remains unmoved by the choice of architecture and training strategies. On the other hand, an analytical solution for the weights of Multi-layer Perceptron (MLP) networks that generalize on the modular addition task is known in the literature. In this work, we (i) extend… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 7+4 pages, 3 figures, 2 tables

  3. arXiv:2406.02550  [pdf, other

    cs.LG cond-mat.dis-nn hep-th stat.ML

    Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks

    Authors: Tianyu He, Darshil Doshi, Aritra Das, Andrey Gromov

    Abstract: Large language models can solve tasks that were not present in the training set. This capability is believed to be due to in-context learning and skill composition. In this work, we study the emergence of in-context learning and skill composition in a collection of modular arithmetic tasks. Specifically, we consider a finite collection of linear modular functions… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 21 pages, 19 figures

  4. arXiv:2404.01413  [pdf, other

    cs.LG cs.AI cs.CL cs.ET stat.ML

    Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

    Authors: Matthias Gerstgrasser, Rylan Schaeffer, Apratim Dey, Rafael Rafailov, Henry Sleight, John Hughes, Tomasz Korbak, Rajashree Agrawal, Dhruv Pai, Andrey Gromov, Daniel A. Roberts, Diyi Yang, David L. Donoho, Sanmi Koyejo

    Abstract: The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops proposed that such loops would lead to a phenomenon termed model collapse, under which performance progressively degrades with each model-data feedback iteration… ▽ More

    Submitted 29 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  5. arXiv:2403.17887  [pdf, other

    cs.CL cs.LG stat.ML

    The Unreasonable Ineffectiveness of the Deeper Layers

    Authors: Andrey Gromov, Kushal Tirumala, Hassan Shapourian, Paolo Glorioso, Daniel A. Roberts

    Abstract: We empirically study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers are removed. To prune these models, we identify the optimal block of layers to prune by considering similarity across layers; then, to "heal" the damage… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 12 + 10 pages, 5 + 4 figures

    Report number: MIT-CTP/5694

  6. arXiv:2402.17392  [pdf, other

    cs.CL

    Spot the bot: Coarse-Grained Partition of Semantic Paths for Bots and Humans

    Authors: Vasilii A. Gromov, Alexandra S. Kogan

    Abstract: Nowadays, technology is rapidly advancing: bots are writing comments, articles, and reviews. Due to this fact, it is crucial to know if the text was written by a human or by a bot. This paper focuses on comparing structures of the coarse-grained partitions of semantic paths for human-written and bot-generated texts. We compare the clusterizations of datasets of n-grams from literary texts and text… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Journal ref: Pattern Recognition and Machine Intelligence, 2023. pp. 348--355

  7. arXiv:2402.10202  [pdf, other

    cs.LG

    Bridging Associative Memory and Probabilistic Modeling

    Authors: Rylan Schaeffer, Nika Zahedi, Mikail Khona, Dhruv Pai, Sang Truong, Yilun Du, Mitchell Ostrow, Sarthak Chandra, Andres Carranza, Ila Rani Fiete, Andrey Gromov, Sanmi Koyejo

    Abstract: Associative memory and probabilistic modeling are two fundamental topics in artificial intelligence. The first studies recurrent neural networks designed to denoise, complete and retrieve data, whereas the second studies learning and sampling from probability distributions. Based on the observation that associative memory's energy functions can be seen as probabilistic modeling's negative log like… ▽ More

    Submitted 13 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  8. Formal concept analysis for evaluating intrinsic dimension of a natural language

    Authors: Sergei O. Kuznetsov, Vasilii A. Gromov, Nikita S. Borodin, Andrei M. Divavin

    Abstract: Some results of a computational experiment for determining the intrinsic dimension of linguistic varieties for the Bengali and Russian languages are presented. At the same time, both sets of words and sets of bigrams in these languages were considered separately. The method used to solve this problem was based on formal concept analysis algorithms. It was found that the intrinsic dimensions of the… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

    Comments: Preprint, 10th International Conference on Pattern Recognition and Machine Intelligence (PReMI 2023)

  9. arXiv:2311.10217  [pdf

    cs.CL cs.AI math.AT nlin.CD

    A Language and Its Dimensions: Intrinsic Dimensions of Language Fractal Structures

    Authors: Vasilii A. Gromov, Nikita S. Borodin, Asel S. Yerbolova

    Abstract: The present paper introduces a novel object of study - a language fractal structure. We hypothesize that a set of embeddings of all $n$-grams of a natural language constitutes a representative sample of this fractal set. (We use the term Hailonakea to refer to the sum total of all language fractal structures, over all $n$). The paper estimates intrinsic (genuine) dimensions of language fractal str… ▽ More

    Submitted 20 November, 2023; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: Preprint. Under review

  10. arXiv:2310.13061  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets

    Authors: Darshil Doshi, Aritra Das, Tianyu He, Andrey Gromov

    Abstract: Robust generalization is a major challenge in deep learning, particularly when the number of trainable parameters is very large. In general, it is very difficult to know if the network has memorized a particular set of examples or understood the underlying rule (or both). Motivated by this challenge, we study an interpretable model where generalizing representations are understood analytically, an… ▽ More

    Submitted 4 March, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: 9+20 pages, 7+25 figures, 2 tables

  11. arXiv:2309.16413  [pdf, other

    cs.NE cs.AI

    Genetic Engineering Algorithm (GEA): An Efficient Metaheuristic Algorithm for Solving Combinatorial Optimization Problems

    Authors: Majid Sohrabi, Amir M. Fathollahi-Fard, Vasilii A. Gromov

    Abstract: Genetic Algorithms (GAs) are known for their efficiency in solving combinatorial optimization problems, thanks to their ability to explore diverse solution spaces, handle various representations, exploit parallelism, preserve good solutions, adapt to changing dynamics, handle combinatorial diversity, and provide heuristic search. However, limitations such as premature convergence, lack of problem-… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: Accepted in Data Analytics and Management in Data Intensive Domains (DAMDID/RCDL 2023)

  12. arXiv:2309.14399  [pdf, other

    physics.med-ph cs.AI q-bio.QM

    Date-Driven Approach for Identifying State of Hemodialysis Fistulas: Entropy-Complexity and Formal Concept Analysis

    Authors: Vasilii A. Gromov, E. I. Zvorykina, Yurii N. Beschastnov, Majid Sohrabi

    Abstract: The paper explores mathematical methods that differentiate regular and chaotic time series, specifically for identifying pathological fistulas. It proposes a noise-resistant method for classifying responding rows of normally and pathologically functioning fistulas. This approach is grounded in the hypothesis that laminar blood flow signifies normal function, while turbulent flow indicates patholog… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: Accepted in AIST-2023 conference. Yerevan, Armenia

  13. arXiv:2301.02679  [pdf, other

    cs.LG cond-mat.dis-nn

    Grokking modular arithmetic

    Authors: Andrey Gromov

    Abstract: We present a simple neural network that can learn modular arithmetic tasks and exhibits a sudden jump in generalization known as ``grokking''. Concretely, we present (i) fully-connected two-layer networks that exhibit grokking on various modular arithmetic tasks under vanilla gradient descent with the MSE loss function in the absence of any regularization; (ii) evidence that grokking modular arith… ▽ More

    Submitted 6 January, 2023; originally announced January 2023.

    Comments: 11+5 pages, 10 figures

  14. arXiv:2206.13568  [pdf, other

    stat.ML cond-mat.dis-nn cond-mat.stat-mech cs.LG

    AutoInit: Automatic Initialization via Jacobian Tuning

    Authors: Tianyu He, Darshil Doshi, Andrey Gromov

    Abstract: Good initialization is essential for training Deep Neural Networks (DNNs). Oftentimes such initialization is found through a trial and error approach, which has to be applied anew every time an architecture is substantially modified, or inherited from smaller size networks leading to sub-optimal initialization. In this work we introduce a new and cheap algorithm, that allows one to find a good ini… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: 22 pages, 5 figures

  15. arXiv:2111.12143  [pdf, other

    cs.LG cond-mat.dis-nn hep-th stat.ML

    Critical Initialization of Wide and Deep Neural Networks through Partial Jacobians: General Theory and Applications

    Authors: Darshil Doshi, Tianyu He, Andrey Gromov

    Abstract: Deep neural networks are notorious for defying theoretical treatment. However, when the number of parameters in each layer tends to infinity, the network function is a Gaussian process (GP) and quantitatively predictive description is possible. Gaussian approximation allows one to formulate criteria for selecting hyperparameters, such as variances of weights and biases, as well as the learning rat… ▽ More

    Submitted 5 October, 2023; v1 submitted 23 November, 2021; originally announced November 2021.

    Comments: Accepted (spotlight) at NeurIPS2023. Additional ResNet results. 42 pages, 12 figures