Skip to main content

Showing 1–8 of 8 results for author: Doshi, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.03495  [pdf, other

    cs.LG cond-mat.dis-nn hep-th math.NT stat.ML

    Grokking Modular Polynomials

    Authors: Darshil Doshi, Tianyu He, Aritra Das, Andrey Gromov

    Abstract: Neural networks readily learn a subset of the modular arithmetic tasks, while failing to generalize on the rest. This limitation remains unmoved by the choice of architecture and training strategies. On the other hand, an analytical solution for the weights of Multi-layer Perceptron (MLP) networks that generalize on the modular addition task is known in the literature. In this work, we (i) extend… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 7+4 pages, 3 figures, 2 tables

  2. arXiv:2406.02550  [pdf, other

    cs.LG cond-mat.dis-nn hep-th stat.ML

    Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks

    Authors: Tianyu He, Darshil Doshi, Aritra Das, Andrey Gromov

    Abstract: Large language models can solve tasks that were not present in the training set. This capability is believed to be due to in-context learning and skill composition. In this work, we study the emergence of in-context learning and skill composition in a collection of modular arithmetic tasks. Specifically, we consider a finite collection of linear modular functions… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 21 pages, 19 figures

  3. arXiv:2404.09886  [pdf, other

    cs.LG cs.CV

    ReffAKD: Resource-efficient Autoencoder-based Knowledge Distillation

    Authors: Divyang Doshi, Jung-Eun Kim

    Abstract: In this research, we propose an innovative method to boost Knowledge Distillation efficiency without the need for resource-heavy teacher models. Knowledge Distillation trains a smaller ``student'' model with guidance from a larger ``teacher'' model, which is computationally costly. However, the main benefit comes from the soft labels provided by the teacher, hel** the student grasp nuanced class… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  4. arXiv:2310.13061  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets

    Authors: Darshil Doshi, Aritra Das, Tianyu He, Andrey Gromov

    Abstract: Robust generalization is a major challenge in deep learning, particularly when the number of trainable parameters is very large. In general, it is very difficult to know if the network has memorized a particular set of examples or understood the underlying rule (or both). Motivated by this challenge, we study an interpretable model where generalizing representations are understood analytically, an… ▽ More

    Submitted 4 March, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: 9+20 pages, 7+25 figures, 2 tables

  5. Parking Spot Classification based on surround view camera system

    Authors: Andy Xiao, Deep Doshi, Lihao Wang, Harsha Gorantla, Thomas Heitzmann, Peter Groth

    Abstract: Surround-view fisheye cameras are commonly used for near-field sensing in automated driving scenarios, including urban driving and auto valet parking. Four fisheye cameras, one on each side, are sufficient to cover 360° around the vehicle capturing the entire near-field region. Based on surround view cameras, there has been much research on parking slot detection with main focus on the occupancy s… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: SPIE Optical Engineering + Applications, 2023, San Diego, California, United States. Proc. SPIE 12675, Applications of Machine Learning 2023

  6. arXiv:2206.13568  [pdf, other

    stat.ML cond-mat.dis-nn cond-mat.stat-mech cs.LG

    AutoInit: Automatic Initialization via Jacobian Tuning

    Authors: Tianyu He, Darshil Doshi, Andrey Gromov

    Abstract: Good initialization is essential for training Deep Neural Networks (DNNs). Oftentimes such initialization is found through a trial and error approach, which has to be applied anew every time an architecture is substantially modified, or inherited from smaller size networks leading to sub-optimal initialization. In this work we introduce a new and cheap algorithm, that allows one to find a good ini… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: 22 pages, 5 figures

  7. arXiv:2111.12143  [pdf, other

    cs.LG cond-mat.dis-nn hep-th stat.ML

    Critical Initialization of Wide and Deep Neural Networks through Partial Jacobians: General Theory and Applications

    Authors: Darshil Doshi, Tianyu He, Andrey Gromov

    Abstract: Deep neural networks are notorious for defying theoretical treatment. However, when the number of parameters in each layer tends to infinity, the network function is a Gaussian process (GP) and quantitatively predictive description is possible. Gaussian approximation allows one to formulate criteria for selecting hyperparameters, such as variances of weights and biases, as well as the learning rat… ▽ More

    Submitted 5 October, 2023; v1 submitted 23 November, 2021; originally announced November 2021.

    Comments: Accepted (spotlight) at NeurIPS2023. Additional ResNet results. 42 pages, 12 figures

  8. arXiv:2007.10571  [pdf, other

    cs.DC cs.PF

    AI Tax: The Hidden Cost of AI Data Center Applications

    Authors: Daniel Richins, Dharmisha Doshi, Matthew Blackmore, Aswathy Thulaseedharan Nair, Neha Pathapati, Ankit Patel, Brainard Daguman, Daniel Dobrijalowski, Ramesh Illikkal, Kevin Long, David Zimmerman, Vijay Janapa Reddi

    Abstract: Artificial intelligence and machine learning are experiencing widespread adoption in industry and academia. This has been driven by rapid advances in the applications and accuracy of AI through increasingly complex algorithms and models; this, in turn, has spurred research into specialized hardware AI accelerators. Given the rapid pace of advances, it is easy to forget that they are often develope… ▽ More

    Submitted 20 July, 2020; originally announced July 2020.

    Comments: 32 pages. 16 figures. Submitted to ACM "Transactions on Computer Systems."

    ACM Class: I.2; C.4