Skip to main content

Showing 1–4 of 4 results for author: Wadekar, S N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.17927  [pdf, other

    cs.AI cs.CL cs.CV cs.LG eess.AS

    The Evolution of Multimodal Model Architectures

    Authors: Shakti N. Wadekar, Abhishek Chaurasia, Aman Chadha, Eugenio Culurciello

    Abstract: This work uniquely identifies and characterizes four prevalent multimodal model architectural patterns in the contemporary multimodal landscape. Systematically categorizing models by architecture type facilitates monitoring of developments in the multimodal domain. Distinct from recent survey papers that present general information on multimodal architectures, this research conducts a comprehensiv… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 30 pages, 6 tables, 7 figures

  2. arXiv:2305.07167  [pdf, other

    cs.CV cs.CL cs.LG eess.IV

    OneCAD: One Classifier for All image Datasets using multimodal learning

    Authors: Shakti N. Wadekar, Eugenio Culurciello

    Abstract: Vision-Transformers (ViTs) and Convolutional neural networks (CNNs) are widely used Deep Neural Networks (DNNs) for classification task. These model architectures are dependent on the number of classes in the dataset it was trained on. Any change in number of classes leads to change (partial or full) in the model's architecture. This work addresses the question: Is it possible to create a number-o… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

    Comments: 8 pages, 6 figures

  3. arXiv:2209.15159  [pdf, other

    cs.CV cs.AI cs.LG

    MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features

    Authors: Shakti N. Wadekar, Abhishek Chaurasia

    Abstract: MobileViT (MobileViTv1) combines convolutional neural networks (CNNs) and vision transformers (ViTs) to create light-weight models for mobile vision tasks. Though the main MobileViTv1-block helps to achieve competitive state-of-the-art results, the fusion block inside MobileViTv1-block, creates scaling challenges and has a complex learning task. We propose changes to the fusion block that are simp… ▽ More

    Submitted 6 October, 2022; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: 20 pages, 7 figures

  4. arXiv:2105.01799  [pdf, other

    cs.RO cs.AI

    Towards End-to-End Deep Learning for Autonomous Racing: On Data Collection and a Unified Architecture for Steering and Throttle Prediction

    Authors: Shakti N. Wadekar, Benjamin J. Schwartz, Shyam S. Kannan, Manuel Mar, Rohan Kumar Manna, Vishnu Chellapandi, Daniel J. Gonzalez, Aly El Gamal

    Abstract: Deep Neural Networks (DNNs) which are trained end-to-end have been successfully applied to solve complex problems that we have not been able to solve in past decades. Autonomous driving is one of the most complex problems which is yet to be completely solved and autonomous racing adds more complexity and exciting challenges to this problem. Towards the challenge of applying end-to-end learning to… ▽ More

    Submitted 4 May, 2021; originally announced May 2021.

    Comments: 6 pages, 10 figures, 3 tables