Skip to main content

Showing 1–22 of 22 results for author: Chavan, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.13039  [pdf, other

    cs.CL cs.AI

    Surgical Feature-Space Decomposition of LLMs: Why, When and How?

    Authors: Arnav Chavan, Nahush Lele, Deepak Gupta

    Abstract: Low-rank approximations, of the weight and feature space can enhance the performance of deep learning models, whether in terms of improving generalization or reducing the latency of inference. However, there is no clear consensus yet on \emph{how}, \emph{when} and \emph{why} these approximations are helpful for large language models (LLMs). In this work, we empirically study the efficacy of weight… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Accepted at ACL 2024

  2. arXiv:2402.12418  [pdf, other

    cs.LG cs.AI cs.NE

    Beyond Uniform Scaling: Exploring Depth Heterogeneity in Neural Architectures

    Authors: Akash Guna R. T, Arnav Chavan, Deepak Gupta

    Abstract: Conventional scaling of neural networks typically involves designing a base network and growing different dimensions like width, depth, etc. of the same by some predefined scaling factors. We introduce an automated scaling approach leveraging second-order loss landscape information. Our method is flexible towards skip connections a mainstay in modern vision transformers. Our training-aware method… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted At ICLR 2024 (Tiny Paper Track)

  3. arXiv:2402.01799  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward

    Authors: Arnav Chavan, Raghav Magazine, Shubham Kushwaha, Mérouane Debbah, Deepak Gupta

    Abstract: Despite the impressive performance of LLMs, their widespread adoption faces challenges due to substantial computational and memory requirements during inference. Recent advancements in model compression and system-level optimization methods aim to enhance LLM inference. This survey offers an overview of these methods, emphasizing recent developments. Through experiments on LLaMA(/2)-7B, we evaluat… ▽ More

    Submitted 24 April, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: Accepted at IJCAI '24 (Survey Track), Updated TGI results

  4. arXiv:2312.07046  [pdf, ps, other

    cs.LG cs.CL

    Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models

    Authors: Arnav Chavan, Nahush Lele, Deepak Gupta

    Abstract: Due to the substantial scale of Large Language Models (LLMs), the direct application of conventional compression methodologies proves impractical. The computational demands associated with even minimal gradient updates present challenges, particularly on consumer-grade hardware. This paper introduces an innovative approach for the parametric and practical compression of LLMs based on reduced order… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: Brief technical report; Code will be made available at https://github.com/transmuteAI/trailmet/tree/main/trailmet/algorithms/llm-rom

  5. arXiv:2306.07967  [pdf, other

    cs.LG cs.AI cs.CV

    One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning

    Authors: Arnav Chavan, Zhuang Liu, Deepak Gupta, Eric Xing, Zhiqiang Shen

    Abstract: We present Generalized LoRA (GLoRA), an advanced approach for universal parameter-efficient fine-tuning tasks. Enhancing Low-Rank Adaptation (LoRA), GLoRA employs a generalized prompt module to optimize pre-trained model weights and adjust intermediate activations, providing more flexibility and capability across diverse tasks and datasets. Moreover, GLoRA facilitates efficient parameter adaptatio… ▽ More

    Submitted 16 October, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: Technical report. v2: Add LLaMA-1&2 results. Code and models at https://github.com/Arnav0400/ViT-Slim/tree/master/GLoRA

  6. arXiv:2304.01074  [pdf, other

    cs.RO

    FinderNet: A Data Augmentation Free Canonicalization aided Loop Detection and Closure technique for Point clouds in 6-DOF separation

    Authors: Sudarshan S Harithas, Gurkirat Singh, Aneesh Chavan, Sarthak Sharma, Suraj Patni, Chetan Arora, K. Madhava Krishna

    Abstract: We focus on the problem of LiDAR point cloud based loop detection (or Finding) and closure (LDC) in a multi-agent setting. State-of-the-art (SOTA) techniques directly generate learned embeddings of a given point cloud, require large data transfers, and are not robust to wide variations in 6 Degrees-of-Freedom (DOF) viewpoint. Moreover, absence of strong priors in an unstructured point cloud leads… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

  7. arXiv:2301.13817  [pdf, other

    cs.CV

    Patch Gradient Descent: Training Neural Networks on Very Large Images

    Authors: Deepak K. Gupta, Gowreesh Mago, Arnav Chavan, Dilip K. Prasad

    Abstract: Traditional CNN models are trained and tested on relatively low resolution images (<300 px), and cannot be directly operated on large-scale images due to compute and memory constraints. We propose Patch Gradient Descent (PatchGD), an effective learning strategy that allows to train the existing CNN architectures on large-scale images in an end-to-end manner. PatchGD is based on the hypothesis that… ▽ More

    Submitted 31 January, 2023; originally announced January 2023.

  8. arXiv:2211.13769  [pdf, other

    cs.CV cs.AI cs.LG

    On Designing Light-Weight Object Trackers through Network Pruning: Use CNNs or Transformers?

    Authors: Saksham Aggarwal, Taneesh Gupta, Pawan Kumar Sahu, Arnav Chavan, Rishabh Tiwari, Dilip K. Prasad, Deepak K. Gupta

    Abstract: Object trackers deployed on low-power devices need to be light-weight, however, most of the current state-of-the-art (SOTA) methods rely on using compute-heavy backbones built using CNNs or transformers. Large sizes of such models do not allow their deployment in low-power conditions and designing compressed variants of large tracking models is of great importance. This paper demonstrates how high… ▽ More

    Submitted 26 March, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

    Comments: Accepted at IEEE ICASSP 2023

  9. arXiv:2209.03578  [pdf

    cs.CV

    Sign Language Detection

    Authors: Shubham Deshmukh, Favin Fernandes, Amey Chavan

    Abstract: With the advancements in Computer vision techniques the need to classify images based on its features have become a huge task and necessity. In this project we proposed 2 models i.e. feature extraction and classification using ORB and SVM and the second is using CNN architecture. The end result of the project is to understand the concept behind feature extraction and image classification. The trai… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

    Comments: 8 pages, 10 figures

  10. arXiv:2209.03576  [pdf

    cs.CV

    Suspicious and Anomaly Detection

    Authors: Shubham Deshmukh, Favin Fernandes, Monali Ahire, Devarshi Borse, Amey Chavan

    Abstract: In this project we propose a CNN architecture to detect anomaly and suspicious activities; the activities chosen for the project are running, jum** and kicking in public places and carrying gun, bat and knife in public places. With the trained model we compare it with the pre-existing models like Yolo, vgg16, vgg19. The trained Model is then implemented for real time detection and also used the.… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

    Comments: 7 pages, 10 figures

  11. arXiv:2209.03570  [pdf

    cs.CV

    SANIP: Shop** Assistant and Navigation for the visually impaired

    Authors: Shubham Deshmukh, Favin Fernandes, Amey Chavan, Monali Ahire, Devashri Borse, Jyoti Madake

    Abstract: The proposed shop** assistant model SANIP is going to help blind persons to detect hand held objects and also to get a video feedback of the information retrieved from the detected and recognized objects. The proposed model consists of three python models i.e. Custom Object Detection, Text Detection and Barcode detection. For object detection of the hand held object, we have created our own cust… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

    Comments: 6 pages, 8 figures. arXiv admin note: text overlap with arXiv:2011.04244 by other authors

  12. arXiv:2206.01690  [pdf, other

    cs.LG cs.CV

    Dynamic Kernel Selection for Improved Generalization and Memory Efficiency in Meta-learning

    Authors: Arnav Chavan, Rishabh Tiwari, Udbhav Bamba, Deepak K. Gupta

    Abstract: Gradient based meta-learning methods are prone to overfit on the meta-training set, and this behaviour is more prominent with large and complex networks. Moreover, large networks restrict the application of meta-learning models on low-power edge devices. While choosing smaller networks avoid these issues to a certain extent, it affects the overall generalization leading to reduced performance. Cle… ▽ More

    Submitted 3 June, 2022; originally announced June 2022.

    Comments: Published at CVPR 2022

  13. arXiv:2201.00814  [pdf, other

    cs.CV cs.AI cs.LG

    Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space

    Authors: Arnav Chavan, Zhiqiang Shen, Zhuang Liu, Zechun Liu, Kwang-Ting Cheng, Eric Xing

    Abstract: This paper explores the feasibility of finding an optimal sub-model from a vision transformer and introduces a pure vision transformer slimming (ViT-Slim) framework. It can search a sub-structure from the original model end-to-end across multiple dimensions, including the input tokens, MHSA and MLP modules with state-of-the-art performance. Our method is based on a learnable and unified $\ell_1$ s… ▽ More

    Submitted 24 April, 2022; v1 submitted 3 January, 2022; originally announced January 2022.

    Comments: CVPR 2022. Code is available at https://github.com/Arnav0400/ViT-Slim

  14. Transfer Learning Gaussian Anomaly Detection by Fine-tuning Representations

    Authors: Oliver Rippel, Arnav Chavan, Chucai Lei, Dorit Merhof

    Abstract: Current state-of-the-art anomaly detection (AD) methods exploit the powerful representations yielded by large-scale ImageNet training. However, catastrophic forgetting prevents the successful fine-tuning of pre-trained representations on new datasets in the semi-supervised setting, and representations are therefore commonly fixed. In our work, we propose a new method to overcome catastrophic forge… ▽ More

    Submitted 13 June, 2022; v1 submitted 9 August, 2021; originally announced August 2021.

    Comments: Camera ready for IMPROVE22 + additional typo fixes

  15. arXiv:2102.07156  [pdf, other

    cs.CV

    ChipNet: Budget-Aware Pruning with Heaviside Continuous Approximations

    Authors: Rishabh Tiwari, Udbhav Bamba, Arnav Chavan, Deepak K. Gupta

    Abstract: Structured pruning methods are among the effective strategies for extracting small resource-efficient convolutional neural networks from their dense counterparts with minimal loss in accuracy. However, most existing methods still suffer from one or more limitations, that include 1) the need for training the dense model from scratch with pruning-related parameters embedded in the architecture, 2) r… ▽ More

    Submitted 14 February, 2021; originally announced February 2021.

    Comments: Accepted at ICLR 2021 Conference

  16. arXiv:2101.05650  [pdf, other

    cs.CV

    Rescaling CNN through Learnable Repetition of Network Parameters

    Authors: Arnav Chavan, Udbhav Bamba, Rishabh Tiwari, Deepak Gupta

    Abstract: Deeper and wider CNNs are known to provide improved performance for deep learning tasks. However, most such networks have poor performance gain per parameter increase. In this paper, we investigate whether the gain observed in deeper models is purely due to the addition of more optimization parameters or whether the physical size of the network as well plays a role. Further, we present a novel res… ▽ More

    Submitted 19 August, 2021; v1 submitted 14 January, 2021; originally announced January 2021.

  17. Deep learning for detection and segmentation of artefact and disease instances in gastrointestinal endoscopy

    Authors: Sharib Ali, Mariia Dmitrieva, Noha Ghatwary, Sophia Bano, Gorkem Polat, Alptekin Temizel, Adrian Krenzer, Amar Hekalo, Yun Bo Guo, Bogdan Matuszewski, Mourad Gridach, Irina Voiculescu, Vishnusai Yoganand, Arnav Chavan, Aryan Raj, Nhan T. Nguyen, Dat Q. Tran, Le Duy Huynh, Nicolas Boutry, Shahadate Rezvy, Haijian Chen, Yoon Ho Choi, Anand Subramanian, Velmurugan Balasubramanian, Xiaohong W. Gao , et al. (12 additional authors not shown)

    Abstract: The Endoscopy Computer Vision Challenge (EndoCV) is a crowd-sourcing initiative to address eminent problems in develo** reliable computer aided detection and diagnosis endoscopy systems and suggest a pathway for clinical translation of technologies. Whilst endoscopy is a widely used diagnostic and treatment tool for hollow-organs, there are several core challenges often faced by endoscopists, ma… ▽ More

    Submitted 17 February, 2021; v1 submitted 12 October, 2020; originally announced October 2020.

    Comments: 32 pages

  18. arXiv:2003.10129  [pdf, ps, other

    cs.CV eess.IV

    Multi-Plateau Ensemble for Endoscopic Artefact Segmentation and Detection

    Authors: Suyog Jadhav, Udbhav Bamba, Arnav Chavan, Rishabh Tiwari, Aryan Raj

    Abstract: Endoscopic artefact detection challenge consists of 1) Artefact detection, 2) Semantic segmentation, and 3) Out-of-sample generalisation. For Semantic segmentation task, we propose a multi-plateau ensemble of FPN (Feature Pyramid Network) with EfficientNet as feature extractor/encoder. For Object detection task, we used a three model ensemble of RetinaNet with Resnet50 Backbone and FasterRCNN (FPN… ▽ More

    Submitted 23 March, 2020; originally announced March 2020.

    Comments: EndoCV2020 workshop ISBI 2020 camera ready

    Journal ref: http://ceur-ws.org/Vol-2595/endoCV2020_paper_id_20.pdf

  19. arXiv:1610.04963  [pdf, other

    cs.DB

    ProvDB: A System for Lifecycle Management of Collaborative Analysis Workflows

    Authors: Hui Miao, Amit Chavan, Amol Deshpande

    Abstract: As data-driven methods are becoming pervasive in a wide variety of disciplines, there is an urgent need to develop scalable and sustainable tools to simplify the process of data science, to make it easier to keep track of the analyses being performed and datasets being generated, and to enable introspection of the workflows. In this paper, we describe our vision of a unified provenance and metadat… ▽ More

    Submitted 16 October, 2016; originally announced October 2016.

  20. arXiv:1506.04815  [pdf, other

    cs.DB

    Towards a unified query language for provenance and versioning

    Authors: Amit Chavan, Silu Huang, Amol Deshpande, Aaron Elmore, Samuel Madden, Aditya Parameswaran

    Abstract: Organizations and teams collect and acquire data from various sources, such as social interactions, financial transactions, sensor data, and genome sequencers. Different teams in an organization as well as different data scientists within a team are interested in extracting a variety of insights which require combining and collaboratively analyzing datasets in diverse ways. DataHub is a system tha… ▽ More

    Submitted 15 June, 2015; originally announced June 2015.

    Comments: Theory and Practice of Provenance, 2015

  21. arXiv:1505.05211  [pdf, other

    cs.DB

    Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff

    Authors: Souvik Bhattacherjee, Amit Chavan, Silu Huang, Amol Deshpande, Aditya Parameswaran

    Abstract: The relative ease of collaborative data science and analysis has led to a proliferation of many thousands or millions of $versions$ of the same datasets in many scientific and commercial domains, acquired or constructed at various stages of data analysis across many users, and often over long periods of time. Managing, storing, and recreating these dataset versions is a non-trivial task. The funda… ▽ More

    Submitted 19 May, 2015; originally announced May 2015.

  22. arXiv:1409.0798  [pdf, other

    cs.DB

    DataHub: Collaborative Data Science & Dataset Version Management at Scale

    Authors: Anant Bhardwaj, Souvik Bhattacherjee, Amit Chavan, Amol Deshpande, Aaron J. Elmore, Samuel Madden, Aditya G. Parameswaran

    Abstract: Relational databases have limited support for data collaboration, where teams collaboratively curate and analyze large datasets. Inspired by software version control systems like git, we propose (a) a dataset version control system, giving users the ability to create, branch, merge, difference and search large, divergent collections of datasets, and (b) a platform, DataHub, that gives users the ab… ▽ More

    Submitted 2 September, 2014; originally announced September 2014.

    Comments: 7 pages