Search | arXiv e-print repository

Rich Feature Distillation with Feature Affinity Module for Efficient Image Dehazing

Authors: Sai Mitheran, Anushri Suresh, Nisha J. S., Varun P. Gopi

Abstract: Single-image haze removal is a long-standing hurdle for computer vision applications. Several works have been focused on transferring advances from image classification, detection, and segmentation to the niche of image dehazing, primarily focusing on contrastive learning and knowledge distillation. However, these approaches prove computationally expensive, raising concern regarding their applicab… ▽ More Single-image haze removal is a long-standing hurdle for computer vision applications. Several works have been focused on transferring advances from image classification, detection, and segmentation to the niche of image dehazing, primarily focusing on contrastive learning and knowledge distillation. However, these approaches prove computationally expensive, raising concern regarding their applicability to on-the-edge use-cases. This work introduces a simple, lightweight, and efficient framework for single-image haze removal, exploiting rich "dark-knowledge" information from a lightweight pre-trained super-resolution model via the notion of heterogeneous knowledge distillation. We designed a feature affinity module to maximize the flow of rich feature semantics from the super-resolution teacher to the student dehazing network. In order to evaluate the efficacy of our proposed framework, its performance as a plug-and-play setup to a baseline model is examined. Our experiments are carried out on the RESIDE-Standard dataset to demonstrate the robustness of our framework to the synthetic and real-world domains. The extensive qualitative and quantitative results provided establish the effectiveness of the framework, achieving gains of upto 15\% (PSNR) while reducing the model size by $\sim$20 times. △ Less

Submitted 13 July, 2022; originally announced July 2022.

Comments: Preprint version. Accepted at Optik

arXiv:2206.08175 [pdf, other]

Not All Lotteries Are Made Equal

Authors: Surya Kant Sahu, Sai Mitheran, Somya Suhans Mahapatra

Abstract: The Lottery Ticket Hypothesis (LTH) states that for a reasonably sized neural network, a sub-network within the same network yields no less performance than the dense counterpart when trained from the same initialization. This work investigates the relation between model size and the ease of finding these sparse sub-networks. We show through experiments that, surprisingly, under a finite budget, s… ▽ More The Lottery Ticket Hypothesis (LTH) states that for a reasonably sized neural network, a sub-network within the same network yields no less performance than the dense counterpart when trained from the same initialization. This work investigates the relation between model size and the ease of finding these sparse sub-networks. We show through experiments that, surprisingly, under a finite budget, smaller models benefit more from Ticket Search (TS). △ Less

Submitted 16 June, 2022; originally announced June 2022.

Comments: Accepted at ICML 2022 HAET Workshop

arXiv:2201.11957 [pdf, other]

doi 10.1109/LRA.2022.3146544

Global-Reasoned Multi-Task Learning Model for Surgical Scene Understanding

Authors: Lalithkumar Seenivasan, Sai Mitheran, Mobarakol Islam, Hongliang Ren

Abstract: Global and local relational reasoning enable scene understanding models to perform human-like scene analysis and understanding. Scene understanding enables better semantic segmentation and object-to-object interaction detection. In the medical domain, a robust surgical scene understanding model allows the automation of surgical skill evaluation, real-time monitoring of surgeon's performance and po… ▽ More Global and local relational reasoning enable scene understanding models to perform human-like scene analysis and understanding. Scene understanding enables better semantic segmentation and object-to-object interaction detection. In the medical domain, a robust surgical scene understanding model allows the automation of surgical skill evaluation, real-time monitoring of surgeon's performance and post-surgical analysis. This paper introduces a globally-reasoned multi-task surgical scene understanding model capable of performing instrument segmentation and tool-tissue interaction detection. Here, we incorporate global relational reasoning in the latent interaction space and introduce multi-scale local (neighborhood) reasoning in the coordinate space to improve segmentation. Utilizing the multi-task model setup, the performance of the visual-semantic graph attention network in interaction detection is further enhanced through global reasoning. The global interaction space features from the segmentation module are introduced into the graph network, allowing it to detect interactions based on both node-to-node and global interaction reasoning. Our model reduces the computation cost compared to running two independent single-task models by sharing common modules, which is indispensable for practical applications. Using a sequential optimization technique, the proposed multi-task model outperforms other state-of-the-art single-task models on the MICCAI endoscopic vision challenge 2018 dataset. Additionally, we also observe the performance of the multi-task model when trained using the knowledge distillation technique. The official code implementation is made available in GitHub. △ Less

Submitted 28 January, 2022; originally announced January 2022.

Comments: Code available at: https://github.com/lalithjets/Global-reasoned-multi-task-model

arXiv:2109.10252

Audiomer: A Convolutional Transformer For Keyword Spotting

Authors: Surya Kant Sahu, Sai Mitheran, Juhi Kamdar, Meet Gandhi

Abstract: Transformers have seen an unprecedented rise in Natural Language Processing and Computer Vision tasks. However, in audio tasks, they are either infeasible to train due to extremely large sequence length of audio waveforms or incur a performance penalty when trained on Fourier-based features. In this work, we introduce an architecture, Audiomer, where we combine 1D Residual Networks with Performer… ▽ More Transformers have seen an unprecedented rise in Natural Language Processing and Computer Vision tasks. However, in audio tasks, they are either infeasible to train due to extremely large sequence length of audio waveforms or incur a performance penalty when trained on Fourier-based features. In this work, we introduce an architecture, Audiomer, where we combine 1D Residual Networks with Performer Attention to achieve state-of-the-art performance in keyword spotting with raw audio waveforms, outperforming all previous methods while being computationally cheaper and parameter-efficient. Additionally, our model has practical advantages for speech processing, such as inference on arbitrarily long audio clips owing to the absence of positional encoding. The code is available at https://github.com/The-Learning-Machines/Audiomer-PyTorch. △ Less

Submitted 1 February, 2022; v1 submitted 21 September, 2021; originally announced September 2021.

Comments: The results and claims made are incorrect due to data leakage and an erroneous split of datasets

arXiv:2107.06212 [pdf, other]

doi 10.1016/j.cag.2021.07.001

'CADSketchNet' -- An Annotated Sketch dataset for 3D CAD Model Retrieval with Deep Neural Networks

Authors: Bharadwaj Manda, Shubham Dhayarkar, Sai Mitheran, V. K. Viekash, Ramanathan Muthuganapathy

Abstract: Ongoing advancements in the fields of 3D modelling and digital archiving have led to an outburst in the amount of data stored digitally. Consequently, several retrieval systems have been developed depending on the type of data stored in these databases. However, unlike text data or images, performing a search for 3D models is non-trivial. Among 3D models, retrieving 3D Engineering/CAD models or me… ▽ More Ongoing advancements in the fields of 3D modelling and digital archiving have led to an outburst in the amount of data stored digitally. Consequently, several retrieval systems have been developed depending on the type of data stored in these databases. However, unlike text data or images, performing a search for 3D models is non-trivial. Among 3D models, retrieving 3D Engineering/CAD models or mechanical components is even more challenging due to the presence of holes, volumetric features, presence of sharp edges etc., which make CAD a domain unto itself. The research work presented in this paper aims at develo** a dataset suitable for building a retrieval system for 3D CAD models based on deep learning. 3D CAD models from the available CAD databases are collected, and a dataset of computer-generated sketch data, termed 'CADSketchNet', has been prepared. Additionally, hand-drawn sketches of the components are also added to CADSketchNet. Using the sketch images from this dataset, the paper also aims at evaluating the performance of various retrieval system or a search engine for 3D CAD models that accepts a sketch image as the input query. Many experimental models are constructed and tested on CADSketchNet. These experiments, along with the model architecture, choice of similarity metrics are reported along with the search results. △ Less

Submitted 20 July, 2021; v1 submitted 13 July, 2021; originally announced July 2021.

Comments: Computers & Graphics Journal, Special Section on 3DOR 2021

Journal ref: Computers & Graphics, Volume 99, 2021, Pages 100-113, ISSN 0097-8493

arXiv:2107.01516 [pdf, other]

Introducing Self-Attention to Target Attentive Graph Neural Networks

Authors: Sai Mitheran, Abhinav Java, Surya Kant Sahu, Arshad Shaikh

Abstract: Session-based recommendation systems suggest relevant items to users by modeling user behavior and preferences using short-term anonymous sessions. Existing methods leverage Graph Neural Networks (GNNs) that propagate and aggregate information from neighboring nodes i.e., local message passing. Such graph-based architectures have representational limits, as a single sub-graph is susceptible to ove… ▽ More Session-based recommendation systems suggest relevant items to users by modeling user behavior and preferences using short-term anonymous sessions. Existing methods leverage Graph Neural Networks (GNNs) that propagate and aggregate information from neighboring nodes i.e., local message passing. Such graph-based architectures have representational limits, as a single sub-graph is susceptible to overfit the sequential dependencies instead of accounting for complex transitions between items in different sessions. We propose a new technique that leverages a Transformer in combination with a target attentive GNN. This allows richer representations to be learnt, which translates to empirical performance gains in comparison to a vanilla target attentive GNN. Our experimental results and ablation show that our proposed method is competitive with the existing methods on real-world benchmark datasets, improving on graph-based hypotheses. Code is available at https://github.com/The-Learning-Machines/SBR △ Less

Submitted 7 January, 2022; v1 submitted 3 July, 2021; originally announced July 2021.

Comments: Accepted at AISP 2022

ACM Class: H.3.3; I.2.1

Showing 1–6 of 6 results for author: Mitheran, S