Skip to main content

Showing 1–12 of 12 results for author: Sagar, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.07823  [pdf, other

    cs.CL cs.SD eess.AS

    PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken Language Understanding

    Authors: Trang Le, Daniel Lazar, Suyoun Kim, Shan Jiang, Duc Le, Adithya Sagar, Aleksandr Livshits, Ahmed Aly, Akshat Shrivastava

    Abstract: Spoken Language Understanding (SLU) is a critical component of voice assistants; it consists of converting speech to semantic parses for task execution. Previous works have explored end-to-end models to improve the quality and robustness of SLU models with Deliberation, however these models have remained autoregressive, resulting in higher latencies. In this work we introduce PRoDeliberation, a no… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  2. arXiv:2405.17184  [pdf, other

    eess.SY eess.SP

    A Pioneering Roadmap for ML-Driven Algorithmic Advancements in Electrical Networks

    Authors: Jochen L. Cremer, Adrian Kelly, Ricardo J. Bessa, Milos Subasic, Panagiotis N. Papadopoulos, Samuel Young, Amar Sagar, Antoine Marot

    Abstract: To advance control, operation and planning tools of electrical networks with ML is not straightforward. 110 experts were surveyed showing where and how ML algorithmis could advance. This paper assesses this survey and research environment. Then it develops an innovation roadmap that helps align our research community towards a goal-oriented realisation of the opportunities that AI upholds. This pa… ▽ More

    Submitted 28 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: 5 pages

  3. arXiv:2207.10643  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    STOP: A dataset for Spoken Task Oriented Semantic Parsing

    Authors: Paden Tomasello, Akshat Shrivastava, Daniel Lazar, Po-Chun Hsu, Duc Le, Adithya Sagar, Ali Elkahky, Jade Copet, Wei-Ning Hsu, Yossi Adi, Robin Algayres, Tu Ahn Nguyen, Emmanuel Dupoux, Luke Zettlemoyer, Abdelrahman Mohamed

    Abstract: End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single model. It promises to improve the performance of assistant systems by leveraging acoustic information lost in the intermediate textual representation and preventing cascading errors from Automatic Speech Recognition (ASR). Further, having one unified model has efficiency advantages when deploying assi… ▽ More

    Submitted 18 October, 2022; v1 submitted 28 June, 2022; originally announced July 2022.

  4. arXiv:2205.12705  [pdf

    eess.IV cs.CV

    COVID-19 Severity Classification on Chest X-ray Images

    Authors: Aditi Sagar, Aman Swaraj, Karan Verma

    Abstract: Biomedical imaging analysis combined with artificial intelligence (AI) methods has proven to be quite valuable in order to diagnose COVID-19. So far, various classification models have been used for diagnosing COVID-19. However, classification of patients based on their severity level is not yet analyzed. In this work, we classify covid images based on the severity of the infection. First, we pre-… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

  5. arXiv:2201.05975  [pdf

    cs.HC eess.SY

    IRHA: An Intelligent RSSI based Home automation System

    Authors: Samsil Arefin Mozumder, A S M Sharifuzzaman Sagar

    Abstract: Human existence is getting more sophisticated and better in many areas due to remarkable advances in the fields of automation. Automated systems are favored over manual ones in the current environment. Home Automation is becoming more popular in this scenario, as people are drawn to the concept of a home environment that can automatically satisfy users' requirements. The key challenges in an intel… ▽ More

    Submitted 16 January, 2022; originally announced January 2022.

    Comments: This article is submitted to the 2nd International Conference on Ubiquitous Computing and Intelligent Information Systems for possible presentation

  6. arXiv:2201.05920  [pdf, other

    eess.IV cs.CV cs.LG

    ViTBIS: Vision Transformer for Biomedical Image Segmentation

    Authors: Abhinav Sagar

    Abstract: In this paper, we propose a novel network named Vision Transformer for Biomedical Image Segmentation (ViTBIS). Our network splits the input feature maps into three parts with $1\times 1$, $3\times 3$ and $5\times 5$ convolutions in both encoder and decoder. Concat operator is used to merge the features before being fed to three consecutive transformer blocks with attention mechanism embedded insid… ▽ More

    Submitted 15 January, 2022; originally announced January 2022.

    Comments: Published at Clinical Image-Based Procedures, Distributed and Collaborative Learning, Artificial Intelligence for Combating COVID-19 and Secure and Privacy-Preserving Machine Learning workshop at MICCAI 2021

    Journal ref: Springer, Cham 2021

  7. arXiv:2108.04349   

    cs.CV cs.LG eess.IV

    AASeg: Attention Aware Network for Real Time Semantic Segmentation

    Authors: Abhinav Sagar

    Abstract: In this paper, we present a new network named Attention Aware Network (AASeg) for real time semantic image segmentation. Our network incorporates spatial and channel information using Spatial Attention (SA) and Channel Attention (CA) modules respectively. It also uses dense local multi-scale context information using Multi Scale Context (MSC) module. The feature maps are concatenated individually… ▽ More

    Submitted 14 May, 2022; v1 submitted 27 July, 2021; originally announced August 2021.

    Comments: This work makes assumptions which were found wrong later by the author

  8. arXiv:2008.10399   

    eess.IV cs.CV cs.LG

    Generate High Resolution Images With Generative Variational Autoencoder

    Authors: Abhinav Sagar

    Abstract: In this work, we present a novel neural network to generate high resolution images. We replace the decoder of VAE with a discriminator while using the encoder as it is. The encoder is fed data from a normal distribution while the generator is fed from a gaussian distribution. The combination from both is given to a discriminator which tells whether the generated image is correct or not. We evaluat… ▽ More

    Submitted 21 June, 2021; v1 submitted 12 August, 2020; originally announced August 2020.

    Comments: The network architecture used in this paper while training the model is not correct

  9. arXiv:2008.09646   

    eess.IV cs.CV cs.LG

    HRVGAN: High Resolution Video Generation using Spatio-Temporal GAN

    Authors: Abhinav Sagar

    Abstract: In this paper, we present a novel network for high resolution video generation. Our network uses ideas from Wasserstein GANs by enforcing k-Lipschitz constraint on the loss term and Conditional GANs using class labels for training and testing. We present Generator and Discriminator network layerwise details along with the combined network architecture, optimization details and algorithm used in th… ▽ More

    Submitted 12 July, 2021; v1 submitted 17 August, 2020; originally announced August 2020.

    Comments: The design of neural network was based on assumptions which was found to be wrong

  10. arXiv:2008.07588  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    Uncertainty Quantification using Variational Inference for Biomedical Image Segmentation

    Authors: Abhinav Sagar

    Abstract: Deep learning motivated by convolutional neural networks has been highly successful in a range of medical imaging problems like image classification, image segmentation, image synthesis etc. However for validation and interpretability, not only do we need the predictions made by the model but also how confident it is while making those predictions. This is important in safety critical applications… ▽ More

    Submitted 10 August, 2021; v1 submitted 12 August, 2020; originally announced August 2020.

    Comments: 11 pages, 4 figures

  11. arXiv:2006.01250   

    cs.CV cs.LG eess.IV

    RUHSNet: 3D Object Detection Using Lidar Data in Real Time

    Authors: Abhinav Sagar

    Abstract: In this work, we address the problem of 3D object detection from point cloud data in real time. For autonomous vehicles to work, it is very important for the perception component to detect the real world objects with both high accuracy and fast inference. We propose a novel neural network architecture along with the training and optimization details for detecting 3D objects in point cloud data. We… ▽ More

    Submitted 21 June, 2021; v1 submitted 9 May, 2020; originally announced June 2020.

    Comments: The results in this paper is not correct as assumptions used while designing the network was found to be wrong

  12. arXiv:2001.10822  [pdf, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    Lattice-based Improvements for Voice Triggering Using Graph Neural Networks

    Authors: Pranay Dighe, Saurabh Adya, Nuoyu Li, Srikanth Vishnubhotla, Devang Naik, Adithya Sagar, Ying Ma, Stephen Pulman, Jason Williams

    Abstract: Voice-triggered smart assistants often rely on detection of a trigger-phrase before they start listening for the user request. Mitigation of false triggers is an important aspect of building a privacy-centric non-intrusive smart assistant. In this paper, we address the task of false trigger mitigation (FTM) using a novel approach based on analyzing automatic speech recognition (ASR) lattices using… ▽ More

    Submitted 24 January, 2020; originally announced January 2020.