Skip to main content

Showing 1–50 of 86 results for author: Singh, P

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18628  [pdf, other

    cs.CV eess.IV

    IDA-UIE: An Iterative Framework for Deep Network-based Degradation Aware Underwater Image Enhancement

    Authors: Pranjali Singh, Prithwijit Guha

    Abstract: Underwater image quality is affected by fluorescence, low illumination, absorption, and scattering. Recent works in underwater image enhancement have proposed different deep network architectures to handle these problems. Most of these works have proposed a single network to handle all the challenges. We believe that deep networks trained for specific conditions deliver better performance than a s… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2406.09999  [pdf, other

    eess.AS

    ROAR: Reinforcing Original to Augmented Data Ratio Dynamics for Wav2Vec2.0 Based ASR

    Authors: Vishwanath Pratap Singh, Federico Malato, Ville Hautamaki, Md. Sahidullah, Tomi Kinnunen

    Abstract: While automatic speech recognition (ASR) greatly benefits from data augmentation, the augmentation recipes themselves tend to be heuristic. In this paper, we address one of the heuristic approach associated with balancing the right amount of augmented data in ASR training by introducing a reinforcement learning (RL) based dynamic adjustment of original-to-augmented data ratio (OAR). Unlike the fix… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted: Interspeech 2024

    Journal ref: Interspeech 2024

  3. arXiv:2406.09494  [pdf, other

    eess.AS cs.LG

    The Second DISPLACE Challenge : DIarization of SPeaker and LAnguage in Conversational Environments

    Authors: Shareef Babu Kalluri, Prachi Singh, Pratik Roy Chowdhuri, Apoorva Kulkarni, Shikha Baghel, Pradyoth Hegde, Swapnil Sontakke, Deepak K T, S. R. Mahadeva Prasanna, Deepu Vijayasenan, Sriram Ganapathy

    Abstract: The DIarization of SPeaker and LAnguage in Conversational Environments (DISPLACE) 2024 challenge is the second in the series of DISPLACE challenges, which involves tasks of speaker diarization (SD) and language diarization (LD) on a challenging multilingual conversational speech dataset. In the DISPLACE 2024 challenge, we also introduced the task of automatic speech recognition (ASR) on this datas… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 figures, Interspeech 2024

  4. arXiv:2406.02443  [pdf, other

    eess.AS cs.AI

    Explainable Deep Learning Analysis for Raga Identification in Indian Art Music

    Authors: Parampreet Singh, Vipul Arora

    Abstract: The task of Raga Identification is a very popular research problem in Music Information Retrieval. Few studies that have explored this task employed various approaches, such as signal processing, Machine Learning (ML) methods, and more recently Deep Learning (DL) based methods. However, a key question remains unanswered in all of these works: do these ML/DL methods learn and interpret Ragas in a m… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  5. arXiv:2405.07256  [pdf, other

    eess.IV cs.CV

    Leveraging Fixed and Dynamic Pseudo-labels for Semi-supervised Medical Image Segmentation

    Authors: Suruchi Kumari, Pravendra Singh

    Abstract: Semi-supervised medical image segmentation has gained growing interest due to its ability to utilize unannotated data. The current state-of-the-art methods mostly rely on pseudo-labeling within a co-training framework. These methods depend on a single pseudo-label for training, but these labels are not as accurate as the ground truth of labeled data. Relying solely on one pseudo-label often result… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: Under Review

  6. arXiv:2402.15566  [pdf

    eess.IV cs.CV cs.LG

    Closing the AI generalization gap by adjusting for dermatology condition distribution differences across clinical settings

    Authors: Rajeev V. Rikhye, Aaron Loh, Grace Eunhae Hong, Preeti Singh, Margaret Ann Smith, Vijaytha Muralidharan, Doris Wong, Rory Sayres, Michelle Phung, Nicolas Betancourt, Bradley Fong, Rachna Sahasrabudhe, Khoban Nasim, Alec Eschholz, Basil Mustafa, Jan Freyberg, Terry Spitz, Yossi Matias, Greg S. Corrado, Katherine Chou, Dale R. Webster, Peggy Bui, Yuan Liu, Yun Liu, Justin Ko , et al. (1 additional authors not shown)

    Abstract: Recently, there has been great progress in the ability of artificial intelligence (AI) algorithms to classify dermatological conditions from clinical photographs. However, little is known about the robustness of these algorithms in real-world settings where several factors can lead to a loss of generalizability. Understanding and overcoming these limitations will permit the development of generali… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  7. arXiv:2402.15214  [pdf, other

    eess.AS cs.SD

    ChildAugment: Data Augmentation Methods for Zero-Resource Children's Speaker Verification

    Authors: Vishwanath Pratap Singh, Md Sahidullah, Tomi Kinnunen

    Abstract: The accuracy of modern automatic speaker verification (ASV) systems, when trained exclusively on adult data, drops substantially when applied to children's speech. The scarcity of children's speech corpora hinders fine-tuning ASV systems for children's speech. Hence, there is a timely need to explore more effective ways of reusing adults' speech data. One promising approach is to align vocal-tract… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: The following article has been accepted by The Journal of the Acoustical Society of America (JASA). After it is published, it will be found at https://pubs.aip.org/asa/jasa

  8. arXiv:2401.12850  [pdf, other

    eess.AS cs.AI cs.SD

    Overlap-aware End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization

    Authors: Prachi Singh, Sriram Ganapathy

    Abstract: Speaker diarization, the task of segmenting an audio recording based on speaker identity, constitutes an important speech pre-processing step for several downstream applications. The conventional approach to diarization involves multiple steps of embedding extraction and clustering, which are often optimized in an isolated fashion. While end-to-end diarization systems attempt to learn a single mod… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: 10 pages

  9. arXiv:2311.15752  [pdf, other

    eess.SP

    Insights into Age-Related Functional Brain Changes during Audiovisual Integration Tasks: A Comprehensive EEG Source-Based Analysis

    Authors: Prerna Singh, Ayush Tripathi, Lalan Kumar, Tapan Kumar Gandhi

    Abstract: The seamless integration of visual and auditory information is a fundamental aspect of human cognition. Although age-related functional changes in Audio-Visual Integration (AVI) have been extensively explored in the past, thorough studies across various age groups remain insufficient. Previous studies have provided valuable insights into agerelated AVI using EEG-based sensor data. However, these s… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  10. arXiv:2311.12564  [pdf

    eess.AS cs.LG eess.SP

    Summary of the DISPLACE Challenge 2023 - DIarization of SPeaker and LAnguage in Conversational Environments

    Authors: Shikha Baghel, Shreyas Ramoji, Somil Jain, Pratik Roy Chowdhuri, Prachi Singh, Deepu Vijayasenan, Sriram Ganapathy

    Abstract: In multi-lingual societies, where multiple languages are spoken in a small geographic vicinity, informal conversations often involve mix of languages. Existing speech technologies may be inefficient in extracting information from such conversations, where the speech data is rich in diversity with multiple languages and speakers. The DISPLACE (DIarization of SPeaker and LAnguage in Conversational E… ▽ More

    Submitted 3 January, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

  11. arXiv:2311.08085  [pdf, other

    eess.SY

    Optimizing Electric Vehicle Efficiency with Real-Time Telemetry using Machine Learning

    Authors: Aryaman Rao, Harshit Gupta, Parth Singh, Shivam Mittal, Utkrash Singh, Dinesh Kumar Vishwakarma

    Abstract: In the contemporary world with degrading natural resources, the urgency of energy efficiency has become imperative due to the conservation and environmental safeguarding. Therefore, it's crucial to look for advanced technology to minimize energy consumption. This research focuses on the optimization of battery-electric city style vehicles through the use of a real-time in-car telemetry system that… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  12. arXiv:2310.19673  [pdf, other

    eess.SY

    A Novel Non-Pyrotechnic Radial Deployment Mechanism for Payloads in Sounding Rockets

    Authors: Thakur Pranav G. Singh, Utkarsh Anand, Tanvi Agrawal, Srinivas G

    Abstract: This research paper introduces an innovative payload deployment mechanism tailored for sounding rockets, addressing a crucial challenge in the field. The problem statement revolves around the need to efficiently and compactly deploy multiple payloads during a single rocket launch. This mechanism, designed to be exceptionally suitable for sounding rockets, features a cylindrical carrier structure e… ▽ More

    Submitted 29 February, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: The results in this paper had to be verified again and hence a new paper has been written with new detailed mechanical simulations

  13. End-to-end Evaluation of Practical Video Analytics Systems for Face Detection and Recognition

    Authors: Praneet Singh, Edward J. Delp, Amy R. Reibman

    Abstract: Practical video analytics systems that are deployed in bandwidth constrained environments like autonomous vehicles perform computer vision tasks such as face detection and recognition. In an end-to-end face analytics system, inputs are first compressed using popular video codecs like HEVC and then passed onto modules that perform face detection, alignment, and recognition sequentially. Typically,… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: Accepted to Autonomous Vehicles and Machines 2023 Conference, IS&T Electronic Imaging (EI) Symposium

    Journal ref: Electronic Imaging, 2023, pp 111-1 - 111-6

  14. arXiv:2310.06557  [pdf, other

    eess.IV cs.CV cs.LG

    Data efficient deep learning for medical image analysis: A survey

    Authors: Suruchi Kumari, Pravendra Singh

    Abstract: The rapid evolution of deep learning has significantly advanced the field of medical image analysis. However, despite these achievements, the further enhancement of deep learning models for medical image analysis faces a significant challenge due to the scarcity of large, well-annotated datasets. To address this issue, recent years have witnessed a growing emphasis on the development of data-effic… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: Under Review

  15. arXiv:2310.00602  [pdf, ps, other

    eess.AS cs.CL

    Wavelet Scattering Transform for Improving Generalization in Low-Resourced Spoken Language Identification

    Authors: Spandan Dey, Premjeet Singh, Goutam Saha

    Abstract: Commonly used features in spoken language identification (LID), such as mel-spectrogram or MFCC, lose high-frequency information due to windowing. The loss further increases for longer temporal contexts. To improve generalization of the low-resourced LID systems, we investigate an alternate feature representation, wavelet scattering transform (WST), that compensates for the shortcomings. To our kn… ▽ More

    Submitted 3 October, 2023; v1 submitted 1 October, 2023; originally announced October 2023.

    Comments: Accepted and presented in INTERSPEECH 2023

  16. arXiv:2308.10488  [pdf, other

    eess.IV cs.CV

    Enhancing Medical Image Segmentation: Optimizing Cross-Entropy Weights and Post-Processing with Autoencoders

    Authors: Pranav Singh, Luoyao Chen, Mei Chen, **qian Pan, Raviteja Chukkapalli, Shravan Chaudhari, Jacopo Cirrone

    Abstract: The task of medical image segmentation presents unique challenges, necessitating both localized and holistic semantic understanding to accurately delineate areas of interest, such as critical tissues or aberrant features. This complexity is heightened in medical image segmentation due to the high degree of inter-class similarities, intra-class variations, and possible image obfuscation. The segmen… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: Accepted at ICCV CVAMD 2023

  17. arXiv:2308.09106  [pdf

    eess.SY

    Optimal Closed Loop Control of G2V/V2G Action Using Model Predictive Controller

    Authors: Satya Vikram Pratap Singh, Siddharth Kamila, Prashanth Agnihotri

    Abstract: This paper has developed a closed-loop control algorithm to operate the G2V/V2G action, tested under varying battery voltage conditions and load and source power differences. Under V2G action, to maintain total harmonic distortion under minimum level and grid frequency under the standard limit, a Model predictive controller (MPC) has been used to control the gate driver circuit of the inverter. Th… ▽ More

    Submitted 11 October, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: \c{opyright}2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  18. arXiv:2308.09046  [pdf

    eess.SY

    Fault Detection and Classification using Wavelet and ANN in DFIG and TCSC Connected Transmission Line

    Authors: Satya Vikram Pratap Singh, Tanu Prasad, Siddharth Kamila, Prashant Agnihotri

    Abstract: This paper presents fault detection and classification using Wavelet and ANN based methods in a DFIG-based series compensated system. The state-of-the art methods include Wavelet transform, Fourier transform, and Wavelet-neuro fuzzy methods-based system for fault detection and classification. However, the accuracy of these state-of-the-art methods diminishes during variable conditions such as chan… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

  19. arXiv:2308.08302  [pdf, ps, other

    cs.IT eess.SP

    PSA Based Power Control for Cell-Free Massive MIMO under LoS/NLoS Channels

    Authors: Ashish Pratap Singh, Ribhu Chopra

    Abstract: A primary design goal of the cell-free~(CF) massive MIMO architecture is to provide uniformly good coverage to all the user equipments~(UEs) connected to the network. However, it has been found that this requirement may not be satisfied in case the channels between the access points~(APs) and the UEs are mixed LoS/NLoS. In this paper, we try to address this issue via the use of appropriate power c… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: 10 pages, 10 figures

  20. arXiv:2308.01317  [pdf

    cs.CV eess.IV

    ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders

    Authors: Shawn Xu, Lin Yang, Christopher Kelly, Marcin Sieniek, Timo Kohlberger, Martin Ma, Wei-Hung Weng, Atilla Kiraly, Sahar Kazemzadeh, Zakkai Melamed, Jungyeon Park, Patricia Strachan, Yun Liu, Chuck Lau, Preeti Singh, Christina Chen, Mozziyar Etemadi, Sreenivasa Raju Kalidindi, Yossi Matias, Katherine Chou, Greg S. Corrado, Shravya Shetty, Daniel Tse, Shruthi Prabhakara, Daniel Golden , et al. (3 additional authors not shown)

    Abstract: In this work, we present an approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, that leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of chest X-ray tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR ach… ▽ More

    Submitted 7 September, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

  21. arXiv:2308.01265  [pdf, other

    eess.IV cs.CV cs.LG

    Deep learning for unsupervised domain adaptation in medical imaging: Recent advancements and future perspectives

    Authors: Suruchi Kumari, Pravendra Singh

    Abstract: Deep learning has demonstrated remarkable performance across various tasks in medical imaging. However, these approaches primarily focus on supervised learning, assuming that the training and testing data are drawn from the same distribution. Unfortunately, this assumption may not always hold true in practice. To address these issues, unsupervised domain adaptation (UDA) techniques have been devel… ▽ More

    Submitted 18 July, 2023; originally announced August 2023.

    Comments: Under Review

  22. arXiv:2307.02814  [pdf, other

    cs.CV eess.IV

    Single Image LDR to HDR Conversion using Conditional Diffusion

    Authors: Dwip Dalal, Gautam Vashishtha, Prajwal Singh, Shanmuganathan Raman

    Abstract: Digital imaging aims to replicate realistic scenes, but Low Dynamic Range (LDR) cameras cannot represent the wide dynamic range of real scenes, resulting in under-/overexposed images. This paper presents a deep learning-based approach for recovering intricate details from shadows and highlights while reconstructing High Dynamic Range (HDR) images. We formulate the problem as an image-to-image (I2I… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

    Journal ref: IEEE International Conference on Image Processing 2023

  23. arXiv:2306.07501  [pdf, other

    eess.AS cs.SD

    Speaker Verification Across Ages: Investigating Deep Speaker Embedding Sensitivity to Age Mismatch in Enrollment and Test Speech

    Authors: Vishwanath Pratap Singh, Md Sahidullah, Tomi Kinnunen

    Abstract: In this paper, we study the impact of the ageing on modern deep speaker embedding based automatic speaker verification (ASV) systems. We have selected two different datasets to examine ageing on the state-of-the-art ECAPA-TDNN system. The first dataset, used for addressing short-term ageing (up to 10 years time difference between enrollment and test) under uncontrolled conditions, is VoxCeleb. The… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Journal ref: Interspeech 2023

  24. arXiv:2305.19097  [pdf, other

    eess.IV cs.CV cs.LG

    A generalized framework to predict continuous scores from medical ordinal labels

    Authors: Katharina V. Hoebel, Andreanne Lemay, John Peter Campbell, Susan Ostmo, Michael F. Chiang, Christopher P. Bridge, Matthew D. Li, Praveer Singh, Aaron S. Coyner, Jayashree Kalpathy-Cramer

    Abstract: Many variables of interest in clinical medicine, like disease severity, are recorded using discrete ordinal categories such as normal/mild/moderate/severe. These labels are used to train and evaluate disease severity prediction models. However, ordinal categories represent a simplification of an underlying continuous severity spectrum. Using continuous scores instead of ordinal categories is more… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

  25. arXiv:2304.06315  [pdf, other

    eess.SP cs.SD eess.AS q-bio.NC

    Brain Connectivity Features-based Age Group Classification using Temporal Asynchrony Audio-Visual Integration Task

    Authors: Prerna Singh, Ayush Tripathi, Lalan Kumar, Tapan Kumar Gandhi

    Abstract: The process of integration of inputs from several sensory modalities in the human brain is referred to as multisensory integration. Age-related cognitive decline leads to a loss in the ability of the brain to conceive multisensory inputs. There has been considerable work done in the study of such cognitive changes for the old age groups. However, in the case of middle age groups, such analysis is… ▽ More

    Submitted 1 May, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

  26. arXiv:2303.00830  [pdf, other

    eess.AS cs.SD eess.SP

    DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments

    Authors: Shikha Baghel, Shreyas Ramoji, Sidharth, Ranjana H, Prachi Singh, Somil Jain, Pratik Roy Chowdhuri, Kaustubh Kulkarni, Swapnil Padhi, Deepu Vijayasenan, Sriram Ganapathy

    Abstract: In multilingual societies, social conversations often involve code-mixed speech. The current speech technology may not be well equipped to extract information from multi-lingual multi-speaker conversations. The DISPLACE challenge entails a first-of-kind task to benchmark speaker and language diarization on the same data, as the data contains multi-speaker conversations in multilingual code-mixed s… ▽ More

    Submitted 5 June, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

  27. arXiv:2302.14757  [pdf, other

    cs.MM cs.IR cs.SD eess.AS

    Audio Retrieval for Multimodal Design Documents: A New Dataset and Algorithms

    Authors: Prachi Singh, Srikrishna Karanam, Sumit Shekhar

    Abstract: We consider and propose a new problem of retrieving audio files relevant to multimodal design document inputs comprising both textual elements and visual imagery, e.g., birthday/greeting cards. In addition to enhancing user experience, integrating audio that matches the theme/style of these inputs also helps improve the accessibility of these documents (e.g., visually impaired people can listen to… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

    Comments: 5 pages including references

  28. arXiv:2302.12716  [pdf, other

    cs.SD cs.LG eess.AS

    Supervised Hierarchical Clustering using Graph Neural Networks for Speaker Diarization

    Authors: Prachi Singh, Amrit Kaul, Sriram Ganapathy

    Abstract: Conventional methods for speaker diarization involve windowing an audio file into short segments to extract speaker embeddings, followed by an unsupervised clustering of the embeddings. This multi-step approach generates speaker assignments for each segment. In this paper, we propose a novel Supervised HierArchical gRaph Clustering algorithm (SHARC) for speaker diarization where we introduce a hie… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

    Comments: 5 pages including references. Accepted in ICASSP 2023

  29. Modulation spectral features for speech emotion recognition using deep neural networks

    Authors: Premjeet Singh, Md Sahidullah, Goutam Saha

    Abstract: This work explores the use of constant-Q transform based modulation spectral features (CQT-MSF) for speech emotion recognition (SER). The human perception and analysis of sound comprise of two important cognitive parts: early auditory analysis and cortex-based processing. The early auditory analysis considers spectrogram-based representation whereas cortex-based analysis includes extraction of tem… ▽ More

    Submitted 14 January, 2023; originally announced January 2023.

    Comments: Accepted for publication in Elsevier's Speech Communication Journal

    Journal ref: Volume 146, January 2023, Pages 53-69

  30. Analysis of constant-Q filterbank based representations for speech emotion recognition

    Authors: Premjeet Singh, Shefali Waldekar, Md Sahidullah, Goutam Saha

    Abstract: This work analyzes the constant-Q filterbank-based time-frequency representations for speech emotion recognition (SER). Constant-Q filterbank provides non-linear spectro-temporal representation with higher frequency resolution at low frequencies. Our investigation reveals how the increased low-frequency resolution benefits SER. The time-domain comparative analysis between short-term mel-frequency… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Comments: Accepted for publication in Elsevier's Digital Signal Processing Journal

    Journal ref: Volume 130, October 2022, 103712

  31. arXiv:2211.09098  [pdf, other

    cs.CV cs.LG eess.SY

    ATEAM: Knowledge Integration from Federated Datasets for Vehicle Feature Extraction using Annotation Team of Experts

    Authors: Abhijit Suprem, Purva Singh, Suma Cherkadi, Sanjyot Vaidya, Joao Eduardo Ferreira, Calton Pu

    Abstract: The vehicle recognition area, including vehicle make-model recognition (VMMR), re-id, tracking, and parts-detection, has made significant progress in recent years, driven by several large-scale datasets for each task. These datasets are often non-overlap**, with different label schemas for each task: VMMR focuses on make and model, while re-id focuses on vehicle ID. It is promising to combine th… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

    Comments: ATEAM for Vehicle Classification and Re-ID

  32. arXiv:2207.08998  [pdf

    eess.IV cs.CV cs.LG q-bio.QM

    Discovering novel systemic biomarkers in photos of the external eye

    Authors: Boris Babenko, Ilana Traynis, Christina Chen, Preeti Singh, Akib Uddin, Jorge Cuadros, Lauren P. Daskivich, April Y. Maa, Ramasamy Kim, Eugene Yu-Chuan Kang, Yossi Matias, Greg S. Corrado, Lily Peng, Dale R. Webster, Christopher Semturs, Jonathan Krause, Avinash V. Varadarajan, Naama Hammel, Yun Liu

    Abstract: External eye photos were recently shown to reveal signs of diabetic retinal disease and elevated HbA1c. In this paper, we evaluate if external eye photos contain information about additional systemic medical conditions. We developed a deep learning system (DLS) that takes external eye photos as input and predicts multiple systemic parameters, such as those related to the liver (albumin, AST); kidn… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

  33. arXiv:2207.06489  [pdf, other

    eess.IV cs.CV cs.LG

    A Data-Efficient Deep Learning Framework for Segmentation and Classification of Histopathology Images

    Authors: Pranav Singh, Jacopo Cirrone

    Abstract: The current study of cell architecture of inflammation in histopathology images commonly performed for diagnosis and research purposes excludes a lot of information available on the biopsy slide. In autoimmune diseases, major outstanding research questions remain regarding which cell types participate in inflammation at the tissue level, and how they interact with each other. While these questions… ▽ More

    Submitted 22 October, 2022; v1 submitted 13 July, 2022; originally announced July 2022.

    Comments: Originally published at the ECCV 2022 Medical Computer Vision Workshop (ECCV-MCV 2022)

  34. arXiv:2206.12008  [pdf, other

    eess.IV cs.LG

    Three Applications of Conformal Prediction for Rating Breast Density in Mammography

    Authors: Charles Lu, Ken Chang, Praveer Singh, Jayashree Kalpathy-Cramer

    Abstract: Breast cancer is the most common cancers and early detection from mammography screening is crucial in improving patient outcomes. Assessing mammographic breast density is clinically important as the denser breasts have higher risk and are more likely to occlude tumors. Manual assessment by experts is both time-consuming and subject to inter-rater variability. As such, there has been increased inte… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: Accepted to Workshop on Distribution-Free Uncertainty Quantification at ICML 2022

  35. arXiv:2206.08197  [pdf, other

    eess.SP

    Reorganization of resting state brain network functional connectivity across human brain developmental stages

    Authors: Prerna Singh, Tapan Kumar Gandhi, Lalan Kumar

    Abstract: The human brain is liable to undergo substantial alterations, anatomically and functionally with aging. Cognitive brain aging can either be healthy or degenerative in nature. Such degeneration of cognitive ability can lead to disorders such as Alzheimer's disease, dementia, schizophrenia, and multiple sclerosis. Furthermore, the brain network goes through various changes during healthy aging, and… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

  36. arXiv:2206.04170  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    CASS: Cross Architectural Self-Supervision for Medical Image Analysis

    Authors: Pranav Singh, Elena Sizikova, Jacopo Cirrone

    Abstract: Recent advances in deep learning and computer vision have reduced many barriers to automated medical image analysis, allowing algorithms to process label-free images and improve performance. However, existing techniques have extreme computational requirements and drop a lot of performance with a reduction in batch size or training epochs. This paper presents Cross Architectural - Self Supervision… ▽ More

    Submitted 19 November, 2022; v1 submitted 8 June, 2022; originally announced June 2022.

    Comments: (27 pages, 14 figures), Accepted at NeurIPS 2022 Workshop: Self-Supervised Learning - Theory and Practice

  37. arXiv:2203.06600  [pdf, other

    eess.AS eess.SP

    Spectral Modification Based Data Augmentation For Improving End-to-End ASR For Children's Speech

    Authors: Vishwanath Pratap Singh, Hardik Sailor, Supratik Bhattacharya, Abhishek Pandey

    Abstract: Training a robust Automatic Speech Recognition (ASR) system for children's speech recognition is a challenging task due to inherent differences in acoustic attributes of adult and child speech and scarcity of publicly available children's speech dataset. In this paper, a novel segmental spectrum war** and perturbations in formant energy are introduced, to generate a children-like speech spectrum… ▽ More

    Submitted 13 March, 2022; originally announced March 2022.

  38. arXiv:2203.03706  [pdf, other

    cs.SD cs.LG eess.AS

    Detection of AI Synthesized Hindi Speech

    Authors: Karan Bhatia, Ansh Agrawal, Priyanka Singh, Arun Kumar Singh

    Abstract: The recent advancements in generative artificial speech models have made possible the generation of highly realistic speech signals. At first, it seems exciting to obtain these artificially synthesized signals such as speech clones or deep fakes but if left unchecked, it may lead us to digital dystopia. One of the primary focus in audio forensics is validating the authenticity of a speech. Though… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: 5 Pages, 6 Figures, 4 Tables

  39. arXiv:2202.07208  [pdf, other

    eess.SY

    Time Domain Simulation of DFIG-Based Wind Power System using Differential Transform Method

    Authors: Pradeep Singh, Upasana Buragohain, Nilanjan Senroy

    Abstract: This paper proposes a new non-iterative time-domain simulation approach using Differential Transform Method (DTM) to solve the set of non-linear Differential-Algebraic Equations (DAEs) involved in a DFIG-based wind power system. The DTM is an analytical as well as numerical approach applied to solve high dimensional non-linear dynamical systems and the solution can be expressed in the form of a se… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

    Comments: 10 pages, 11 figures

  40. arXiv:2202.05177  [pdf, other

    eess.SP cs.LG eess.IV

    Automated Atrial Fibrillation Classification Based on Denoising Stacked Autoencoder and Optimized Deep Network

    Authors: Prateek Singh, Ambalika Sharma, Shreesha Maiya

    Abstract: The incidences of atrial fibrillation (AFib) are increasing at a daunting rate worldwide. For the early detection of the risk of AFib, we have developed an automatic detection system based on deep neural networks. For achieving better classification, it is mandatory to have good pre-processing of physiological signals. Kee** this in mind, we have proposed a two-fold study. First, an end-to-end m… ▽ More

    Submitted 26 January, 2022; originally announced February 2022.

  41. arXiv:2201.09952  [pdf

    eess.IV cs.CV cs.LG

    A Deep Learning Approach for the Detection of COVID-19 from Chest X-Ray Images using Convolutional Neural Networks

    Authors: Aditya Saxena, Shamsheer Pal Singh

    Abstract: The COVID-19 (coronavirus) is an ongoing pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The virus was first identified in mid-December 2019 in the Hubei province of Wuhan, China and by now has spread throughout the planet with more than 75.5 million confirmed cases and more than 1.67 million deaths. With limited number of COVID-19 test kits available in medical fa… ▽ More

    Submitted 24 January, 2022; originally announced January 2022.

  42. arXiv:2112.10074  [pdf, other

    eess.IV cs.CV cs.LG

    QU-BraTS: MICCAI BraTS 2020 Challenge on Quantifying Uncertainty in Brain Tumor Segmentation - Analysis of Ranking Scores and Benchmarking Results

    Authors: Raghav Mehta, Angelos Filos, Ujjwal Baid, Chiharu Sako, Richard McKinley, Michael Rebsamen, Katrin Datwyler, Raphael Meier, Piotr Radojewski, Gowtham Krishnan Murugesan, Sahil Nalawade, Chandan Ganesh, Ben Wagner, Fang F. Yu, Baowei Fei, Ananth J. Madhuranthakam, Joseph A. Maldjian, Laura Daza, Catalina Gomez, Pablo Arbelaez, Chengliang Dai, Shuo Wang, Hadrien Reynaud, Yuan-han Mo, Elsa Angelini , et al. (67 additional authors not shown)

    Abstract: Deep learning (DL) models have provided state-of-the-art performance in various medical imaging benchmarking challenges, including the Brain Tumor Segmentation (BraTS) challenges. However, the task of focal pathology multi-compartment segmentation (e.g., tumor and lesion sub-regions) is particularly challenging, and potential errors hinder translating DL models into clinical workflows. Quantifying… ▽ More

    Submitted 23 August, 2022; v1 submitted 19 December, 2021; originally announced December 2021.

    Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA): https://www.melba-journal.org/papers/2022:026.html

    Journal ref: Machine.Learning.for.Biomedical.Imaging. 1 (2022)

  43. arXiv:2112.01025  [pdf, other

    eess.AS cs.CL cs.SD

    A Mixture of Expert Based Deep Neural Network for Improved ASR

    Authors: Vishwanath Pratap Singh, Shakti P. Rath, Abhishek Pandey

    Abstract: This paper presents a novel deep learning architecture for acoustic model in the context of Automatic Speech Recognition (ASR), termed as MixNet. Besides the conventional layers, such as fully connected layers in DNN-HMM and memory cells in LSTM-HMM, the model uses two additional layers based on Mixture of Experts (MoE). The first MoE layer operating at the input is based on pre-defined broad phon… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

  44. arXiv:2112.01023  [pdf, other

    eess.AS cs.SD

    A higher order Minkowski loss for improved prediction ability of acoustic model in ASR

    Authors: Vishwanath Pratap Singh, Shakti P. Rath, Abhishek Pandey

    Abstract: Conventional automatic speech recognition (ASR) system uses second-order minkowski loss during inference time which is suboptimal as it incorporates only first order statistics in posterior estimation [2]. In this paper we have proposed higher order minkowski loss (4th Order and 6th Order) during inference time, without any changes during training time. The main contribution of the paper is to sho… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

  45. arXiv:2111.04212  [pdf, other

    eess.IV cs.AI cs.CV cs.GR

    Dense Representative Tooth Landmark/axis Detection Network on 3D Model

    Authors: Guangshun Wei, Zhiming Cui, Jie Zhu, Lei Yang, Yuanfeng Zhou, Pradeep Singh, Min Gu, Wen** Wang

    Abstract: Artificial intelligence (AI) technology is increasingly used for digital orthodontics, but one of the challenges is to automatically and accurately detect tooth landmarks and axes. This is partly because of sophisticated geometric definitions of them, and partly due to large variations among individual tooth and across different types of tooth. As such, we propose a deep learning approach with a l… ▽ More

    Submitted 8 November, 2021; v1 submitted 7 November, 2021; originally announced November 2021.

    Comments: 11pages,27figures

  46. arXiv:2109.06824  [pdf, other

    eess.AS cs.SD

    Self-Supervised Metric Learning With Graph Clustering For Speaker Diarization

    Authors: Prachi Singh, Sriram Ganapathy

    Abstract: In this paper, we propose a novel algorithm for speaker diarization using metric learning for graph based clustering. The graph clustering algorithms use an adjacency matrix consisting of similarity scores. These scores are computed between speaker embeddings extracted from pairs of audio segments within the given recording. In this paper, we propose an approach that jointly learns the speaker emb… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

    Comments: 8 pages, Accepted in ASRU 2021

  47. arXiv:2107.11412  [pdf, ps, other

    cs.LG cs.MM cs.SD eess.AS

    Using Deep Learning Techniques and Inferential Speech Statistics for AI Synthesised Speech Recognition

    Authors: Arun Kumar Singh, Priyanka Singh, Karan Nathwani

    Abstract: The recent developments in technology have re-warded us with amazing audio synthesis models like TACOTRON and WAVENETS. On the other side, it poses greater threats such as speech clones and deep fakes, that may go undetected. To tackle these alarming situations, there is an urgent need to propose models that can help discriminate a synthesized speech from an actual human speech and also identify t… ▽ More

    Submitted 23 July, 2021; originally announced July 2021.

    Comments: 13 Pages, 13 Figures, 6 Tables. arXiv admin note: substantial text overlap with arXiv:2009.01934

  48. arXiv:2107.02375  [pdf, other

    cs.LG eess.IV

    SplitAVG: A heterogeneity-aware federated deep learning method for medical imaging

    Authors: Miao Zhang, Liangqiong Qu, Praveer Singh, Jayashree Kalpathy-Cramer, Daniel L. Rubin

    Abstract: Federated learning is an emerging research paradigm for enabling collaboratively training deep learning models without sharing patient data. However, the data from different institutions are usually heterogeneous across institutions, which may reduce the performance of models trained using federated learning. In this study, we propose a novel heterogeneity-aware federated learning method, SplitAVG… ▽ More

    Submitted 10 April, 2022; v1 submitted 5 July, 2021; originally announced July 2021.

  49. arXiv:2106.07972  [pdf

    eess.AS cs.SD

    SRIB Submission to Interspeech 2021 DiCOVA Challenge

    Authors: Vishwanath Pratap Singh, Shashi Kumar, Ravi Shekhar Jha, Abhishek Pandey

    Abstract: The COVID-19 pandemic has resulted in more than 125 million infections and more than 2.7 million casualties. In this paper, we attempt to classify covid vs non-covid cough sounds using signal processing and deep learning methods. Air turbulence, the vibration of tissues, movement of fluid through airways, opening, and closure of glottis are some of the causes for the production of the acoustic sou… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: 5 pages, 5 figures

  50. arXiv:2105.04806  [pdf, ps, other

    eess.AS cs.LG cs.SD eess.SP

    Deep scattering network for speech emotion recognition

    Authors: Premjeet Singh, Goutam Saha, Md Sahidullah

    Abstract: This paper introduces scattering transform for speech emotion recognition (SER). Scattering transform generates feature representations which remain stable to deformations and shifting in time and frequency without much loss of information. In speech, the emotion cues are spread across time and localised in frequency. The time and frequency invariance characteristic of scattering coefficients prov… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

    Comments: 5 pages, 4 figures, Accepted for publication in 2021 European Signal Processing Conference (EUSIPCO 2021)