Skip to main content

Showing 1–50 of 95 results for author: Singh, R

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.13370  [pdf, other

    eess.IV cs.CV cs.LG

    Low-Resolution Chest X-ray Classification via Knowledge Distillation and Multi-task Learning

    Authors: Yasmeena Akhter, Rishabh Ranjan, Richa Singh, Mayank Vatsa

    Abstract: This research addresses the challenges of diagnosing chest X-rays (CXRs) at low resolutions, a common limitation in resource-constrained healthcare settings. High-resolution CXR imaging is crucial for identifying small but critical anomalies, such as nodules or opacities. However, when images are downsized for processing in Computer-Aided Diagnosis (CAD) systems, vital spatial details and receptiv… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: IEEE ISBI 2024

  2. arXiv:2405.09101  [pdf, other

    cs.RO eess.SY

    Adaptive Koopman Embedding for Robust Control of Complex Nonlinear Dynamical Systems

    Authors: Rajpal Singh, Chandan Kumar Sah, Jishnu Keshavan

    Abstract: The discovery of linear embedding is the key to the synthesis of linear control techniques for nonlinear systems. In recent years, while Koopman operator theory has become a prominent approach for learning these linear embeddings through data-driven methods, these algorithms often exhibit limitations in generalizability beyond the distribution captured by training data and are not robust to change… ▽ More

    Submitted 20 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: Corrected the title

  3. arXiv:2405.05937  [pdf, other

    eess.SP eess.SY

    Dynamics of a Towed Cable with Sensor-Array for Underwater Target Motion Analysis

    Authors: Rohit Kumar Singh, Subrata Kumar, Shovan Bhaumik

    Abstract: During a war situation, many times an underwater target motion analysis (TMA) is performed using bearing-only measurements, obtained from a sensor array, which is towed by an own-ship with the help of a connected cable. It is well known that the own-ship is required to perform a manoeuvre in order to make the system observable and localise the target successfully. During the maneuver, it is import… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  4. arXiv:2405.05676  [pdf, other

    eess.SP

    Maximum Correntropy Polynomial Chaos Kalman Filter for Underwater Navigation

    Authors: Rohit Kumar Singh, Joydeb Saha, Shovan Bhaumik

    Abstract: This paper develops an underwater navigation solution that utilizes a strapdown inertial navigation system (SINS) and fuses a set of auxiliary sensors such as an acoustic positioning system, Doppler velocity log, depth meter, attitude meter, and magnetometer to accurately estimate an underwater vessel's position and orientation. The conventional integrated navigation system assumes Gaussian measur… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  5. arXiv:2403.15248  [pdf, other

    cs.CV cs.AI eess.IV

    Self-Supervised Backbone Framework for Diverse Agricultural Vision Tasks

    Authors: Sudhir Sornapudi, Rajhans Singh

    Abstract: Computer vision in agriculture is game-changing with its ability to transform farming into a data-driven, precise, and sustainable industry. Deep learning has empowered agriculture vision to analyze vast, complex visual data, but heavily rely on the availability of large annotated datasets. This remains a bottleneck as manual labeling is error-prone, time-consuming, and expensive. The lack of effi… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  6. arXiv:2402.15707  [pdf, other

    eess.SP quant-ph

    A Quick Guide to Quantum Communication

    Authors: Rohit Singh, Roshan M. Bodile

    Abstract: This article provides a quick overview of quantum communication, bringing together several innovative aspects of quantum enabled transmission. We first take a neutral look at the role of quantum communication, presenting its importance for the forthcoming wireless. Then, we summarise the principles and basic mechanisms involved in quantum communication, including quantum entanglement, quantum supe… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  7. arXiv:2402.09585  [pdf, other

    cs.SD eess.AS

    Domain Adaptation for Contrastive Audio-Language Models

    Authors: Soham Deshmukh, Rita Singh, Bhiksha Raj

    Abstract: Audio-Language Models (ALM) aim to be general-purpose audio models by providing zero-shot capabilities at test time. The zero-shot performance of ALM improves by using suitable text prompts for each domain. The text prompts are usually hand-crafted through an ad-hoc process and lead to a drop in ALM generalization and out-of-distribution performance. Existing approaches to improve domain performan… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  8. arXiv:2402.09244  [pdf, other

    eess.SP

    Zero-energy Devices for 6G: Technical Enablers at a Glance

    Authors: Onel López, Ritesh Kumar Singh, Dinh-Thuy Phan-Huy, Efstathios Katranaras, Nafiseh Mazloum, Riku Jäntti, Hamza Khan, Osmel Rosabal, Pavlos Alexias, Prasoon Raghuwanshi, David Ruiz-Guirola, Bikramjit Singh, Andreas Höglund, Dung Pham Van, Amirhossein Azarbahram, Jeroen Famaey

    Abstract: Low-cost, resource-constrained, maintenance-free, and energy-harvesting (EH) Internet of Things (IoT) devices, referred to as zero-energy devices (ZEDs), are rapidly attracting attention from industry and academia due to their myriad of applications. To date, such devices remain primarily unsupported by modern IoT connectivity solutions due to their intrinsic fabrication, hardware, deployment, and… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: 8 pages, 4 Figures

  9. arXiv:2402.00282  [pdf, other

    eess.AS cs.SD

    PAM: Prompting Audio-Language Models for Audio Quality Assessment

    Authors: Soham Deshmukh, Dareen Alharthi, Benjamin Elizalde, Hannes Gamper, Mahmoud Al Ismail, Rita Singh, Bhiksha Raj, Huaming Wang

    Abstract: While audio quality is a key performance metric for various audio processing tasks, including generative modeling, its objective measurement remains a challenge. Audio-Language Models (ALMs) are pre-trained on audio-text pairs that may contain information about audio quality, the presence of artifacts, or noise. Given an audio input and a text prompt related to quality, an ALM can be used to calcu… ▽ More

    Submitted 31 January, 2024; originally announced February 2024.

  10. arXiv:2401.12803  [pdf, other

    cs.IT cs.AI cs.LG eess.SP

    Enhancements for 5G NR PRACH Reception: An AI/ML Approach

    Authors: Rohit Singh, Anil Kumar Yerrapragada, Jeeva Keshav S, Radha Krishna Ganti

    Abstract: Random Access is an important step in enabling the initial attachment of a User Equipment (UE) to a Base Station (gNB). The UE identifies itself by embedding a Preamble Index (RAPID) in the phase rotation of a known base sequence, which it transmits on the Physical Random Access Channel (PRACH). The signal on the PRACH also enables the estimation of propagation delay, often known as Timing Advance… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  11. arXiv:2310.13817  [pdf, other

    eess.SY

    Deep Learning Based Forecasting-Aided State Estimation in Active Distribution Networks

    Authors: Malek Alduhaymi, Ravindra Singh, Firdous Ul Nazir, Bikash C. Pal

    Abstract: Operating an active distribution network (ADN) in the absence of enough measurements, the presence of distributed energy resources, and poor knowledge of responsive demand behaviour is a huge challenge. This paper introduces systematic modelling of demand response behaviour which is then included in Forecasting Aided State Estimation (FASE) for better control of the network. There are several inno… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  12. arXiv:2310.02298  [pdf, other

    cs.SD cs.AI eess.AS

    Prompting Audios Using Acoustic Properties For Emotion Representation

    Authors: Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh

    Abstract: Emotions lie on a continuum, but current models treat emotions as a finite valued discrete variable. This representation does not capture the diversity in the expression of emotion. To better represent emotions we propose the use of natural language descriptions (or prompts). In this work, we address the challenge of automatically generating these prompts and training a model to better learn emoti… ▽ More

    Submitted 6 December, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2211.07737

  13. arXiv:2310.00706  [pdf, other

    cs.CL cs.SD eess.AS

    Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech

    Authors: Dareen Alharthi, Roshan Sharma, Hira Dhamyal, Soumi Maiti, Bhiksha Raj, Rita Singh

    Abstract: Modern speech synthesis systems have improved significantly, with synthetic speech being indistinguishable from real speech. However, efficient and holistic evaluation of synthetic speech still remains a significant challenge. Human evaluation using Mean Opinion Score (MOS) is ideal, but inefficient due to high costs. Therefore, researchers have developed auxiliary automatic metrics like Word Erro… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

  14. arXiv:2309.13544  [pdf

    cs.IR cs.AI cs.LG cs.SD eess.AS

    Related Rhythms: Recommendation System To Discover Music You May Like

    Authors: Rahul Singh, Pranav Kanuparthi

    Abstract: Machine Learning models are being utilized extensively to drive recommender systems, which is a widely explored topic today. This is especially true of the music industry, where we are witnessing a surge in growth. Besides a large chunk of active users, these systems are fueled by massive amounts of data. These large-scale systems yield applications that aim to provide a better user experience and… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

    ACM Class: I.2.6; H.3.3

  15. arXiv:2309.13227  [pdf, other

    cs.LG cs.SD eess.AS

    Importance of negative sampling in weak label learning

    Authors: Ankit Shah, Fuyu Tang, Zelin Ye, Rita Singh, Bhiksha Raj

    Abstract: Weak-label learning is a challenging task that requires learning from data "bags" containing positive and negative instances, but only the bag labels are known. The pool of negative instances is usually larger than positive instances, thus making selecting the most informative negative instance critical for performance. Such a selection strategy for negative instances from each bag is an open prob… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  16. arXiv:2309.07372  [pdf, other

    eess.AS cs.SD

    Training Audio Captioning Models without Audio

    Authors: Soham Deshmukh, Benjamin Elizalde, Dimitra Emmanouilidou, Bhiksha Raj, Rita Singh, Huaming Wang

    Abstract: Automated Audio Captioning (AAC) is the task of generating natural language descriptions given an audio stream. A typical AAC system requires manually curated training data of audio segments and corresponding text caption annotations. The creation of these audio-caption pairs is costly, resulting in general data scarcity for the task. In this work, we address this major limitation and propose an a… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

  17. arXiv:2308.14190  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Score-Based Generative Models for PET Image Reconstruction

    Authors: Imraj RD Singh, Alexander Denker, Riccardo Barbano, Željko Kereta, Bangti **, Kris Thielemans, Peter Maass, Simon Arridge

    Abstract: Score-based generative models have demonstrated highly promising results for medical image reconstruction tasks in magnetic resonance imaging or computed tomography. However, their application to Positron Emission Tomography (PET) is still largely unexplored. PET image reconstruction involves a variety of challenges, including Poisson noise with high variance and a wide dynamic range. To address t… ▽ More

    Submitted 23 January, 2024; v1 submitted 27 August, 2023; originally announced August 2023.

    Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2024:001

    MSC Class: 15A29; 45Q05 ACM Class: I.4.9; J.2; I.2.1

    Journal ref: Machine.Learning.for.Biomedical.Imaging. 2 (2024)

  18. arXiv:2307.13953  [pdf, other

    cs.CV cs.SD eess.AS

    The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features

    Authors: Liao Qu, Xianwei Zou, Xiang Li, Yandong Wen, Rita Singh, Bhiksha Raj

    Abstract: This work unveils the enigmatic link between phonemes and facial features. Traditional studies on voice-face correlations typically involve using a long period of voice input, including generating face images from voices and reconstructing 3D face meshes from voices. However, in situations like voice-based crimes, the available voice evidence may be short and limited. Additionally, from a physiolo… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: Interspeech 2023

  19. arXiv:2307.13948  [pdf, other

    cs.CV cs.SD eess.AS

    Rethinking Voice-Face Correlation: A Geometry View

    Authors: Xiang Li, Yandong Wen, Muqiao Yang, **glu Wang, Rita Singh, Bhiksha Raj

    Abstract: Previous works on voice-face matching and voice-guided face synthesis demonstrate strong correlations between voice and face, but mainly rely on coarse semantic cues such as gender, age, and emotion. In this paper, we aim to investigate the capability of reconstructing the 3D facial shape from voice from a geometry perspective without any semantic information. We propose a voice-anthropometric mea… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: ACM Multimedia 2023

  20. arXiv:2307.08217  [pdf, other

    cs.CL cs.SD eess.AS

    BASS: Block-wise Adaptation for Speech Summarization

    Authors: Roshan Sharma, Kenneth Zheng, Siddhant Arora, Shinji Watanabe, Rita Singh, Bhiksha Raj

    Abstract: End-to-end speech summarization has been shown to improve performance over cascade baselines. However, such models are difficult to train on very large inputs (dozens of minutes or hours) owing to compute restrictions and are hence trained with truncated model inputs. Truncation leads to poorer models, and a solution to this problem rests in block-wise modeling, i.e., processing a portion of the i… ▽ More

    Submitted 16 July, 2023; originally announced July 2023.

    Comments: Accepted at Interspeech 2023

  21. arXiv:2307.06669  [pdf, other

    cs.SD cs.CR eess.AS

    Uncovering the Deceptions: An Analysis on Audio Spoofing Detection and Future Prospects

    Authors: Rishabh Ranjan, Mayank Vatsa, Richa Singh

    Abstract: Audio has become an increasingly crucial biometric modality due to its ability to provide an intuitive way for humans to interact with machines. It is currently being used for a range of applications, including person authentication to banking to virtual assistants. Research has shown that these systems are also susceptible to spoofing and attacks. Therefore, protecting audio processing systems ag… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

    Comments: Accepted in IJCAI 2023

  22. arXiv:2306.05329  [pdf

    cs.RO eess.SY math.NA math.OC

    Movement Optimization of Robotic Arms for Energy and Time Reduction using Evolutionary Algorithms

    Authors: Abolfazl Akbari, Saeed Mozaffari, Rajmeet Singh, Majid Ahmadi, Shahpour Alirezaee

    Abstract: Trajectory optimization of a robot manipulator consists of both optimization of the robot movement as well as optimization of the robot end-effector path. This paper aims to find optimum movement parameters including movement type, speed, and acceleration to minimize robot energy. Trajectory optimization by minimizing the energy would increase the longevity of robotic manipulators. We utilized the… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

  23. arXiv:2305.16974  [pdf, other

    eess.SY cs.LG

    Finite Time Regret Bounds for Minimum Variance Control of Autoregressive Systems with Exogenous Inputs

    Authors: Rahul Singh, Akshay Mete, Avik Kar, P. R. Kumar

    Abstract: Minimum variance controllers have been employed in a wide-range of industrial applications. A key challenge experienced by many adaptive controllers is their poor empirical performance in the initial stages of learning. In this paper, we address the problem of initializing them so that they provide acceptable transients, and also provide an accompanying finite-time regret analysis, for adaptive mi… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

  24. arXiv:2305.11834  [pdf, other

    eess.AS cs.SD

    Pengi: An Audio Language Model for Audio Tasks

    Authors: Soham Deshmukh, Benjamin Elizalde, Rita Singh, Huaming Wang

    Abstract: In the domain of audio processing, Transfer Learning has facilitated the rise of Self-Supervised Learning and Zero-Shot Learning techniques. These approaches have led to the development of versatile models capable of tackling a wide array of tasks, while delivering state-of-the-art performance. However, current models inherently lack the capacity to produce the requisite language for open-ended ta… ▽ More

    Submitted 18 January, 2024; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted at NeurIPS 2023. The manuscript is updated with additional experiments suggested by reviewers

  25. arXiv:2303.17660  [pdf, other

    physics.optics eess.IV

    Randomness assisted in-line holography with deep learning

    Authors: Manisha, Aditya Chandra Mandal, Mohit Rathor, Zeev Zalevsky, Rakesh Kumar Singh

    Abstract: We propose and demonstrate a holographic imaging scheme exploiting random illuminations for recording hologram and then applying numerical reconstruction and twin removal. We use an in-line holographic geometry to record the hologram in terms of the second-order correlation and apply the numerical approach to reconstruct the recorded hologram. The twin image issue of the in-line holographic scheme… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: 10 pages, 7 figures

  26. arXiv:2302.07476  [pdf, other

    cs.IT eess.SP

    Indexed Multiple Access with Reconfigurable Intelligent Surfaces: The Reflection Tuning Potential

    Authors: Rohit Singh, Aryan Kaushik, Wonjae Shin, George C. Alexandropoulos, Mesut Toka, Marco Di Renzo

    Abstract: Indexed modulation (IM) is an evolving technique that has become popular due to its ability of parallel data communication over distinct combinations of transmission entities. In this article, we first provide a comprehensive survey of IM-enabled multiple access (MA) techniques, emphasizing the shortcomings of existing non-indexed MA schemes. Theoretical comparisons are presented to show how the n… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

    Comments: 7 pages, 5 figures, 1 table

  27. arXiv:2302.07375  [pdf

    eess.SP

    The Role of Physical Layer Security in Satellite-Based Networks

    Authors: R. Singh, I. Ahmad, J. Huusko

    Abstract: In the coming years, 6G will revolutionize the world with a large amount of bandwidth, high data rates, and extensive coverage in remote and rural areas. These goals can only be achieved by integrating terrestrial networks with non-terrestrial networks. On the other hand, these advancements are raising more concerns than other wireless links about malicious attacks on satellite-terrestrial links d… ▽ More

    Submitted 14 February, 2023; originally announced February 2023.

  28. arXiv:2301.07853  [pdf

    cs.RO cs.HC eess.SY

    DECISIVE Benchmarking Data Report: sUAS Performance Results from Phase I

    Authors: Adam Norton, Reza Ahmadzadeh, Kshitij Jerath, Paul Robinette, Jay Weitzen, Thanuka Wickramarathne, Holly Yanco, Minseop Choi, Ryan Donald, Brendan Donoghue, Christian Dumas, Peter Gavriel, Alden Giedraitis, Brendan Hertel, Jack Houle, Nathan Letteri, Edwin Meriaux, Zahra Rezaei Khavas, Rakshith Singh, Gregg Willcox, Naye Yoni

    Abstract: This report reviews all results derived from performance benchmarking conducted during Phase I of the Development and Execution of Comprehensive and Integrated Subterranean Intelligent Vehicle Evaluations (DECISIVE) project by the University of Massachusetts Lowell, using the test methods specified in the DECISIVE Test Methods Handbook v1.1 for evaluating small unmanned aerial systems (sUAS) perfo… ▽ More

    Submitted 20 January, 2023; v1 submitted 18 January, 2023; originally announced January 2023.

    Comments: Approved for public release: PAO #PR2023_74172; arXiv admin note: substantial text overlap with arXiv:2211.01801

  29. arXiv:2211.08367  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    FlowGrad: Using Motion for Visual Sound Source Localization

    Authors: Rajsuryan Singh, Pablo Zinemanas, Xavier Serra, Juan Pablo Bello, Magdalena Fuentes

    Abstract: Most recent work in visual sound source localization relies on semantic audio-visual representations learned in a self-supervised manner, and by design excludes temporal information present in videos. While it proves to be effective for widely used benchmark datasets, the method falls short for challenging scenarios like urban traffic. This work introduces temporal context into the state-of-the-ar… ▽ More

    Submitted 14 April, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: Accepted in ICASSP 2023

  30. arXiv:2211.07737  [pdf, other

    cs.SD cs.LG eess.AS

    Describing emotions with acoustic property prompts for speech emotion recognition

    Authors: Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh

    Abstract: Emotions lie on a broad continuum and treating emotions as a discrete number of classes limits the ability of a model to capture the nuances in the continuum. The challenge is how to describe the nuances of emotions and how to enable a model to learn the descriptions. In this work, we devise a method to automatically create a description (or prompt) for a given audio by computing acoustic properti… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

  31. arXiv:2211.02005  [pdf, other

    eess.SP cs.IT

    Robust Dependence Measure using RKHS based Uncertainty Moments and Optimal Transport

    Authors: Rishabh Singh, Jose C. Principe

    Abstract: Reliable measurement of dependence between variables is essential in many applications of statistics and machine learning. Current approaches for dependence estimation, especially density-based approaches, lack in precision, robustness and/or interpretability (in terms of the type of dependence being estimated). We propose a two-step approach for dependence quantification between random variables:… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

  32. arXiv:2211.01999  [pdf, other

    cs.CV cs.IT cs.LG eess.IV

    Quantifying Model Uncertainty for Semantic Segmentation using Operators in the RKHS

    Authors: Rishabh Singh, Jose C. Principe

    Abstract: Deep learning models for semantic segmentation are prone to poor performance in real-world applications due to the highly challenging nature of the task. Model uncertainty quantification (UQ) is one way to address this issue of lack of model trustworthiness by enabling the practitioner to know how much to trust a segmentation output. Current UQ methods in this application domain are mainly restric… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

  33. arXiv:2211.01801  [pdf

    cs.RO cs.HC eess.SY

    DECISIVE Test Methods Handbook: Test Methods for Evaluating sUAS in Subterranean and Constrained Indoor Environments, Version 1.1

    Authors: Adam Norton, Reza Ahmadzadeh, Kshitij Jerath, Paul Robinette, Jay Weitzen, Thanuka Wickramarathne, Holly Yanco, Minseop Choi, Ryan Donald, Brendan Donoghue, Christian Dumas, Peter Gavriel, Alden Giedraitis, Brendan Hertel, Jack Houle, Nathan Letteri, Edwin Meriaux, Zahra Rezaei Khavas, Rakshith Singh, Gregg Willcox, Naye Yoni

    Abstract: This handbook outlines all test methods developed under the Development and Execution of Comprehensive and Integrated Subterranean Intelligent Vehicle Evaluations (DECISIVE) project by the University of Massachusetts Lowell for evaluating small unmanned aerial systems (sUAS) performance in subterranean and constrained indoor environments, spanning communications, field readiness, interface, obstac… ▽ More

    Submitted 20 January, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

    Comments: Approved for public release: PAO #PR2022_47058

  34. arXiv:2210.16642  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Unifying the Discrete and Continuous Emotion labels for Speech Emotion Recognition

    Authors: Roshan Sharma, Hira Dhamyal, Bhiksha Raj, Rita Singh

    Abstract: Traditionally, in paralinguistic analysis for emotion detection from speech, emotions have been identified with discrete or dimensional (continuous-valued) labels. Accordingly, models that have been proposed for emotion detection use one or the other of these label types. However, psychologists like Russell and Plutchik have proposed theories and models that unite these views, maintaining that the… ▽ More

    Submitted 29 October, 2022; originally announced October 2022.

    Comments: Under Review at ICASSP 2023

  35. arXiv:2206.12568  [pdf, other

    cs.SD cs.AI eess.AS

    Self-supervision and Learnable STRFs for Age, Emotion, and Country Prediction

    Authors: Roshan Sharma, Tyler Vuong, Mark Lindsey, Hira Dhamyal, Rita Singh, Bhiksha Raj

    Abstract: This work presents a multitask approach to the simultaneous estimation of age, country of origin, and emotion given vocal burst audio for the 2022 ICML Expressive Vocalizations Challenge ExVo-MultiTask track. The method of choice utilized a combination of spectro-temporal modulation and self-supervised features, followed by an encoder-decoder network organized in a multitask paradigm. We evaluate… ▽ More

    Submitted 25 June, 2022; originally announced June 2022.

    Journal ref: Proceedings of the 39th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022

  36. arXiv:2206.08826  [pdf, other

    cs.LG cs.CV eess.IV

    Multimodal Attention-based Deep Learning for Alzheimer's Disease Diagnosis

    Authors: Michal Golovanevsky, Carsten Eickhoff, Ritambhara Singh

    Abstract: Alzheimer's Disease (AD) is the most common neurodegenerative disorder with one of the most complex pathogeneses, making effective and clinically actionable decision support difficult. The objective of this study was to develop a novel multimodal deep learning framework to aid medical professionals in AD diagnosis. We present a Multimodal Alzheimer's Disease Diagnosis framework (MADDi) to accurate… ▽ More

    Submitted 23 September, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

    Comments: 11 pages, 5 figures

    Journal ref: Journal of the American Medical Informatics Association, 2022; ocac168

  37. arXiv:2205.09677  [pdf, other

    physics.optics eess.IV

    Reconstructing complex field through opaque scattering layer with structured light illumination

    Authors: Aditya Chandra Mandal, Manisha, Abhijeet Phatak, Zeev Zalevsky, Rakesh Kumar Singh

    Abstract: The wavefront is scrambled when coherent light propagates through a random scattering medium and which makes direct use of the conventional optical methods ineffective. In this paper, we propose and demonstrate a structured light illumination for imaging through an opaque scattering layer. Proposed technique is reference free and capable to recover the complex field from intensities of the speckle… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

    Comments: 23 pages, 7 figures

  38. arXiv:2204.04802  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    On the pragmatism of using binary classifiers over data intensive neural network classifiers for detection of COVID-19 from voice

    Authors: Ankit Shah, Hira Dhamyal, Yang Gao, Daniel Arancibia, Mario Arancibia, Bhiksha Raj, Rita Singh

    Abstract: Lately, there has been a global effort by multiple research groups to detect COVID-19 from voice. Different researchers use different kinds of information from the voice signal to achieve this. Various types of phonated sounds and the sound of cough and breath have all been used with varying degree of success in automated voice-based COVID-19 detection apps. In this paper, we show that detecting C… ▽ More

    Submitted 25 October, 2022; v1 submitted 10 April, 2022; originally announced April 2022.

    Comments: Submitted to ICASSP 2022

  39. arXiv:2203.11725  [pdf, other

    eess.IV cs.CV

    Unsupervised Anomaly Detection in Medical Images with a Memory-augmented Multi-level Cross-attentional Masked Autoencoder

    Authors: Yu Tian, Guansong Pang, Yuyuan Liu, Chong Wang, Yuanhong Chen, Fengbei Liu, Rajvinder Singh, Johan W Verjans, Mengyu Wang, Gustavo Carneiro

    Abstract: Unsupervised anomaly detection (UAD) aims to find anomalous images by optimising a detector using a training set that contains only normal images. UAD approaches can be based on reconstruction methods, self-supervised approaches, and Imagenet pre-trained models. Reconstruction methods, which detect anomalies from image reconstruction errors, are advantageous because they do not rely on the design… ▽ More

    Submitted 21 August, 2023; v1 submitted 22 March, 2022; originally announced March 2022.

    Comments: Accepted to MICCAI MLMI2023

  40. arXiv:2201.10542  [pdf, other

    math.OC cs.LG eess.SY

    Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems

    Authors: Akshay Mete, Rahul Singh, P. R. Kumar

    Abstract: We consider the problem of controlling an unknown stochastic linear system with quadratic costs - called the adaptive LQ control problem. We re-examine an approach called ''Reward Biased Maximum Likelihood Estimate'' (RBMLE) that was proposed more than forty years ago, and which predates the ''Upper Confidence Bound'' (UCB) method as well as the definition of ''regret'' for bandit problems. It sim… ▽ More

    Submitted 24 March, 2023; v1 submitted 25 January, 2022; originally announced January 2022.

    Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022). https://openreview.net/forum?id=7pNV4PCjbQy

  41. arXiv:2112.07102  [pdf, other

    eess.IV cs.CV cs.LG

    COVID-19 Pneumonia and Influenza Pneumonia Detection Using Convolutional Neural Networks

    Authors: Julianna Antonchuk, Benjamin Prescott, Philip Melanchthon, Robin Singh

    Abstract: In the research, we developed a computer vision solution to support diagnostic radiology in differentiating between COVID-19 pneumonia, influenza virus pneumonia, and normal biomarkers. The chest radiograph appearance of COVID-19 pneumonia is thought to be nonspecific, having presented a challenge to identify an optimal architecture of a convolutional neural network (CNN) that would classify with… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

    Comments: for associated Azure ML notebook code, see https://github.com/bcprescott/MSDS/tree/main/Capstone_COVID19/code/AML

  42. arXiv:2110.08820  [pdf, other

    cs.LG eess.SY

    On-board Fault Diagnosis of a Laboratory Mini SR-30 Gas Turbine Engine

    Authors: Richa Singh

    Abstract: Inspired by recent progress in machine learning, a data-driven fault diagnosis and isolation (FDI) scheme is explicitly developed for failure in the fuel supply system and sensor measurements of the laboratory gas turbine system. A passive approach of fault diagnosis is implemented where a model is trained using machine learning classifiers to detect a given set of fault scenarios in real-time on… ▽ More

    Submitted 19 October, 2021; v1 submitted 17 October, 2021; originally announced October 2021.

  43. arXiv:2110.04800  [pdf, other

    cs.CV cs.LG eess.IV

    Self-Supervised 3D Face Reconstruction via Conditional Estimation

    Authors: Yandong Wen, Weiyang Liu, Bhiksha Raj, Rita Singh

    Abstract: We present a conditional estimation (CEST) framework to learn 3D facial parameters from 2D single-view images by self-supervised training from videos. CEST is based on the process of analysis by synthesis, where the 3D facial parameters (shape, reflectance, viewpoint, and illumination) are estimated from the face image, and then recombined to reconstruct the 2D face image. In order to learn semant… ▽ More

    Submitted 10 October, 2021; originally announced October 2021.

    Comments: ICCV 2021 (15 pages)

  44. arXiv:2110.04678  [pdf, other

    cs.SD cs.AI eess.AS

    An Overview of Techniques for Biomarker Discovery in Voice Signal

    Authors: Rita Singh, Ankit Shah, Hira Dhamyal

    Abstract: This paper reflects on the effect of several categories of medical conditions on human voice, focusing on those that may be hypothesized to have effects on voice, but for which the changes themselves may be subtle enough to have eluded observation in standard analytical examinations of the voice signal. It presents three categories of techniques that can potentially uncover such elusive biomarkers… ▽ More

    Submitted 9 October, 2021; originally announced October 2021.

    Comments: Last two authors contributed equally to the paper

  45. arXiv:2109.05580  [pdf, other

    eess.IV cs.CV cs.LG q-bio.TO

    A Joint Graph and Image Convolution Network for Automatic Brain Tumor Segmentation

    Authors: Camillo Saueressig, Adam Berkley, Reshma Munbodh, Ritambhara Singh

    Abstract: We present a joint graph convolution-image convolution neural network as our submission to the Brain Tumor Segmentation (BraTS) 2021 challenge. We model each brain as a graph composed of distinct image regions, which is initially segmented by a graph neural network (GNN). Subsequently, the tumorous volume identified by the GNN is further refined by a simple (voxel) convolutional neural network (CN… ▽ More

    Submitted 30 July, 2022; v1 submitted 12 September, 2021; originally announced September 2021.

    Comments: 9 pages, 3 figures, submitted to BrainLes Workshop (MICCAI 2021) as part of BraTS2021 challenge

  46. arXiv:2109.01303  [pdf, other

    eess.IV cs.CV

    Self-supervised Pseudo Multi-class Pre-training for Unsupervised Anomaly Detection and Segmentation in Medical Images

    Authors: Yu Tian, Fengbei Liu, Guansong Pang, Yuanhong Chen, Yuyuan Liu, Johan W. Verjans, Rajvinder Singh, Gustavo Carneiro

    Abstract: Unsupervised anomaly detection (UAD) methods are trained with normal (or healthy) images only, but during testing, they are able to classify normal and abnormal (or disease) images. UAD is an important medical image analysis (MIA) method to be applied in disease screening problems because the training sets available for those problems usually contain only normal images. However, the exclusive reli… ▽ More

    Submitted 14 August, 2023; v1 submitted 3 September, 2021; originally announced September 2021.

    Comments: Accepted to Medical Image Analysis

  47. arXiv:2108.10579  [pdf, other

    eess.IV cs.AI cs.LG eess.SP

    Lossy Medical Image Compression using Residual Learning-based Dual Autoencoder Model

    Authors: Dipti Mishra, Satish Kumar Singh, Rajat Kumar Singh

    Abstract: In this work, we propose a two-stage autoencoder based compressor-decompressor framework for compressing malaria RBC cell image patches. We know that the medical images used for disease diagnosis are around multiple gigabytes size, which is quite huge. The proposed residual-based dual autoencoder network is trained to extract the unique features which are then used to reconstruct the original imag… ▽ More

    Submitted 24 August, 2021; originally announced August 2021.

  48. arXiv:2107.11662  [pdf, other

    stat.ML cs.LG eess.SP eess.SY

    Inference of collective Gaussian hidden Markov models

    Authors: Rahul Singh, Yongxin Chen

    Abstract: We consider inference problems for a class of continuous state collective hidden Markov models, where the data is recorded in aggregate (collective) form generated by a large population of individuals following the same dynamics. We propose an aggregate inference algorithm called collective Gaussian forward-backward algorithm, extending recently proposed Sinkhorn belief propagation algorithm to mo… ▽ More

    Submitted 24 July, 2021; originally announced July 2021.

  49. arXiv:2107.07988  [pdf, other

    cs.CV cs.LG cs.SD eess.AS eess.IV

    Controlled AutoEncoders to Generate Faces from Voices

    Authors: Hao Liang, Lulan Yu, Guikang Xu, Bhiksha Raj, Rita Singh

    Abstract: Multiple studies in the past have shown that there is a strong correlation between human vocal characteristics and facial features. However, existing approaches generate faces simply from voice, without exploring the set of features that contribute to these observed correlations. A computational methodology to explore this can be devised by rephrasing the question to: "how much would a target face… ▽ More

    Submitted 16 July, 2021; originally announced July 2021.

  50. arXiv:2106.06858  [pdf, other

    eess.AS cs.LG

    Improving weakly supervised sound event detection with self-supervised auxiliary tasks

    Authors: Soham Deshmukh, Bhiksha Raj, Rita Singh

    Abstract: While multitask and transfer learning has shown to improve the performance of neural networks in limited data settings, they require pretraining of the model on large datasets beforehand. In this paper, we focus on improving the performance of weakly supervised sound event detection in low data and noisy settings simultaneously without requiring any pretraining task. To that extent, we propose a s… ▽ More

    Submitted 12 June, 2021; originally announced June 2021.

    Comments: Accepted at INTERSPEECH 21