-
MAGMA: Music Aligned Generative Motion Autodecoder
Authors:
Sohan Anisetty,
Amit Raj,
James Hays
Abstract:
Map** music to dance is a challenging problem that requires spatial and temporal coherence along with a continual synchronization with the music's progression. Taking inspiration from large language models, we introduce a 2-step approach for generating dance using a Vector Quantized-Variational Autoencoder (VQ-VAE) to distill motion into primitives and train a Transformer decoder to learn the co…
▽ More
Map** music to dance is a challenging problem that requires spatial and temporal coherence along with a continual synchronization with the music's progression. Taking inspiration from large language models, we introduce a 2-step approach for generating dance using a Vector Quantized-Variational Autoencoder (VQ-VAE) to distill motion into primitives and train a Transformer decoder to learn the correct sequencing of these primitives. We also evaluate the importance of music representations by comparing naive music feature extraction using Librosa to deep audio representations generated by state-of-the-art audio compression algorithms. Additionally, we train variations of the motion generator using relative and absolute positional encodings to determine the effect on generated motion quality when generating arbitrarily long sequence lengths. Our proposed approach achieve state-of-the-art results in music-to-motion generation benchmarks and enables the real-time generation of considerably longer motion sequences, the ability to chain multiple motion sequences seamlessly, and easy customization of motion sequences to meet style requirements.
△ Less
Submitted 3 September, 2023;
originally announced September 2023.
-
A Robust ADMM-Based Optimization Algorithm For Underwater Acoustic Channel Estimation
Authors:
Tian Tian,
Agastya Raj,
Bruno Missi Xavier,
Ying Zhang,
Feiyun Wu,
Kunde Yang
Abstract:
Accurate estimation of the Underwater acoustic (UWA) is a key part of underwater communications, especially for coherent systems. The severe multipath effects and large delay spreads make the estimation problem large-scale. The non-stationary, non-Gaussian, and impulsive nature of ocean ambient noise poses further obstacles to the design of estimation algorithms. Under the framework of compressed…
▽ More
Accurate estimation of the Underwater acoustic (UWA) is a key part of underwater communications, especially for coherent systems. The severe multipath effects and large delay spreads make the estimation problem large-scale. The non-stationary, non-Gaussian, and impulsive nature of ocean ambient noise poses further obstacles to the design of estimation algorithms. Under the framework of compressed sensing (CS), this work addresses the issue of robust channel estimation when measurements are contaminated by impulsive noise. A first-order algorithm based on alternating direction method of multipliers (ADMM) is proposed. Numerical simulations of time-varying channel estimation are performed to show its improved performance in highly impulsive noise environments.
△ Less
Submitted 24 August, 2023; v1 submitted 23 August, 2023;
originally announced August 2023.
-
A Reinforcement Learning Approach for Performance-aware Reduction in Power Consumption of Data Center Compute Nodes
Authors:
Akhilesh Raj,
Swann Perarnau,
Aniruddha Gokhale
Abstract:
As Exascale computing becomes a reality, the energy needs of compute nodes in cloud data centers will continue to grow. A common approach to reducing this energy demand is to limit the power consumption of hardware components when workloads are experiencing bottlenecks elsewhere in the system. However, designing a resource controller capable of detecting and limiting power consumption on-the-fly i…
▽ More
As Exascale computing becomes a reality, the energy needs of compute nodes in cloud data centers will continue to grow. A common approach to reducing this energy demand is to limit the power consumption of hardware components when workloads are experiencing bottlenecks elsewhere in the system. However, designing a resource controller capable of detecting and limiting power consumption on-the-fly is a complex issue and can also adversely impact application performance. In this paper, we explore the use of Reinforcement Learning (RL) to design a power cap** policy on cloud compute nodes using observations on current power consumption and instantaneous application performance (heartbeats). By leveraging the Argo Node Resource Management (NRM) software stack in conjunction with the Intel Running Average Power Limit (RAPL) hardware control mechanism, we design an agent to control the maximum supplied power to processors without compromising on application performance. Employing a Proximal Policy Optimization (PPO) agent to learn an optimal policy on a mathematical model of the compute nodes, we demonstrate and evaluate using the STREAM benchmark how a trained agent running on actual hardware can take actions by balancing power consumption and application performance.
△ Less
Submitted 15 August, 2023;
originally announced August 2023.
-
MASR: Multi-label Aware Speech Representation
Authors:
Anjali Raj,
Shikhar Bharadwaj,
Sriram Ganapathy,
Min Ma,
Shikhar Vashishth
Abstract:
In the recent years, speech representation learning is constructed primarily as a self-supervised learning (SSL) task, using the raw audio signal alone, while ignoring the side-information that is often available for a given speech recording. In this paper, we propose MASR, a Multi-label Aware Speech Representation learning framework, which addresses the aforementioned limitations. MASR enables th…
▽ More
In the recent years, speech representation learning is constructed primarily as a self-supervised learning (SSL) task, using the raw audio signal alone, while ignoring the side-information that is often available for a given speech recording. In this paper, we propose MASR, a Multi-label Aware Speech Representation learning framework, which addresses the aforementioned limitations. MASR enables the inclusion of multiple external knowledge sources to enhance the utilization of meta-data information. The external knowledge sources are incorporated in the form of sample-level pair-wise similarity matrices that are useful in a hard-mining loss. A key advantage of the MASR framework is that it can be combined with any choice of SSL method. Using MASR representations, we perform evaluations on several downstream tasks such as language identification, speech recognition and other non-semantic tasks such as speaker and emotion recognition. In these experiments, we illustrate significant performance improvements for the MASR over other established benchmarks. We perform a detailed analysis on the language identification task to provide insights on how the proposed loss function enables the representations to separate closely related languages.
△ Less
Submitted 25 September, 2023; v1 submitted 20 July, 2023;
originally announced July 2023.
-
Sensor Data Validation for Garbage Collection Using Machine Learning
Authors:
Kabeer Gulati,
Zuhaib Ahmad,
Abhishek Raj
Abstract:
Any complex dynamic system's ability to function successfully depends in significant part on the accuracy of the sensor data; hence sensor data validation is crucial. Because sensor data is utilized for monitoring and oversight, erroneous sensor data would result in overall poor process output. In this study, the data confidence of the sensor data is ascertained using a Mamdani fuzzy inference sys…
▽ More
Any complex dynamic system's ability to function successfully depends in significant part on the accuracy of the sensor data; hence sensor data validation is crucial. Because sensor data is utilized for monitoring and oversight, erroneous sensor data would result in overall poor process output. In this study, the data confidence of the sensor data is ascertained using a Mamdani fuzzy inference system. Erroneous data can be corrected with this method. If the sensor outputs faulty value for a prolonged period of time, the system will be reported and a report will be generated. This can be used as a generic module for any system. This fuzzy system is then used on the readings from an ultrasonic sensor and is used as a part of a bigger and more complex IoT system.
△ Less
Submitted 16 April, 2023;
originally announced April 2023.
-
OGInfra: Geolocating Oil & Gas Infrastructure using Remote Sensing based Active Fire Data
Authors:
Samyak Prajapati,
Amrit Raj,
Yash Chaudhari,
Akhilesh Nandwal,
Japman Singh Monga
Abstract:
Remote sensing has become a crucial part of our daily lives, whether it be from triangulating our location using GPS or providing us with a weather forecast. It has multiple applications in domains such as military, socio-economical, commercial, and even in supporting humanitarian efforts. This work proposes a novel technique for the automated geo-location of Oil & Gas infrastructure with the use…
▽ More
Remote sensing has become a crucial part of our daily lives, whether it be from triangulating our location using GPS or providing us with a weather forecast. It has multiple applications in domains such as military, socio-economical, commercial, and even in supporting humanitarian efforts. This work proposes a novel technique for the automated geo-location of Oil & Gas infrastructure with the use of Active Fire Data from the NASA FIRMS data repository & Deep Learning techniques; achieving a top accuracy of 90.68% with the use of ResNet101.
△ Less
Submitted 30 October, 2022;
originally announced October 2022.
-
Deep Learning-Based MR Image Re-parameterization
Authors:
Abhijeet Narang,
Abhigyan Raj,
Mihaela Pop,
Mehran Ebrahimi
Abstract:
Magnetic resonance (MR) image re-parameterization refers to the process of generating via simulations of an MR image with a new set of MRI scanning parameters. Different parameter values generate distinct contrast between different tissues, hel** identify pathologic tissue. Typically, more than one scan is required for diagnosis; however, acquiring repeated scans can be costly, time-consuming, a…
▽ More
Magnetic resonance (MR) image re-parameterization refers to the process of generating via simulations of an MR image with a new set of MRI scanning parameters. Different parameter values generate distinct contrast between different tissues, hel** identify pathologic tissue. Typically, more than one scan is required for diagnosis; however, acquiring repeated scans can be costly, time-consuming, and difficult for patients. Thus, using MR image re-parameterization to predict and estimate the contrast in these imaging scans can be an effective alternative. In this work, we propose a novel deep learning (DL) based convolutional model for MRI re-parameterization. Based on our preliminary results, DL-based techniques hold the potential to learn the non-linearities that govern the re-parameterization.
△ Less
Submitted 12 April, 2024; v1 submitted 11 June, 2022;
originally announced June 2022.
-
HWRCNet: Handwritten Word Recognition in JPEG Compressed Domain using CNN-BiLSTM Network
Authors:
Bulla Rajesh,
Abhishek Kumar Gupta,
Ayush Raj,
Mohammed Javed,
Shiv Ram Dubey
Abstract:
Handwritten word recognition from document images using deep learning is an active research area in the field of Document Image Analysis and Recognition. In the present era of Big data, since more and more documents are being generated and archived in the compressed form to provide better storage and transmission efficiencies, the problem of word recognition in the respective compressed domain wit…
▽ More
Handwritten word recognition from document images using deep learning is an active research area in the field of Document Image Analysis and Recognition. In the present era of Big data, since more and more documents are being generated and archived in the compressed form to provide better storage and transmission efficiencies, the problem of word recognition in the respective compressed domain without decompression becomes very challenging. The traditional methods employ decompression and then apply learning algorithms over them, therefore, novel algorithms are to be designed in order to apply learning techniques directly in the compressed representations/domains. In this direction, this research paper proposes a novel HWRCNet model for handwritten word recognition directly in the compressed domain specifically focusing on JPEG format. The proposed model combines the Convolutional Neural Network (CNN) and Bi-Directional Long Short Term Memory (BiLSTM) based Recurrent Neural Network (RNN). Basically, we train the model using JPEG compressed word images and observe a very appealing performance with $89.05\%$ word recognition accuracy and $13.37\%$ character error rate.
△ Less
Submitted 17 February, 2023; v1 submitted 3 January, 2022;
originally announced January 2022.
-
Explanatory Analysis and Rectification of the Pitfalls in COVID-19 Datasets
Authors:
Samyak Prajapati,
Japman Singh Monga,
Shaanya Singh,
Amrit Raj,
Yuvraj Singh Champawat,
Chandra Prakash
Abstract:
Since the onset of the COVID-19 pandemic in 2020, millions of people have succumbed to this deadly virus. Many attempts have been made to devise an automated method of testing that could detect the virus. Various researchers around the globe have proposed deep learning based methodologies to detect the COVID-19 using Chest X-Rays. However, questions have been raised on the presence of bias in the…
▽ More
Since the onset of the COVID-19 pandemic in 2020, millions of people have succumbed to this deadly virus. Many attempts have been made to devise an automated method of testing that could detect the virus. Various researchers around the globe have proposed deep learning based methodologies to detect the COVID-19 using Chest X-Rays. However, questions have been raised on the presence of bias in the publicly available Chest X-Ray datasets which have been used by the majority of the researchers. In this paper, we propose a 2 staged methodology to address this topical issue. Two experiments have been conducted as a part of stage 1 of the methodology to exhibit the presence of bias in the datasets. Subsequently, an image segmentation, super-resolution and CNN based pipeline along with different image augmentation techniques have been proposed in stage 2 of the methodology to reduce the effect of bias. InceptionResNetV2 trained on Chest X-Ray images that were augmented with Histogram Equalization followed by Gamma Correction when passed through the pipeline proposed in stage 2, yielded a top accuracy of 90.47% for 3-class (Normal, Pneumonia, and COVID-19) classification task.
△ Less
Submitted 10 November, 2021;
originally announced November 2021.
-
Harmony-Search and Otsu based System for Coronavirus Disease (COVID-19) Detection using Lung CT Scan Images
Authors:
V. Ra**ikanth,
Nilanjan Dey,
Alex Noel Joseph Raj,
Aboul Ella Hassanien,
K. C. Santosh,
N. Sri Madhava Raja
Abstract:
Pneumonia is one of the foremost lung diseases and untreated pneumonia will lead to serious threats for all age groups. The proposed work aims to extract and evaluate the Coronavirus disease (COVID-19) caused pneumonia infection in lung using CT scans. We propose an image-assisted system to extract COVID-19 infected sections from lung CT scans (coronal view). It includes following steps: (i) Thres…
▽ More
Pneumonia is one of the foremost lung diseases and untreated pneumonia will lead to serious threats for all age groups. The proposed work aims to extract and evaluate the Coronavirus disease (COVID-19) caused pneumonia infection in lung using CT scans. We propose an image-assisted system to extract COVID-19 infected sections from lung CT scans (coronal view). It includes following steps: (i) Threshold filter to extract the lung region by eliminating possible artifacts; (ii) Image enhancement using Harmony-Search-Optimization and Otsu thresholding; (iii) Image segmentation to extract infected region(s); and (iv) Region-of-interest (ROI) extraction (features) from binary image to compute level of severity. The features that are extracted from ROI are then employed to identify the pixel ratio between the lung and infection sections to identify infection level of severity. The primary objective of the tool is to assist the pulmonologist not only to detect but also to help plan treatment process. As a consequence, for mass screening processing, it will help prevent diagnostic burden.
△ Less
Submitted 6 April, 2020;
originally announced April 2020.
-
Multi-Plateau Ensemble for Endoscopic Artefact Segmentation and Detection
Authors:
Suyog Jadhav,
Udbhav Bamba,
Arnav Chavan,
Rishabh Tiwari,
Aryan Raj
Abstract:
Endoscopic artefact detection challenge consists of 1) Artefact detection, 2) Semantic segmentation, and 3) Out-of-sample generalisation. For Semantic segmentation task, we propose a multi-plateau ensemble of FPN (Feature Pyramid Network) with EfficientNet as feature extractor/encoder. For Object detection task, we used a three model ensemble of RetinaNet with Resnet50 Backbone and FasterRCNN (FPN…
▽ More
Endoscopic artefact detection challenge consists of 1) Artefact detection, 2) Semantic segmentation, and 3) Out-of-sample generalisation. For Semantic segmentation task, we propose a multi-plateau ensemble of FPN (Feature Pyramid Network) with EfficientNet as feature extractor/encoder. For Object detection task, we used a three model ensemble of RetinaNet with Resnet50 Backbone and FasterRCNN (FPN + DC5) with Resnext101 Backbone}. A PyTorch implementation to our approach to the problem is available at https://github.com/ubamba98/EAD2020.
△ Less
Submitted 23 March, 2020;
originally announced March 2020.
-
Super-Resolution DOA Estimation for Arbitrary Array Geometries Using a Single Noisy Snapshot
Authors:
A. Govinda Raj,
J. H. McClellan
Abstract:
We address the problem of search-free DOA estimation from a single noisy snapshot for sensor arrays of arbitrary geometry, by extending a method of gridless super-resolution beamforming to arbitrary arrays with noisy measurements. The primal atomic norm minimization problem is converted to a dual problem in which the periodic dual function is represented with a trigonometric polynomial using trunc…
▽ More
We address the problem of search-free DOA estimation from a single noisy snapshot for sensor arrays of arbitrary geometry, by extending a method of gridless super-resolution beamforming to arbitrary arrays with noisy measurements. The primal atomic norm minimization problem is converted to a dual problem in which the periodic dual function is represented with a trigonometric polynomial using truncated Fourier series. The number of terms required for accurate representation depends linearly on the distance of the farthest sensor from a reference. The dual problem is then expressed as a semidefinite program and solved in polynomial time. DOA estimates are obtained via polynomial rooting followed by a LASSO based approach to remove extraneous roots arising in root finding from noisy data, and then source amplitudes are recovered by least squares. Simulations using circular and random planar arrays show high resolution DOA estimation in white and colored noise scenarios.
△ Less
Submitted 22 March, 2019;
originally announced March 2019.
-
GAN-based Projector for Faster Recovery with Convergence Guarantees in Linear Inverse Problems
Authors:
Ankit Raj,
Yuqi Li,
Yoram Bresler
Abstract:
A Generative Adversarial Network (GAN) with generator $G$ trained to model the prior of images has been shown to perform better than sparsity-based regularizers in ill-posed inverse problems. Here, we propose a new method of deploying a GAN-based prior to solve linear inverse problems using projected gradient descent (PGD). Our method learns a network-based projector for use in the PGD algorithm,…
▽ More
A Generative Adversarial Network (GAN) with generator $G$ trained to model the prior of images has been shown to perform better than sparsity-based regularizers in ill-posed inverse problems. Here, we propose a new method of deploying a GAN-based prior to solve linear inverse problems using projected gradient descent (PGD). Our method learns a network-based projector for use in the PGD algorithm, eliminating expensive computation of the Jacobian of $G$. Experiments show that our approach provides a speed-up of $60\text{-}80\times$ over earlier GAN-based recovery methods along with better accuracy. Our main theoretical result is that if the measurement matrix is moderately conditioned on the manifold range($G$) and the projector is $δ$-approximate, then the algorithm is guaranteed to reach $O(δ)$ reconstruction error in $O(log(1/δ))$ steps in the low noise regime. Additionally, we propose a fast method to design such measurement matrices for a given $G$. Extensive experiments demonstrate the efficacy of this method by requiring $5\text{-}10\times$ fewer measurements than random Gaussian measurement matrices for comparable recovery performance. Because the learning of the GAN and projector is decoupled from the measurement operator, our GAN-based projector and recovery algorithm are applicable without retraining to all linear inverse problems, as confirmed by experiments on compressed sensing, super-resolution, and inpainting.
△ Less
Submitted 23 October, 2019; v1 submitted 25 February, 2019;
originally announced February 2019.
-
Single Snapshot Super-Resolution DOA Estimation for Arbitrary Array Geometries
Authors:
A. Govinda Raj,
J. H. McClellan
Abstract:
We address the problem of search-free direction of arrival (DOA) estimation for sensor arrays of arbitrary geometry under the challenging conditions of a single snapshot and coherent sources. We extend a method of searchfree super-resolution beamforming, originally applicable only for uniform linear arrays, to arrays of arbitrary geometry. The infinite dimensional primal atomic norm minimization p…
▽ More
We address the problem of search-free direction of arrival (DOA) estimation for sensor arrays of arbitrary geometry under the challenging conditions of a single snapshot and coherent sources. We extend a method of searchfree super-resolution beamforming, originally applicable only for uniform linear arrays, to arrays of arbitrary geometry. The infinite dimensional primal atomic norm minimization problem in continuous angle domain is converted to a dual problem. By exploiting periodicity, the dual function is then represented with a trigonometric polynomial using a truncated Fourier series. A linear rule of thumb is derived for selecting the minimum number of Fourier coefficients required for accurate polynomial representation, based on the distance of the farthest sensor from a reference point. The dual problem is then expressed as a semidefinite program and solved efficiently. Finally, the searchfree DOA estimates are obtained through polynomial rooting, and source amplitudes are recovered through least squares. Simulations using circular and random planar arrays show perfect DOA estimation in noise-free cases.
△ Less
Submitted 15 November, 2018; v1 submitted 28 September, 2018;
originally announced October 2018.
-
Microseismic events enhancement and detection in sensor arrays using autocorrelation based filtering
Authors:
Entao Liu,
Lijun Zhu,
Anupama Govinda Raj,
James H. McClellan,
Abdullatif Al-Shuhail,
SanLinn I. Kaka,
Naveed Iqbal
Abstract:
Passive microseismic data are commonly buried in noise, which presents a significant challenge for signal detection and recovery. For recordings from a surface sensor array where each trace contains a time-delayed arrival from the event, we propose an autocorrelation-based stacking method that designs a denoising filter from all the traces, as well as a multi-channel detection scheme. This approac…
▽ More
Passive microseismic data are commonly buried in noise, which presents a significant challenge for signal detection and recovery. For recordings from a surface sensor array where each trace contains a time-delayed arrival from the event, we propose an autocorrelation-based stacking method that designs a denoising filter from all the traces, as well as a multi-channel detection scheme. This approach circumvents the issue of time aligning the traces prior to stacking because every trace's autocorrelation is centered at zero in the lag domain. The effect of white noise is concentrated near zero lag, so the filter design requires a predictable adjustment of the zero-lag value. Truncation of the autocorrelation is employed to smooth the impulse response of the denoising filter. In order to extend the applicability of the algorithm, we also propose a noise prewhitening scheme that addresses cases with colored noise. The simplicity and robustness of this method are validated with synthetic and real seismic traces.
△ Less
Submitted 6 December, 2016;
originally announced December 2016.