Search | arXiv e-print repository

ROAR: Reinforcing Original to Augmented Data Ratio Dynamics for Wav2Vec2.0 Based ASR

Authors: Vishwanath Pratap Singh, Federico Malato, Ville Hautamaki, Md. Sahidullah, Tomi Kinnunen

Abstract: While automatic speech recognition (ASR) greatly benefits from data augmentation, the augmentation recipes themselves tend to be heuristic. In this paper, we address one of the heuristic approach associated with balancing the right amount of augmented data in ASR training by introducing a reinforcement learning (RL) based dynamic adjustment of original-to-augmented data ratio (OAR). Unlike the fix… ▽ More While automatic speech recognition (ASR) greatly benefits from data augmentation, the augmentation recipes themselves tend to be heuristic. In this paper, we address one of the heuristic approach associated with balancing the right amount of augmented data in ASR training by introducing a reinforcement learning (RL) based dynamic adjustment of original-to-augmented data ratio (OAR). Unlike the fixed OAR approach in conventional data augmentation, our proposed method employs a deep Q-network (DQN) as the RL mechanism to learn the optimal dynamics of OAR throughout the wav2vec2.0 based ASR training. We conduct experiments using the LibriSpeech dataset with varying amounts of training data, specifically, the 10Min, 1H, 10H, and 100H splits to evaluate the efficacy of the proposed method under different data conditions. Our proposed method, on average, achieves a relative improvement of 4.96% over the open-source wav2vec2.0 base model on standard LibriSpeech test sets. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Accepted: Interspeech 2024

Journal ref: Interspeech 2024

arXiv:2405.18297 [pdf, other]

Artificial Intelligence Satellite Telecommunication Testbed using Commercial Off-The-Shelf Chipsets

Authors: Luis M. Garces, Amirhossein Nik, Flor Ortiz, Juan A. Vásquez-Peralvo, Jorge L. Gonzalez, Mouhamad Chehailty, Marcele Kuhfuss, Eva Lagunas, Jan Thoemel, Sumit Kumar, Vishal Singh, Juan C. Duncan, Sahar Malmir, Swetha Varadajulu, Jorge Querol, Symeon Chatzinotas

Abstract: The Artificial Intelligence Satellite Telecommunications Testbed (AISTT), part of the ESA project SPAICE, is focused on the transformation of the satellite payload by using artificial intelligence (AI) and machine learning (ML) methodologies over available commercial off-the-shelf (COTS) AI chips for on-board processing. The objectives include validating artificial intelligence-driven SATCOM scena… ▽ More The Artificial Intelligence Satellite Telecommunications Testbed (AISTT), part of the ESA project SPAICE, is focused on the transformation of the satellite payload by using artificial intelligence (AI) and machine learning (ML) methodologies over available commercial off-the-shelf (COTS) AI chips for on-board processing. The objectives include validating artificial intelligence-driven SATCOM scenarios such as interference detection, spectrum sharing, radio resource management, decoding, and beamforming. The study highlights hardware selection and payload architecture. Preliminary results show that ML models significantly improve signal quality, spectral efficiency, and throughput compared to conventional payload. Moreover, the testbed aims to evaluate the performance and application of AI-capable COTS chips in onboard SATCOM contexts. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: Submitted to SPAICE Conference 2024: AI in and for Space, 5 pages, 3 figures

Journal ref: SPAICE Conference 2024

arXiv:2402.15214 [pdf, other]

ChildAugment: Data Augmentation Methods for Zero-Resource Children's Speaker Verification

Authors: Vishwanath Pratap Singh, Md Sahidullah, Tomi Kinnunen

Abstract: The accuracy of modern automatic speaker verification (ASV) systems, when trained exclusively on adult data, drops substantially when applied to children's speech. The scarcity of children's speech corpora hinders fine-tuning ASV systems for children's speech. Hence, there is a timely need to explore more effective ways of reusing adults' speech data. One promising approach is to align vocal-tract… ▽ More The accuracy of modern automatic speaker verification (ASV) systems, when trained exclusively on adult data, drops substantially when applied to children's speech. The scarcity of children's speech corpora hinders fine-tuning ASV systems for children's speech. Hence, there is a timely need to explore more effective ways of reusing adults' speech data. One promising approach is to align vocal-tract parameters between adults and children through children-specific data augmentation, referred here to as ChildAugment. Specifically, we modify the formant frequencies and formant bandwidths of adult speech to emulate children's speech. The modified spectra are used to train ECAPA-TDNN (emphasized channel attention, propagation, and aggregation in time-delay neural network) recognizer for children. We compare ChildAugment against various state-of-the-art data augmentation techniques for children's ASV. We also extensively compare different scoring methods, including cosine scoring, PLDA (probabilistic linear discriminant analysis), and NPLDA (neural PLDA). We also propose a low-complexity weighted cosine score for extremely low-resource children ASV. Our findings on the CSLU kids corpus indicate that ChildAugment holds promise as a simple, acoustics-motivated approach, for improving state-of-the-art deep learning based ASV for children. We achieve up to 12.45% (boys) and 11.96% (girls) relative improvement over the baseline. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: The following article has been accepted by The Journal of the Acoustical Society of America (JASA). After it is published, it will be found at https://pubs.aip.org/asa/jasa

arXiv:2402.06463 [pdf, other]

Cardiac ultrasound simulation for autonomous ultrasound navigation

Authors: Abdoul Aziz Amadou, Laura Peralta, Paul Dryburgh, Paul Klein, Kaloian Petkov, Richard James Housden, Vivek Singh, Rui Liao, Young-Ho Kim, Florin Christian Ghesu, Tommaso Mansi, Ronak Rajani, Alistair Young, Kawal Rhode

Abstract: Ultrasound is well-established as an imaging modality for diagnostic and interventional purposes. However, the image quality varies with operator skills as acquiring and interpreting ultrasound images requires extensive training due to the imaging artefacts, the range of acquisition parameters and the variability of patient anatomies. Automating the image acquisition task could improve acquisition… ▽ More Ultrasound is well-established as an imaging modality for diagnostic and interventional purposes. However, the image quality varies with operator skills as acquiring and interpreting ultrasound images requires extensive training due to the imaging artefacts, the range of acquisition parameters and the variability of patient anatomies. Automating the image acquisition task could improve acquisition reproducibility and quality but training such an algorithm requires large amounts of navigation data, not saved in routine examinations. Thus, we propose a method to generate large amounts of ultrasound images from other modalities and from arbitrary positions, such that this pipeline can later be used by learning algorithms for navigation. We present a novel simulation pipeline which uses segmentations from other modalities, an optimized volumetric data representation and GPU-accelerated Monte Carlo path tracing to generate view-dependent and patient-specific ultrasound images. We extensively validate the correctness of our pipeline with a phantom experiment, where structures' sizes, contrast and speckle noise properties are assessed. Furthermore, we demonstrate its usability to train neural networks for navigation in an echocardiography view classification experiment by generating synthetic images from more than 1000 patients. Networks pre-trained with our simulations achieve significantly superior performance in settings where large real datasets are not available, especially for under-represented classes. The proposed approach allows for fast and accurate patient-specific ultrasound image generation, and its usability for training networks for navigation-related tasks is demonstrated. △ Less

Submitted 9 February, 2024; originally announced February 2024.

Comments: 24 pages, 10 figures, 5 tables

ACM Class: I.6.0; I.5.4; J.3

arXiv:2312.00479 [pdf]

doi 10.1109/INDISCON58499.2023.10270235

EEG-Based Reaction Time Prediction with Fuzzy Common Spatial Patterns and Phase Cohesion using Deep Autoencoder Based Data Fusion

Authors: Vivek Singh, Tharun Kumar Reddy

Abstract: Drowsiness state of a driver is a topic of extensive discussion due to its significant role in causing traffic accidents. This research presents a novel approach that combines Fuzzy Common Spatial Patterns (CSP) optimised Phase Cohesive Sequence (PCS) representations and fuzzy CSP-optimized signal amplitude representations. The research aims to examine alterations in Electroencephalogram (EEG) syn… ▽ More Drowsiness state of a driver is a topic of extensive discussion due to its significant role in causing traffic accidents. This research presents a novel approach that combines Fuzzy Common Spatial Patterns (CSP) optimised Phase Cohesive Sequence (PCS) representations and fuzzy CSP-optimized signal amplitude representations. The research aims to examine alterations in Electroencephalogram (EEG) synchronisation between a state of alertness and drowsiness, forecast drivers' reaction times by analysing EEG data, and subsequently identify the presence of drowsiness. The study's findings indicate that this approach successfully distinguishes between alert and drowsy mental states. By employing a Deep Autoencoder-based data fusion technique and a regression model such as Support Vector Regression (SVR) or Least Absolute Shrinkage and Selection Operator (LASSO), the proposed method outperforms using individual feature sets in combination with a regressor model. This superiority is measured by evaluating the Root Mean Squared Error (RMSE), Mean Absolute Percentage Error (MAPE), and Correlation Coefficient (CC). In other words, the fusion of autoencoder-based amplitude EEG power features and PCS features, when used in regression, outperforms using either of these features alone in a regressor model. Specifically, the proposed data fusion method achieves a 14.36% reduction in RMSE, a 25.12% reduction in MAPE, and a 10.12% increase in CC compared to the baseline model using only individual amplitude EEG power features and regression. △ Less

Submitted 1 December, 2023; originally announced December 2023.

arXiv:2311.08689 [pdf, other]

Low Complexity High Speed Deep Neural Network Augmented Wireless Channel Estimation

Authors: Syed Asrar ul haq, Varun Singh, Bhanu Teja Tanaji, Sumit Darak

Abstract: The channel estimation (CE) in wireless receivers is one of the most critical and computationally complex signal processing operations. Recently, various works have shown that the deep learning (DL) based CE outperforms conventional minimum mean square error (MMSE) based CE, and it is hardware-friendly. However, DL-based CE has higher complexity and latency than popularly used least square (LS) ba… ▽ More The channel estimation (CE) in wireless receivers is one of the most critical and computationally complex signal processing operations. Recently, various works have shown that the deep learning (DL) based CE outperforms conventional minimum mean square error (MMSE) based CE, and it is hardware-friendly. However, DL-based CE has higher complexity and latency than popularly used least square (LS) based CE. In this work, we propose a novel low complexity high-speed Deep Neural Network-Augmented Least Square (LC-LSDNN) algorithm for IEEE 802.11p wireless physical layer and efficiently implement it on Zynq system on chip (ZSoC). The novelty of the LC-LSDNN is to use different DNNs for real and imaginary values of received complex symbols. This helps reduce the size of DL by 59% and optimize the critical path, allowing it to operate at 60% higher clock frequency. We also explore three different architectures for MMSE-based CE. We show that LC-LSDNN significantly outperforms MMSE and state-of-the-art DL-based CE for a wide range of signal-to-noise ratios (SNR) and different wireless channels. Also, it is computationally efficient, with around 50% lower resources than existing DL-based CE. △ Less

Submitted 14 November, 2023; originally announced November 2023.

arXiv:2309.15750 [pdf, other]

Automated CT Lung Cancer Screening Workflow using 3D Camera

Authors: Brian Teixeira, Vivek Singh, Birgi Tamersoy, Andreas Prokein, Ankur Kapoor

Abstract: Despite recent developments in CT planning that enabled automation in patient positioning, time-consuming scout scans are still needed to compute dose profile and ensure the patient is properly positioned. In this paper, we present a novel method which eliminates the need for scout scans in CT lung cancer screening by estimating patient scan range, isocenter, and Water Equivalent Diameter (WED) fr… ▽ More Despite recent developments in CT planning that enabled automation in patient positioning, time-consuming scout scans are still needed to compute dose profile and ensure the patient is properly positioned. In this paper, we present a novel method which eliminates the need for scout scans in CT lung cancer screening by estimating patient scan range, isocenter, and Water Equivalent Diameter (WED) from 3D camera images. We achieve this task by training an implicit generative model on over 60,000 CT scans and introduce a novel approach for updating the prediction using real-time scan data. We demonstrate the effectiveness of our method on a testing set of 110 pairs of depth data and CT scan, resulting in an average error of 5mm in estimating the isocenter, 13mm in determining the scan range, 10mm and 16mm in estimating the AP and lateral WED respectively. The relative WED error of our method is 4%, which is well within the International Electrotechnical Commission (IEC) acceptance criteria of 10%. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: Accepted at MICCAI 2023

arXiv:2309.14328 [pdf, other]

doi 10.2312/envirvis.20231100

pyParaOcean: A System for Visual Analysis of Ocean Data

Authors: Toshit Jain, Varun Singh, Vijay Kumar Boda, Upkar Singh, Ingrid Hotz, P. N. Vinayachandran, Vijay Natarajan

Abstract: Visual analysis is well adopted within the field of oceanography for the analysis of model simulations, detection of different phenomena and events, and tracking of dynamic processes. With increasing data sizes and the availability of multivariate dynamic data, there is a growing need for scalable and extensible tools for visualization and interactive exploration. We describe pyParaOcean, a visual… ▽ More Visual analysis is well adopted within the field of oceanography for the analysis of model simulations, detection of different phenomena and events, and tracking of dynamic processes. With increasing data sizes and the availability of multivariate dynamic data, there is a growing need for scalable and extensible tools for visualization and interactive exploration. We describe pyParaOcean, a visualization system that supports several tasks routinely used in the visual analysis of ocean data. The system is available as a plugin to Paraview and is hence able to leverage its distributed computing capabilities and its rich set of generic analysis and visualization functionalities. pyParaOcean provides modules to support different visual analysis tasks specific to ocean data, such as eddy identification and salinity movement tracking. These modules are available as Paraview filters and this seamless integration results in a system that is easy to install and use. A case study on the Bay of Bengal illustrates the utility of the system for the study of ocean phenomena and processes. △ Less

Submitted 25 September, 2023; originally announced September 2023.

Comments: 8 pages, EnvirVis2023

ACM Class: F.7; I.3.6

Journal ref: envirvis2023

arXiv:2308.09106 [pdf]

Optimal Closed Loop Control of G2V/V2G Action Using Model Predictive Controller

Authors: Satya Vikram Pratap Singh, Siddharth Kamila, Prashanth Agnihotri

Abstract: This paper has developed a closed-loop control algorithm to operate the G2V/V2G action, tested under varying battery voltage conditions and load and source power differences. Under V2G action, to maintain total harmonic distortion under minimum level and grid frequency under the standard limit, a Model predictive controller (MPC) has been used to control the gate driver circuit of the inverter. Th… ▽ More This paper has developed a closed-loop control algorithm to operate the G2V/V2G action, tested under varying battery voltage conditions and load and source power differences. Under V2G action, to maintain total harmonic distortion under minimum level and grid frequency under the standard limit, a Model predictive controller (MPC) has been used to control the gate driver circuit of the inverter. The state space model of the plant has been created using the system identification toolbox, and the MPC Controller block has been designed using the Model Predictive Control Toolbox of MATLAB. The proposed methodology is tested using MATLAB/Simulink and OPAL-RT (OP4510) in a real-time environment. This methodology reduces %THD to less than 0.5%, improves waveform quality of grid voltage, inverter output voltage, grid current, and inverter output current to nearly 99%, and maintains the grid frequency in standard limit while in G2V/V2G action. △ Less

Submitted 11 October, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

Comments: \c{opyright}2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2308.09046 [pdf]

Fault Detection and Classification using Wavelet and ANN in DFIG and TCSC Connected Transmission Line

Authors: Satya Vikram Pratap Singh, Tanu Prasad, Siddharth Kamila, Prashant Agnihotri

Abstract: This paper presents fault detection and classification using Wavelet and ANN based methods in a DFIG-based series compensated system. The state-of-the art methods include Wavelet transform, Fourier transform, and Wavelet-neuro fuzzy methods-based system for fault detection and classification. However, the accuracy of these state-of-the-art methods diminishes during variable conditions such as chan… ▽ More This paper presents fault detection and classification using Wavelet and ANN based methods in a DFIG-based series compensated system. The state-of-the art methods include Wavelet transform, Fourier transform, and Wavelet-neuro fuzzy methods-based system for fault detection and classification. However, the accuracy of these state-of-the-art methods diminishes during variable conditions such as changes in wind speed, high impedance faults, and the changes in the series compensation level. Specifically, in Wavelet transform based methods, the threshold values need to be adapted based on the variable field conditions. To solve this problem, this paper has proposed a Wavelet-ANN based fault detection method where Wavelet is used as an identifier and ANN is used as a classifier for detecting various fault cases. This methodology is also effective under SSR condition. The proposed methodology is evaluated on various fault and non-fault cases generated on an IEEE first benchmark model under varying compensation levels from 20% to 55%, impedance faults, and wind velocity from 6m/sec to 10m/sec using MATLAB/Simulink, OPALRT(OP4510) manufactured real-time digital simulator environment, Arduino board I/O ports communicating with external PC in which ANN model dumped, using Arduino support package of MATLAB. The preliminary results are compared with the state-of-the-art fault detection method, where the proposed method shows robust performance under varying field conditions. △ Less

Submitted 17 August, 2023; originally announced August 2023.

arXiv:2306.07501 [pdf, other]

Speaker Verification Across Ages: Investigating Deep Speaker Embedding Sensitivity to Age Mismatch in Enrollment and Test Speech

Authors: Vishwanath Pratap Singh, Md Sahidullah, Tomi Kinnunen

Abstract: In this paper, we study the impact of the ageing on modern deep speaker embedding based automatic speaker verification (ASV) systems. We have selected two different datasets to examine ageing on the state-of-the-art ECAPA-TDNN system. The first dataset, used for addressing short-term ageing (up to 10 years time difference between enrollment and test) under uncontrolled conditions, is VoxCeleb. The… ▽ More In this paper, we study the impact of the ageing on modern deep speaker embedding based automatic speaker verification (ASV) systems. We have selected two different datasets to examine ageing on the state-of-the-art ECAPA-TDNN system. The first dataset, used for addressing short-term ageing (up to 10 years time difference between enrollment and test) under uncontrolled conditions, is VoxCeleb. The second dataset, used for addressing long-term ageing effect (up to 40 years difference) of Finnish speakers under a more controlled setup, is Longitudinal Corpus of Finnish Spoken in Helsinki (LCFSH). Our study provides new insights into the impact of speaker ageing on modern ASV systems. Specifically, we establish a quantitative measure between ageing and ASV scores. Further, our research indicates that ageing affects female English speakers to a greater degree than male English speakers, while in the case of Finnish, it has a greater impact on male speakers than female speakers. △ Less

Submitted 12 June, 2023; originally announced June 2023.

Journal ref: Interspeech 2023

arXiv:2305.03546 [pdf, other]

Breast Cancer Immunohistochemical Image Generation: a Benchmark Dataset and Challenge Review

Authors: Chuang Zhu, Shengjie Liu, Zekuan Yu, Feng Xu, Arpit Aggarwal, Germán Corredor, Anant Madabhushi, Qixun Qu, Hongwei Fan, Fangda Li, Yueheng Li, Xianchao Guan, Yongbing Zhang, Vivek Kumar Singh, Farhan Akram, Md. Mostafa Kamal Sarker, Zhongyue Shi, Mulan **

Abstract: For invasive breast cancer, immunohistochemical (IHC) techniques are often used to detect the expression level of human epidermal growth factor receptor-2 (HER2) in breast tissue to formulate a precise treatment plan. From the perspective of saving manpower, material and time costs, directly generating IHC-stained images from Hematoxylin and Eosin (H&E) stained images is a valuable research direct… ▽ More For invasive breast cancer, immunohistochemical (IHC) techniques are often used to detect the expression level of human epidermal growth factor receptor-2 (HER2) in breast tissue to formulate a precise treatment plan. From the perspective of saving manpower, material and time costs, directly generating IHC-stained images from Hematoxylin and Eosin (H&E) stained images is a valuable research direction. Therefore, we held the breast cancer immunohistochemical image generation challenge, aiming to explore novel ideas of deep learning technology in pathological image generation and promote research in this field. The challenge provided registered H&E and IHC-stained image pairs, and participants were required to use these images to train a model that can directly generate IHC-stained images from corresponding H&E-stained images. We selected and reviewed the five highest-ranking methods based on their PSNR and SSIM metrics, while also providing overviews of the corresponding pipelines and implementations. In this paper, we further analyze the current limitations in the field of breast cancer immunohistochemical image generation and forecast the future development of this field. We hope that the released dataset and the challenge will inspire more scholars to jointly study higher-quality IHC-stained image generation. △ Less

Submitted 22 September, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

Comments: 12 pages, 12 figures, 2tables

arXiv:2303.15852 [pdf]

doi 10.1007/978-3-031-28183-9_18

Exploring Deep Learning Methods for Classification of SAR Images: Towards NextGen Convolutions via Transformers

Authors: Aakash Singh, Vivek Kumar Singh

Abstract: Images generated by high-resolution SAR have vast areas of application as they can work better in adverse light and weather conditions. One such area of application is in the military systems. This study is an attempt to explore the suitability of current state-of-the-art models introduced in the domain of computer vision for SAR target classification (MSTAR). Since the application of any solution… ▽ More Images generated by high-resolution SAR have vast areas of application as they can work better in adverse light and weather conditions. One such area of application is in the military systems. This study is an attempt to explore the suitability of current state-of-the-art models introduced in the domain of computer vision for SAR target classification (MSTAR). Since the application of any solution produced for military systems would be strategic and real-time, accuracy is often not the only criterion to measure its performance. Other important parameters like prediction time and input resiliency are equally important. The paper deals with these issues in the context of SAR images. Experimental results show that deep learning models can be suitably applied in the domain of SAR image classification with the desired performance levels. △ Less

Submitted 28 March, 2023; originally announced March 2023.

Comments: 6 pages, 9 figures

Journal ref: In Advanced Network Technologies and Intelligent Computing Second International Conference, ANTIC 2022, Varanasi, India, December 22 24, 2022, Proceedings, Part II pp. 249 260. Cham Springer Nature Switzerland

arXiv:2303.04584 [pdf, other]

Estimating a scalar log-concave random variable, using a silence set based probabilistic sampling

Authors: Maben Rabi, Junfeng Wu, Vyoma Singh, Karl Henrik Johansson

Abstract: We study the probabilistic sampling of a random variable, in which the variable is sampled only if it falls outside a given set, which is called the silence set. This helps us to understand optimal event-based sampling for the special case of IID random processes, and also to understand the design of a sub-optimal scheme for other cases. We consider the design of this probabilistic sampling for a… ▽ More We study the probabilistic sampling of a random variable, in which the variable is sampled only if it falls outside a given set, which is called the silence set. This helps us to understand optimal event-based sampling for the special case of IID random processes, and also to understand the design of a sub-optimal scheme for other cases. We consider the design of this probabilistic sampling for a scalar, log-concave random variable, to minimize either the mean square estimation error, or the mean absolute estimation error. We show that the optimal silence interval: (i) is essentially unique, and (ii) is the limit of an iterative procedure of centering. Further we show through numerical experiments that super-level intervals seem to be remarkably near-optimal for mean square estimation. Finally we use the Gauss inequality for scalar unimodal densities, to show that probabilistic sampling gives a mean square distortion that is less than a third of the distortion incurred by periodic sampling, if the average sampling rate is between 0.3 and 0.9 samples per tick. △ Less

Submitted 16 March, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

Comments: Accepted for publication in the 2023 American Control Conference

arXiv:2303.00888 [pdf]

doi 10.1115/IMECE2022-96616

Modeling and Analysis of Multiple Electrostatic Actuators on the Response of Vibrotactile Haptic Device

Authors: Santosh Mohan Rajkumar, Kumar Vikram Singh, Jeong-Hoi Koo

Abstract: In this research, modeling and analysis of a beam-type touchscreen interface with multiple actuators is considered. As thin beams, a mechanical model of a touch screen system is developed with embedded electrostatic actuators at different spatial locations. This discrete finite element-based model is developed to compute the analytical and numerical vibrotactile response due to multiple actuators… ▽ More In this research, modeling and analysis of a beam-type touchscreen interface with multiple actuators is considered. As thin beams, a mechanical model of a touch screen system is developed with embedded electrostatic actuators at different spatial locations. This discrete finite element-based model is developed to compute the analytical and numerical vibrotactile response due to multiple actuators excited with varying frequency and amplitude. The model is tested with spring-damper boundary conditions incorporating sinusoidal excitations in the human haptic range. An analytical solution is proposed to obtain the vibrotactile response of the touch surface for different frequencies of excitations, the number of actuators, actuator stiffness, and actuator positions. The effect of the mechanical properties of the touch surface on vibrotactile feedback provided to the user feedback is explored. Investigation of optimal location and number of actuators for a desired localized response, such as the magnitude of acceleration and variation in acceleration response for a desired zone on the interface, is carried out. It has been shown that a wide variety of localizable vibrotactile feedback can be generated on the touch surface using different frequencies of excitations, different actuator stiffness, number of actuators, and actuator positions. Having a mechanical model will facilitate simulation studies capable of incorporating more testing scenarios that may not be feasible to physically test. △ Less

Submitted 14 February, 2023; originally announced March 2023.

Journal ref: ASME International Mechanical Engineering Congress and Exposition 2022

arXiv:2209.11675 [pdf]

An analysis of the Internet of Things in wireless sensor network technologies

Authors: Harshit Poddar, Vansh Singh

Abstract: Information may be accessed from a distance thanks to computer networks. Wireless or wired networks are also possible. Due to recent developments in wireless infrastructure, wireless sensor networks (WSNs) were developed. Activities or events occurring in the environment are monitored, recorded, and managed by WSN. Through a variety of routing techniques, data relaying is done in these systems. Th… ▽ More Information may be accessed from a distance thanks to computer networks. Wireless or wired networks are also possible. Due to recent developments in wireless infrastructure, wireless sensor networks (WSNs) were developed. Activities or events occurring in the environment are monitored, recorded, and managed by WSN. Through a variety of routing techniques, data relaying is done in these systems. The fourth industrial revolution, or Industry 4.0, is defined as the integration of complex physical automation systems made up of machinery and devices connected by sensors and managed by software. This is done to boost the efficiency and reliability of operations. Industry 4.0 is viewed as a possibility because of industrial IoT, the concept of leveraging IoT technology in manufacturing. delivering, in an industrial setting, a means of connecting engines, power grids, and sensors to the cloud. In this essay, we'll try to comprehend how the Internet of Things (IoT) works in wireless sensor networks and how it might be used in various situations. △ Less

Submitted 23 September, 2022; originally announced September 2022.

Comments: 8 pages, 13 figures, 3 tables, preprint

arXiv:2209.11230 [pdf]

A Trio-Method for Retinal Vessel Segmentation using Image Processing

Authors: Mahendra Kumar Gourisaria, Vinayak Singh, Manoj Sahni

Abstract: Inner Retinal neurons are a most essential part of the retina and they are supplied with blood via retinal vessels. This paper primarily focuses on the segmentation of retinal vessels using a triple preprocessing approach. DRIVE database was taken into consideration and preprocessed by Gabor Filtering, Gaussian Blur, and Edge Detection by Sobel and Pruning. Segmentation was driven out by 2 propose… ▽ More Inner Retinal neurons are a most essential part of the retina and they are supplied with blood via retinal vessels. This paper primarily focuses on the segmentation of retinal vessels using a triple preprocessing approach. DRIVE database was taken into consideration and preprocessed by Gabor Filtering, Gaussian Blur, and Edge Detection by Sobel and Pruning. Segmentation was driven out by 2 proposed U-Net architectures. Both the architectures were compared in terms of all the standard performance metrics. Preprocessing generated varied interesting results which impacted the results shown by the UNet architectures for segmentation. This real-time deployment can help in the efficient pre-processing of images with better segmentation and detection. △ Less

Submitted 19 September, 2022; originally announced September 2022.

Comments: Accepted at 26th UK Conference on Medical Image Understanding and Analysis (MIUA-2022) (Abstract short paper)

arXiv:2207.10284 [pdf, other]

Multi Resolution Analysis (MRA) for Approximate Self-Attention

Authors: Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, Vikas Singh

Abstract: Transformers have emerged as a preferred model for many tasks in natural langugage processing and vision. Recent efforts on training and deploying Transformers more efficiently have identified many strategies to approximate the self-attention matrix, a key module in a Transformer architecture. Effective ideas include various prespecified sparsity patterns, low-rank basis expansions and combination… ▽ More Transformers have emerged as a preferred model for many tasks in natural langugage processing and vision. Recent efforts on training and deploying Transformers more efficiently have identified many strategies to approximate the self-attention matrix, a key module in a Transformer architecture. Effective ideas include various prespecified sparsity patterns, low-rank basis expansions and combinations thereof. In this paper, we revisit classical Multiresolution Analysis (MRA) concepts such as Wavelets, whose potential value in this setting remains underexplored thus far. We show that simple approximations based on empirical feedback and design choices informed by modern hardware and implementation challenges, eventually yield a MRA-based approach for self-attention with an excellent performance profile across most criteria of interest. We undertake an extensive set of experiments and demonstrate that this multi-resolution scheme outperforms most efficient self-attention proposals and is favorable for both short and long sequences. Code is available at \url{https://github.com/mlpen/mra-attention}. △ Less

Submitted 20 July, 2022; originally announced July 2022.

Comments: ICML2022

arXiv:2206.11520 [pdf, other]

ICOS Protein Expression Segmentation: Can Transformer Networks Give Better Results?

Authors: Vivek Kumar Singh, Paul O Reilly, Jacqueline James, Manuel Salto Tellez, Perry Maxwell

Abstract: Biomarkers identify a patients response to treatment. With the recent advances in artificial intelligence based on the Transformer networks, there is only limited research has been done to measure the performance on challenging histopathology images. In this paper, we investigate the efficacy of the numerous state-of-the-art Transformer networks for immune-checkpoint biomarker, Inducible Tcell COS… ▽ More Biomarkers identify a patients response to treatment. With the recent advances in artificial intelligence based on the Transformer networks, there is only limited research has been done to measure the performance on challenging histopathology images. In this paper, we investigate the efficacy of the numerous state-of-the-art Transformer networks for immune-checkpoint biomarker, Inducible Tcell COStimulator (ICOS) protein cell segmentation in colon cancer from immunohistochemistry (IHC) slides. Extensive and comprehensive experimental results confirm that MiSSFormer achieved the highest Dice score of 74.85% than the rest evaluated Transformer and Efficient U-Net methods. △ Less

Submitted 23 June, 2022; originally announced June 2022.

Comments: Accepted MIUA conference (Abstract short paper)

arXiv:2205.01640 [pdf]

doi 10.1080/19427867.2022.2050493

Adaptive Traffic Signal Control for Develo** Countries Using Fused Parameters Derived from Crowd-Source Data

Authors: Sumit Mishra, Vishal Singh, Ankit Gupta, Devanjan Bhattacharya, Abhisek Mudgal

Abstract: Advancement of mobile technologies has enabled economical collection, storage, processing, and sharing of traffic data. These data are made accessible to intended users through various application program interfaces (API) and can be used to recognize and mitigate congestion in real time. In this paper, quantitative (time of arrival) and qualitative (color-coded congestion levels) data were acquire… ▽ More Advancement of mobile technologies has enabled economical collection, storage, processing, and sharing of traffic data. These data are made accessible to intended users through various application program interfaces (API) and can be used to recognize and mitigate congestion in real time. In this paper, quantitative (time of arrival) and qualitative (color-coded congestion levels) data were acquired from the Google traffic APIs. New parameters that reflect heterogeneous traffic conditions were defined and utilized for real-time control of traffic signals while maintaining the green-to-red time ratio. The proposed method utilizes a congestion-avoiding principle commonly used in computer networking. Adaptive congestion levels were observed on three different intersections of Delhi (India), in peak hours. It showed good variation, hence sensitive for the control algorithm to act efficiently. Also, simulation study establishes that proposed control algorithm decreases waiting time and congestion. The proposed method provides an inexpensive alternative for traffic sensing and tracking technologies. △ Less

Submitted 11 March, 2022; originally announced May 2022.

Comments: 15 pages, 11 figures, 7 tables, Accepted by Transportation Letters: the International Journal of Transportation Research

arXiv:2203.06600 [pdf, other]

Spectral Modification Based Data Augmentation For Improving End-to-End ASR For Children's Speech

Authors: Vishwanath Pratap Singh, Hardik Sailor, Supratik Bhattacharya, Abhishek Pandey

Abstract: Training a robust Automatic Speech Recognition (ASR) system for children's speech recognition is a challenging task due to inherent differences in acoustic attributes of adult and child speech and scarcity of publicly available children's speech dataset. In this paper, a novel segmental spectrum war** and perturbations in formant energy are introduced, to generate a children-like speech spectrum… ▽ More Training a robust Automatic Speech Recognition (ASR) system for children's speech recognition is a challenging task due to inherent differences in acoustic attributes of adult and child speech and scarcity of publicly available children's speech dataset. In this paper, a novel segmental spectrum war** and perturbations in formant energy are introduced, to generate a children-like speech spectrum from that of an adult's speech spectrum. Then, this modified adult spectrum is used as augmented data to improve end-to-end ASR systems for children's speech recognition. The proposed data augmentation methods give 6.5% and 6.1% relative reduction in WER on children dev and test sets respectively, compared to the vocal tract length perturbation (VTLP) baseline system trained on Librispeech 100 hours adult speech dataset. When children's speech data is added in training with Librispeech set, it gives a 3.7 % and 5.1% relative reduction in WER, compared to the VTLP baseline system. △ Less

Submitted 13 March, 2022; originally announced March 2022.

arXiv:2112.01025 [pdf, other]

A Mixture of Expert Based Deep Neural Network for Improved ASR

Authors: Vishwanath Pratap Singh, Shakti P. Rath, Abhishek Pandey

Abstract: This paper presents a novel deep learning architecture for acoustic model in the context of Automatic Speech Recognition (ASR), termed as MixNet. Besides the conventional layers, such as fully connected layers in DNN-HMM and memory cells in LSTM-HMM, the model uses two additional layers based on Mixture of Experts (MoE). The first MoE layer operating at the input is based on pre-defined broad phon… ▽ More This paper presents a novel deep learning architecture for acoustic model in the context of Automatic Speech Recognition (ASR), termed as MixNet. Besides the conventional layers, such as fully connected layers in DNN-HMM and memory cells in LSTM-HMM, the model uses two additional layers based on Mixture of Experts (MoE). The first MoE layer operating at the input is based on pre-defined broad phonetic classes and the second layer operating at the penultimate layer is based on automatically learned acoustic classes. In natural speech, overlap in distribution across different acoustic classes is inevitable, which leads to inter-class mis-classification. The ASR accuracy is expected to improve if the conventional architecture of acoustic model is modified to make them more suitable to account for such overlaps. MixNet is developed kee** this in mind. Analysis conducted by means of scatter diagram verifies that MoE indeed improves the separation between classes that translates to better ASR accuracy. Experiments are conducted on a large vocabulary ASR task which show that the proposed architecture provides 13.6% and 10.0% relative reduction in word error rates compared to the conventional models, namely, DNN and LSTM respectively, trained using sMBR criteria. In comparison to an existing method developed for phone-classification (by Eigen et al), our proposed method yields a significant improvement. △ Less

Submitted 2 December, 2021; originally announced December 2021.

arXiv:2112.01023 [pdf, other]

A higher order Minkowski loss for improved prediction ability of acoustic model in ASR

Authors: Vishwanath Pratap Singh, Shakti P. Rath, Abhishek Pandey

Abstract: Conventional automatic speech recognition (ASR) system uses second-order minkowski loss during inference time which is suboptimal as it incorporates only first order statistics in posterior estimation [2]. In this paper we have proposed higher order minkowski loss (4th Order and 6th Order) during inference time, without any changes during training time. The main contribution of the paper is to sho… ▽ More Conventional automatic speech recognition (ASR) system uses second-order minkowski loss during inference time which is suboptimal as it incorporates only first order statistics in posterior estimation [2]. In this paper we have proposed higher order minkowski loss (4th Order and 6th Order) during inference time, without any changes during training time. The main contribution of the paper is to show that higher order loss uses higher order statistics in posterior estimation, which improves the prediction ability of acoustic model in ASR system. We have shown mathematically that posterior probability obtained due to higher order loss is function of second order posterior and thus the method can be incorporated in standard ASR system in an easy manner. It is to be noted that all changes are proposed during test(inference) time, we do not make any change in any training pipeline. Multiple baseline systems namely, TDNN1, TDNN2, DNN and LSTM are developed to verify the improvement incurred due to higher order minkowski loss. All experiments are conducted on LibriSpeech dataset and performance metrics are word error rate (WER) on "dev-clean", "test-clean", "dev-other" and "test-other" datasets. △ Less

Submitted 2 December, 2021; originally announced December 2021.

arXiv:2106.07972 [pdf]

SRIB Submission to Interspeech 2021 DiCOVA Challenge

Authors: Vishwanath Pratap Singh, Shashi Kumar, Ravi Shekhar Jha, Abhishek Pandey

Abstract: The COVID-19 pandemic has resulted in more than 125 million infections and more than 2.7 million casualties. In this paper, we attempt to classify covid vs non-covid cough sounds using signal processing and deep learning methods. Air turbulence, the vibration of tissues, movement of fluid through airways, opening, and closure of glottis are some of the causes for the production of the acoustic sou… ▽ More The COVID-19 pandemic has resulted in more than 125 million infections and more than 2.7 million casualties. In this paper, we attempt to classify covid vs non-covid cough sounds using signal processing and deep learning methods. Air turbulence, the vibration of tissues, movement of fluid through airways, opening, and closure of glottis are some of the causes for the production of the acoustic sound signals during cough. Does the COVID-19 alter the acoustic characteristics of breath, cough, and speech sounds produced through the respiratory system? This is an open question waiting for answers. In this paper, we incorporated novel data augmentation methods for cough sound augmentation and multiple deep neural network architectures and methods along with handcrafted features. Our proposed system gives 14% absolute improvement in area under the curve (AUC). The proposed system is developed as part of Interspeech 2021 special sessions and challenges viz. diagnosing of COVID-19 using acoustics (DiCOVA). Our proposed method secured the 5th position on the leaderboard among 29 participants. △ Less

Submitted 15 June, 2021; originally announced June 2021.

Comments: 5 pages, 5 figures

arXiv:2102.10640 [pdf, other]

Tchebichef Transform Domain-based Deep Learning Architecture for Image Super-resolution

Authors: Ahlad Kumar, Harsh Vardhan Singh

Abstract: The recent outbreak of COVID-19 has motivated researchers to contribute in the area of medical imaging using artificial intelligence and deep learning. Super-resolution (SR), in the past few years, has produced remarkable results using deep learning methods. The ability of deep learning methods to learn the non-linear map** from low-resolution (LR) images to their corresponding high-resolution (… ▽ More The recent outbreak of COVID-19 has motivated researchers to contribute in the area of medical imaging using artificial intelligence and deep learning. Super-resolution (SR), in the past few years, has produced remarkable results using deep learning methods. The ability of deep learning methods to learn the non-linear map** from low-resolution (LR) images to their corresponding high-resolution (HR) images leads to compelling results for SR in diverse areas of research. In this paper, we propose a deep learning based image super-resolution architecture in Tchebichef transform domain. This is achieved by integrating a transform layer into the proposed architecture through a customized Tchebichef convolutional layer ($TCL$). The role of TCL is to convert the LR image from the spatial domain to the orthogonal transform domain using Tchebichef basis functions. The inversion of the aforementioned transformation is achieved using another layer known as the Inverse Tchebichef convolutional Layer (ITCL), which converts back the LR images from the transform domain to the spatial domain. It has been observed that using the Tchebichef transform domain for the task of SR takes the advantage of high and low-frequency representation of images that makes the task of super-resolution simplified. We, further, introduce transfer learning approach to enhance the quality of Covid based medical images. It is shown that our architecture enhances the quality of X-ray and CT images of COVID-19, providing a better image quality that helps in clinical diagnosis. Experimental results obtained using the proposed Tchebichef transform domain super-resolution (TTDSR) architecture provides competitive results when compared with most of the deep learning methods employed using a fewer number of trainable parameters. △ Less

Submitted 22 February, 2021; v1 submitted 21 February, 2021; originally announced February 2021.

Comments: 11 pages, 12 figures, 53 references

arXiv:2010.03199 [pdf, other]

WDN: A Wide and Deep Network to Divide-and-Conquer Image Super-resolution

Authors: Vikram Singh, Anurag Mittal

Abstract: Divide and conquer is an established algorithm design paradigm that has proven itself to solve a variety of problems efficiently. However, it is yet to be fully explored in solving problems with a neural network, particularly the problem of image super-resolution. In this work, we propose an approach to divide the problem of image super-resolution into multiple sub-problems and then solve/conquer… ▽ More Divide and conquer is an established algorithm design paradigm that has proven itself to solve a variety of problems efficiently. However, it is yet to be fully explored in solving problems with a neural network, particularly the problem of image super-resolution. In this work, we propose an approach to divide the problem of image super-resolution into multiple sub-problems and then solve/conquer them with the help of a neural network. Unlike a typical deep neural network, we design an alternate network architecture that is much wider (along with being deeper) than existing networks and is specially designed to implement the divide-and-conquer design paradigm with a neural network. Additionally, a technique to calibrate the intensities of feature map pixels is being introduced. Extensive experimentation on five datasets reveals that our approach towards the problem and the proposed architecture generate better and sharper results than current state-of-the-art methods. △ Less

Submitted 7 October, 2020; originally announced October 2020.

MSC Class: 68T07 (Primary) 68T45; 68U10 (Secondary) ACM Class: I.4.3

arXiv:2008.05060 [pdf, other]

doi 10.1109/CVPR.2017.533

Online Graph Completion: Multivariate Signal Recovery in Computer Vision

Authors: Won Hwa Kim, Mona Jalal, Seongjae Hwang, Sterling C. Johnson, Vikas Singh

Abstract: The adoption of "human-in-the-loop" paradigms in computer vision and machine learning is leading to various applications where the actual data acquisition (e.g., human supervision) and the underlying inference algorithms are closely interwined. While classical work in active learning provides effective solutions when the learning module involves classification and regression tasks, many practical… ▽ More The adoption of "human-in-the-loop" paradigms in computer vision and machine learning is leading to various applications where the actual data acquisition (e.g., human supervision) and the underlying inference algorithms are closely interwined. While classical work in active learning provides effective solutions when the learning module involves classification and regression tasks, many practical issues such as partially observed measurements, financial constraints and even additional distributional or structural aspects of the data typically fall outside the scope of this treatment. For instance, with sequential acquisition of partial measurements of data that manifest as a matrix (or tensor), novel strategies for completion (or collaborative filtering) of the remaining entries have only been studied recently. Motivated by vision problems where we seek to annotate a large dataset of images via a crowdsourced platform or alternatively, complement results from a state-of-the-art object detector using human feedback, we study the "completion" problem defined on graphs, where requests for additional measurements must be made sequentially. We design the optimization model in the Fourier domain of the graph describing how ideas based on adaptive submodularity provide algorithms that work well in practice. On a large set of images collected from Imgur, we see promising results on images that are otherwise difficult to categorize. We also show applications to an experimental design problem in neuroimaging. △ Less

Submitted 11 August, 2020; originally announced August 2020.

Comments: 9 pages, 7 figures, CVPR 2017 Conference

arXiv:2006.16848 [pdf]

doi 10.3390/su12104023

Modeling and Uncertainty Analysis of Groundwater Level Using Six Evolutionary Optimization Algorithms Hybridized with ANFIS, SVM, and ANN

Authors: Akram Seifi, Mohammad Ehteram, Vijay P. Singh, Amir Mosavi

Abstract: In the present study, six meta-heuristic schemes are hybridized with artificial neural network (ANN), adaptive neuro-fuzzy interface system (ANFIS), and support vector machine (SVM), to predict monthly groundwater level (GWL), evaluate uncertainty analysis of predictions and spatial variation analysis. The six schemes, including grasshopper optimization algorithm (GOA), cat swarm optimization (CSO… ▽ More In the present study, six meta-heuristic schemes are hybridized with artificial neural network (ANN), adaptive neuro-fuzzy interface system (ANFIS), and support vector machine (SVM), to predict monthly groundwater level (GWL), evaluate uncertainty analysis of predictions and spatial variation analysis. The six schemes, including grasshopper optimization algorithm (GOA), cat swarm optimization (CSO), weed algorithm (WA), genetic algorithm (GA), krill algorithm (KA), and particle swarm optimization (PSO), were used to hybridize for improving the performance of ANN, SVM, and ANFIS models. Groundwater level (GWL) data of Ardebil plain (Iran) for a period of 144 months were selected to evaluate the hybrid models. The pre-processing technique of principal component analysis (PCA) was applied to reduce input combinations from monthly time series up to 12-month prediction intervals. The results showed that the ANFIS-GOA was superior to the other hybrid models for predicting GWL in the first piezometer and third piezometer in the testing stage. The performance of hybrid models with optimization algorithms was far better than that of classical ANN, ANFIS, and SVM models without hybridization. The percent of improvements in the ANFIS-GOA versus standalone ANFIS in piezometer 10 were 14.4%, 3%, 17.8%, and 181% for RMSE, MAE, NSE, and PBIAS in the training stage and 40.7%, 55%, 25%, and 132% in testing stage, respectively. The improvements for piezometer 6 in train step were 15%, 4%, 13%, and 208% and in the test step were 33%, 44.6%, 16.3%, and 173%, respectively, that clearly confirm the superiority of developed hybridization schemes in GWL modeling. Uncertainty analysis showed that ANFIS-GOA and SVM had, respectively, the best and worst performances among other models. In general, GOA enhanced the accuracy of the ANFIS, ANN, and SVM models. △ Less

Submitted 28 June, 2020; originally announced June 2020.

Comments: 42 pages, 11 figures

MSC Class: 68T07

Journal ref: Sustainability 2020, 12, 4023

arXiv:2005.04258 [pdf, other]

View Invariant Human Body Detection and Pose Estimation from Multiple Depth Sensors

Authors: Walid Bekhtaoui, Ruhan Sa, Brian Teixeira, Vivek Singh, Klaus Kirchberg, Yao-jen Chang, Ankur Kapoor

Abstract: Point cloud based methods have produced promising results in areas such as 3D object detection in autonomous driving. However, most of the recent point cloud work focuses on single depth sensor data, whereas less work has been done on indoor monitoring applications, such as operation room monitoring in hospitals or indoor surveillance. In these scenarios multiple cameras are often used to tackle o… ▽ More Point cloud based methods have produced promising results in areas such as 3D object detection in autonomous driving. However, most of the recent point cloud work focuses on single depth sensor data, whereas less work has been done on indoor monitoring applications, such as operation room monitoring in hospitals or indoor surveillance. In these scenarios multiple cameras are often used to tackle occlusion problems. We propose an end-to-end multi-person 3D pose estimation network, Point R-CNN, using multiple point cloud sources. We conduct extensive experiments to simulate challenging real world cases, such as individual camera failures, various target appearances, and complex cluttered scenes with the CMU panoptic dataset and the MVOR operation room dataset. Unlike most of the previous methods that attempt to use multiple sensor information by building complex fusion models, which often lead to poor generalization, we take advantage of the efficiency of concatenating point clouds to fuse the information at the input level. In the meantime, we show our end-to-end network greatly outperforms cascaded state-of-the-art models. △ Less

Submitted 8 May, 2020; originally announced May 2020.

arXiv:2001.01277 [pdf, other]

Automated Segmentation of Vertebrae on Lateral Chest Radiography Using Deep Learning

Authors: Sanket Badhe, Varun Singh, Joy Li, Paras Lakhani

Abstract: The purpose of this study is to develop an automated algorithm for thoracic vertebral segmentation on chest radiography using deep learning. 124 de-identified lateral chest radiographs on unique patients were obtained. Segmentations of visible vertebrae were manually performed by a medical student and verified by a board-certified radiologist. 74 images were used for training, 10 for validation, a… ▽ More The purpose of this study is to develop an automated algorithm for thoracic vertebral segmentation on chest radiography using deep learning. 124 de-identified lateral chest radiographs on unique patients were obtained. Segmentations of visible vertebrae were manually performed by a medical student and verified by a board-certified radiologist. 74 images were used for training, 10 for validation, and 40 were held out for testing. A U-Net deep convolutional neural network was employed for segmentation, using the sum of dice coefficient and binary cross-entropy as the loss function. On the test set, the algorithm demonstrated an average dice coefficient value of 90.5 and an average intersection-over-union (IoU) of 81.75. Deep learning demonstrates promise in the segmentation of vertebrae on lateral chest radiography. △ Less

Submitted 5 January, 2020; originally announced January 2020.

Comments: 10 pages, Accepted Poster presentation at Conference on Machine Intelligence in Medical Imaging 2018

arXiv:1911.08616 [pdf, other]

Attention Guided Anomaly Localization in Images

Authors: Shashanka Venkataramanan, Kuan-Chuan Peng, Rajat Vikram Singh, Abhijit Mahalanobis

Abstract: Anomaly localization is an important problem in computer vision which involves localizing anomalous regions within images with applications in industrial inspection, surveillance, and medical imaging. This task is challenging due to the small sample size and pixel coverage of the anomaly in real-world scenarios. Most prior works need to use anomalous training images to compute a class-specific thr… ▽ More Anomaly localization is an important problem in computer vision which involves localizing anomalous regions within images with applications in industrial inspection, surveillance, and medical imaging. This task is challenging due to the small sample size and pixel coverage of the anomaly in real-world scenarios. Most prior works need to use anomalous training images to compute a class-specific threshold to localize anomalies. Without the need of anomalous training images, we propose Convolutional Adversarial Variational autoencoder with Guided Attention (CAVGA), which localizes the anomaly with a convolutional latent variable to preserve the spatial information. In the unsupervised setting, we propose an attention expansion loss where we encourage CAVGA to focus on all normal regions in the image. Furthermore, in the weakly-supervised setting we propose a complementary guided attention loss, where we encourage the attention map to focus on all normal regions while minimizing the attention map corresponding to anomalous regions in the image. CAVGA outperforms the state-of-the-art (SOTA) anomaly localization methods on MVTec Anomaly Detection (MVTAD), modified ShanghaiTech Campus (mSTC) and Large-scale Attention based Glaucoma (LAG) datasets in the unsupervised setting and when using only 2% anomalous images in the weakly-supervised setting. CAVGA also outperforms SOTA anomaly detection methods on the MNIST, CIFAR-10, Fashion-MNIST, MVTAD, mSTC and LAG datasets. △ Less

Submitted 16 July, 2020; v1 submitted 19 November, 2019; originally announced November 2019.

Comments: Accepted to ECCV 2020

arXiv:1908.08074 [pdf, other]

DUAL-GLOW: Conditional Flow-Based Generative Model for Modality Transfer

Authors: Haoliang Sun, Ronak Mehta, Hao H. Zhou, Zhichun Huang, Sterling C. Johnson, Vivek Prabhakaran, Vikas Singh

Abstract: Positron emission tomography (PET) imaging is an imaging modality for diagnosing a number of neurological diseases. In contrast to Magnetic Resonance Imaging (MRI), PET is costly and involves injecting a radioactive substance into the patient. Motivated by developments in modality transfer in vision, we study the generation of certain types of PET images from MRI data. We derive new flow-based gen… ▽ More Positron emission tomography (PET) imaging is an imaging modality for diagnosing a number of neurological diseases. In contrast to Magnetic Resonance Imaging (MRI), PET is costly and involves injecting a radioactive substance into the patient. Motivated by developments in modality transfer in vision, we study the generation of certain types of PET images from MRI data. We derive new flow-based generative models which we show perform well in this small sample size regime (much smaller than dataset sizes available in standard vision tasks). Our formulation, DUAL-GLOW, is based on two invertible networks and a relation network that maps the latent spaces to each other. We discuss how given the prior distribution, learning the conditional distribution of PET given the MRI image reduces to obtaining the conditional distribution between the two latent codes w.r.t. the two image types. We also extend our framework to leverage 'side' information (or attributes) when available. By controlling the PET generation through 'conditioning' on age, our model is also able to capture brain FDG-PET (hypometabolism) changes, as a function of age. We present experiments on the Alzheimers Disease Neuroimaging Initiative (ADNI) dataset with 826 subjects, and obtain good performance in PET image synthesis, qualitatively and quantitatively better than recent works. △ Less

Submitted 21 August, 2019; originally announced August 2019.

Journal ref: ICCV 2019

arXiv:1907.02742 [pdf, other]

Adversarial Learning with Multiscale Features and Kernel Factorization for Retinal Blood Vessel Segmentation

Authors: Farhan Akram, Vivek Kumar Singh, Hatem A. Rashwan, Mohamed Abdel-Nasser, Md. Mostafa Kamal Sarker, Nidhi Pandey, Domenec Puig

Abstract: In this paper, we propose an efficient blood vessel segmentation method for the eye fundus images using adversarial learning with multiscale features and kernel factorization. In the generator network of the adversarial framework, spatial pyramid pooling, kernel factorization and squeeze excitation block are employed to enhance the feature representation in spatial domain on different scales with… ▽ More In this paper, we propose an efficient blood vessel segmentation method for the eye fundus images using adversarial learning with multiscale features and kernel factorization. In the generator network of the adversarial framework, spatial pyramid pooling, kernel factorization and squeeze excitation block are employed to enhance the feature representation in spatial domain on different scales with reduced computational complexity. In turn, the discriminator network of the adversarial framework is formulated by combining convolutional layers with an additional squeeze excitation block to differentiate the generated segmentation mask from its respective ground truth. Before feeding the images to the network, we pre-processed them by using edge sharpening and Gaussian regularization to reach an optimized solution for vessel segmentation. The output of the trained model is post-processed using morphological operations to remove the small speckles of noise. The proposed method qualitatively and quantitatively outperforms state-of-the-art vessel segmentation methods using DRIVE and STARE datasets. △ Less

Submitted 5 July, 2019; originally announced July 2019.

Comments: 9 pages, 4 figures

arXiv:1907.00887 [pdf, other]

An Efficient Solution for Breast Tumor Segmentation and Classification in Ultrasound Images Using Deep Adversarial Learning

Authors: Vivek Kumar Singh, Hatem A. Rashwan, Mohamed Abdel-Nasser, Md. Mostafa Kamal Sarker, Farhan Akram, Nidhi Pandey, Santiago Romani, Domenec Puig

Abstract: This paper proposes an efficient solution for tumor segmentation and classification in breast ultrasound (BUS) images. We propose to add an atrous convolution layer to the conditional generative adversarial network (cGAN) segmentation model to learn tumor features at different resolutions of BUS images. To automatically re-balance the relative impact of each of the highest level encoded features,… ▽ More This paper proposes an efficient solution for tumor segmentation and classification in breast ultrasound (BUS) images. We propose to add an atrous convolution layer to the conditional generative adversarial network (cGAN) segmentation model to learn tumor features at different resolutions of BUS images. To automatically re-balance the relative impact of each of the highest level encoded features, we also propose to add a channel-wise weighting block in the network. In addition, the SSIM and L1-norm loss with the typical adversarial loss are used as a loss function to train the model. Our model outperforms the state-of-the-art segmentation models in terms of the Dice and IoU metrics, achieving top scores of 93.76% and 88.82%, respectively. In the classification stage, we show that few statistics features extracted from the shape of the boundaries of the predicted masks can properly discriminate between benign and malignant tumors with an accuracy of 85%$ △ Less

Submitted 1 July, 2019; originally announced July 2019.

Comments: 9 pages

arXiv:1907.00856 [pdf, other]

SLSNet: Skin lesion segmentation using a lightweight generative adversarial network

Authors: Md. Mostafa Kamal Sarker, Hatem A. Rashwan, Farhan Akram, Vivek Kumar Singh, Syeda Furruka Banu, Forhad U H Chowdhury, Kabir Ahmed Choudhury, Sylvie Chambon, Petia Radeva, Domenec Puig, Mohamed Abdel-Nasser

Abstract: The determination of precise skin lesion boundaries in dermoscopic images using automated methods faces many challenges, most importantly, the presence of hair, inconspicuous lesion edges and low contrast in dermoscopic images, and variability in the color, texture and shapes of skin lesions. Existing deep learning-based skin lesion segmentation algorithms are expensive in terms of computational t… ▽ More The determination of precise skin lesion boundaries in dermoscopic images using automated methods faces many challenges, most importantly, the presence of hair, inconspicuous lesion edges and low contrast in dermoscopic images, and variability in the color, texture and shapes of skin lesions. Existing deep learning-based skin lesion segmentation algorithms are expensive in terms of computational time and memory. Consequently, running such segmentation algorithms requires a powerful GPU and high bandwidth memory, which are not available in dermoscopy devices. Thus, this article aims to achieve precise skin lesion segmentation with minimum resources: a lightweight, efficient generative adversarial network (GAN) model called SLSNet, which combines 1-D kernel factorized networks, position and channel attention, and multiscale aggregation mechanisms with a GAN model. The 1-D kernel factorized network reduces the computational cost of 2D filtering. The position and channel attention modules enhance the discriminative ability between the lesion and non-lesion feature representations in spatial and channel dimensions, respectively. A multiscale block is also used to aggregate the coarse-to-fine features of input skin images and reduce the effect of the artifacts. SLSNet is evaluated on two publicly available datasets: ISBI 2017 and the ISIC 2018. Although SLSNet has only 2.35 million parameters, the experimental results demonstrate that it achieves segmentation results on a par with the state-of-the-art skin lesion segmentation methods with an accuracy of 97.61%, and Dice and Jaccard similarity coefficients of 90.63% and 81.98%, respectively. SLSNet can run at more than 110 frames per second (FPS) in a single GTX1080Ti GPU, which is faster than well-known deep learning-based image segmentation models, such as FCN. Therefore, SLSNet can be used for practical dermoscopic applications. △ Less

Submitted 17 June, 2021; v1 submitted 1 July, 2019; originally announced July 2019.

Comments: Accepted in Expert Systems with Applications

arXiv:1811.03343 [pdf, other]

Repetitive Motion Estimation Network: Recover cardiac and respiratory signal from thoracic imaging

Authors: Xiaoxiao Li, Vivek Singh, Yifan Wu, Klaus Kirchberg, James Duncan, Ankur Kapoor

Abstract: Tracking organ motion is important in image-guided interventions, but motion annotations are not always easily available. Thus, we propose Repetitive Motion Estimation Network (RMEN) to recover cardiac and respiratory signals. It learns the spatio-temporal repetition patterns, embedding high dimensional motion manifolds to 1D vectors with partial motion phase boundary annotations. Compared with th… ▽ More Tracking organ motion is important in image-guided interventions, but motion annotations are not always easily available. Thus, we propose Repetitive Motion Estimation Network (RMEN) to recover cardiac and respiratory signals. It learns the spatio-temporal repetition patterns, embedding high dimensional motion manifolds to 1D vectors with partial motion phase boundary annotations. Compared with the best alternative models, our proposed RMEN significantly decreased the QRS peaks detection offsets by 59.3%. Results showed that RMEN could handle the irregular cardiac and respiratory motion cases. Repetitive motion patterns learned by RMEN were visualized and indicated in the feature maps. △ Less

Submitted 8 November, 2018; originally announced November 2018.

Comments: Accepted by NIPS workshop MED-NIPS 2018

arXiv:1306.5412 [pdf]

Electronically Tunable Voltage-Mode Biquad Filter/Oscillator Based On CCCCTAs

Authors: Sajai Vir Singh, Gungan Gupta, Rahul Chhabra, Kanika Nagpal, Devansh

Abstract: In this paper, a circuit employing current controlled current conveyor trans-conductance amplifiers (CCCCTAs) as active element is proposed which can function both as biquad filter and oscillator. It uses two CCCCTAs and two capacitors. As a biquad filter it can realizes all the standard filtering functions (low pass, band pass, high pass, band reject and all pass) in voltage-mode and provides the… ▽ More In this paper, a circuit employing current controlled current conveyor trans-conductance amplifiers (CCCCTAs) as active element is proposed which can function both as biquad filter and oscillator. It uses two CCCCTAs and two capacitors. As a biquad filter it can realizes all the standard filtering functions (low pass, band pass, high pass, band reject and all pass) in voltage-mode and provides the feature of electronically and orthogonal control of pole frequency and quality factor through biasing current(s) of CCCCTAs. The proposed circuit can also be worked as oscillator without changing the circuit topology. Without any resistors and using capacitors, the proposed circuit is suitable for IC fabrication. The validity of proposed filter is verified through PSPICE simulations. △ Less

Submitted 23 June, 2013; originally announced June 2013.

Comments: 5 pages, 7 figures, 1 table, Authors profile

Showing 1–37 of 37 results for author: Singh, V