Search | arXiv e-print repository

GLOBE: A High-quality English Corpus with Global Accents for Zero-shot Speaker Adaptive Text-to-Speech

Authors: Wenbin Wang, Yang Song, Sanjay Jha

Abstract: This paper introduces GLOBE, a high-quality English corpus with worldwide accents, specifically designed to address the limitations of current zero-shot speaker adaptive Text-to-Speech (TTS) systems that exhibit poor generalizability in adapting to speakers with accents. Compared to commonly used English corpora, such as LibriTTS and VCTK, GLOBE is unique in its inclusion of utterances from 23,519… ▽ More This paper introduces GLOBE, a high-quality English corpus with worldwide accents, specifically designed to address the limitations of current zero-shot speaker adaptive Text-to-Speech (TTS) systems that exhibit poor generalizability in adapting to speakers with accents. Compared to commonly used English corpora, such as LibriTTS and VCTK, GLOBE is unique in its inclusion of utterances from 23,519 speakers and covers 164 accents worldwide, along with detailed metadata for these speakers. Compared to its original corpus, i.e., Common Voice, GLOBE significantly improves the quality of the speech data through rigorous filtering and enhancement processes, while also populating all missing speaker metadata. The final curated GLOBE corpus includes 535 hours of speech data at a 24 kHz sampling rate. Our benchmark results indicate that the speaker adaptive TTS model trained on the GLOBE corpus can synthesize speech with better speaker similarity and comparable naturalness than that trained on other popular corpora. We will release GLOBE publicly after acceptance. The GLOBE dataset is available at https://globecorpus.github.io/. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: Interspeech 2024, 4 pages, 3 figures

arXiv:2406.10893 [pdf, other]

Development and Validation of Fully Automatic Deep Learning-Based Algorithms for Immunohistochemistry Reporting of Invasive Breast Ductal Carcinoma

Authors: Sumit Kumar Jha, Purnendu Mishra, Shubham Mathur, Gursewak Singh, Rajiv Kumar, Kiran Aatre, Suraj Rengarajan

Abstract: Immunohistochemistry (IHC) analysis is a well-accepted and widely used method for molecular subty**, a procedure for prognosis and targeted therapy of breast carcinoma, the most common type of tumor affecting women. There are four molecular biomarkers namely progesterone receptor (PR), estrogen receptor (ER), antigen Ki67, and human epidermal growth factor receptor 2 (HER2) whose assessment is n… ▽ More Immunohistochemistry (IHC) analysis is a well-accepted and widely used method for molecular subty**, a procedure for prognosis and targeted therapy of breast carcinoma, the most common type of tumor affecting women. There are four molecular biomarkers namely progesterone receptor (PR), estrogen receptor (ER), antigen Ki67, and human epidermal growth factor receptor 2 (HER2) whose assessment is needed under IHC procedure to decide prognosis as well as predictors of response to therapy. However, IHC scoring is based on subjective microscopic examination of tumor morphology and suffers from poor reproducibility, high subjectivity, and often incorrect scoring in low-score cases. In this paper, we present, a deep learning-based semi-supervised trained, fully automatic, decision support system (DSS) for IHC scoring of invasive ductal carcinoma. Our system automatically detects the tumor region removing artifacts and scores based on Allred standard. The system is developed using 3 million pathologist-annotated image patches from 300 slides, fifty thousand in-house cell annotations, and forty thousand pixels marking of HER2 membrane. We have conducted multicentric trials at four centers with three different types of digital scanners in terms of percentage agreement with doctors. And achieved agreements of 95, 92, 88 and 82 percent for Ki67, HER2, ER, and PR stain categories, respectively. In addition to overall accuracy, we found that there is 5 percent of cases where pathologist have changed their score in favor of algorithm score while reviewing with detailed algorithmic analysis. Our approach could improve the accuracy of IHC scoring and subsequent therapy decisions, particularly where specialist expertise is unavailable. Our system is highly modular. The proposed algorithm modules can be used to develop DSS for other cancer types. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.05828 [pdf, other]

Multi-Stain Multi-Level Convolutional Network for Multi-Tissue Breast Cancer Image Segmentation

Authors: Akash Modi, Sumit Kumar Jha, Purnendu Mishra, Rajiv Kumar, Kiran Aatre, Gursewak Singh, Shubham Mathur

Abstract: Digital pathology and microscopy image analysis are widely employed in the segmentation of digitally scanned IHC slides, primarily to identify cancer and pinpoint regions of interest (ROI) indicative of tumor presence. However, current ROI segmentation models are either stain-specific or suffer from the issues of stain and scanner variance due to different staining protocols or modalities across m… ▽ More Digital pathology and microscopy image analysis are widely employed in the segmentation of digitally scanned IHC slides, primarily to identify cancer and pinpoint regions of interest (ROI) indicative of tumor presence. However, current ROI segmentation models are either stain-specific or suffer from the issues of stain and scanner variance due to different staining protocols or modalities across multiple labs. Also, tissues like Ductal Carcinoma in Situ (DCIS), acini, etc. are often classified as Tumors due to their structural similarities and color compositions. In this paper, we proposed a novel convolutional neural network (CNN) based Multi-class Tissue Segmentation model for histopathology whole-slide Breast slides which classify tumors and segments other tissue regions such as Ducts, acini, DCIS, Squamous epithelium, Blood Vessels, Necrosis, etc. as a separate class. Our unique pixel-aligned non-linear merge across spatial resolutions empowers models with both local and global fields of view for accurate detection of various classes. Our proposed model is also able to separate bad regions such as folds, artifacts, blurry regions, bubbles, etc. from tissue regions using multi-level context from different resolutions of WSI. Multi-phase iterative training with context-aware augmentation and increasing noise was used to efficiently train a multi-stain generic model with partial and noisy annotations from 513 slides. Our training pipeline used 12 million patches generated using context-aware augmentations which made our model stain and scanner invariant across data sources. To extrapolate stain and scanner invariance, our model was evaluated on 23000 patches which were for a completely new stain (Hematoxylin and Eosin) from a completely new scanner (Motic) from a different lab. The mean IOU was 0.72 which is on par with model performance on other data sources and scanners. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2404.18094 [pdf, other]

doi 10.1109/TASLP.2024.3393714

USAT: A Universal Speaker-Adaptive Text-to-Speech Approach

Authors: Wenbin Wang, Yang Song, Sanjay Jha

Abstract: Conventional text-to-speech (TTS) research has predominantly focused on enhancing the quality of synthesized speech for speakers in the training dataset. The challenge of synthesizing lifelike speech for unseen, out-of-dataset speakers, especially those with limited reference data, remains a significant and unresolved problem. While zero-shot or few-shot speaker-adaptive TTS approaches have been e… ▽ More Conventional text-to-speech (TTS) research has predominantly focused on enhancing the quality of synthesized speech for speakers in the training dataset. The challenge of synthesizing lifelike speech for unseen, out-of-dataset speakers, especially those with limited reference data, remains a significant and unresolved problem. While zero-shot or few-shot speaker-adaptive TTS approaches have been explored, they have many limitations. Zero-shot approaches tend to suffer from insufficient generalization performance to reproduce the voice of speakers with heavy accents. While few-shot methods can reproduce highly varying accents, they bring a significant storage burden and the risk of overfitting and catastrophic forgetting. In addition, prior approaches only provide either zero-shot or few-shot adaptation, constraining their utility across varied real-world scenarios with different demands. Besides, most current evaluations of speaker-adaptive TTS are conducted only on datasets of native speakers, inadvertently neglecting a vast portion of non-native speakers with diverse accents. Our proposed framework unifies both zero-shot and few-shot speaker adaptation strategies, which we term as "instant" and "fine-grained" adaptations based on their merits. To alleviate the insufficient generalization performance observed in zero-shot speaker adaptation, we designed two innovative discriminators and introduced a memory mechanism for the speech decoder. To prevent catastrophic forgetting and reduce storage implications for few-shot speaker adaptation, we designed two adapters and a unique adaptation procedure. △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: 15 pages, 13 figures. Copyright has been transferred to IEEE

Journal ref: IEEE/ACM Transactions on Audio, Speech and Language Processing, 2024

arXiv:2404.15293 [pdf, other]

Interactive Manipulation and Visualization of 3D Brain MRI for Surgical Training

Authors: Siddharth Jha, Zichen Gui, Benjamin Delbos, Richard Moreau, Arnaud Leleve, Irene Cheng

Abstract: In modern medical diagnostics, magnetic resonance imaging (MRI) is an important technique that provides detailed insights into anatomical structures. In this paper, we present a comprehensive methodology focusing on streamlining the segmentation, reconstruction, and visualization process of 3D MRI data. Segmentation involves the extraction of anatomical regions with the help of state-of-the-art de… ▽ More In modern medical diagnostics, magnetic resonance imaging (MRI) is an important technique that provides detailed insights into anatomical structures. In this paper, we present a comprehensive methodology focusing on streamlining the segmentation, reconstruction, and visualization process of 3D MRI data. Segmentation involves the extraction of anatomical regions with the help of state-of-the-art deep learning algorithms. Then, 3D reconstruction converts segmented data from the previous step into multiple 3D representations. Finally, the visualization stage provides efficient and interactive presentations of both 2D and 3D MRI data. Integrating these three steps, the proposed system is able to augment the interpretability of the anatomical information from MRI scans according to our interviews with doctors. Even though this system was originally designed and implemented as part of human brain haptic feedback simulation for surgeon training, it can also provide experienced medical practitioners with an effective tool for clinical data analysis, surgical planning and other purposes △ Less

Submitted 24 March, 2024; originally announced April 2024.

arXiv:2311.00429 [pdf, other]

Crop Disease Classification using Support Vector Machines with Green Chromatic Coordinate (GCC) and Attention based feature extraction for IoT based Smart Agricultural Applications

Authors: Shashwat Jha, Vishvaditya Luhach, Gauri Shanker Gupta, Beependra Singh

Abstract: Crops hold paramount significance as they serve as the primary provider of energy, nutrition, and medicinal benefits for the human population. Plant diseases, however, can negatively affect leaves during agricultural cultivation, resulting in significant losses in crop output and economic value. Therefore, it is crucial for farmers to identify crop diseases. However, this method frequently necessi… ▽ More Crops hold paramount significance as they serve as the primary provider of energy, nutrition, and medicinal benefits for the human population. Plant diseases, however, can negatively affect leaves during agricultural cultivation, resulting in significant losses in crop output and economic value. Therefore, it is crucial for farmers to identify crop diseases. However, this method frequently necessitates hard work, a lot of planning, and in-depth familiarity with plant pathogens. Given these numerous obstacles, it is essential to provide solutions that can easily interface with mobile and IoT devices so that our farmers can guarantee the best possible crop development. Various machine learning (ML) as well as deep learning (DL) algorithms have been created & studied for the identification of plant disease detection, yielding substantial and promising results. This article presents a novel classification method that builds on prior work by utilising attention-based feature extraction, RGB channel-based chromatic analysis, Support Vector Machines (SVM) for improved performance, and the ability to integrate with mobile applications and IoT devices after quantization of information. Several disease classification algorithms were compared with the suggested model, and it was discovered that, in terms of accuracy, Vision Transformer-based feature extraction and additional Green Chromatic Coordinate feature with SVM classification achieved an accuracy of (GCCViT-SVM) - 99.69%, whereas after quantization for IoT device integration achieved an accuracy of - 97.41% while almost reducing 4x in size. Our findings have profound implications because they have the potential to transform how farmers identify crop illnesses with precise and fast information, thereby preserving agricultural output and ensuring food security. △ Less

Submitted 6 November, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

arXiv:2309.05070 [pdf, other]

doi 10.1145/3610419.3610487

Chasing the Intruder: A Reinforcement Learning Approach for Tracking Intruder Drones

Authors: Shivam Kainth, Subham Sahoo, Rajtilak Pal, Shashi Shekhar Jha

Abstract: Drones are becoming versatile in a myriad of applications. This has led to the use of drones for spying and intruding into the restricted or private air spaces. Such foul use of drone technology is dangerous for the safety and security of many critical infrastructures. In addition, due to the varied low-cost design and agility of the drones, it is a challenging task to identify and track them usin… ▽ More Drones are becoming versatile in a myriad of applications. This has led to the use of drones for spying and intruding into the restricted or private air spaces. Such foul use of drone technology is dangerous for the safety and security of many critical infrastructures. In addition, due to the varied low-cost design and agility of the drones, it is a challenging task to identify and track them using the conventional radar systems. In this paper, we propose a reinforcement learning based approach for identifying and tracking any intruder drone using a chaser drone. Our proposed solution uses computer vision techniques interleaved with the policy learning framework of reinforcement learning to learn a control policy for chasing the intruder drone. The whole system has been implemented using ROS and Gazebo along with the Ardupilot based flight controller. The results show that the reinforcement learning based policy converges to identify and track the intruder drone. Further, the learnt policy is robust with respect to the change in speed or orientation of the intruder drone. △ Less

Submitted 10 September, 2023; originally announced September 2023.

arXiv:2308.13007 [pdf, other]

Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with Disentangled Representations

Authors: Wenbin Wang, Yang Song, Sanjay Jha

Abstract: While most research into speech synthesis has focused on synthesizing high-quality speech for in-dataset speakers, an equally essential yet unsolved problem is synthesizing speech for unseen speakers who are out-of-dataset with limited reference data, i.e., speaker adaptive speech synthesis. Many studies have proposed zero-shot speaker adaptive text-to-speech and voice conversion approaches aimed… ▽ More While most research into speech synthesis has focused on synthesizing high-quality speech for in-dataset speakers, an equally essential yet unsolved problem is synthesizing speech for unseen speakers who are out-of-dataset with limited reference data, i.e., speaker adaptive speech synthesis. Many studies have proposed zero-shot speaker adaptive text-to-speech and voice conversion approaches aimed at this task. However, most current approaches suffer from the degradation of naturalness and speaker similarity when synthesizing speech for unseen speakers (i.e., speakers not in the training dataset) due to the poor generalizability of the model in out-of-distribution data. To address this problem, we propose GZS-TV, a generalizable zero-shot speaker adaptive text-to-speech and voice conversion model. GZS-TV introduces disentangled representation learning for both speaker embedding extraction and timbre transformation to improve model generalization and leverages the representation learning capability of the variational autoencoder to enhance the speaker encoder. Our experiments demonstrate that GZS-TV reduces performance degradation on unseen speakers and outperforms all baseline models in multiple datasets. △ Less

Submitted 24 August, 2023; originally announced August 2023.

Comments: 5 pages, 3 figures. Accepted by Interspeech 2023, Oral

arXiv:2303.01702 [pdf, other]

Hardware Software Co-Design Based Reconfigurable Radar Signal Processing Accelerator for Joint Radar-Communication System

Authors: Shragvi Sidharth Jha, Aakanksha Tewari, Sumit J Darak, Akanksha Sneh, Shobha Sundar Ram

Abstract: Millimeter wave (mmW) codesigned 802.11ad-based joint radar communication (JRC) systems have been identified as a potential solution for realizing high bandwidth connected vehicles for next-generation intelligent transportation systems. The radar functionality within the JRC enables accurate detection and localization of mobile targets, which can significantly speed up the selection of the optimal… ▽ More Millimeter wave (mmW) codesigned 802.11ad-based joint radar communication (JRC) systems have been identified as a potential solution for realizing high bandwidth connected vehicles for next-generation intelligent transportation systems. The radar functionality within the JRC enables accurate detection and localization of mobile targets, which can significantly speed up the selection of the optimal high-directional narrow beam required for mmW communications between the base station and mobile target. To bring JRC to reality, a radar signal processing (RSP) accelerator, co-located with the wireless communication physical layer (PHY), on edge platforms is desired. In this work, we discuss the three-dimensional digital hardware RSP framework for 802.11ad-based JRC to detect the range, azimuth, and Doppler velocity of multiple targets. We present a novel efficient reconfigurable architecture for RSP on multi-processor system-on-chip (MPSoC) via hardware-software co-design, word-length optimization, and serial-parallel configurations. We demonstrate the functional correctness of the proposed fixed-point architecture and significant savings in resource utilization (~40-70), execution time (1.5x improvement), and power consumption (50%) over floating-point architecture. The acceleration on hardware offers a 120-factor improvement in execution time over the benchmark Quad-core processor. The proposed architecture enables on-the-fly reconfigurability to support different azimuth precision and Doppler velocity resolution, offering a real-time trade-off between functional accuracy and detection time. We demonstrate end-to-end RSP on MPSoC with a user-friendly graphical user interface (GUI). △ Less

Submitted 2 March, 2023; originally announced March 2023.

arXiv:2302.05494 [pdf]

Data-Driven Web-Based Patching Management Tool Using Multi-Sensor Pavement Structure Measurements

Authors: Sneha Jha, Yaguang Zhang, Bongsuk Park, Seonghwan Cho, James V. Krogmeier, Tandra Bagchi, John E. Haddock

Abstract: Automating pavement maintenance suggestions is challenging,especially for actionable recommendations such as patching location,depth and priority.It is common practice among State agencies to manually inspect road segments of interest and decide maintenance requirements based on the pavement condition index (PCI).However,standalone PCI only evaluates the pavement surface condition and coupled with… ▽ More Automating pavement maintenance suggestions is challenging,especially for actionable recommendations such as patching location,depth and priority.It is common practice among State agencies to manually inspect road segments of interest and decide maintenance requirements based on the pavement condition index (PCI).However,standalone PCI only evaluates the pavement surface condition and coupled with the variability in human perception of pavement distress,limits the accuracy and quality of current pavement maintenance practices.Here,a need for multi-sensor data integrated with standardized pavement distress condition ratings is required.This study explores the possibility of estimating the appropriate pavement patching strategy (i.e.,patching location,depth,and quantity) by integrating pavement structural and surface condition assessment with pavement specific ratings of distress.Especially,it combines pavement structural condition assessment parameter;falling weight deflectometer deflections along with surface condition assessment parameters;international roughness index,and cracking density for a better representation of overall pavement distress condition.Then,a pavement specific threshold-based patching suggestion algorithm is implemented to evaluate the pavement overall distress condition into a priority-based patching suggestion.The novelty in the use of pavement specific thresholds is placed on its data-driven ability to determine threshold values from current road condition measurements using a reliability concept validated by the theoretical pavement condition rating,pavement structural number.A web-based patching manager tool (PMT) was implemented to automate the patching suggestion procedure and visualize the results.Validated with road surface images obtained from three-dimensional laser sensors,PMT could successfully capture localized distresses in existing pavements. △ Less

Submitted 10 February, 2023; originally announced February 2023.

Comments: Presented at: Transportation Research Board Annual Meeting 2023

Report number: PaperID-TRBAM-23-04483

arXiv:2209.08795 [pdf, other]

AutoLV: Automatic Lecture Video Generator

Authors: Wenbin Wang, Yang Song, Sanjay Jha

Abstract: We propose an end-to-end lecture video generation system that can generate realistic and complete lecture videos directly from annotated slides, instructor's reference voice and instructor's reference portrait video. Our system is primarily composed of a speech synthesis module with few-shot speaker adaptation and an adversarial learning-based talking-head generation module. It is capable of not o… ▽ More We propose an end-to-end lecture video generation system that can generate realistic and complete lecture videos directly from annotated slides, instructor's reference voice and instructor's reference portrait video. Our system is primarily composed of a speech synthesis module with few-shot speaker adaptation and an adversarial learning-based talking-head generation module. It is capable of not only reducing instructors' workload but also changing the language and accent which can help the students follow the lecture more easily and enable a wider dissemination of lecture contents. Our experimental results show that the proposed model outperforms other current approaches in terms of authenticity, naturalness and accuracy. Here is a video demonstration of how our system works, and the outcomes of the evaluation and comparison: https://youtu.be/cY6TYkI0cog. △ Less

Submitted 19 September, 2022; originally announced September 2022.

Comments: 4 pages, 4 figures, ICIP 2022

arXiv:2109.05222 [pdf, ps, other]

Fundamental limits of over-the-air optimization: Are analog schemes optimal?

Authors: Shubham K Jha, Prathamesh Mayekar, Himanshu Tyagi

Abstract: We consider over-the-air convex optimization on a $d-$dimensional space where coded gradients are sent over an additive Gaussian noise channel with variance $σ^2$. The codewords satisfy an average power constraint $P$, resulting in the signal-to-noise ratio (SNR) of $P/σ^2$. We derive bounds for the convergence rates for over-the-air optimization. Our first result is a lower bound for the converge… ▽ More We consider over-the-air convex optimization on a $d-$dimensional space where coded gradients are sent over an additive Gaussian noise channel with variance $σ^2$. The codewords satisfy an average power constraint $P$, resulting in the signal-to-noise ratio (SNR) of $P/σ^2$. We derive bounds for the convergence rates for over-the-air optimization. Our first result is a lower bound for the convergence rate showing that any code must slowdown the convergence rate by a factor of roughly $\sqrt{d/\log(1+\mathtt{SNR})}$. Next, we consider a popular class of schemes called $analog$ $coding$, where a linear function of the gradient is sent. We show that a simple scaled transmission analog coding scheme results in a slowdown in convergence rate by a factor of $\sqrt{d(1+1/\mathtt{SNR})}$. This matches the previous lower bound up to constant factors for low SNR, making the scaled transmission scheme optimal at low SNR. However, we show that this slowdown is necessary for any analog coding scheme. In particular, a slowdown in convergence by a factor of $\sqrt{d}$ for analog coding remains even when SNR tends to infinity. Remarkably, we present a simple quantize-and-modulate scheme that uses $Amplitude$ $Shift$ $Keying$ and almost attains the optimal convergence rate at all SNRs. △ Less

Submitted 15 September, 2021; v1 submitted 11 September, 2021; originally announced September 2021.

Comments: Few typos fixed and one reference added. An abridged version of this paper will appear in the proceedings of IEEE Global Communications Conference (GLOBECOM), Spain, 2021

arXiv:2108.06884 [pdf, other]

Seirios: Leveraging Multiple Channels for LoRaWAN Indoor and Outdoor Localization

Authors: Jun Liu, Jiayao Gao, Sanjay Jha, Wen Hu

Abstract: Localization is important for a large number of Internet of Things (IoT) endpoint devices connected by LoRaWAN. Due to the bandwidth limitations of LoRaWAN, existing localization methods without specialized hardware (e.g., GPS) produce poor performance. To increase the localization accuracy, we propose a super-resolution localization method, called Seirios, which features a novel algorithm to sync… ▽ More Localization is important for a large number of Internet of Things (IoT) endpoint devices connected by LoRaWAN. Due to the bandwidth limitations of LoRaWAN, existing localization methods without specialized hardware (e.g., GPS) produce poor performance. To increase the localization accuracy, we propose a super-resolution localization method, called Seirios, which features a novel algorithm to synchronize multiple non-overlapped communication channels by exploiting the unique features of the radio physical layer to increase the overall bandwidth. By exploiting both the original and the conjugate of the physical layer, Seirios can resolve the direct path from multiple reflectors in both indoor and outdoor environments. We design a Seirios prototype and evaluate its performance in an outdoor area of 100 m $\times$ 60 m, and an indoor area of 25 m $\times$ 15 m, which shows that Seirios can achieve a median error of 4.4 m outdoors (80% samples < 6.4 m), and 2.4 m indoors (80% samples < 6.1 m), respectively. The results show that Seirios produces 42% less localization error than the baseline approach. Our evaluation also shows that, different to previous studies in Wi-Fi localization systems that have wider bandwidth, time-of-fight (ToF) estimation is less effective for LoRaWAN localization systems with narrowband radio signals. △ Less

Submitted 15 August, 2021; originally announced August 2021.

Comments: MOBICOM 2021

arXiv:2108.00874 [pdf, other]

Few-Shot Domain Adaptation For End-to-End Communication

Authors: Jayaram Raghuram, Yi**g Zeng, Dolores García Martí, Rafael Ruiz Ortiz, Somesh Jha, Joerg Widmer, Suman Banerjee

Abstract: The problem of end-to-end learning of a communication system using an autoencoder -- consisting of an encoder, channel, and decoder modeled using neural networks -- has recently been shown to be a promising approach. A challenge faced in the practical adoption of this learning approach is that under changing channel conditions (e.g. a wireless link), it requires frequent retraining of the autoenco… ▽ More The problem of end-to-end learning of a communication system using an autoencoder -- consisting of an encoder, channel, and decoder modeled using neural networks -- has recently been shown to be a promising approach. A challenge faced in the practical adoption of this learning approach is that under changing channel conditions (e.g. a wireless link), it requires frequent retraining of the autoencoder in order to maintain a low decoding error rate. Since retraining is both time consuming and requires a large number of samples, it becomes impractical when the channel distribution is changing quickly. We propose to address this problem using a fast and sample-efficient (few-shot) domain adaptation method that does not change the encoder and decoder networks. Different from conventional training-time unsupervised or semi-supervised domain adaptation, here we have a trained autoencoder from a source distribution, that we want to adapt (at test time) to a target distribution using only a small labeled dataset and no unlabeled data. Our method focuses on a Gaussian mixture density network based channel model, and formulates its adaptation based on class and component-conditional affine transformations. The learned affine transformations are used to design an optimal input transformation at the decoder to compensate for the distribution shift, and effectively present to the decoder inputs close to the source distribution. Experiments on a real mmWave FPGA setup as well as a number of simulated distribution changes common to the wireless setting demonstrate the effectiveness of our method at adaptation using very small number of target domain samples. △ Less

Submitted 25 July, 2022; v1 submitted 2 August, 2021; originally announced August 2021.

Comments: 32 pages, 11 figures

arXiv:2108.00640 [pdf, ps, other]

Few-shot calibration of low-cost air pollution (PM2.5) sensors using meta-learning

Authors: Kalpit Yadav, Vipul Arora, Sonu Kumar Jha, Mohit Kumar, Sachchida Nand Tripathi

Abstract: Low-cost particulate matter sensors are transforming air quality monitoring because they have lower costs and greater mobility as compared to reference monitors. Calibration of these low-cost sensors requires training data from co-deployed reference monitors. Machine Learning based calibration gives better performance than conventional techniques, but requires a large amount of training data from… ▽ More Low-cost particulate matter sensors are transforming air quality monitoring because they have lower costs and greater mobility as compared to reference monitors. Calibration of these low-cost sensors requires training data from co-deployed reference monitors. Machine Learning based calibration gives better performance than conventional techniques, but requires a large amount of training data from the sensor, to be calibrated, co-deployed with a reference monitor. In this work, we propose novel transfer learning methods for quick calibration of sensors with minimal co-deployment with reference monitors. Transfer learning utilizes a large amount of data from other sensors along with a limited amount of data from the target sensor. Our extensive experimentation finds the proposed Model-Agnostic- Meta-Learning (MAML) based transfer learning method to be the most effective over other competitive baselines. △ Less

Submitted 2 August, 2021; originally announced August 2021.

Comments: 3+1 pages, submitted to IEEE sensors conference 2021

arXiv:2106.07972 [pdf]

SRIB Submission to Interspeech 2021 DiCOVA Challenge

Authors: Vishwanath Pratap Singh, Shashi Kumar, Ravi Shekhar Jha, Abhishek Pandey

Abstract: The COVID-19 pandemic has resulted in more than 125 million infections and more than 2.7 million casualties. In this paper, we attempt to classify covid vs non-covid cough sounds using signal processing and deep learning methods. Air turbulence, the vibration of tissues, movement of fluid through airways, opening, and closure of glottis are some of the causes for the production of the acoustic sou… ▽ More The COVID-19 pandemic has resulted in more than 125 million infections and more than 2.7 million casualties. In this paper, we attempt to classify covid vs non-covid cough sounds using signal processing and deep learning methods. Air turbulence, the vibration of tissues, movement of fluid through airways, opening, and closure of glottis are some of the causes for the production of the acoustic sound signals during cough. Does the COVID-19 alter the acoustic characteristics of breath, cough, and speech sounds produced through the respiratory system? This is an open question waiting for answers. In this paper, we incorporated novel data augmentation methods for cough sound augmentation and multiple deep neural network architectures and methods along with handcrafted features. Our proposed system gives 14% absolute improvement in area under the curve (AUC). The proposed system is developed as part of Interspeech 2021 special sessions and challenges viz. diagnosing of COVID-19 using acoustics (DiCOVA). Our proposed method secured the 5th position on the leaderboard among 29 participants. △ Less

Submitted 15 June, 2021; originally announced June 2021.

Comments: 5 pages, 5 figures

arXiv:2104.09396 [pdf, other]

doi 10.1016/j.ins.2021.04.062

Continual Learning in Sensor-based Human Activity Recognition: an Empirical Benchmark Analysis

Authors: Saurav Jha, Martin Schiemer, Franco Zambonelli, Juan Ye

Abstract: Sensor-based human activity recognition (HAR), i.e., the ability to discover human daily activity patterns from wearable or embedded sensors, is a key enabler for many real-world applications in smart homes, personal healthcare, and urban planning. However, with an increasing number of applications being deployed, an important question arises: how can a HAR system autonomously learn new activities… ▽ More Sensor-based human activity recognition (HAR), i.e., the ability to discover human daily activity patterns from wearable or embedded sensors, is a key enabler for many real-world applications in smart homes, personal healthcare, and urban planning. However, with an increasing number of applications being deployed, an important question arises: how can a HAR system autonomously learn new activities over a long period of time without being re-engineered from scratch? This problem is known as continual learning and has been particularly popular in the domain of computer vision, where several techniques to attack it have been developed. This paper aims to assess to what extent such continual learning techniques can be applied to the HAR domain. To this end, we propose a general framework to evaluate the performance of such techniques on various types of commonly used HAR datasets. We then present a comprehensive empirical analysis of their computational cost and effectiveness of tackling HAR-specific challenges (i.e., sensor noise and labels' scarcity). The presented results uncover useful insights on their applicability and suggest future research directions for HAR systems. Our code, models and data are available at https://github.com/srvCodes/continual-learning-benchmark. △ Less

Submitted 19 April, 2021; originally announced April 2021.

Comments: Accepted in the Information Sciences journal

arXiv:2011.14674 [pdf, other]

Thermal Analysis of PEM Fuel Cell and Lithium Ion Battery Pack in Confined Space

Authors: Ashita Victor, Abhay Shankar Jha, Janamejaya Channegowda, Sumukh Surya, Kali vara prasad Naraharisetti

Abstract: Hybrid energy storage systems (HESS) have carved a niche in the industry. HESS improve the system efficiency, reduce the overall cost and increase the lifespan of the system. The proton exchange membrane (PEM) fuel cell is hybridized with Li-ion batteries (LIB) for vehicular applications, robotic applications etc. In applications which have geometrical space constraints, the temperature of the ene… ▽ More Hybrid energy storage systems (HESS) have carved a niche in the industry. HESS improve the system efficiency, reduce the overall cost and increase the lifespan of the system. The proton exchange membrane (PEM) fuel cell is hybridized with Li-ion batteries (LIB) for vehicular applications, robotic applications etc. In applications which have geometrical space constraints, the temperature of the energy storage elements is influenced by convective heat transfer. In this paper the thermal analysis of the geometry of PEM-LIB hybrid system is carried out using COMSOL Multiphysics Software package for different discharge rates (C rates) of the LIB and different voltages of the PEM cell. The additional rise in temperature of the LIB pack when placed in close proximity with PEM cell was in the range of 0.03-0.6$^0$C at 4C. The cell temperature of the LIB pack increased with increase in C rate and decrease in PEM cell voltage. △ Less

Submitted 30 November, 2020; originally announced November 2020.

arXiv:2011.12569 [pdf, other]

Learning Certified Control using Contraction Metric

Authors: Dawei Sun, Susmit Jha, Chuchu Fan

Abstract: In this paper, we solve the problem of finding a certified control policy that drives a robot from any given initial state and under any bounded disturbance to the desired reference trajectory, with guarantees on the convergence or bounds on the tracking error. Such a controller is crucial in safe motion planning. We leverage the advanced theory in Control Contraction Metric and design a learning… ▽ More In this paper, we solve the problem of finding a certified control policy that drives a robot from any given initial state and under any bounded disturbance to the desired reference trajectory, with guarantees on the convergence or bounds on the tracking error. Such a controller is crucial in safe motion planning. We leverage the advanced theory in Control Contraction Metric and design a learning framework based on neural networks to co-synthesize the contraction metric and the controller for control-affine systems. We further provide methods to validate the convergence and bounded error guarantees. We demonstrate the performance of our method using a suite of challenging robotic models, including models with learned dynamics as neural networks. We compare our approach with leading methods using sum-of-squares programming, reinforcement learning, and model predictive control. Results show that our methods indeed can handle a broader class of systems with less tracking error and faster execution speed. Code is available at https://github.com/sundw2014/C3M. △ Less

Submitted 25 November, 2020; originally announced November 2020.

Comments: Accepted to Conference on Robot Learning (CoRL) 2020

arXiv:2006.04570 [pdf]

Incorporating Image Gradients as Secondary Input Associated with Input Image to Improve the Performance of the CNN Model

Authors: Vijay Pandey, Shashi Bhushan Jha

Abstract: CNN is very popular neural network architecture in modern days. It is primarily most used tool for vision related task to extract the important features from the given image. Moreover, CNN works as a filter to extract the important features using convolutional operation in distinct layers. In existing CNN architectures, to train the network on given input, only single form of given input is fed to… ▽ More CNN is very popular neural network architecture in modern days. It is primarily most used tool for vision related task to extract the important features from the given image. Moreover, CNN works as a filter to extract the important features using convolutional operation in distinct layers. In existing CNN architectures, to train the network on given input, only single form of given input is fed to the network. In this paper, new architecture has been proposed where given input is passed in more than one form to the network simultaneously by sharing the layers with both forms of input. We incorporate image gradient as second form of the input associated with the original input image and allowing both inputs to flow in the network using same number of parameters to improve the performance of the model for better generalization. The results of the proposed CNN architecture, applying on diverse set of datasets such as MNIST, CIFAR10 and CIFAR100 show superior result compared to the benchmark CNN architecture considering inputs in single form. △ Less

Submitted 5 June, 2020; originally announced June 2020.

arXiv:1911.11378 [pdf, other]

Text2FaceGAN: Face Generation from Fine Grained Textual Descriptions

Authors: Osaid Rehman Nasir, Shailesh Kumar Jha, Manraj Singh Grover, Yi Yu, Ajit Kumar, Rajiv Ratn Shah

Abstract: Powerful generative adversarial networks (GAN) have been developed to automatically synthesize realistic images from text. However, most existing tasks are limited to generating simple images such as flowers from captions. In this work, we extend this problem to the less addressed domain of face generation from fine-grained textual descriptions of face, e.g., "A person has curly hair, oval face, a… ▽ More Powerful generative adversarial networks (GAN) have been developed to automatically synthesize realistic images from text. However, most existing tasks are limited to generating simple images such as flowers from captions. In this work, we extend this problem to the less addressed domain of face generation from fine-grained textual descriptions of face, e.g., "A person has curly hair, oval face, and mustache". We are motivated by the potential of automated face generation to impact and assist critical tasks such as criminal face reconstruction. Since current datasets for the task are either very small or do not contain captions, we generate captions for images in the CelebA dataset by creating an algorithm to automatically convert a list of attributes to a set of captions. We then model the highly multi-modal problem of text to face generation as learning the conditional distribution of faces (conditioned on text) in same latent space. We utilize the current state-of-the-art GAN (DC-GAN with GAN-CLS loss) for learning conditional multi-modality. The presence of more fine-grained details and variable length of the captions makes the problem easier for a user but more difficult to handle compared to the other text-to-image tasks. We flipped the labels for real and fake images and added noise in discriminator. Generated images for diverse textual descriptions show promising results. In the end, we show how the widely used inceptions score is not a good metric to evaluate the performance of generative models used for synthesizing faces from text. △ Less

Submitted 26 November, 2019; originally announced November 2019.

arXiv:1909.03900 [pdf, other]

Measurement, Characterization and Modeling of LoRa Technology in Multi-floor Buildings

Authors: Weitao Xu, Jun Young Kim, Walter Huang, Salil Kanhere, Sanjay Jha, Wen Hu

Abstract: In recent years, we have witnessed the rapid development of LoRa technology, together with extensive studies trying to understand its performance in various application settings. In contrast to measurements performed in large outdoor areas, limited number of attempts have been made to understand the characterization and performance of LoRa technology in indoor environments. In this paper, we prese… ▽ More In recent years, we have witnessed the rapid development of LoRa technology, together with extensive studies trying to understand its performance in various application settings. In contrast to measurements performed in large outdoor areas, limited number of attempts have been made to understand the characterization and performance of LoRa technology in indoor environments. In this paper, we present a comprehensive study of LoRa technology in multi-floor buildings. Specifically, we investigate the large-scale fading characteristic, temporal fading characteristic, coverage and energy consumption of LoRa technology in four different types of buildings. Moreover, we find that the energy consumption using different parameter settings can vary up to 145 times. These results indicate the importance of parameter selection and enabling LoRa adaptive data rate feature in energy-limited applications. We hope the results in this paper can help both academia and industry understand the performance of LoRa technology in multi-floor buildings to facilitate develo** practical indoor applications. △ Less

Submitted 9 September, 2019; originally announced September 2019.

Comments: 10 pages, 9 figures

arXiv:1907.01038 [pdf, other]

doi 10.1109/DSN-W.2018.00027

AVFI: Fault Injection for Autonomous Vehicles

Authors: Saurabh Jha, Subho S. Banerjee, James Cyriac, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer

Abstract: Autonomous vehicle (AV) technology is rapidly becoming a reality on U.S. roads, offering the promise of improvements in traffic management, safety, and the comfort and efficiency of vehicular travel. With this increasing popularity and ubiquitous deployment, resilience has become a critical requirement for public acceptance and adoption. Recent studies into the resilience of AVs have shown that th… ▽ More Autonomous vehicle (AV) technology is rapidly becoming a reality on U.S. roads, offering the promise of improvements in traffic management, safety, and the comfort and efficiency of vehicular travel. With this increasing popularity and ubiquitous deployment, resilience has become a critical requirement for public acceptance and adoption. Recent studies into the resilience of AVs have shown that though the AV systems are improving over time, they have not reached human levels of automation. Prior work in this area has studied the safety and resilience of individual components of the AV system (e.g., testing of neural networks powering the perception function). However, methods for holistic end-to-end resilience assessment of AV systems are still non-existent. △ Less

Submitted 1 July, 2019; originally announced July 2019.

Comments: Published in: 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)

arXiv:1902.11277 [pdf, other]

A Risk-Sensitive Finite-Time Reachability Approach for Safety of Stochastic Dynamic Systems

Authors: Margaret P. Chapman, Jonathan Lacotte, Aviv Tamar, Donggun Lee, Kevin M. Smith, Victoria Cheng, Jaime F. Fisac, Susmit Jha, Marco Pavone, Claire J. Tomlin

Abstract: A classic reachability problem for safety of dynamic systems is to compute the set of initial states from which the state trajectory is guaranteed to stay inside a given constraint set over a given time horizon. In this paper, we leverage existing theory of reachability analysis and risk measures to devise a risk-sensitive reachability approach for safety of stochastic dynamic systems under non-ad… ▽ More A classic reachability problem for safety of dynamic systems is to compute the set of initial states from which the state trajectory is guaranteed to stay inside a given constraint set over a given time horizon. In this paper, we leverage existing theory of reachability analysis and risk measures to devise a risk-sensitive reachability approach for safety of stochastic dynamic systems under non-adversarial disturbances over a finite time horizon. Specifically, we first introduce the notion of a risk-sensitive safe set as a set of initial states from which the risk of large constraint violations can be reduced to a required level via a control policy, where risk is quantified using the Conditional Value-at-Risk (CVaR) measure. Second, we show how the computation of a risk-sensitive safe set can be reduced to the solution to a Markov Decision Process (MDP), where cost is assessed according to CVaR. Third, leveraging this reduction, we devise a tractable algorithm to approximate a risk-sensitive safe set, and provide theoretical arguments about its correctness. Finally, we present a realistic example inspired from stormwater catchment design to demonstrate the utility of risk-sensitive reachability analysis. In particular, our approach allows a practitioner to tune the level of risk sensitivity from worst-case (which is typical for Hamilton-Jacobi reachability analysis) to risk-neutral (which is the case for stochastic reachability analysis). △ Less

Submitted 30 April, 2019; v1 submitted 28 February, 2019; originally announced February 2019.

arXiv:1809.06472 [pdf, other]

Ground vehicle odometry using a non-intrusive inertial speed sensor

Authors: Het Shah, Siddhant Haldar, Rohit Ner, Siddharth Jha, Debashish Chakravarty

Abstract: This paper describes the design and development of a non-intrusive inertial speed sensor that can be reliably used to replace a conventional optical or hall effect-based speedometer on any kind of ground vehicle. The design allows for simple assembly-disassembly from tyre rims. The sensor design and data flow are explained. Algorithms and filters for pre-processing and processing the data are deta… ▽ More This paper describes the design and development of a non-intrusive inertial speed sensor that can be reliably used to replace a conventional optical or hall effect-based speedometer on any kind of ground vehicle. The design allows for simple assembly-disassembly from tyre rims. The sensor design and data flow are explained. Algorithms and filters for pre-processing and processing the data are detailed. Comparison with a real optical encoder proves the accuracy of the proposed sensor. Finally, it is shown that factor graph-based localization is possible with the developed sensor. △ Less

Submitted 17 September, 2018; originally announced September 2018.

arXiv:1612.01476 [pdf, other]

Modeling and Control of an Autonomous Three Wheeled Mobile Robot with Front Steer

Authors: Ayush Pandey, Siddharth Jha, Debashish Chakravarty

Abstract: Modeling and control strategies for a design of an autonomous three wheeled mobile robot with front wheel steer is presented. Although, the three-wheel vehicle design with front wheel steer is common in automotive vehicles used often in public transport, but its advantages in navigation and localization of autonomous vehicles is seldom utilized. We present the system model for such a robotic vehic… ▽ More Modeling and control strategies for a design of an autonomous three wheeled mobile robot with front wheel steer is presented. Although, the three-wheel vehicle design with front wheel steer is common in automotive vehicles used often in public transport, but its advantages in navigation and localization of autonomous vehicles is seldom utilized. We present the system model for such a robotic vehicle. A PID controller for speed control is designed for the model obtained and has been implemented in a digital control framework. The trajectory control framework, which is a challenging task for such a three-wheeled robot has also been presented in the paper. The derived system model has been verified using experimental results obtained for the robot vehicle design. Controller performance and robustness issues have also been discussed briefly. △ Less

Submitted 5 December, 2016; originally announced December 2016.

Comments: IEEE International Conference on Robotic Computing 2017. (under review)

arXiv:1407.4937

doi 10.4204/EPTCS.157

Proceedings 3rd Workshop on Synthesis

Authors: Krishnendu Chatterjee, Rüdiger Ehlers, Susmit Jha

Abstract: The idea of synthesis, i.e., the process of automatically computing implementations from their specifications, has recently gained a lot of momentum in the contexts of software engineering and reactive system design. While it is widely believed that, due to complexity/undecidability issues, synthesis cannot completely replace manual engineering, it can assist the process of designing the intricate… ▽ More The idea of synthesis, i.e., the process of automatically computing implementations from their specifications, has recently gained a lot of momentum in the contexts of software engineering and reactive system design. While it is widely believed that, due to complexity/undecidability issues, synthesis cannot completely replace manual engineering, it can assist the process of designing the intricate pieces of code that most programmers find challenging, or help with orchestrating tasks in reactive environments. The SYNT workshop aims to bring together researchers interested in synthesis to discuss and present ongoing and mature work on all aspects of automated synthesis and its applications. The third iteration of the workshop took place in Vienna, Austria, and was co-located with the 26th International Conference on Computer Aided Verification, held in the context of the Vienna Summer of Logic in July 2014. The workshop included eight contributed talks and four invited talks. In addition, it featured a special session about the Syntax-Guided Synthesis Competition (SyGuS) and the SyntComp Synthesis competition. △ Less

Submitted 18 July, 2014; originally announced July 2014.

ACM Class: B.1.2; D.2.2; F.1.1; F.1.2; I.2.2

Journal ref: EPTCS 157, 2014

arXiv:1302.1920 [pdf, ps, other]

SWATI: Synthesizing Wordlengths Automatically Using Testing and Induction

Authors: Susmit Jha, Sanjit A. Seshia

Abstract: In this paper, we present an automated technique SWATI: Synthesizing Wordlengths Automatically Using Testing and Induction, which uses a combination of Nelder-Mead optimization based testing, and induction from examples to automatically synthesize optimal fixedpoint implementation of numerical routines. The design of numerical software is commonly done using floating-point arithmetic in design-env… ▽ More In this paper, we present an automated technique SWATI: Synthesizing Wordlengths Automatically Using Testing and Induction, which uses a combination of Nelder-Mead optimization based testing, and induction from examples to automatically synthesize optimal fixedpoint implementation of numerical routines. The design of numerical software is commonly done using floating-point arithmetic in design-environments such as Matlab. However, these designs are often implemented using fixed-point arithmetic for speed and efficiency reasons especially in embedded systems. The fixed-point implementation reduces implementation cost, provides better performance, and reduces power consumption. The conversion from floating-point designs to fixed-point code is subject to two opposing constraints: (i) the word-width of fixed-point types must be minimized, and (ii) the outputs of the fixed-point program must be accurate. In this paper, we propose a new solution to this problem. Our technique takes the floating-point program, specified accuracy and an implementation cost model and provides the fixed-point program with specified accuracy and optimal implementation cost. We demonstrate the effectiveness of our approach on a set of examples from the domain of automated control, robotics and digital signal processing. △ Less

Submitted 7 February, 2013; originally announced February 2013.

ACM Class: F.2.1; D.2.4; I.2.2

arXiv:1103.0800 [pdf, ps, other]

Synthesizing Switching Logic to Minimize Long-Run Cost

Authors: Susmit Jha, Sanjit A. Seshia, Ashish Tiwari

Abstract: Given a multi-modal dynamical system, optimal switching logic synthesis involves generating the conditions for switching between the system modes such that the resulting hybrid system satisfies a quantitative specification. We formalize and solve the problem of optimal switching logic synthesis for quantitative specifications over long run behavior. Each trajectory of the system, and each state of… ▽ More Given a multi-modal dynamical system, optimal switching logic synthesis involves generating the conditions for switching between the system modes such that the resulting hybrid system satisfies a quantitative specification. We formalize and solve the problem of optimal switching logic synthesis for quantitative specifications over long run behavior. Each trajectory of the system, and each state of the system, is associated with a cost. Our goal is to synthesize a system that minimizes this cost from each initial state. Our paper generalizes earlier work on synthesis for safety as safety specifications can be encoded as quantitative specifications. We present an approach for specifying quantitative measures using reward and penalty functions, and illustrate its effectiveness using several examples. We present an automated technique to synthesize switching logic for such quantitative measures. Our algorithm is based on reducing the synthesis problem to an unconstrained numerical optimization problem which can be solved by any off-the-shelf numerical optimization engines. We demonstrate the effectiveness of this approach with experimental results. △ Less

Submitted 5 May, 2011; v1 submitted 3 March, 2011; originally announced March 2011.

Comments: UC Berkeley Technical Report

Showing 1–29 of 29 results for author: Jha, S