Search | arXiv e-print repository

Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis

Authors: Vinotha R, Hepsiba D, L. D. Vijay Anand, Deepak John Reji

Abstract: Neural Text-to-speech (TTS) synthesis is a powerful technology that can generate speech using neural networks. One of the most remarkable features of TTS synthesis is its capability to produce speech in the voice of different speakers. This paper introduces voice cloning and speech synthesis https://pypi.org/project/voice-cloning/ an open-source python package for hel** speech disorders to commu… ▽ More Neural Text-to-speech (TTS) synthesis is a powerful technology that can generate speech using neural networks. One of the most remarkable features of TTS synthesis is its capability to produce speech in the voice of different speakers. This paper introduces voice cloning and speech synthesis https://pypi.org/project/voice-cloning/ an open-source python package for hel** speech disorders to communicate more effectively as well as for professionals seeking to integrate voice cloning or speech synthesis capabilities into their projects. This package aims to generate synthetic speech that sounds like the natural voice of an individual, but it does not replace the natural human voice. The architecture of the system comprises a speaker verification system, a synthesizer, a vocoder, and noise reduction. Speaker verification system trained on a varied set of speakers to achieve optimal generalization performance without relying on transcriptions. Synthesizer is trained using both audio and transcriptions that generate Mel spectrogram from a text and vocoder which converts the generated Mel Spectrogram into corresponding audio signal. Then the audio signal is processed by a noise reduction algorithm to eliminate unwanted noise and enhance speech clarity. The performance of synthesized speech from seen and unseen speakers are then evaluated using subjective and objective evaluation such as Mean Opinion Score (MOS), Gross Pitch Error (GPE), and Spectral distortion (SD). The model can create speech in distinct voices by including speaker characteristics that are chosen randomly. △ Less

Submitted 16 February, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

arXiv:2009.06833 [pdf, ps, other]

doi 10.4204/EPTCS.323.10

Compositional Models for Power Systems

Authors: John S. Nolan, Blake S. Pollard, Spencer Breiner, Dhananjay Anand, Eswaran Subrahmanian

Abstract: The problem of integrating multiple overlap** models and data is pervasive in engineering, though often implicit. We consider this issue of model management in the context of the electrical power grid as it transitions towards a modern 'Smart Grid.' We present a methodology for specifying, managing, and reasoning within multiple models of distributed energy resources (DERs), entities which produ… ▽ More The problem of integrating multiple overlap** models and data is pervasive in engineering, though often implicit. We consider this issue of model management in the context of the electrical power grid as it transitions towards a modern 'Smart Grid.' We present a methodology for specifying, managing, and reasoning within multiple models of distributed energy resources (DERs), entities which produce, consume, or store power, using categorical databases and symmetric monoidal categories. Considering the problem of distributing power on the grid in the presence of DERs, we show how to connect a generic problem specification with implementation-specific numerical solvers using the paradigm of categorical databases. △ Less

Submitted 14 September, 2020; originally announced September 2020.

Comments: In Proceedings ACT 2019, arXiv:2009.06334

ACM Class: D.2.12; H.2.1; J.2; J.6

Journal ref: EPTCS 323, 2020, pp. 149-160

arXiv:2008.03750 [pdf, other]

Switching Loss for Generalized Nucleus Detection in Histopathology

Authors: Deepak Anand, Gaurav Patel, Yaman Dang, Amit Sethi

Abstract: The accuracy of deep learning methods for two foundational tasks in medical image analysis -- detection and segmentation -- can suffer from class imbalance. We propose a `switching loss' function that adaptively shifts the emphasis between foreground and background classes. While the existing loss functions to address this problem were motivated by the classification task, the switching loss is ba… ▽ More The accuracy of deep learning methods for two foundational tasks in medical image analysis -- detection and segmentation -- can suffer from class imbalance. We propose a `switching loss' function that adaptively shifts the emphasis between foreground and background classes. While the existing loss functions to address this problem were motivated by the classification task, the switching loss is based on Dice loss, which is better suited for segmentation and detection. Furthermore, to get the most out the training samples, we adapt the loss with each mini-batch, unlike previous proposals that adapt once for the entire training set. A nucleus detector trained using the proposed loss function on a source dataset outperformed those trained using cross-entropy, Dice, or focal losses. Remarkably, without retraining on target datasets, our pre-trained nucleus detector also outperformed existing nucleus detectors that were trained on at least some of the images from the target datasets. To establish a broad utility of the proposed loss, we also confirmed that it led to more accurate ventricle segmentation in MRI as compared to the other loss functions. Our GPU-enabled pre-trained nucleus detection software is also ready to process whole slide images right out-of-the-box and is usably fast. △ Less

Submitted 9 August, 2020; originally announced August 2020.

arXiv:2006.13914 [pdf, ps, other]

A Reference Governor for Overshoot Mitigation of Tracking Control Systems

Authors: C. Freiheit, D. M. Anand, H. R. Ossareh

Abstract: This paper presents a novel reference governor scheme for overshoot mitigation in tracking control systems. Our proposed scheme, referred to as the Reference Governor with Dynamic Constraint (RG-DC), recasts the overshoot mitigation problem as a constraint management problem. The outcome of this reformulation is a dynamic Maximal Admissible Set (MAS), which varies in real-time as a function of the… ▽ More This paper presents a novel reference governor scheme for overshoot mitigation in tracking control systems. Our proposed scheme, referred to as the Reference Governor with Dynamic Constraint (RG-DC), recasts the overshoot mitigation problem as a constraint management problem. The outcome of this reformulation is a dynamic Maximal Admissible Set (MAS), which varies in real-time as a function of the reference signal and the tracking output. The RG-DC employs the dynamic MAS to modify the reference signal to mitigate or, if possible, prevent overshoot. We present several properties of the dynamic MAS and the algorithms required to compute it. We also investigate the stability and recursive feasibility of the RG-DC, and present an interesting property of RG-DC regarding its effect on the governed system's frequency response. Simulation results demonstrate the efficacy of the approach, and also highlight its limitations. This paper serves as an extension of our earlier paper on this topic. △ Less

Submitted 24 June, 2020; originally announced June 2020.

arXiv:2006.09464 [pdf, other]

doi 10.1109/BIBE50027.2020.00060

Visualization for Histopathology Images using Graph Convolutional Neural Networks

Authors: Mookund Sureka, Abhijeet Patil, Deepak Anand, Amit Sethi

Abstract: With the increase in the use of deep learning for computer-aided diagnosis in medical images, the criticism of the black-box nature of the deep learning models is also on the rise. The medical community needs interpretable models for both due diligence and advancing the understanding of disease and treatment mechanisms. In histology, in particular, while there is rich detail available at the cellu… ▽ More With the increase in the use of deep learning for computer-aided diagnosis in medical images, the criticism of the black-box nature of the deep learning models is also on the rise. The medical community needs interpretable models for both due diligence and advancing the understanding of disease and treatment mechanisms. In histology, in particular, while there is rich detail available at the cellular level and that of spatial relationships between cells, it is difficult to modify convolutional neural networks to point out the relevant visual features. We adopt an approach to model histology tissue as a graph of nuclei and develop a graph convolutional network framework based on attention mechanism and node occlusion for disease diagnosis. The proposed method highlights the relative contribution of each cell nucleus in the whole-slide image. Our visualization of such networks trained to distinguish between invasive and in-situ breast cancers, and Gleason 3 and 4 prostate cancers generate interpretable visual maps that correspond well with our understanding of the structures that are important to experts for their diagnosis. △ Less

Submitted 16 June, 2020; originally announced June 2020.

Comments: 5 pages, 3 Figures

arXiv:1908.08004 [pdf, other]

Pixel-wise Segmentation of Right Ventricle of Heart

Authors: Yaman Dang, Deepak Anand, Amit Sethi

Abstract: One of the first steps in the diagnosis of most cardiac diseases, such as pulmonary hypertension, coronary heart disease is the segmentation of ventricles from cardiac magnetic resonance (MRI) images. Manual segmentation of the right ventricle requires diligence and time, while its automated segmentation is challenging due to shape variations and illdefined borders. We propose a deep learning base… ▽ More One of the first steps in the diagnosis of most cardiac diseases, such as pulmonary hypertension, coronary heart disease is the segmentation of ventricles from cardiac magnetic resonance (MRI) images. Manual segmentation of the right ventricle requires diligence and time, while its automated segmentation is challenging due to shape variations and illdefined borders. We propose a deep learning based method for the accurate segmentation of right ventricle, which does not require post-processing and yet it achieves the state-of-the-art performance of 0.86 Dice coefficient and 6.73 mm Hausdorff distance on RVSC-MICCAI 2012 dataset. We use a novel adaptive cost function to counter extreme class-imbalance in the dataset. We present a comprehensive comparative study of loss functions, architectures, and ensembling techniques to build a principled approach for biomedical segmentation tasks. △ Less

Submitted 21 August, 2019; originally announced August 2019.

Comments: Accepted at IEEE TENCON 2019

arXiv:1908.05020 [pdf, other]

Histographs: Graphs in Histopathology

Authors: Shrey Gadiya, Deepak Anand, Amit Sethi

Abstract: Spatial arrangement of cells of various types, such as tumor infiltrating lymphocytes and the advancing edge of a tumor, are important features for detecting and characterizing cancers. However, convolutional neural networks (CNNs) do not explicitly extract intricate features of the spatial arrangements of the cells from histopathology images. In this work, we propose to classify cancers using gra… ▽ More Spatial arrangement of cells of various types, such as tumor infiltrating lymphocytes and the advancing edge of a tumor, are important features for detecting and characterizing cancers. However, convolutional neural networks (CNNs) do not explicitly extract intricate features of the spatial arrangements of the cells from histopathology images. In this work, we propose to classify cancers using graph convolutional networks (GCNs) by modeling a tissue section as a multi-attributed spatial graph of its constituent cells. Cells are detected using their nuclei in H&E stained tissue image, and each cell's appearance is captured as a multi-attributed high-dimensional vertex feature. The spatial relations between neighboring cells are captured as edge features based on their distances in a graph. We demonstrate the utility of this approach by obtaining classification accuracy that is competitive with CNNs, specifically, Inception-v3, on two tasks-cancerous versus non-cancerous and in situ versus invasive-on the BACH breast cancer dataset. △ Less

Submitted 14 August, 2019; originally announced August 2019.

Comments: 5 pages, 1 figure

arXiv:1905.08990 [pdf, other]

MIST: A Novel Training Strategy for Low-latency Scalable Neural Net Decoders

Authors: Kumar Yashashwi, Deepak Anand, Sibi Raj B Pillai, Prasanna Chaporkar, K Ganesh

Abstract: In this paper, we propose a low latency, robust and scalable neural net based decoder for convolutional and low-density parity-check (LPDC) coding schemes. The proposed decoders are demonstrated to have bit error rate (BER) and block error rate (BLER) performances at par with the state-of-the-art neural net based decoders while achieving more than 8 times higher decoding speed. The enhanced decodi… ▽ More In this paper, we propose a low latency, robust and scalable neural net based decoder for convolutional and low-density parity-check (LPDC) coding schemes. The proposed decoders are demonstrated to have bit error rate (BER) and block error rate (BLER) performances at par with the state-of-the-art neural net based decoders while achieving more than 8 times higher decoding speed. The enhanced decoding speed is due to the use of convolutional neural network (CNN) as opposed to recurrent neural network (RNN) used in the best known neural net based decoders. This contradicts existing doctrine that only RNN based decoders can provide a performance close to the optimal ones. The key ingredient to our approach is a novel Mixed-SNR Independent Samples based Training (MIST), which allows for training of CNN with only 1\% of possible datawords, even for block length as high as 1000. The proposed decoder is robust as, once trained, the same decoder can be used for a wide range of SNR values. Finally, in the presence of channel outages, the proposed decoders outperform the best known decoders, {\it viz.} unquantized Viterbi decoder for convolutional code, and belief propagation for LDPC. This gives the CNN decoder a significant advantage in 5G millimeter wave systems, where channel outages are prevalent. △ Less

Submitted 22 May, 2019; originally announced May 2019.

Showing 1–8 of 8 results for author: Anand, D