Search | arXiv e-print repository

Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis

Authors: Vinotha R, Hepsiba D, L. D. Vijay Anand, Deepak John Reji

Abstract: Neural Text-to-speech (TTS) synthesis is a powerful technology that can generate speech using neural networks. One of the most remarkable features of TTS synthesis is its capability to produce speech in the voice of different speakers. This paper introduces voice cloning and speech synthesis https://pypi.org/project/voice-cloning/ an open-source python package for hel** speech disorders to commu… ▽ More Neural Text-to-speech (TTS) synthesis is a powerful technology that can generate speech using neural networks. One of the most remarkable features of TTS synthesis is its capability to produce speech in the voice of different speakers. This paper introduces voice cloning and speech synthesis https://pypi.org/project/voice-cloning/ an open-source python package for hel** speech disorders to communicate more effectively as well as for professionals seeking to integrate voice cloning or speech synthesis capabilities into their projects. This package aims to generate synthetic speech that sounds like the natural voice of an individual, but it does not replace the natural human voice. The architecture of the system comprises a speaker verification system, a synthesizer, a vocoder, and noise reduction. Speaker verification system trained on a varied set of speakers to achieve optimal generalization performance without relying on transcriptions. Synthesizer is trained using both audio and transcriptions that generate Mel spectrogram from a text and vocoder which converts the generated Mel Spectrogram into corresponding audio signal. Then the audio signal is processed by a noise reduction algorithm to eliminate unwanted noise and enhance speech clarity. The performance of synthesized speech from seen and unseen speakers are then evaluated using subjective and objective evaluation such as Mean Opinion Score (MOS), Gross Pitch Error (GPE), and Spectral distortion (SD). The model can create speech in distinct voices by including speaker characteristics that are chosen randomly. △ Less

Submitted 16 February, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

arXiv:2310.18642 [pdf]

One-shot Localization and Segmentation of Medical Images with Foundation Models

Authors: Deepa Anand, Gurunath Reddy M, Vanika Singhal, Dattesh D. Shanbhag, Shriram KS, Uday Patil, Chitresh Bhushan, Kavitha Manickam, Dawei Gui, Rakesh Mullick, Avinash Gopal, Parminder Bhatia, Taha Kass-Hout

Abstract: Recent advances in Vision Transformers (ViT) and Stable Diffusion (SD) models with their ability to capture rich semantic features of the image have been used for image correspondence tasks on natural images. In this paper, we examine the ability of a variety of pre-trained ViT (DINO, DINOv2, SAM, CLIP) and SD models, trained exclusively on natural images, for solving the correspondence problems o… ▽ More Recent advances in Vision Transformers (ViT) and Stable Diffusion (SD) models with their ability to capture rich semantic features of the image have been used for image correspondence tasks on natural images. In this paper, we examine the ability of a variety of pre-trained ViT (DINO, DINOv2, SAM, CLIP) and SD models, trained exclusively on natural images, for solving the correspondence problems on medical images. While many works have made a case for in-domain training, we show that the models trained on natural images can offer good performance on medical images across different modalities (CT,MR,Ultrasound) sourced from various manufacturers, over multiple anatomical regions (brain, thorax, abdomen, extremities), and on wide variety of tasks. Further, we leverage the correspondence with respect to a template image to prompt a Segment Anything (SAM) model to arrive at single shot segmentation, achieving dice range of 62%-90% across tasks, using just one image as reference. We also show that our single-shot method outperforms the recently proposed few-shot segmentation method - UniverSeg (Dice range 47%-80%) on most of the semantic segmentation tasks(six out of seven) across medical imaging modalities. △ Less

Submitted 28 October, 2023; originally announced October 2023.

Comments: Accepted at NeurIPS 2023 R0-FoMo Workshop

arXiv:2208.12996 [pdf, other]

doi 10.1145/3556558.3558580

A Federated Learning-enabled Smart Street Light Monitoring Application: Benefits and Future Challenges

Authors: Diya Anand, Ioannis Mavromatis, Pietro Carnelli, Aftab Khan

Abstract: Data-enabled cities are recently accelerated and enhanced with automated learning for improved Smart Cities applications. In the context of an Internet of Things (IoT) ecosystem, the data communication is frequently costly, inefficient, not scalable and lacks security. Federated Learning (FL) plays a pivotal role in providing privacy-preserving and communication efficient Machine Learning (ML) fra… ▽ More Data-enabled cities are recently accelerated and enhanced with automated learning for improved Smart Cities applications. In the context of an Internet of Things (IoT) ecosystem, the data communication is frequently costly, inefficient, not scalable and lacks security. Federated Learning (FL) plays a pivotal role in providing privacy-preserving and communication efficient Machine Learning (ML) frameworks. In this paper we evaluate the feasibility of FL in the context of a Smart Cities Street Light Monitoring application. FL is evaluated against benchmarks of centralised and (fully) personalised machine learning techniques for the classification task of the lampposts operation. Incorporating FL in such a scenario shows minimal performance reduction in terms of the classification task, but huge improvements in the communication cost and the privacy preserving. These outcomes strengthen FL's viability and potential for IoT applications. △ Less

Submitted 27 August, 2022; originally announced August 2022.

Comments: Accepted for publication at ACM MobiCom 2022 - (MORSE Workshop)

arXiv:2012.05940 [pdf, other]

A Simplistic Machine Learning Approach to Contact Tracing

Authors: Carlos Gómez, Niamh Belton, Boi Quach, Jack Nicholls, Devanshu Anand

Abstract: This report is based on the modified NIST challenge, Too Close For Too Long, provided by the SFI Centre for Machine Learning (ML-Labs). The modified challenge excludes the time calculation (too long) aspect. By handcrafting features from phone instrumental data we develop two machine learning models, a GBM and an MLP, to estimate distance between two phones. Our method is able to outperform the le… ▽ More This report is based on the modified NIST challenge, Too Close For Too Long, provided by the SFI Centre for Machine Learning (ML-Labs). The modified challenge excludes the time calculation (too long) aspect. By handcrafting features from phone instrumental data we develop two machine learning models, a GBM and an MLP, to estimate distance between two phones. Our method is able to outperform the leading NIST challenge result by the Hong Kong University of Science and Technology (HKUST) by a significant margin. △ Less

Submitted 10 December, 2020; originally announced December 2020.

Comments: 6 pages, 7 tables, 2 figures

arXiv:2008.03750 [pdf, other]

Switching Loss for Generalized Nucleus Detection in Histopathology

Authors: Deepak Anand, Gaurav Patel, Yaman Dang, Amit Sethi

Abstract: The accuracy of deep learning methods for two foundational tasks in medical image analysis -- detection and segmentation -- can suffer from class imbalance. We propose a `switching loss' function that adaptively shifts the emphasis between foreground and background classes. While the existing loss functions to address this problem were motivated by the classification task, the switching loss is ba… ▽ More The accuracy of deep learning methods for two foundational tasks in medical image analysis -- detection and segmentation -- can suffer from class imbalance. We propose a `switching loss' function that adaptively shifts the emphasis between foreground and background classes. While the existing loss functions to address this problem were motivated by the classification task, the switching loss is based on Dice loss, which is better suited for segmentation and detection. Furthermore, to get the most out the training samples, we adapt the loss with each mini-batch, unlike previous proposals that adapt once for the entire training set. A nucleus detector trained using the proposed loss function on a source dataset outperformed those trained using cross-entropy, Dice, or focal losses. Remarkably, without retraining on target datasets, our pre-trained nucleus detector also outperformed existing nucleus detectors that were trained on at least some of the images from the target datasets. To establish a broad utility of the proposed loss, we also confirmed that it led to more accurate ventricle segmentation in MRI as compared to the other loss functions. Our GPU-enabled pre-trained nucleus detection software is also ready to process whole slide images right out-of-the-box and is usably fast. △ Less

Submitted 9 August, 2020; originally announced August 2020.

arXiv:2006.09464 [pdf, other]

doi 10.1109/BIBE50027.2020.00060

Visualization for Histopathology Images using Graph Convolutional Neural Networks

Authors: Mookund Sureka, Abhijeet Patil, Deepak Anand, Amit Sethi

Abstract: With the increase in the use of deep learning for computer-aided diagnosis in medical images, the criticism of the black-box nature of the deep learning models is also on the rise. The medical community needs interpretable models for both due diligence and advancing the understanding of disease and treatment mechanisms. In histology, in particular, while there is rich detail available at the cellu… ▽ More With the increase in the use of deep learning for computer-aided diagnosis in medical images, the criticism of the black-box nature of the deep learning models is also on the rise. The medical community needs interpretable models for both due diligence and advancing the understanding of disease and treatment mechanisms. In histology, in particular, while there is rich detail available at the cellular level and that of spatial relationships between cells, it is difficult to modify convolutional neural networks to point out the relevant visual features. We adopt an approach to model histology tissue as a graph of nuclei and develop a graph convolutional network framework based on attention mechanism and node occlusion for disease diagnosis. The proposed method highlights the relative contribution of each cell nucleus in the whole-slide image. Our visualization of such networks trained to distinguish between invasive and in-situ breast cancers, and Gleason 3 and 4 prostate cancers generate interpretable visual maps that correspond well with our understanding of the structures that are important to experts for their diagnosis. △ Less

Submitted 16 June, 2020; originally announced June 2020.

Comments: 5 pages, 3 Figures

arXiv:2003.08573 [pdf, other]

Uncertainty Estimation in Cancer Survival Prediction

Authors: Hrushikesh Loya, Pranav Poduval, Deepak Anand, Neeraj Kumar, Amit Sethi

Abstract: Survival models are used in various fields, such as the development of cancer treatment protocols. Although many statistical and machine learning models have been proposed to achieve accurate survival predictions, little attention has been paid to obtain well-calibrated uncertainty estimates associated with each prediction. The currently popular models are opaque and untrustworthy in that they oft… ▽ More Survival models are used in various fields, such as the development of cancer treatment protocols. Although many statistical and machine learning models have been proposed to achieve accurate survival predictions, little attention has been paid to obtain well-calibrated uncertainty estimates associated with each prediction. The currently popular models are opaque and untrustworthy in that they often express high confidence even on those test cases that are not similar to the training samples, and even when their predictions are wrong. We propose a Bayesian framework for survival models that not only gives more accurate survival predictions but also quantifies the survival uncertainty better. Our approach is a novel combination of variational inference for uncertainty estimation, neural multi-task logistic regression for estimating nonlinear and time-varying risk models, and an additional sparsity-inducing prior to work with high dimensional data. △ Less

Submitted 25 March, 2020; v1 submitted 19 March, 2020; originally announced March 2020.

Comments: 5 pages, Accepted at AI4AH Workshop at ICLR 2020

arXiv:2003.00823 [pdf, other]

Breast Cancer Histopathology Image Classification and Localization using Multiple Instance Learning

Authors: Abhijeet Patil, Dipesh Tamboli, Swati Meena, Deepak Anand, Amit Sethi

Abstract: Breast cancer has the highest mortality among cancers in women. Computer-aided pathology to analyze microscopic histopathology images for diagnosis with an increasing number of breast cancer patients can bring the cost and delays of diagnosis down. Deep learning in histopathology has attracted attention over the last decade of achieving state-of-the-art performance in classification and localizati… ▽ More Breast cancer has the highest mortality among cancers in women. Computer-aided pathology to analyze microscopic histopathology images for diagnosis with an increasing number of breast cancer patients can bring the cost and delays of diagnosis down. Deep learning in histopathology has attracted attention over the last decade of achieving state-of-the-art performance in classification and localization tasks. The convolutional neural network, a deep learning framework, provides remarkable results in tissue images analysis, but lacks in providing interpretation and reasoning behind the decisions. We aim to provide a better interpretation of classification results by providing localization on microscopic histopathology images. We frame the image classification problem as weakly supervised multiple instance learning problem where an image is collection of patches i.e. instances. Attention-based multiple instance learning (A-MIL) learns attention on the patches from the image to localize the malignant and normal regions in an image and use them to classify the image. We present classification and localization results on two publicly available BreakHIS and BACH dataset. The classification and visualization results are compared with other recent techniques. The proposed method achieves better localization results without compromising classification accuracy. △ Less

Submitted 16 February, 2020; originally announced March 2020.

Comments: Accepted in 2019 5th IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE) and Awarded as best paper

arXiv:1908.08004 [pdf, other]

Pixel-wise Segmentation of Right Ventricle of Heart

Authors: Yaman Dang, Deepak Anand, Amit Sethi

Abstract: One of the first steps in the diagnosis of most cardiac diseases, such as pulmonary hypertension, coronary heart disease is the segmentation of ventricles from cardiac magnetic resonance (MRI) images. Manual segmentation of the right ventricle requires diligence and time, while its automated segmentation is challenging due to shape variations and illdefined borders. We propose a deep learning base… ▽ More One of the first steps in the diagnosis of most cardiac diseases, such as pulmonary hypertension, coronary heart disease is the segmentation of ventricles from cardiac magnetic resonance (MRI) images. Manual segmentation of the right ventricle requires diligence and time, while its automated segmentation is challenging due to shape variations and illdefined borders. We propose a deep learning based method for the accurate segmentation of right ventricle, which does not require post-processing and yet it achieves the state-of-the-art performance of 0.86 Dice coefficient and 6.73 mm Hausdorff distance on RVSC-MICCAI 2012 dataset. We use a novel adaptive cost function to counter extreme class-imbalance in the dataset. We present a comprehensive comparative study of loss functions, architectures, and ensembling techniques to build a principled approach for biomedical segmentation tasks. △ Less

Submitted 21 August, 2019; originally announced August 2019.

Comments: Accepted at IEEE TENCON 2019

arXiv:1908.05020 [pdf, other]

Histographs: Graphs in Histopathology

Authors: Shrey Gadiya, Deepak Anand, Amit Sethi

Abstract: Spatial arrangement of cells of various types, such as tumor infiltrating lymphocytes and the advancing edge of a tumor, are important features for detecting and characterizing cancers. However, convolutional neural networks (CNNs) do not explicitly extract intricate features of the spatial arrangements of the cells from histopathology images. In this work, we propose to classify cancers using gra… ▽ More Spatial arrangement of cells of various types, such as tumor infiltrating lymphocytes and the advancing edge of a tumor, are important features for detecting and characterizing cancers. However, convolutional neural networks (CNNs) do not explicitly extract intricate features of the spatial arrangements of the cells from histopathology images. In this work, we propose to classify cancers using graph convolutional networks (GCNs) by modeling a tissue section as a multi-attributed spatial graph of its constituent cells. Cells are detected using their nuclei in H&E stained tissue image, and each cell's appearance is captured as a multi-attributed high-dimensional vertex feature. The spatial relations between neighboring cells are captured as edge features based on their distances in a graph. We demonstrate the utility of this approach by obtaining classification accuracy that is competitive with CNNs, specifically, Inception-v3, on two tasks-cancerous versus non-cancerous and in situ versus invasive-on the BACH breast cancer dataset. △ Less

Submitted 14 August, 2019; originally announced August 2019.

Comments: 5 pages, 1 figure

arXiv:1906.06968 [pdf]

Scrubbing Sensitive PHI Data from Medical Records made Easy by SpaCy -- A Scalable Model Implementation Comparisons

Authors: Rashmi Jain, Dinah Samuel Anand, Vijayalakshmi Janakiraman

Abstract: De-identification of clinical records is an extremely important process which enables the use of the wealth of information present in them. There are a lot of techniques available for this but none of the method implementation has evaluated the scalability, which is an important benchmark. We evaluated numerous deep learning techniques such as BiLSTM-CNN, IDCNN, CRF, BiLSTM-CRF, SpaCy, etc. on bot… ▽ More De-identification of clinical records is an extremely important process which enables the use of the wealth of information present in them. There are a lot of techniques available for this but none of the method implementation has evaluated the scalability, which is an important benchmark. We evaluated numerous deep learning techniques such as BiLSTM-CNN, IDCNN, CRF, BiLSTM-CRF, SpaCy, etc. on both the performance and efficiency. We propose that the SpaCy model implementation for scrubbing sensitive PHI data from medical records is both well performing and extremely efficient compared to other published models. △ Less

Submitted 17 June, 2019; originally announced June 2019.

Comments: 9 Pages, 7 Figures, 2 Tables

ACM Class: I.2.7; I.5.4

arXiv:1905.08990 [pdf, other]

MIST: A Novel Training Strategy for Low-latency Scalable Neural Net Decoders

Authors: Kumar Yashashwi, Deepak Anand, Sibi Raj B Pillai, Prasanna Chaporkar, K Ganesh

Abstract: In this paper, we propose a low latency, robust and scalable neural net based decoder for convolutional and low-density parity-check (LPDC) coding schemes. The proposed decoders are demonstrated to have bit error rate (BER) and block error rate (BLER) performances at par with the state-of-the-art neural net based decoders while achieving more than 8 times higher decoding speed. The enhanced decodi… ▽ More In this paper, we propose a low latency, robust and scalable neural net based decoder for convolutional and low-density parity-check (LPDC) coding schemes. The proposed decoders are demonstrated to have bit error rate (BER) and block error rate (BLER) performances at par with the state-of-the-art neural net based decoders while achieving more than 8 times higher decoding speed. The enhanced decoding speed is due to the use of convolutional neural network (CNN) as opposed to recurrent neural network (RNN) used in the best known neural net based decoders. This contradicts existing doctrine that only RNN based decoders can provide a performance close to the optimal ones. The key ingredient to our approach is a novel Mixed-SNR Independent Samples based Training (MIST), which allows for training of CNN with only 1\% of possible datawords, even for block length as high as 1000. The proposed decoder is robust as, once trained, the same decoder can be used for a wide range of SNR values. Finally, in the presence of channel outages, the proposed decoders outperform the best known decoders, {\it viz.} unquantized Viterbi decoder for convolutional code, and belief propagation for LDPC. This gives the CNN decoder a significant advantage in 5G millimeter wave systems, where channel outages are prevalent. △ Less

Submitted 22 May, 2019; originally announced May 2019.

arXiv:1901.03088 [pdf, other]

Fast GPU-Enabled Color Normalization for Digital Pathology

Authors: Goutham Ramakrishnan, Deepak Anand, Amit Sethi

Abstract: Normalizing unwanted color variations due to differences in staining processes and scanner responses has been shown to aid machine learning in computational pathology. Of the several popular techniques for color normalization, structure preserving color normalization (SPCN) is well-motivated, convincingly tested, and published with its code base. However, SPCN makes occasional errors in color basi… ▽ More Normalizing unwanted color variations due to differences in staining processes and scanner responses has been shown to aid machine learning in computational pathology. Of the several popular techniques for color normalization, structure preserving color normalization (SPCN) is well-motivated, convincingly tested, and published with its code base. However, SPCN makes occasional errors in color basis estimation leading to artifacts such as swap** the color basis vectors between stains or giving a colored tinge to the background with no tissue. We made several algorithmic improvements to remove these artifacts. Additionally, the original SPCN code is not readily usable on gigapixel whole slide images (WSIs) due to long run times, use of proprietary software platform and libraries, and its inability to automatically handle WSIs. We completely rewrote the software such that it can automatically handle images of any size in popular WSI formats. Our software utilizes GPU-acceleration and open-source libraries that are becoming ubiquitous with the advent of deep learning. We also made several other small improvements and achieved a multifold overall speedup on gigapixel images. Our algorithm and software is usable right out-of-the-box by the computational pathology community. △ Less

Submitted 10 January, 2019; originally announced January 2019.

arXiv:1811.00052 [pdf, other]

Some New Layer Architectures for Graph CNN

Authors: Shrey Gadiya, Deepak Anand, Amit Sethi

Abstract: While convolutional neural networks (CNNs) have recently made great strides in supervised classification of data structured on a grid (e.g. images composed of pixel grids), in several interesting datasets, the relations between features can be better represented as a general graph instead of a regular grid. Although recent algorithms that adapt CNNs to graphs have shown promising results, they mos… ▽ More While convolutional neural networks (CNNs) have recently made great strides in supervised classification of data structured on a grid (e.g. images composed of pixel grids), in several interesting datasets, the relations between features can be better represented as a general graph instead of a regular grid. Although recent algorithms that adapt CNNs to graphs have shown promising results, they mostly neglect learning explicit operations for edge features while focusing on vertex features alone. We propose new formulations for convolutional, pooling, and fully connected layers for neural networks that make more comprehensive use of the information available in multi-dimensional graphs. Using these layers led to an improvement in classification accuracy over the state-of-the-art methods on benchmark graph datasets. △ Less

Submitted 31 October, 2018; originally announced November 2018.

Comments: 5 pages, 1 figure, submitted to ICASSP 2019 Special Session on Learning Methods in Complex and Hypercomplex Domains, Brighton, United Kingdom, May 12-17, 2019

arXiv:1802.08080 [pdf, other]

Classification of Breast Cancer Histology using Deep Learning

Authors: Aditya Golatkar, Deepak Anand, Amit Sethi

Abstract: Breast Cancer is a major cause of death worldwide among women. Hematoxylin and Eosin (H&E) stained breast tissue samples from biopsies are observed under microscopes for the primary diagnosis of breast cancer. In this paper, we propose a deep learning-based method for classification of H&E stained breast tissue images released for BACH challenge 2018 by fine-tuning Inception-v3 convolutional neura… ▽ More Breast Cancer is a major cause of death worldwide among women. Hematoxylin and Eosin (H&E) stained breast tissue samples from biopsies are observed under microscopes for the primary diagnosis of breast cancer. In this paper, we propose a deep learning-based method for classification of H&E stained breast tissue images released for BACH challenge 2018 by fine-tuning Inception-v3 convolutional neural network (CNN) proposed by Szegedy et al. These images are to be classified into four classes namely, i) normal tissue, ii) benign tumor, iii) in-situ carcinoma and iv) invasive carcinoma. Our strategy is to extract patches based on nuclei density instead of random or grid sampling, along with rejection of patches that are not rich in nuclei (non-epithelial) regions for training and testing. Every patch (nuclei-dense region) in an image is classified in one of the four above mentioned categories. The class of the entire image is determined using majority voting over the nuclear classes. We obtained an average four class accuracy of 85% and an average two class (non-cancer vs. carcinoma) accuracy of 93%, which improves upon a previous benchmark by Araujo et al. △ Less

Submitted 25 July, 2018; v1 submitted 22 February, 2018; originally announced February 2018.

Comments: 8 pages. Published at ICIAR 2018, Portugal

Showing 1–15 of 15 results for author: Anand, D