Search | arXiv e-print repository

Neural Machine Translation For Low Resource Languages

Authors: Vakul Goyle, Parvathy Krishnaswamy, Kannan Girija Ravikumar, Utsa Chattopadhyay, Kartikay Goyle

Abstract: Neural Machine translation is a challenging task due to the inherent complex nature and the fluidity that natural languages bring. Nonetheless, in recent years, it has achieved state-of-the-art performance in several language pairs. Although, a lot of traction can be seen in the areas of multilingual neural machine translation (MNMT) in the recent years, there are no comprehensive survey done to i… ▽ More Neural Machine translation is a challenging task due to the inherent complex nature and the fluidity that natural languages bring. Nonetheless, in recent years, it has achieved state-of-the-art performance in several language pairs. Although, a lot of traction can be seen in the areas of multilingual neural machine translation (MNMT) in the recent years, there are no comprehensive survey done to identify what approaches work well. The goal of this paper is to investigate the realm of low resource languages and build a Neural Machine Translation model to achieve state-of-the-art results. The paper looks to build upon the mBART language model and explore strategies to augment it with various NLP and Deep Learning techniques like back translation and transfer learning. This implementation tries to unpack the architecture of the NMT application and determine the different components which offers us opportunities to amend the said application within the purview of the low resource languages problem space. △ Less

Submitted 17 April, 2023; v1 submitted 16 April, 2023; originally announced April 2023.

arXiv:2302.07549 [pdf, other]

Deep Offline Reinforcement Learning for Real-world Treatment Optimization Applications

Authors: Milashini Nambiar, Supriyo Ghosh, Priscilla Ong, Yu En Chan, Yong Mong Bee, Pavitra Krishnaswamy

Abstract: There is increasing interest in data-driven approaches for recommending optimal treatment strategies in many chronic disease management and critical care applications. Reinforcement learning methods are well-suited to this sequential decision-making problem, but must be trained and evaluated exclusively on retrospective medical record datasets as direct online exploration is unsafe and infeasible.… ▽ More There is increasing interest in data-driven approaches for recommending optimal treatment strategies in many chronic disease management and critical care applications. Reinforcement learning methods are well-suited to this sequential decision-making problem, but must be trained and evaluated exclusively on retrospective medical record datasets as direct online exploration is unsafe and infeasible. Despite this requirement, the vast majority of treatment optimization studies use off-policy RL methods (e.g., Double Deep Q Networks (DDQN) or its variants) that are known to perform poorly in purely offline settings. Recent advances in offline RL, such as Conservative Q-Learning (CQL), offer a suitable alternative. But there remain challenges in adapting these approaches to real-world applications where suboptimal examples dominate the retrospective dataset and strict safety constraints need to be satisfied. In this work, we introduce a practical and theoretically grounded transition sampling approach to address action imbalance during offline RL training. We perform extensive experiments on two real-world tasks for diabetes and sepsis treatment optimization to compare performance of the proposed approach against prominent off-policy and offline RL baselines (DDQN and CQL). Across a range of principled and clinically relevant metrics, we show that our proposed approach enables substantial improvements in expected health outcomes and in accordance with relevant practice and safety guidelines. △ Less

Submitted 13 June, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

arXiv:2209.05424 [pdf, other]

Towards More Efficient Data Valuation in Healthcare Federated Learning using Ensembling

Authors: Sourav Kumar, A. Lakshminarayanan, Ken Chang, Feri Guretno, Ivan Ho Mien, Jayashree Kalpathy-Cramer, Pavitra Krishnaswamy, Praveer Singh

Abstract: Federated Learning (FL) wherein multiple institutions collaboratively train a machine learning model without sharing data is becoming popular. Participating institutions might not contribute equally, some contribute more data, some better quality data or some more diverse data. To fairly rank the contribution of different institutions, Shapley value (SV) has emerged as the method of choice. Exact… ▽ More Federated Learning (FL) wherein multiple institutions collaboratively train a machine learning model without sharing data is becoming popular. Participating institutions might not contribute equally, some contribute more data, some better quality data or some more diverse data. To fairly rank the contribution of different institutions, Shapley value (SV) has emerged as the method of choice. Exact SV computation is impossibly expensive, especially when there are hundreds of contributors. Existing SV computation techniques use approximations. However, in healthcare where the number of contributing institutions are likely not of a colossal scale, computing exact SVs is still exorbitantly expensive, but not impossible. For such settings, we propose an efficient SV computation technique called SaFE (Shapley Value for Federated Learning using Ensembling). We empirically show that SaFE computes values that are close to exact SVs, and that it performs better than current SV approximations. This is particularly relevant in medical imaging setting where widespread heterogeneity across institutions is rampant and fast accurate data valuation is required to determine the contribution of each participant in multi-institutional collaborative learning. △ Less

Submitted 12 September, 2022; originally announced September 2022.

Comments: 3rd MICCAI Workshop on Distributed, Collaborative and Federated Learning

arXiv:2209.01858 [pdf, other]

Consistency-Based Semi-supervised Evidential Active Learning for Diagnostic Radiograph Classification

Authors: Shafa Balaram, Cuong M. Nguyen, Ashraf Kassim, Pavitra Krishnaswamy

Abstract: Deep learning approaches achieve state-of-the-art performance for classifying radiology images, but rely on large labelled datasets that require resource-intensive annotation by specialists. Both semi-supervised learning and active learning can be utilised to mitigate this annotation burden. However, there is limited work on combining the advantages of semi-supervised and active learning approache… ▽ More Deep learning approaches achieve state-of-the-art performance for classifying radiology images, but rely on large labelled datasets that require resource-intensive annotation by specialists. Both semi-supervised learning and active learning can be utilised to mitigate this annotation burden. However, there is limited work on combining the advantages of semi-supervised and active learning approaches for multi-label medical image classification. Here, we introduce a novel Consistency-based Semi-supervised Evidential Active Learning framework (CSEAL). Specifically, we leverage predictive uncertainty based on theories of evidence and subjective logic to develop an end-to-end integrated approach that combines consistency-based semi-supervised learning with uncertainty-based active learning. We apply our approach to enhance four leading consistency-based semi-supervised learning methods: Pseudo-labelling, Virtual Adversarial Training, Mean Teacher and NoTeacher. Extensive evaluations on multi-label Chest X-Ray classification tasks demonstrate that CSEAL achieves substantive performance improvements over two leading semi-supervised active learning baselines. Further, a class-wise breakdown of results shows that our approach can substantially improve accuracy on rarer abnormalities with fewer labelled samples. △ Less

Submitted 5 September, 2022; originally announced September 2022.

Comments: Preprint submitted to MICCAI. Accepted in May 2022

arXiv:2206.02428 [pdf, other]

Domain-specific Language Pre-training for Dialogue Comprehension on Clinical Inquiry-Answering Conversations

Authors: Zhengyuan Liu, Pavitra Krishnaswamy, Nancy F. Chen

Abstract: There is growing interest in the automated extraction of relevant information from clinical dialogues. However, it is difficult to collect and construct large annotated resources for clinical dialogue tasks. Recent developments in natural language processing suggest that large-scale pre-trained language backbones could be leveraged for such machine comprehension and information extraction tasks. Y… ▽ More There is growing interest in the automated extraction of relevant information from clinical dialogues. However, it is difficult to collect and construct large annotated resources for clinical dialogue tasks. Recent developments in natural language processing suggest that large-scale pre-trained language backbones could be leveraged for such machine comprehension and information extraction tasks. Yet, due to the gap between pre-training and downstream clinical domains, it remains challenging to exploit the generic backbones for domain-specific applications. Therefore, in this work, we propose a domain-specific language pre-training, to improve performance on downstream tasks like dialogue comprehension. Aside from the common token-level masking pre-training method, according to the nature of human conversations and interactive flow of multi-topic inquiry-answering dialogues, we further propose sample generation strategies with speaker and utterance manipulation. The conversational pre-training guides the language backbone to reconstruct the utterances coherently based on the remaining context, thus bridging the gap between general and specific domains. Experiments are conducted on a clinical conversation dataset for symptom checking, where nurses inquire and discuss symptom information with patients. We empirically show that the neural model with our proposed approach brings improvement in the dialogue comprehension task, and can achieve favorable results in the low resource training scenario. △ Less

Submitted 6 June, 2022; originally announced June 2022.

Comments: W3PHIAI-2022

arXiv:2108.04423 [pdf, other]

doi 10.1016/j.media.2021.102148

Semi-supervised classification of radiology images with NoTeacher: A Teacher that is not Mean

Authors: Balagopal Unnikrishnan, Cuong Nguyen, Shafa Balaram, Chao Li, Chuan Sheng Foo, Pavitra Krishnaswamy

Abstract: Deep learning models achieve strong performance for radiology image classification, but their practical application is bottlenecked by the need for large labeled training datasets. Semi-supervised learning (SSL) approaches leverage small labeled datasets alongside larger unlabeled datasets and offer potential for reducing labeling cost. In this work, we introduce NoTeacher, a novel consistency-bas… ▽ More Deep learning models achieve strong performance for radiology image classification, but their practical application is bottlenecked by the need for large labeled training datasets. Semi-supervised learning (SSL) approaches leverage small labeled datasets alongside larger unlabeled datasets and offer potential for reducing labeling cost. In this work, we introduce NoTeacher, a novel consistency-based SSL framework which incorporates probabilistic graphical models. Unlike Mean Teacher which maintains a teacher network updated via a temporal ensemble, NoTeacher employs two independent networks, thereby eliminating the need for a teacher network. We demonstrate how NoTeacher can be customized to handle a range of challenges in radiology image classification. Specifically, we describe adaptations for scenarios with 2D and 3D inputs, uni and multi-label classification, and class distribution mismatch between labeled and unlabeled portions of the training data. In realistic empirical evaluations on three public benchmark datasets spanning the workhorse modalities of radiology (X-Ray, CT, MRI), we show that NoTeacher achieves over 90-95% of the fully supervised AUROC with less than 5-15% labeling budget. Further, NoTeacher outperforms established SSL methods with minimal hyperparameter tuning, and has implications as a principled and practical option for semisupervised learning in radiology applications. △ Less

Submitted 9 August, 2021; originally announced August 2021.

Comments: Preprint submitted to Medical Image Analysis. Accepted in June 2021

MSC Class: 41A05; 41A10; 65D05; 65D17

arXiv:2008.05571 [pdf, other]

Self-Path: Self-supervision for Classification of Pathology Images with Limited Annotations

Authors: Navid Alemi Koohbanani, Balagopal Unnikrishnan, Syed Ali Khurram, Pavitra Krishnaswamy, Nasir Rajpoot

Abstract: While high-resolution pathology images lend themselves well to `data hungry' deep learning algorithms, obtaining exhaustive annotations on these images is a major challenge. In this paper, we propose a self-supervised CNN approach to leverage unlabeled data for learning generalizable and domain invariant representations in pathology images. The proposed approach, which we term as Self-Path, is a m… ▽ More While high-resolution pathology images lend themselves well to `data hungry' deep learning algorithms, obtaining exhaustive annotations on these images is a major challenge. In this paper, we propose a self-supervised CNN approach to leverage unlabeled data for learning generalizable and domain invariant representations in pathology images. The proposed approach, which we term as Self-Path, is a multi-task learning approach where the main task is tissue classification and pretext tasks are a variety of self-supervised tasks with labels inherent to the input data. We introduce novel domain specific self-supervision tasks that leverage contextual, multi-resolution and semantic features in pathology images for semi-supervised learning and domain adaptation. We investigate the effectiveness of Self-Path on 3 different pathology datasets. Our results show that Self-Path with the domain-specific pretext tasks achieves state-of-the-art performance for semi-supervised learning when small amounts of labeled data are available. Further, we show that Self-Path improves domain adaptation for classification of histology image patches when there is no labeled data available for the target domain. This approach can potentially be employed for other applications in computational pathology, where annotation budget is often limited or large amount of unlabeled image data is available. △ Less

Submitted 12 August, 2020; originally announced August 2020.

arXiv:1910.07150 [pdf, other]

Joint Learning of Word and Label Embeddings for Sequence Labelling in Spoken Language Understanding

Authors: Jiewen Wu, Luis Fernando D'Haro, Nancy F. Chen, Pavitra Krishnaswamy, Rafael E. Banchs

Abstract: We propose an architecture to jointly learn word and label embeddings for slot filling in spoken language understanding. The proposed approach encodes labels using a combination of word embeddings and straightforward word-label association from the training data. Compared to the state-of-the-art methods, our approach does not require label embeddings as part of the input and therefore lends itself… ▽ More We propose an architecture to jointly learn word and label embeddings for slot filling in spoken language understanding. The proposed approach encodes labels using a combination of word embeddings and straightforward word-label association from the training data. Compared to the state-of-the-art methods, our approach does not require label embeddings as part of the input and therefore lends itself nicely to a wide range of model architectures. In addition, our architecture computes contextual distances between words and labels to avoid adding contextual windows, thus reducing memory footprint. We validate the approach on established spoken dialogue datasets and show that it can achieve state-of-the-art performance with much fewer trainable parameters. △ Less

Submitted 15 October, 2019; originally announced October 2019.

Comments: Accepted for publication at ASRU 2019

arXiv:1903.03530 [pdf, other]

Fast Prototy** a Dialogue Comprehension System for Nurse-Patient Conversations on Symptom Monitoring

Authors: Zhengyuan Liu, Hazel Lim, Nur Farah Ain Binte Suhaimi, Shao Chuen Tong, Sharon Ong, Angela Ng, Sheldon Lee, Michael R. Macdonald, Savitha Ramasamy, Pavitra Krishnaswamy, Wai Leng Chow, Nancy F. Chen

Abstract: Data for human-human spoken dialogues for research and development are currently very limited in quantity, variety, and sources; such data are even scarcer in healthcare. In this work, we investigate fast prototy** of a dialogue comprehension system by leveraging on minimal nurse-to-patient conversations. We propose a framework inspired by nurse-initiated clinical symptom monitoring conversation… ▽ More Data for human-human spoken dialogues for research and development are currently very limited in quantity, variety, and sources; such data are even scarcer in healthcare. In this work, we investigate fast prototy** of a dialogue comprehension system by leveraging on minimal nurse-to-patient conversations. We propose a framework inspired by nurse-initiated clinical symptom monitoring conversations to construct a simulated human-human dialogue dataset, embodying linguistic characteristics of spoken interactions like thinking aloud, self-contradiction, and topic drift. We then adopt an established bidirectional attention pointer network on this simulated dataset, achieving more than 80% F1 score on a held-out test set from real-world nurse-to-patient conversations. The ability to automatically comprehend conversations in the healthcare domain by exploiting only limited data has implications for improving clinical workflows through red flag symptom detection and triaging capabilities. We demonstrate the feasibility for efficient and effective extraction, retrieval and comprehension of symptom checking information discussed in multi-turn human-human spoken conversations. △ Less

Submitted 5 April, 2019; v1 submitted 8 March, 2019; originally announced March 2019.

Comments: 8 pages. To appear in NAACL 2019

arXiv:1901.10074 [pdf, other]

CaRENets: Compact and Resource-Efficient CNN for Homomorphic Inference on Encrypted Medical Images

Authors: ** Chao, Ahmad Al Badawi, Balagopal Unnikrishnan, Jie Lin, Chan Fook Mun, James M. Brown, J. Peter Campbell, Michael Chiang, Jayashree Kalpathy-Cramer, Vijay Ramaseshan Chandrasekhar, Pavitra Krishnaswamy, Khin Mi Mi Aung

Abstract: Convolutional neural networks (CNNs) have enabled significant performance leaps in medical image classification tasks. However, translating neural network models for clinical applications remains challenging due to data privacy issues. Fully Homomorphic Encryption (FHE) has the potential to address this challenge as it enables the use of CNNs on encrypted images. However, current HE technology pos… ▽ More Convolutional neural networks (CNNs) have enabled significant performance leaps in medical image classification tasks. However, translating neural network models for clinical applications remains challenging due to data privacy issues. Fully Homomorphic Encryption (FHE) has the potential to address this challenge as it enables the use of CNNs on encrypted images. However, current HE technology poses immense computational and memory overheads, particularly for high-resolution images such as those seen in the clinical context. We present CaRENets: Compact and Resource-Efficient CNNs for high performance and resource-efficient inference on high-resolution encrypted images in practical applications. At the core, CaRENets comprises a new FHE compact packing scheme that is tightly integrated with CNN functions. CaRENets offers dual advantages of memory efficiency (due to compact packing of images and CNN activations) and inference speed (due to the reduction in the number of ciphertexts created and the associated mathematical operations) over standard interleaved packing schemes. We apply CaRENets to perform homomorphic abnormality detection with 80-bit security level in two clinical conditions - Retinopathy of Prematurity (ROP) and Diabetic Retinopathy (DR). The ROP dataset comprises 96 x 96 grayscale images, while the DR dataset comprises 256 x 256 RGB images. We demonstrate over 45x improvement in memory efficiency and 4-5x speedup in inference over the interleaved packing schemes. As our approach enables memory-efficient low-latency HE inference without imposing additional communication burden, it has implications for practical and secure deep learning inference in clinical imaging. △ Less

Submitted 28 January, 2019; originally announced January 2019.

arXiv:1812.07832 [pdf, other]

Semi-Supervised Deep Learning for Abnormality Classification in Retinal Images

Authors: Bruno Lecouat, Ken Chang, Chuan-Sheng Foo, Balagopal Unnikrishnan, James M. Brown, Houssam Zenati, Andrew Beers, Vijay Chandrasekhar, Jayashree Kalpathy-Cramer, Pavitra Krishnaswamy

Abstract: Supervised deep learning algorithms have enabled significant performance gains in medical image classification tasks. But these methods rely on large labeled datasets that require resource-intensive expert annotation. Semi-supervised generative adversarial network (GAN) approaches offer a means to learn from limited labeled data alongside larger unlabeled datasets, but have not been applied to dis… ▽ More Supervised deep learning algorithms have enabled significant performance gains in medical image classification tasks. But these methods rely on large labeled datasets that require resource-intensive expert annotation. Semi-supervised generative adversarial network (GAN) approaches offer a means to learn from limited labeled data alongside larger unlabeled datasets, but have not been applied to discern fine-scale, sparse or localized features that define medical abnormalities. To overcome these limitations, we propose a patch-based semi-supervised learning approach and evaluate performance on classification of diabetic retinopathy from funduscopic images. Our semi-supervised approach achieves high AUC with just 10-20 labeled training images, and outperforms the supervised baselines by upto 15% when less than 30% of the training dataset is labeled. Further, our method implicitly enables interpretation of the SSL predictions. As this approach enables good accuracy, resolution and interpretability with lower annotation burden, it sets the pathway for scalable applications of deep learning in clinical imaging. △ Less

Submitted 19 December, 2018; originally announced December 2018.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

Report number: ML4H/2018/227

arXiv:1803.02043 [pdf, other]

Online Deep Learning: Growing RBM on the fly

Authors: Savitha Ramasamy, Kanagasabai Rajaraman, Pavitra Krishnaswamy, Vijay Chandrasekhar

Abstract: We propose a novel online learning algorithm for Restricted Boltzmann Machines (RBM), namely, the Online Generative Discriminative Restricted Boltzmann Machine (OGD-RBM), that provides the ability to build and adapt the network architecture of RBM according to the statistics of streaming data. The OGD-RBM is trained in two phases: (1) an online generative phase for unsupervised feature representat… ▽ More We propose a novel online learning algorithm for Restricted Boltzmann Machines (RBM), namely, the Online Generative Discriminative Restricted Boltzmann Machine (OGD-RBM), that provides the ability to build and adapt the network architecture of RBM according to the statistics of streaming data. The OGD-RBM is trained in two phases: (1) an online generative phase for unsupervised feature representation at the hidden layer and (2) a discriminative phase for classification. The online generative training begins with zero neurons in the hidden layer, adds and updates the neurons to adapt to statistics of streaming data in a single pass unsupervised manner, resulting in a feature representation best suited to the data. The discriminative phase is based on stochastic gradient descent and associates the represented features to the class labels. We demonstrate the OGD-RBM on a set of multi-category and binary classification problems for data sets having varying degrees of class-imbalance. We first apply the OGD-RBM algorithm on the multi-class MNIST dataset to characterize the network evolution. We demonstrate that the online generative phase converges to a stable, concise network architecture, wherein individual neurons are inherently discriminative to the class labels despite unsupervised training. We then benchmark OGD-RBM performance to other machine learning, neural network and ClassRBM techniques for credit scoring applications using 3 public non-stationary two-class credit datasets with varying degrees of class-imbalance. We report that OGD-RBM improves accuracy by 2.5-3% over batch learning techniques while requiring at least 24%-70% fewer neurons and fewer training samples. This online generative training approach can be extended greedily to multiple layers for training Deep Belief Networks in non-stationary data mining applications without the need for a priori fixed architectures. △ Less

Submitted 6 March, 2018; originally announced March 2018.

Comments: 14 pages, 4 figures, 2 tables

arXiv:1706.08041 [pdf, other]

doi 10.1073/pnas.1705414114

Sparsity Enables Estimation of both Subcortical and Cortical Activity from MEG and EEG

Authors: Pavitra Krishnaswamy, Gabriel Obregon-Henao, Jyrki Ahveninen, Sheraz Khan, Behtash Babadi, Juan Eugenio Iglesias, Matti S. Hamalainen, Patrick L. Purdon

Abstract: Subcortical structures play a critical role in brain function. However, options for assessing electrophysiological activity in these structures are limited. Electromagnetic fields generated by neuronal activity in subcortical structures can be recorded non-invasively using magnetoencephalography (MEG) and electroencephalography (EEG). However, these subcortical signals are much weaker than those d… ▽ More Subcortical structures play a critical role in brain function. However, options for assessing electrophysiological activity in these structures are limited. Electromagnetic fields generated by neuronal activity in subcortical structures can be recorded non-invasively using magnetoencephalography (MEG) and electroencephalography (EEG). However, these subcortical signals are much weaker than those due to cortical activity. In addition, we show here that it is difficult to resolve subcortical sources, because distributed cortical activity can explain the MEG and EEG patterns due to deep sources. We then demonstrate that if the cortical activity can be assumed to be spatially sparse, both cortical and subcortical sources can be resolved with M/EEG. Building on this insight, we develop a novel hierarchical sparse inverse solution for M/EEG. We assess the performance of this algorithm on realistic simulations and auditory evoked response data and show that thalamic and brainstem sources can be correctly estimated in the presence of cortical activity. Our analysis and method suggest new opportunities and offer practical tools for characterizing electrophysiological activity in the subcortical structures of the human brain. △ Less

Submitted 25 June, 2017; originally announced June 2017.

Comments: 12 pages with 6 figures

MSC Class: 62-07; 15A29 (Primary); and 62P10; 92-08; 68W01 (Secondary) ACM Class: G.1.3; I.5.4

Showing 1–13 of 13 results for author: Krishnaswamy, P