-
Development of Machine Learning Classifiers for Blood-based Diagnosis and Prognosis of Suspected Acute Infections and Sepsis
Authors:
Ljubomir Buturovic,
Michael Mayhew,
Roland Luethy,
Kirindi Choi,
Uros Midic,
Nandita Damaraju,
Yehudit Hasin-Brumshtein,
Amitesh Pratap,
Rhys M. Adams,
Joao Fonseca,
Ambika Srinath,
Paul Fleming,
Claudia Pereira,
Oliver Liesenfeld,
Purvesh Khatri,
Timothy Sweeney
Abstract:
We applied machine learning to the unmet medical need of rapid and accurate diagnosis and prognosis of acute infections and sepsis in emergency departments. Our solution consists of a Myrna (TM) Instrument and embedded TriVerity (TM) classifiers. The instrument measures abundances of 29 messenger RNAs in patient's blood, subsequently used as features for machine learning. The classifiers convert t…
▽ More
We applied machine learning to the unmet medical need of rapid and accurate diagnosis and prognosis of acute infections and sepsis in emergency departments. Our solution consists of a Myrna (TM) Instrument and embedded TriVerity (TM) classifiers. The instrument measures abundances of 29 messenger RNAs in patient's blood, subsequently used as features for machine learning. The classifiers convert the input features to an intuitive test report comprising the separate likelihoods of (1) a bacterial infection (2) a viral infection, and (3) severity (need for Intensive Care Unit-level care). In internal validation, the system achieved AUROC = 0.83 on the three-class disease diagnosis (bacterial, viral, or non-infected) and AUROC = 0.77 on binary prognosis of disease severity. The Myrna, TriVerity system was granted breakthrough device designation by the United States Food and Drug Administration (FDA). This engineering manuscript teaches the standard and novel machine learning methods used to translate an academic research concept to a clinical product aimed at improving patient care, and discusses lessons learned.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Machine Translation by Projecting Text into the Same Phonetic-Orthographic Space Using a Common Encoding
Authors:
Amit Kumar,
Shantipriya Parida,
Ajay Pratap,
Anil Kumar Singh
Abstract:
The use of subword embedding has proved to be a major innovation in Neural Machine Translation (NMT). It helps NMT to learn better context vectors for Low Resource Languages (LRLs) so as to predict the target words by better modelling the morphologies of the two languages and also the morphosyntax transfer. Even so, their performance for translation in Indian language to Indian language scenario i…
▽ More
The use of subword embedding has proved to be a major innovation in Neural Machine Translation (NMT). It helps NMT to learn better context vectors for Low Resource Languages (LRLs) so as to predict the target words by better modelling the morphologies of the two languages and also the morphosyntax transfer. Even so, their performance for translation in Indian language to Indian language scenario is still not as good as for resource-rich languages. One reason for this is the relative morphological richness of Indian languages, while another is that most of them fall into the extremely low resource or zero-shot categories. Since most major Indian languages use Indic or Brahmi origin scripts, the text written in them is highly phonetic in nature and phonetically similar in terms of abstract letters and their arrangements. We use these characteristics of Indian languages and their scripts to propose an approach based on common multilingual Latin-based encodings (WX notation) that take advantage of language similarity while addressing the morphological complexity issue in NMT. These multilingual Latin-based encodings in NMT, together with Byte Pair Embedding (BPE) allow us to better exploit their phonetic and orthographic as well as lexical similarities to improve the translation quality by projecting different but similar languages on the same orthographic-phonetic character space. We verify the proposed approach by demonstrating experiments on similar language pairs (Gujarati-Hindi, Marathi-Hindi, Nepali-Hindi, Maithili-Hindi, Punjabi-Hindi, and Urdu-Hindi) under low resource conditions. The proposed approach shows an improvement in a majority of cases, in one case as much as ~10 BLEU points compared to baseline techniques for similar language pairs. We also get up to ~1 BLEU points improvement on distant and zero-shot language pairs.
△ Less
Submitted 21 May, 2023;
originally announced May 2023.
-
Exploiting Multilingualism in Low-resource Neural Machine Translation via Adversarial Learning
Authors:
Amit Kumar,
Ajay Pratap,
Anil Kumar Singh
Abstract:
Generative Adversarial Networks (GAN) offer a promising approach for Neural Machine Translation (NMT). However, feeding multiple morphologically languages into a single model during training reduces the NMT's performance. In GAN, similar to bilingual models, multilingual NMT only considers one reference translation for each sentence during model training. This single reference translation limits t…
▽ More
Generative Adversarial Networks (GAN) offer a promising approach for Neural Machine Translation (NMT). However, feeding multiple morphologically languages into a single model during training reduces the NMT's performance. In GAN, similar to bilingual models, multilingual NMT only considers one reference translation for each sentence during model training. This single reference translation limits the GAN model from learning sufficient information about the source sentence representation. Thus, in this article, we propose Denoising Adversarial Auto-encoder-based Sentence Interpolation (DAASI) approach to perform sentence interpolation by learning the intermediate latent representation of the source and target sentences of multilingual language pairs. Apart from latent representation, we also use the Wasserstein-GAN approach for the multilingual NMT model by incorporating the model generated sentences of multiple languages for reward computation. This computed reward optimizes the performance of the GAN-based multilingual model in an effective manner. We demonstrate the experiments on low-resource language pairs and find that our approach outperforms the existing state-of-the-art approaches for multilingual NMT with a performance gain of up to 4 BLEU points. Moreover, we use our trained model on zero-shot language pairs under an unsupervised scenario and show the robustness of the proposed approach.
△ Less
Submitted 31 March, 2023;
originally announced March 2023.
-
Exploiting Language Relatedness in Machine Translation Through Domain Adaptation Techniques
Authors:
Amit Kumar,
Rupjyoti Baruah,
Ajay Pratap,
Mayank Swarnkar,
Anil Kumar Singh
Abstract:
One of the significant challenges of Machine Translation (MT) is the scarcity of large amounts of data, mainly parallel sentence aligned corpora. If the evaluation is as rigorous as resource-rich languages, both Neural Machine Translation (NMT) and Statistical Machine Translation (SMT) can produce good results with such large amounts of data. However, it is challenging to improve the quality of MT…
▽ More
One of the significant challenges of Machine Translation (MT) is the scarcity of large amounts of data, mainly parallel sentence aligned corpora. If the evaluation is as rigorous as resource-rich languages, both Neural Machine Translation (NMT) and Statistical Machine Translation (SMT) can produce good results with such large amounts of data. However, it is challenging to improve the quality of MT output for low resource languages, especially in NMT and SMT. In order to tackle the challenges faced by MT, we present a novel approach of using a scaled similarity score of sentences, especially for related languages based on a 5-gram KenLM language model with Kneser-ney smoothing technique for filtering in-domain data from out-of-domain corpora that boost the translation quality of MT. Furthermore, we employ other domain adaptation techniques such as multi-domain, fine-tuning and iterative back-translation approach to compare our novel approach on the Hindi-Nepali language pair for NMT and SMT. Our approach succeeds in increasing ~2 BLEU point on multi-domain approach, ~3 BLEU point on fine-tuning for NMT and ~2 BLEU point on iterative back-translation approach.
△ Less
Submitted 3 March, 2023;
originally announced March 2023.
-
BPFISH: Blockchain and Privacy-preserving FL Inspired Smart Healthcare
Authors:
Moirangthem Biken Singh,
Ajay Pratap
Abstract:
This paper proposes Federated Learning (FL) based smart healthcare system where Medical Centers (MCs) train the local model using the data collected from patients and send the model weights to the miners in a blockchain-based robust framework without sharing raw data, kee** privacy preservation into deliberation. We formulate an optimization problem by maximizing the utility and minimizing the l…
▽ More
This paper proposes Federated Learning (FL) based smart healthcare system where Medical Centers (MCs) train the local model using the data collected from patients and send the model weights to the miners in a blockchain-based robust framework without sharing raw data, kee** privacy preservation into deliberation. We formulate an optimization problem by maximizing the utility and minimizing the loss function considering energy consumption and FL process delay of MCs for learning effective models on distributed healthcare data underlying a blockchain-based framework. We propose a solution in two stages: first, offer a stable matching-based association algorithm to maximize the utility of both miners and MCs and then solve loss minimization using Stochastic Gradient Descent (SGD) algorithm employing FL under Differential Privacy (DP) and blockchain technology. Moreover, we incorporate blockchain technology to provide tempered resistant and decentralized model weight sharing in the proposed FL-based framework. The effectiveness of the proposed model is shown through simulation on real-world healthcare data comparing other state-of-the-art techniques.
△ Less
Submitted 27 July, 2022; v1 submitted 24 July, 2022;
originally announced July 2022.
-
Covid-19 Spread Detection and Controlling with Fog-based Infection Probability Evaluation Model
Authors:
Suraj Mahawar,
Ajay Pratap
Abstract:
COVID-19 has created a pandemic around the world, paused the path of building the future, and still ongoing without having any long-term solution shortly. The time taken in vaccine distribution is too slow compared to the spread of COVID-19. Hence, it is important to aware and takes precautions on time without delaying and waiting for long-duration after getting infected with the virus. Currently…
▽ More
COVID-19 has created a pandemic around the world, paused the path of building the future, and still ongoing without having any long-term solution shortly. The time taken in vaccine distribution is too slow compared to the spread of COVID-19. Hence, it is important to aware and takes precautions on time without delaying and waiting for long-duration after getting infected with the virus. Currently used technology is more advanced than ever before. Almost everyone has access to at least one mobile device with an Internet connection. Therefore, we propose a Fog Server (FS) based system that can be used to create awareness about the spread of COVID-19 within the surroundings of individuals utilizing the concept of Hidden Markov Models (HMM) and Bluetooth contact tracing, in polynomial computational time complexity. Moreover, we evaluate the effectiveness of the proposed model through real-world data analysis on different simulation parameter settings.
△ Less
Submitted 20 May, 2021;
originally announced June 2021.
-
Criticality and Utility-aware Fog Computing System for Remote Health Monitoring
Authors:
Moirangthem Biken Singh,
Navneet Taunk,
Naveen Kumar Mall,
Ajay Pratap
Abstract:
Growing remote health monitoring system allows constant monitoring of the patient's condition and performance of preventive and control check-ups outside medical facilities. However, the real-time smart-healthcare application poses a delay constraint that has to be solved efficiently. Fog computing is emerging as an efficient solution for such real-time applications. Moreover, different medical ce…
▽ More
Growing remote health monitoring system allows constant monitoring of the patient's condition and performance of preventive and control check-ups outside medical facilities. However, the real-time smart-healthcare application poses a delay constraint that has to be solved efficiently. Fog computing is emerging as an efficient solution for such real-time applications. Moreover, different medical centers are getting attracted to the growing IoT-based remote healthcare system in order to make a profit by hiring Fog computing resources. However, there is a need for an efficient algorithmic model for allocation of limited fog computing resources in the criticality-aware smart-healthcare system considering the profit of medical centers. Thus, the objective of this work is to maximize the system utility calculated as a linear combination of the profit of the medical center and the loss of patients. To measure profit, we propose a flat-pricing-based model. Further, we propose a swap**-based heuristic to maximize the system utility. The proposed heuristic is tested on various parameters and shown to perform close to the optimal with criticality-awareness in its core. Through extensive simulations, we show that the proposed heuristic achieves an average utility of $96\%$ of the optimal, in polynomial time complexity.
△ Less
Submitted 2 April, 2022; v1 submitted 24 May, 2021;
originally announced May 2021.
-
Indicators of retention in remote digital health studies: A cross-study evaluation of 100,000 participants
Authors:
Abhishek Pratap,
Elias Chaibub Neto,
Phil Snyder,
Carl Stepnowsky,
NoƩmie Elhadad,
Daniel Grant,
Matthew H. Mohebbi,
Sean Mooney,
Christine Suver,
John Wilbanks,
Lara Mangravite,
Patrick Heagerty,
Pat Arean,
Larsson Omberg
Abstract:
Digital technologies such as smartphones are transforming the way scientists conduct biomedical research using real-world data. Several remotely-conducted studies have recruited thousands of participants over a span of a few months. Unfortunately, these studies are hampered by substantial participant attrition, calling into question the representativeness of the collected data including generaliza…
▽ More
Digital technologies such as smartphones are transforming the way scientists conduct biomedical research using real-world data. Several remotely-conducted studies have recruited thousands of participants over a span of a few months. Unfortunately, these studies are hampered by substantial participant attrition, calling into question the representativeness of the collected data including generalizability of findings from these studies. We report the challenges in retention and recruitment in eight remote digital health studies comprising over 100,000 participants who participated for more than 850,000 days, completing close to 3.5 million remote health evaluations. Survival modeling surfaced several factors significantly associated(P < 1e-16) with increase in median retention time i) Clinician referral(increase of 40 days), ii) Effect of compensation (22 days), iii) Clinical conditions of interest to the study (7 days) and iv) Older adults(4 days). Additionally, four distinct patterns of daily app usage behavior that were also associated(P < 1e-10) with participant demographics were identified. Most studies were not able to recruit a representative sample, either demographically or regionally. Combined together these findings can help inform recruitment and retention strategies to enable equitable participation of populations in future digital health research.
△ Less
Submitted 2 October, 2019;
originally announced October 2019.
-
On Maximizing Task Throughput in IoT-enabled 5G Networks under Latency and Bandwidth Constraints
Authors:
Ajay Pratap,
Ragini Gupta,
Venkata Sriram Siddhardh Nadendla,
Sajal K. Das
Abstract:
Fog computing in 5G networks has played a significant role in increasing the number of users in a given network. However, Internet-of-Things (IoT) has driven system designers towards designing heterogeneous networks to support diverse demands (tasks with different priority values) with different latency and data rate constraints. In this paper, our goal is to maximize the total number of tasks ser…
▽ More
Fog computing in 5G networks has played a significant role in increasing the number of users in a given network. However, Internet-of-Things (IoT) has driven system designers towards designing heterogeneous networks to support diverse demands (tasks with different priority values) with different latency and data rate constraints. In this paper, our goal is to maximize the total number of tasks served by a heterogeneous network, labeled task throughput, in the presence of data rate and latency constraints and device preferences regarding computational needs. Since our original problem is intractable, we propose an efficient solution based on graph-coloring techniques. We demonstrate the effectiveness of our proposed algorithm using numerical results, real-world experiments on a laboratory testbed and comparing with the state-of-the-art algorithm.
△ Less
Submitted 10 April, 2019;
originally announced May 2019.