-
Logic-dependent emergence of multistability, hysteresis, and biphasic dynamics in a minimal positive feedback network with an autoloop
Authors:
Akriti Srivastava,
Mubasher Rashid
Abstract:
Cellular decision-making (CDM) is a dynamic phenomenon often controlled by regulatory networks defining interactions between genes and transcription factor proteins. Traditional studies have focussed on molecular switches such as positive feedback circuits that exhibit at most bistability. However, higher-order dynamics such as tristability is also prominent in many biological processes. It is thu…
▽ More
Cellular decision-making (CDM) is a dynamic phenomenon often controlled by regulatory networks defining interactions between genes and transcription factor proteins. Traditional studies have focussed on molecular switches such as positive feedback circuits that exhibit at most bistability. However, higher-order dynamics such as tristability is also prominent in many biological processes. It is thus imperative to identify a minimal circuit that can alone explain mono, bi, and tristable dynamics. In this work, we consider a two-component positive feedback network with an autoloop and explore these regimes of stability for different degrees of multimerization and the choice of Boolean logic functions. We report that this network can exhibit numerous dynamical scenarios such as bi-and tristability, hysteresis, and biphasic kinetics, explaining the possibilities of abrupt cell state transitions and the smooth state swap without a step-like switch. Specifically, while with monomeric regulation and competitive OR logic, the circuit exhibits mono-and bistability and biphasic dynamics, with non-competitive AND and OR logics only monostability can be achieved. To obtain bistability in the latter cases, we show that the autoloop must have (at least) dimeric regulation. In pursuit of higher-order stability, we show that tristability occurs with higher degrees of multimerization and with non-competitive OR logic only. Our results, backed by rigorous analytical calculations and numerical examples, thus explain the association between multistability, multimerization, and logic in this minimal circuit. Since this circuit underlies various biological processes, including epithelial-mesenchymal transition which often drives carcinoma metastasis, these results can thus offer crucial inputs to control cell state transition by manipulating multimerization and the logic of regulation in cells.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Dynamics-based Feature Augmentation of Graph Neural Networks for Variant Emergence Prediction
Authors:
Majd Al Aawar,
Srikar Mutnuri,
Mansooreh Montazerin,
Ajitesh Srivastava
Abstract:
During the COVID-19 pandemic, a major driver of new surges has been the emergence of new variants. When a new variant emerges in one or more countries, other nations monitor its spread in preparation for its potential arrival. The impact of the new variant and the timings of epidemic peaks in a country highly depend on when the variant arrives. The current methods for predicting the spread of new…
▽ More
During the COVID-19 pandemic, a major driver of new surges has been the emergence of new variants. When a new variant emerges in one or more countries, other nations monitor its spread in preparation for its potential arrival. The impact of the new variant and the timings of epidemic peaks in a country highly depend on when the variant arrives. The current methods for predicting the spread of new variants rely on statistical modeling, however, these methods work only when the new variant has already arrived in the region of interest and has a significant prevalence. Can we predict when a variant existing elsewhere will arrive in a given region? To address this question, we propose a variant-dynamics-informed Graph Neural Network (GNN) approach. First, we derive the dynamics of variant prevalence across pairs of regions (countries) that apply to a large class of epidemic models. The dynamics motivate the introduction of certain features in the GNN. We demonstrate that our proposed dynamics-informed GNN outperforms all the baselines, including the currently pervasive framework of Physics-Informed Neural Networks (PINNs). To advance research in this area, we introduce a benchmarking tool to assess a user-defined model's prediction performance across 87 countries and 36 variants.
△ Less
Submitted 28 May, 2024; v1 submitted 7 January, 2024;
originally announced January 2024.
-
Spatio-Temporal Attention in Multi-Granular Brain Chronnectomes for Detection of Autism Spectrum Disorder
Authors:
James Orme-Rogers,
Ajitesh Srivastava
Abstract:
The traditional methods for detecting autism spectrum disorder (ASD) are expensive, subjective, and time-consuming, often taking years for a diagnosis, with many children growing well into adolescence and even adulthood before finally confirming the disorder. Recently, graph-based learning techniques have demonstrated impressive results on resting-state functional magnetic resonance imaging (rs-fM…
▽ More
The traditional methods for detecting autism spectrum disorder (ASD) are expensive, subjective, and time-consuming, often taking years for a diagnosis, with many children growing well into adolescence and even adulthood before finally confirming the disorder. Recently, graph-based learning techniques have demonstrated impressive results on resting-state functional magnetic resonance imaging (rs-fMRI) data from the Autism Brain Imaging Data Exchange (ABIDE). We introduce IMAGIN, a multI-granular, Multi-Atlas spatio-temporal attention Graph Isomorphism Network, which we use to learn graph representations of dynamic functional brain connectivity (chronnectome), as opposed to static connectivity (connectome). The experimental results demonstrate that IMAGIN achieves a 5-fold cross-validation accuracy of 79.25%, which surpasses the current state-of-the-art by 1.5%. In addition, analysis of the spatial and temporal attention scores provides further validation for the neural basis of autism.
△ Less
Submitted 29 October, 2022;
originally announced November 2022.
-
Shape-based Evaluation of Epidemic Forecasts
Authors:
Ajitesh Srivastava,
Satwant Singh,
Fiona Lee
Abstract:
Infectious disease forecasting for ongoing epidemics has been traditionally performed, communicated, and evaluated as numerical targets - 1, 2, 3, and 4 week ahead cases, deaths, and hospitalizations. While there is great value in predicting these numerical targets to assess the burden of the disease, we argue that there is also value in communicating the future trend (description of the shape) of…
▽ More
Infectious disease forecasting for ongoing epidemics has been traditionally performed, communicated, and evaluated as numerical targets - 1, 2, 3, and 4 week ahead cases, deaths, and hospitalizations. While there is great value in predicting these numerical targets to assess the burden of the disease, we argue that there is also value in communicating the future trend (description of the shape) of the epidemic -- for instance, if the cases will remain flat or a surge is expected. To ensure what is being communicated is useful we need to be able to evaluate how well the predicted shape matches with the ground truth shape. Instead of treating this as a classification problem (one out of $n$ shapes), we define a transformation of the numerical forecasts into a ``shapelet''-space representation. In this representation, each dimension corresponds to the similarity of the shape with one of the shapes of interest (a shapelet). We prove that this representation satisfies the property that two shapes that one would consider similar are mapped close to each other, and vice versa. We demonstrate that our representation is able to reasonably capture the trends in COVID-19 cases and deaths time-series. With this representation, we define an evaluation measure and a measure of agreement among multiple models. We also define the shapelet-space ensemble of multiple models as the mean of their shapelet-space representations. We show that this ensemble is able to accurately predict the shape of the future trend for COVID-19 cases and trends. We also show that the agreement between models can provide a good indicator of the reliability of the forecast.
△ Less
Submitted 11 November, 2022; v1 submitted 8 September, 2022;
originally announced September 2022.
-
Robust Scenario Interpretation from Multi-model Prediction Efforts
Authors:
Yuanhao Lu,
Ajitesh Srivastava
Abstract:
Multi-model prediction efforts in infectious disease modeling and climate modeling involve multiple teams independently producing projections under various scenarios. Often these scenarios are produced by the presence and absence of a decision in the future, e.g., no vaccinations (scenario A) vs vaccinations (scenario B) available in the future. The models submit probabilistic projections for each…
▽ More
Multi-model prediction efforts in infectious disease modeling and climate modeling involve multiple teams independently producing projections under various scenarios. Often these scenarios are produced by the presence and absence of a decision in the future, e.g., no vaccinations (scenario A) vs vaccinations (scenario B) available in the future. The models submit probabilistic projections for each of the scenarios. Obtaining a confidence interval on the impact of the decision (e.g., number of deaths averted) is important for decision making. However, obtaining tight bounds only from the probabilistic projections for the individual scenarios is difficult, as the joint probability is not known. Further, the models may not be able to generate the joint probability distribution due to various reasons including the need to rewrite simulations, and storage and transfer requirements.
Without asking the submitting models for additional work, we aim to estimate a non-trivial bound on the outcomes due to the decision variable. We first prove, under a key assumption, that an $α-$confidence interval on the difference of scenario predictions can be obtained given only the quantiles of the predictions. Then we show how to estimate a confidence interval after relaxing that assumption. We use our approach to estimate confidence intervals on reduction in cases, deaths, and hospitalizations due to vaccinations based on model submissions to the US Scenario Modeling Hub.
△ Less
Submitted 9 August, 2022;
originally announced August 2022.
-
The Variations of SIkJalpha Model for COVID-19 Forecasting and Scenario Projections
Authors:
Ajitesh Srivastava
Abstract:
We proposed the SIkJalpha model at the beginning of the COVID-19 pandemic (early 2020). Since then, as the pandemic evolved, more complexities were added to capture crucial factors and variables that can assist with projecting desired future scenarios. Throughout the pandemic, multi-model collaborative efforts have been organized to predict short-term outcomes (cases, deaths, and hospitalizations)…
▽ More
We proposed the SIkJalpha model at the beginning of the COVID-19 pandemic (early 2020). Since then, as the pandemic evolved, more complexities were added to capture crucial factors and variables that can assist with projecting desired future scenarios. Throughout the pandemic, multi-model collaborative efforts have been organized to predict short-term outcomes (cases, deaths, and hospitalizations) of COVID-19 and long-term scenario projections. We have been participating in five such efforts. This paper presents the evolution of the SIkJalpha model and its many versions that have been used to submit to these collaborative efforts since the beginning of the pandemic. Specifically, we show that the SIkJalpha model is an approximation of a class of epidemiological models. We demonstrate how the model can be used to incorporate various complexities, including under-reporting, multiple variants, waning of immunity, and contact rates, and to generate probabilistic outputs.
△ Less
Submitted 18 September, 2023; v1 submitted 6 July, 2022;
originally announced July 2022.
-
Targeted Neural Dynamical Modeling
Authors:
Cole Hurwitz,
Akash Srivastava,
Kai Xu,
Justin Jude,
Matthew G. Perich,
Lee E. Miller,
Matthias H. Hennig
Abstract:
Latent dynamics models have emerged as powerful tools for modeling and interpreting neural population activity. Recently, there has been a focus on incorporating simultaneously measured behaviour into these models to further disentangle sources of neural variability in their latent space. These approaches, however, are limited in their ability to capture the underlying neural dynamics (e.g. linear…
▽ More
Latent dynamics models have emerged as powerful tools for modeling and interpreting neural population activity. Recently, there has been a focus on incorporating simultaneously measured behaviour into these models to further disentangle sources of neural variability in their latent space. These approaches, however, are limited in their ability to capture the underlying neural dynamics (e.g. linear) and in their ability to relate the learned dynamics back to the observed behaviour (e.g. no time lag). To this end, we introduce Targeted Neural Dynamical Modeling (TNDM), a nonlinear state-space model that jointly models the neural activity and external behavioural variables. TNDM decomposes neural dynamics into behaviourally relevant and behaviourally irrelevant dynamics; the relevant dynamics are used to reconstruct the behaviour through a flexible linear decoder and both sets of dynamics are used to reconstruct the neural activity through a linear decoder with no time lag. We implement TNDM as a sequential variational autoencoder and validate it on simulated recordings and recordings taken from the premotor and motor cortex of a monkey performing a center-out reaching task. We show that TNDM is able to learn low-dimensional latent dynamics that are highly predictive of behaviour without sacrificing its fit to the neural data.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
Implementing Stepped Pooled Testing for Rapid COVID-19 Detection
Authors:
Abhishek Srivastava,
Anurag Mishra,
Trusha Jayant Parekh,
Sampreeti Jena
Abstract:
COVID-19, a viral respiratory pandemic, has rapidly spread throughout the globe. Large scale and rapid testing of the population is required to contain the disease, but such testing is prohibitive in terms of resources, cost and time. Recently RT-PCR based pooled testing has emerged as a promising way to boost testing efficiency. We introduce a stepped pooled testing strategy, a probability driven…
▽ More
COVID-19, a viral respiratory pandemic, has rapidly spread throughout the globe. Large scale and rapid testing of the population is required to contain the disease, but such testing is prohibitive in terms of resources, cost and time. Recently RT-PCR based pooled testing has emerged as a promising way to boost testing efficiency. We introduce a stepped pooled testing strategy, a probability driven approach which significantly reduces the number of tests required to identify infected individuals in a large population. Our comprehensive methodology incorporates the effect of false negative and positive rates to accurately determine not only the efficiency of pooling but also it's accuracy. Under various plausible scenarios, we show that this approach significantly reduces the cost of testing and also reduces the effective false positive rate of tests when compared to a strategy of testing every individual of a population. We also outline an optimization strategy to obtain the pool size that maximizes the efficiency of pooling given the diagnostic protocol parameters and local infection conditions.
△ Less
Submitted 19 July, 2020;
originally announced July 2020.
-
Fast and Accurate Forecasting of COVID-19 Deaths Using the SIkJ$α$ Model
Authors:
Ajitesh Srivastava,
Tianjian Xu,
Viktor K. Prasanna
Abstract:
Forecasting the effect of COVID-19 is essential to design policies that may prepare us to handle the pandemic. Many methods have already been proposed, particularly, to forecast reported cases and deaths at country-level and state-level. Many of these methods are based on traditional epidemiological model which rely on simulations or Bayesian inference to simultaneously learn many parameters at a…
▽ More
Forecasting the effect of COVID-19 is essential to design policies that may prepare us to handle the pandemic. Many methods have already been proposed, particularly, to forecast reported cases and deaths at country-level and state-level. Many of these methods are based on traditional epidemiological model which rely on simulations or Bayesian inference to simultaneously learn many parameters at a time. This makes them prone to over-fitting and slow execution. We propose an extension to our model SIkJ$α$ to forecast deaths and show that it can consider the effect of many complexities of the epidemic process and yet be simplified to a few parameters that are learned using fast linear regressions. We also present an evaluation of our method against seven approaches currently being used by the CDC, based on their two weeks forecast at various times during the pandemic. We demonstrate that our method achieves better root mean squared error compared to these seven approaches during majority of the evaluation period. Further, on a 2 core desktop machine, our approach takes only 3.18s to tune hyper-parameters, learn parameters and generate 100 days of forecasts of reported cases and deaths for all the states in the US. The total execution time for 184 countries is 11.83s and for all the US counties ($>$ 3000) is 101.03s.
△ Less
Submitted 12 July, 2020; v1 submitted 10 July, 2020;
originally announced July 2020.
-
Data-driven Identification of Number of Unreported Cases for COVID-19: Bounds and Limitations
Authors:
Ajitesh Srivastava,
Viktor K. Prasanna
Abstract:
Accurate forecasts for COVID-19 are necessary for better preparedness and resource management. Specifically, deciding the response over months or several months requires accurate long-term forecasts which is particularly challenging as the model errors accumulate with time. A critical factor that can hinder accurate long-term forecasts, is the number of unreported/asymptomatic cases. While there h…
▽ More
Accurate forecasts for COVID-19 are necessary for better preparedness and resource management. Specifically, deciding the response over months or several months requires accurate long-term forecasts which is particularly challenging as the model errors accumulate with time. A critical factor that can hinder accurate long-term forecasts, is the number of unreported/asymptomatic cases. While there have been early serology tests to estimate this number, more tests need to be conducted for more reliable results. To identify the number of unreported/asymptomatic cases, we take an epidemiology data-driven approach. We show that we can identify lower bounds on this ratio or upper bound on actual cases as a factor of reported cases. To do so, we propose an extension of our prior heterogeneous infection rate model, incorporating unreported/asymptomatic cases. We prove that the number of unreported cases can be reliably estimated only from a certain time period of the epidemic data. In doing so, we construct an algorithm called Fixed Infection Rate method, which identifies a reliable bound on the learned ratio. We also propose two heuristics to learn this ratio and show their effectiveness on simulated data. We use our approaches to identify the upper bounds on the ratio of actual to reported cases for New York City and several US states. Our results demonstrate with high confidence that the actual number of cases cannot be more than 35 times in New York, 40 times in Illinois, 38 times in Massachusetts and 29 times in New Jersey, than the reported cases.
△ Less
Submitted 9 July, 2020; v1 submitted 3 June, 2020;
originally announced June 2020.
-
Agent-Level Pandemic Simulation (ALPS) for Analyzing Effects of Lockdown Measures
Authors:
Anuj Srivastava
Abstract:
This paper develops an agent-level simulation model, termed ALPS, for simulating the spread of an infectious disease in a confined community. The mechanism of transmission is agent-to-agent contact, using parameters reported for Corona COVID-19 pandemic. The main goal of the ALPS simulation is analyze effects of preventive measures -- imposition and lifting of lockdown norms -- on the rates of inf…
▽ More
This paper develops an agent-level simulation model, termed ALPS, for simulating the spread of an infectious disease in a confined community. The mechanism of transmission is agent-to-agent contact, using parameters reported for Corona COVID-19 pandemic. The main goal of the ALPS simulation is analyze effects of preventive measures -- imposition and lifting of lockdown norms -- on the rates of infections, fatalities and recoveries. The model assumptions and choices represent a balance between competing demands of being realistic and being efficient for real-time inferences. The model provides quantification of gains in reducing casualties by imposition and maintenance of restrictive measures in place.
△ Less
Submitted 25 April, 2020;
originally announced April 2020.
-
Learning to Forecast and Forecasting to Learn from the COVID-19 Pandemic
Authors:
Ajitesh Srivastava,
Viktor K. Prasanna
Abstract:
Accurate forecasts of COVID-19 is central to resource management and building strategies to deal with the epidemic. We propose a heterogeneous infection rate model with human mobility for epidemic modeling, a preliminary version of which we have successfully used during DARPA Grand Challenge 2014. By linearizing the model and using weighted least squares, our model is able to quickly adapt to chan…
▽ More
Accurate forecasts of COVID-19 is central to resource management and building strategies to deal with the epidemic. We propose a heterogeneous infection rate model with human mobility for epidemic modeling, a preliminary version of which we have successfully used during DARPA Grand Challenge 2014. By linearizing the model and using weighted least squares, our model is able to quickly adapt to changing trends and provide extremely accurate predictions of confirmed cases at the level of countries and states of the United States. We show that during the earlier part of the epidemic, using travel data increases the predictions. Training the model to forecast also enables learning characteristics of the epidemic. In particular, we show that changes in model parameters over time can help us quantify how well a state or a country has responded to the epidemic. The variations in parameters also allow us to forecast different scenarios such as what would happen if we were to disregard social distancing suggestions.
△ Less
Submitted 4 May, 2020; v1 submitted 23 April, 2020;
originally announced April 2020.
-
On the Inhibition of COVID-19 Protease by Indian Herbal Plants: An In Silico Investigation
Authors:
Ambrish Kumar Srivastava,
Abhishek Kumar,
Neeraj Misra
Abstract:
COVID-19 has quickly spread across the globe, becoming a pandemic. This disease has a variable impact in different countries depending on their cultural norms, mitigation efforts and health infrastructure. In India, a majority of people rely upon traditional Indian medicine to treat human maladies due to less-cost, easier availability and without any side-effect. These medicines are made by herbal…
▽ More
COVID-19 has quickly spread across the globe, becoming a pandemic. This disease has a variable impact in different countries depending on their cultural norms, mitigation efforts and health infrastructure. In India, a majority of people rely upon traditional Indian medicine to treat human maladies due to less-cost, easier availability and without any side-effect. These medicines are made by herbal plants. This study aims to assess the Indian herbal plants in the pursuit of potential COVID-19 inhibitors using in silico approaches. We have considered 18 extracted compounds of 11 different species of these plants. Our calculated lipophilicity, aqueous solubility and binding affinity of the extracted compounds suggest that the inhibition potentials in the order; harsingar > aloe vera > giloy > turmeric > neem > ashwagandha > red onion > tulsi > cannabis > black pepper. On comparing the binding affinity with hydroxychloroquine, we note that the inhibition potentials of the extracts of harsingar, aloe vera and giloy are very promising. Therefore, we believe that these findings will open further possibilities and accelerate the works towards finding an antidote for this malady.
△ Less
Submitted 5 April, 2020;
originally announced April 2020.
-
Predicting Onset of Dementia in Parkinson's Disease Patients
Authors:
Dhruv Agarwal,
Abhishek Srivastava,
Edward W Huang
Abstract:
Alzheimer's disease (AD) and Parkinson's disease (PD) are the two most common neurodegenerative disorders in humans. Because a significant percentage of patients have clinical and pathological features of both diseases, it has been hypothesized that the patho-cascades of the two diseases overlap. Despite this evidence, these two diseases are rarely studied in a joint manner. In this paper, we util…
▽ More
Alzheimer's disease (AD) and Parkinson's disease (PD) are the two most common neurodegenerative disorders in humans. Because a significant percentage of patients have clinical and pathological features of both diseases, it has been hypothesized that the patho-cascades of the two diseases overlap. Despite this evidence, these two diseases are rarely studied in a joint manner. In this paper, we utilize clinical, imaging, genetic, and biospecimen features to cluster AD and PD patients into the same feature space. By training a machine learning classifier on the combined feature space, we predict the disease stage of patients two years after their baseline visits. We observed a considerable improvement in the prediction accuracy of Parkinson's dementia patients due to combined training on Alzheimer's and Parkinson's patients, thereby affirming the claim that these two diseases can be jointly studied.
△ Less
Submitted 2 June, 2019;
originally announced June 2019.
-
Scalable Spike Source Localization in Extracellular Recordings using Amortized Variational Inference
Authors:
Cole L. Hurwitz,
Kai Xu,
Akash Srivastava,
Alessio P. Buccino,
Matthias H. Hennig
Abstract:
Determining the positions of neurons in an extracellular recording is useful for investigating functional properties of the underlying neural circuitry. In this work, we present a Bayesian modelling approach for localizing the source of individual spikes on high-density, microelectrode arrays. To allow for scalable inference, we implement our model as a variational autoencoder and perform amortize…
▽ More
Determining the positions of neurons in an extracellular recording is useful for investigating functional properties of the underlying neural circuitry. In this work, we present a Bayesian modelling approach for localizing the source of individual spikes on high-density, microelectrode arrays. To allow for scalable inference, we implement our model as a variational autoencoder and perform amortized variational inference. We evaluate our method on both biophysically realistic simulated and real extracellular datasets, demonstrating that it is more accurate than and can improve spike sorting performance over heuristic localization methods such as center of mass.
△ Less
Submitted 26 January, 2022; v1 submitted 29 May, 2019;
originally announced May 2019.
-
Discovering Common Change-Point Patterns in Functional Connectivity Across Subjects
Authors:
Mengyu Dai,
Zhengwu Zhang,
Anuj Srivastava
Abstract:
This paper studies change-points in human brain functional connectivity (FC) and seeks patterns that are common across multiple subjects under identical external stimulus. FC relates to the similarity of fMRI responses across different brain regions when the brain is simply resting or performing a task. While the dynamic nature of FC is well accepted, this paper develops a formal statistical test…
▽ More
This paper studies change-points in human brain functional connectivity (FC) and seeks patterns that are common across multiple subjects under identical external stimulus. FC relates to the similarity of fMRI responses across different brain regions when the brain is simply resting or performing a task. While the dynamic nature of FC is well accepted, this paper develops a formal statistical test for finding {\it change-points} in times series associated with FC. It represents short-term connectivity by a symmetric positive-definite matrix, and uses a Riemannian metric on this space to develop a graphical method for detecting change-points in a time series of such matrices. It also provides a graphical representation of estimated FC for stationary subintervals in between the detected change-points. Furthermore, it uses a temporal alignment of the test statistic, viewed as a real-valued function over time, to remove inter-subject variability and to discover common change-point patterns across subjects. This method is illustrated using data from Human Connectome Project (HCP) database for multiple subjects and tasks.
△ Less
Submitted 26 April, 2019;
originally announced April 2019.
-
Early Prediction of Acute Kidney Injury in Critical Care Setting Using Clinical Notes
Authors:
Yikuan Li,
Liang Yao,
Chengsheng Mao,
Anand Srivastava,
Xiaoqian Jiang,
Yuan Luo
Abstract:
Acute kidney injury (AKI) in critically ill patients is associated with significant morbidity and mortality. Development of novel methods to identify patients with AKI earlier will allow for testing of novel strategies to prevent or reduce the complications of AKI. We developed data-driven prediction models to estimate the risk of new AKI onset. We generated models from clinical notes within the f…
▽ More
Acute kidney injury (AKI) in critically ill patients is associated with significant morbidity and mortality. Development of novel methods to identify patients with AKI earlier will allow for testing of novel strategies to prevent or reduce the complications of AKI. We developed data-driven prediction models to estimate the risk of new AKI onset. We generated models from clinical notes within the first 24 hours following intensive care unit (ICU) admission extracted from Medical Information Mart for Intensive Care III (MIMIC-III). From the clinical notes, we generated clinically meaningful word and concept representations and embeddings, respectively. Five supervised learning classifiers and knowledge-guided deep learning architecture were used to construct prediction models. The best configuration yielded a competitive AUC of 0.779. Our work suggests that natural language processing of clinical notes can be applied to assist clinicians in identifying the risk of incident AKI onset in critically ill patients upon admission to the ICU.
△ Less
Submitted 9 November, 2018; v1 submitted 6 November, 2018;
originally announced November 2018.
-
Accurate, Fast and Lightweight Clustering of de novo Transcriptomes using Fragment Equivalence Classes
Authors:
Avi Srivastava,
Hirak Sarkar,
Laraib Malik,
Rob Patro
Abstract:
Motivation: De novo transcriptome assembly of non-model organisms is the first major step for many RNA-seq analysis tasks. Current methods for de novo assembly often report a large number of contiguous sequences (contigs), which may be fractured and incomplete sequences instead of full-length transcripts. Dealing with a large number of such contigs can slow and complicate downstream analysis.
Re…
▽ More
Motivation: De novo transcriptome assembly of non-model organisms is the first major step for many RNA-seq analysis tasks. Current methods for de novo assembly often report a large number of contiguous sequences (contigs), which may be fractured and incomplete sequences instead of full-length transcripts. Dealing with a large number of such contigs can slow and complicate downstream analysis.
Results :We present a method for clustering contigs from de novo transcriptome assemblies based upon the relationships exposed by multi-map** sequencing fragments. Specifically, we cast the problem of clustering contigs as one of clustering a sparse graph that is induced by equivalence classes of fragments that map to subsets of the transcriptome. Leveraging recent developments in efficient read map** and transcript quantification, we have developed RapClust, a tool implementing this approach that is capable of accurately clustering most large de novo transcriptomes in a matter of minutes, while simultaneously providing accurate estimates of expression for the resulting clusters. We compare RapClust against a number of tools commonly used for de novo transcriptome clustering. Using de novo assemblies of organisms for which reference genomes are available, we assess the accuracy of these different methods in terms of the quality of the resulting clusterings, and the concordance of differential expression tests with those based on ground truth clusters. We find that RapClust produces clusters of comparable or better quality than existing state-of-the-art approaches, and does so substantially faster. RapClust also confers a large benefit in terms of space usage, as it produces only succinct intermediate files - usually on the order of a few megabytes - even when processing hundreds of millions of reads.
△ Less
Submitted 12 April, 2016;
originally announced April 2016.