-
Conditional Generative Models are Sufficient to Sample from Any Causal Effect Estimand
Authors:
Md Musfiqur Rahman,
Matt Jordan,
Murat Kocaoglu
Abstract:
Causal inference from observational data has recently found many applications in machine learning. While sound and complete algorithms exist to compute causal effects, many of these algorithms require explicit access to conditional likelihoods over the observational distribution, which is difficult to estimate in the high-dimensional regime, such as with images. To alleviate this issue, researcher…
▽ More
Causal inference from observational data has recently found many applications in machine learning. While sound and complete algorithms exist to compute causal effects, many of these algorithms require explicit access to conditional likelihoods over the observational distribution, which is difficult to estimate in the high-dimensional regime, such as with images. To alleviate this issue, researchers have approached the problem by simulating causal relations with neural models and obtained impressive results. However, none of these existing approaches can be applied to generic scenarios such as causal graphs on image data with latent confounders, or obtain conditional interventional samples. In this paper, we show that any identifiable causal effect given an arbitrary causal graph can be computed through push-forward computations of conditional generative models. Based on this result, we devise a diffusion-based approach to sample from any (conditional) interventional distribution on image data. To showcase our algorithm's performance, we conduct experiments on a Colored MNIST dataset having both the treatment ($X$) and the target variables ($Y$) as images and obtain interventional samples from $P(y|do(x))$. As an application of our algorithm, we evaluate two large conditional generative models that are pre-trained on the CelebA dataset by analyzing the strength of spurious correlations and the level of disentanglement they achieve.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Pavement Performance Evaluation Models for South Carolina
Authors:
Md Mostaqur Rahman,
Majbah Uddin,
Sarah L Gassman
Abstract:
This paper develops pavement performance evaluation models using data from primary and interstate highway systems in the state of South Carolina, USA. Twenty pavement sections are selected from across the state, and historical pavement performance data of those sections are collected. A total of 8 models were developed based on regression techniques, which include 4 for Asphalt Concrete (AC) pavem…
▽ More
This paper develops pavement performance evaluation models using data from primary and interstate highway systems in the state of South Carolina, USA. Twenty pavement sections are selected from across the state, and historical pavement performance data of those sections are collected. A total of 8 models were developed based on regression techniques, which include 4 for Asphalt Concrete (AC) pavements and 4 for Jointed Plain Concrete Pavements (JPCP). Four different performance indicators are considered as response variables in the statistical analysis: Present Serviceability Index (PSI), Pavement Distress Index (PDI), Pavement Quality Index (PQI), and International Roughness Index (IRI). Annual Average Daily Traffic (AADT), Free Flow Speed (FFS), precipitation, temperature, and soil type (soil Type A from Blue Ridge and Piedmont Region, and soil Type B from Coastal Plain and Sediment Region) are considered as predictor variables. Results showed that AADT, FFS, and precipitation have statistically significant effects on PSI and IRI for both JPCP and AC pavements. Temperature showed significant effect only on PDI and PQI (p < 0.01) for AC pavements. Considering soil type, Type B soil produced statistically higher PDI and PQI (p < 0.01) compared to Type A soil on AC pavements; whereas, Type B soil produced statistically higher IRI and PSI (p < 0.001) compared to Type A soil on JPCP pavements. Using the developed models, local transportation agencies could estimate future corrective actions, such as maintenance and rehabilitation, as well as future pavement performances.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Modular Learning of Deep Causal Generative Models for High-dimensional Causal Inference
Authors:
Md Musfiqur Rahman,
Murat Kocaoglu
Abstract:
Pearl's causal hierarchy establishes a clear separation between observational, interventional, and counterfactual questions. Researchers proposed sound and complete algorithms to compute identifiable causal queries at a given level of the hierarchy using the causal structure and data from the lower levels of the hierarchy. However, most of these algorithms assume that we can accurately estimate th…
▽ More
Pearl's causal hierarchy establishes a clear separation between observational, interventional, and counterfactual questions. Researchers proposed sound and complete algorithms to compute identifiable causal queries at a given level of the hierarchy using the causal structure and data from the lower levels of the hierarchy. However, most of these algorithms assume that we can accurately estimate the probability distribution of the data, which is an impractical assumption for high-dimensional variables such as images. On the other hand, modern generative deep learning architectures can be trained to learn how to accurately sample from such high-dimensional distributions. Especially with the recent rise of foundation models for images, it is desirable to leverage pre-trained models to answer causal queries with such high-dimensional data. To address this, we propose a sequential training algorithm that, given the causal structure and a pre-trained conditional generative model, can train a deep causal generative model, which utilizes the pre-trained model and can provably sample from identifiable interventional and counterfactual distributions. Our algorithm, called Modular-DCM, uses adversarial training to learn the network weights, and to the best of our knowledge, is the first algorithm that can make use of pre-trained models and provably sample from any identifiable causal query in the presence of latent confounders with high-dimensional data. We demonstrate the utility of our algorithm using semi-synthetic and real-world datasets containing images as variables in the causal structure.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
Towards Causal Deep Learning for Vulnerability Detection
Authors:
Md Mahbubur Rahman,
Ira Ceka,
Chengzhi Mao,
Saikat Chakraborty,
Baishakhi Ray,
Wei Le
Abstract:
Deep learning vulnerability detection has shown promising results in recent years. However, an important challenge that still blocks it from being very useful in practice is that the model is not robust under perturbation and it cannot generalize well over the out-of-distribution (OOD) data, e.g., applying a trained model to unseen projects in real world. We hypothesize that this is because the mo…
▽ More
Deep learning vulnerability detection has shown promising results in recent years. However, an important challenge that still blocks it from being very useful in practice is that the model is not robust under perturbation and it cannot generalize well over the out-of-distribution (OOD) data, e.g., applying a trained model to unseen projects in real world. We hypothesize that this is because the model learned non-robust features, e.g., variable names, that have spurious correlations with labels. When the perturbed and OOD datasets no longer have the same spurious features, the model prediction fails. To address the challenge, in this paper, we introduced causality into deep learning vulnerability detection. Our approach CausalVul consists of two phases. First, we designed novel perturbations to discover spurious features that the model may use to make predictions. Second, we applied the causal learning algorithms, specifically, do-calculus, on top of existing deep learning models to systematically remove the use of spurious features and thus promote causal based prediction. Our results show that CausalVul consistently improved the model accuracy, robustness and OOD performance for all the state-of-the-art models and datasets we experimented. To the best of our knowledge, this is the first work that introduces do calculus based causal learning to software engineering models and shows it's indeed useful for improving the model accuracy, robustness and generalization. Our replication package is located at https://figshare.com/s/0ffda320dcb96c249ef2.
△ Less
Submitted 14 January, 2024; v1 submitted 11 October, 2023;
originally announced October 2023.
-
Pseudo value-based Deep Neural Networks for Multi-state Survival Analysis
Authors:
Md Mahmudur Rahman,
Sanjay Purushotham
Abstract:
Multi-state survival analysis (MSA) uses multi-state models for the analysis of time-to-event data. In medical applications, MSA can provide insights about the complex disease progression in patients. A key challenge in MSA is the accurate subject-specific prediction of multi-state model quantities such as transition probability and state occupation probability in the presence of censoring. Tradit…
▽ More
Multi-state survival analysis (MSA) uses multi-state models for the analysis of time-to-event data. In medical applications, MSA can provide insights about the complex disease progression in patients. A key challenge in MSA is the accurate subject-specific prediction of multi-state model quantities such as transition probability and state occupation probability in the presence of censoring. Traditional multi-state methods such as Aalen-Johansen (AJ) estimators and Cox-based methods are respectively limited by Markov and proportional hazards assumptions and are infeasible for making subject-specific predictions. Neural ordinary differential equations for MSA relax these assumptions but are computationally expensive and do not directly model the transition probabilities. To address these limitations, we propose a new class of pseudo-value-based deep learning models for multi-state survival analysis, where we show that pseudo values - designed to handle censoring - can be a natural replacement for estimating the multi-state model quantities when derived from a consistent estimator. In particular, we provide an algorithm to derive pseudo values from consistent estimators to directly predict the multi-state survival quantities from the subject's covariates. Empirical results on synthetic and real-world datasets show that our proposed models achieve state-of-the-art results under various censoring settings.
△ Less
Submitted 11 July, 2022;
originally announced July 2022.
-
FedPseudo: Pseudo value-based Deep Learning Models for Federated Survival Analysis
Authors:
Md Mahmudur Rahman,
Sanjay Purushotham
Abstract:
Survival analysis, time-to-event analysis, is an important problem in healthcare since it has a wide-ranging impact on patients and palliative care. Many survival analysis methods have assumed that the survival data is centrally available either from one medical center or by data sharing from multi-centers. However, the sensitivity of the patient attributes and the strict privacy laws have increas…
▽ More
Survival analysis, time-to-event analysis, is an important problem in healthcare since it has a wide-ranging impact on patients and palliative care. Many survival analysis methods have assumed that the survival data is centrally available either from one medical center or by data sharing from multi-centers. However, the sensitivity of the patient attributes and the strict privacy laws have increasingly forbidden sharing of healthcare data. To address this challenge, the research community has looked at the solution of decentralized training and sharing of model parameters using the Federated Learning (FL) paradigm. In this paper, we study the utilization of FL for performing survival analysis on distributed healthcare datasets. Recently, the popular Cox proportional hazard (CPH) models have been adapted for FL settings; however, due to its linearity and proportional hazards assumptions, CPH models result in suboptimal performance, especially for non-linear, non-iid, and heavily censored survival datasets. To overcome the challenges of existing federated survival analysis methods, we leverage the predictive accuracy of the deep learning models and the power of pseudo values to propose a first-of-its-kind, pseudo value-based deep learning model for federated survival analysis (FSA) called FedPseudo. Furthermore, we introduce a novel approach of deriving pseudo values for survival probability in the FL settings that speeds up the computation of pseudo values. Extensive experiments on synthetic and real-world datasets show that our pseudo valued-based FL framework achieves similar performance as the best centrally trained deep survival analysis model. Moreover, our proposed FL approach obtains the best results for various censoring settings.
△ Less
Submitted 11 July, 2022;
originally announced July 2022.
-
Pandemic Vulnerability Index of US Cities: A Hybrid Knowledge-based and Data-driven Approach
Authors:
Md. Shahinoor Rahman,
Kamal Chandra Paul,
Md. Mokhlesur Rahman,
Jim Samuel,
Jean-Claude Thill,
Md. Amjad Hossain,
G. G. Md. Nawaz Ali
Abstract:
Cities become mission-critical zones during pandemics and it is vital to develop a better understanding of the factors that are associated with infection levels. The COVID-19 pandemic has impacted many cities severely; however, there is significant variance in its impact across cities. Pandemic infection levels are associated with inherent features of cities (e.g., population size, density, mobili…
▽ More
Cities become mission-critical zones during pandemics and it is vital to develop a better understanding of the factors that are associated with infection levels. The COVID-19 pandemic has impacted many cities severely; however, there is significant variance in its impact across cities. Pandemic infection levels are associated with inherent features of cities (e.g., population size, density, mobility patterns, socioeconomic condition, and health environment), which need to be better understood. Intuitively, the infection levels are expected to be higher in big urban agglomerations, but the measurable influence of a specific urban feature is unclear. The present study examines 41 variables and their potential influence on COVID-19 cases and fatalities. The study uses a multi-method approach to study the influence of variables, classified as demographic, socioeconomic, mobility and connectivity, urban form and density, and health and environment dimensions. This study develops an index dubbed the PVI-CI for classifying the pandemic vulnerability levels of cities, grou** them into five vulnerability classes, from very high to very low. Furthermore, clustering and outlier analysis provides insights on the spatial clustering of cities with high and low vulnerability scores. This study provides strategic insights into levels of influence of key variables upon the spread of infections as well as fatalities, along with an objective ranking for the vulnerability of cities. Thus it provides critical wisdom needed for urban healthcare policy and resource management. The pandemic vulnerability index calculation method and the process present a blueprint for the development of similar indices for cities in other countries, leading to a better understanding and improved pandemic management for urban areas and post-pandemic urban planning across the world.
△ Less
Submitted 11 March, 2022;
originally announced March 2022.
-
Whole MILC: generalizing learned dynamics across tasks, datasets, and populations
Authors:
Usman Mahmood,
Md Mahfuzur Rahman,
Alex Fedorov,
Noah Lewis,
Zening Fu,
Vince D. Calhoun,
Sergey M. Plis
Abstract:
Behavioral changes are the earliest signs of a mental disorder, but arguably, the dynamics of brain function gets affected even earlier. Subsequently, spatio-temporal structure of disorder-specific dynamics is crucial for early diagnosis and understanding the disorder mechanism. A common way of learning discriminatory features relies on training a classifier and evaluating feature importance. Clas…
▽ More
Behavioral changes are the earliest signs of a mental disorder, but arguably, the dynamics of brain function gets affected even earlier. Subsequently, spatio-temporal structure of disorder-specific dynamics is crucial for early diagnosis and understanding the disorder mechanism. A common way of learning discriminatory features relies on training a classifier and evaluating feature importance. Classical classifiers, based on handcrafted features are quite powerful, but suffer the curse of dimensionality when applied to large input dimensions of spatio-temporal data. Deep learning algorithms could handle the problem and a model introspection could highlight discriminatory spatio-temporal regions but need way more samples to train. In this paper we present a novel self supervised training schema which reinforces whole sequence mutual information local to context (whole MILC). We pre-train the whole MILC model on unlabeled and unrelated healthy control data. We test our model on three different disorders (i) Schizophrenia (ii) Autism and (iii) Alzheimers and four different studies. Our algorithm outperforms existing self-supervised pre-training methods and provides competitive classification results to classical machine learning algorithms. Importantly, whole MILC enables attribution of subject diagnosis to specific spatio-temporal regions in the fMRI signal.
△ Less
Submitted 18 June, 2021; v1 submitted 29 July, 2020;
originally announced July 2020.
-
Human Activity Recognition from Wearable Sensor Data Using Self-Attention
Authors:
Saif Mahmud,
M Tanjid Hasan Tonmoy,
Kishor Kumar Bhaumik,
A K M Mahbubur Rahman,
M Ashraful Amin,
Mohammad Shoyaib,
Muhammad Asif Hossain Khan,
Amin Ahsan Ali
Abstract:
Human Activity Recognition from body-worn sensor data poses an inherent challenge in capturing spatial and temporal dependencies of time-series signals. In this regard, the existing recurrent or convolutional or their hybrid models for activity recognition struggle to capture spatio-temporal context from the feature space of sensor reading sequence. To address this complex problem, we propose a se…
▽ More
Human Activity Recognition from body-worn sensor data poses an inherent challenge in capturing spatial and temporal dependencies of time-series signals. In this regard, the existing recurrent or convolutional or their hybrid models for activity recognition struggle to capture spatio-temporal context from the feature space of sensor reading sequence. To address this complex problem, we propose a self-attention based neural network model that foregoes recurrent architectures and utilizes different types of attention mechanisms to generate higher dimensional feature representation used for classification. We performed extensive experiments on four popular publicly available HAR datasets: PAMAP2, Opportunity, Skoda and USC-HAD. Our model achieve significant performance improvement over recent state-of-the-art models in both benchmark test subjects and Leave-one-subject-out evaluation. We also observe that the sensor attention maps produced by our model is able capture the importance of the modality and placement of the sensors in predicting the different activity classes.
△ Less
Submitted 17 March, 2020;
originally announced March 2020.
-
iPromoter-BnCNN: a Novel Branched CNN Based Predictor for Identifying and Classifying Sigma Promoters
Authors:
Ruhul Amin,
Chowdhury Rafeed Rahman,
Md. Habibur Rahman Sifat,
Md Nazmul Khan Liton,
Md. Moshiur Rahman,
Swakkhar Shatabda,
Sajid Ahmed
Abstract:
Promoter is a short region of DNA which is responsible for initiating transcription of specific genes. Development of computational tools for automatic identification of promoters is in high demand. According to the difference of functions, promoters can be of different types. Promoters may have both intra and inter class variation and similarity in terms of consensus sequences. Accurate classific…
▽ More
Promoter is a short region of DNA which is responsible for initiating transcription of specific genes. Development of computational tools for automatic identification of promoters is in high demand. According to the difference of functions, promoters can be of different types. Promoters may have both intra and inter class variation and similarity in terms of consensus sequences. Accurate classification of various types of sigma promoters still remains a challenge. We present iPromoter-BnCNN for identification and accurate classification of six types of promoters - sigma24, sigma28, sigma32, sigma38, sigma54, sigma70. It is a Convolutional Neural Network (CNN) based classifier which combines local features related to monomer nucleotide sequence, trimer nucleotide sequence, dimer structural properties and trimer structural properties through the use of parallel branching. We conducted experiments on a benchmark dataset and compared with two state-of-the-art tools to show our supremacy on 5-fold cross-validation. Moreover, we tested our classifier on an independent test dataset. Our proposed tool iPromoter-BnCNN web server is freely available at http://103.109.52.8/iPromoter-BnCNN. The runnable source code can be found at https://colab.research.google.com/drive/1yWWh7BXhsm8U4PODgPqlQRy23QGjF2DZ.
△ Less
Submitted 16 June, 2020; v1 submitted 21 December, 2019;
originally announced December 2019.
-
Transfer Learning of fMRI Dynamics
Authors:
Usman Mahmood,
Md Mahfuzur Rahman,
Alex Fedorov,
Zening Fu,
Sergey Plis
Abstract:
As a mental disorder progresses, it may affect brain structure, but brain function expressed in brain dynamics is affected much earlier. Capturing the moment when brain dynamics express the disorder is crucial for early diagnosis. The traditional approach to this problem via training classifiers either proceeds from handcrafted features or requires large datasets to combat the $m>>n$ problem when…
▽ More
As a mental disorder progresses, it may affect brain structure, but brain function expressed in brain dynamics is affected much earlier. Capturing the moment when brain dynamics express the disorder is crucial for early diagnosis. The traditional approach to this problem via training classifiers either proceeds from handcrafted features or requires large datasets to combat the $m>>n$ problem when a high dimensional fMRI volume only has a single label that carries learning signal. Large datasets may not be available for a study of each disorder, or rare disorder types or sub-populations may not warrant for them. In this paper, we demonstrate a self-supervised pre-training method that enables us to pre-train directly on fMRI dynamics of healthy control subjects and transfer the learning to much smaller datasets of schizophrenia. Not only we enable classification of disorder directly based on fMRI dynamics in small data but also significantly speed up the learning when possible. This is encouraging evidence of informative transfer learning across datasets and diagnostic categories.
△ Less
Submitted 16 November, 2019;
originally announced November 2019.
-
GLIMPS: A Greedy Mixed Integer Approach for Super Robust Matched Subspace Detection
Authors:
Md Mahfuzur Rahman,
Daniel Pimentel-Alarcon
Abstract:
Due to diverse nature of data acquisition and modern applications, many contemporary problems involve high dimensional datum $\x \in \R^\d$ whose entries often lie in a union of subspaces and the goal is to find out which entries of $\x$ match with a particular subspace $\sU$, classically called \emph {matched subspace detection}. Consequently, entries that match with one subspace are considered a…
▽ More
Due to diverse nature of data acquisition and modern applications, many contemporary problems involve high dimensional datum $\x \in \R^\d$ whose entries often lie in a union of subspaces and the goal is to find out which entries of $\x$ match with a particular subspace $\sU$, classically called \emph {matched subspace detection}. Consequently, entries that match with one subspace are considered as inliers w.r.t the subspace while all other entries are considered as outliers. Proportion of outliers relative to each subspace varies based on the degree of coordinates from subspaces. This problem is a combinatorial NP-hard in nature and has been immensely studied in recent years. Existing approaches can solve the problem when outliers are sparse. However, if outliers are abundant or in other words if $\x$ contains coordinates from a fair amount of subspaces, this problem can't be solved with acceptable accuracy or within a reasonable amount of time. This paper proposes a two-stage approach called \emph{Greedy Linear Integer Mixed Programmed Selector} (GLIMPS) for this abundant-outliers setting, which combines a greedy algorithm and mixed integer formulation and can tolerate over 80\% outliers, outperforming the state-of-the-art.
△ Less
Submitted 29 October, 2019;
originally announced October 2019.
-
Unfolding the Structure of a Document using Deep Learning
Authors:
Muhammad Mahbubur Rahman,
Tim Finin
Abstract:
Understanding and extracting of information from large documents, such as business opportunities, academic articles, medical documents and technical reports, poses challenges not present in short documents. Such large documents may be multi-themed, complex, noisy and cover diverse topics. We describe a framework that can analyze large documents and help people and computer systems locate desired i…
▽ More
Understanding and extracting of information from large documents, such as business opportunities, academic articles, medical documents and technical reports, poses challenges not present in short documents. Such large documents may be multi-themed, complex, noisy and cover diverse topics. We describe a framework that can analyze large documents and help people and computer systems locate desired information in them. We aim to automatically identify and classify different sections of documents and understand their purpose within the document. A key contribution of our research is modeling and extracting the logical and semantic structure of electronic documents using deep learning techniques. We evaluate the effectiveness and robustness of our framework through extensive experiments on two collections: more than one million scholarly articles from arXiv and a collection of requests for proposal documents from government sources.
△ Less
Submitted 29 September, 2019;
originally announced October 2019.
-
Understanding and representing the semantics of large structured documents
Authors:
Muhammad Mahbubur Rahman,
Tim Finin
Abstract:
Understanding large, structured documents like scholarly articles, requests for proposals or business reports is a complex and difficult task. It involves discovering a document's overall purpose and subject(s), understanding the function and meaning of its sections and subsections, and extracting low level entities and facts about them. In this research, we present a deep learning based document…
▽ More
Understanding large, structured documents like scholarly articles, requests for proposals or business reports is a complex and difficult task. It involves discovering a document's overall purpose and subject(s), understanding the function and meaning of its sections and subsections, and extracting low level entities and facts about them. In this research, we present a deep learning based document ontology to capture the general purpose semantic structure and domain specific semantic concepts from a large number of academic articles and business documents. The ontology is able to describe different functional parts of a document, which can be used to enhance semantic indexing for a better understanding by human beings and machines. We evaluate our models through extensive experiments on datasets of scholarly articles from arXiv and Request for Proposal documents.
△ Less
Submitted 24 July, 2018;
originally announced July 2018.