Search | arXiv e-print repository

GRACE-C: Generalized Rate Agnostic Causal Estimation via Constraints

Authors: Mohammadsajad Abavisani, David Danks, Sergey Plis

Abstract: Graphical structures estimated by causal learning algorithms from time series data can provide misleading causal information if the causal timescale of the generating process fails to match the measurement timescale of the data. Existing algorithms provide limited resources to respond to this challenge, and so researchers must either use models that they know are likely misleading, or else forego… ▽ More Graphical structures estimated by causal learning algorithms from time series data can provide misleading causal information if the causal timescale of the generating process fails to match the measurement timescale of the data. Existing algorithms provide limited resources to respond to this challenge, and so researchers must either use models that they know are likely misleading, or else forego causal learning entirely. Existing methods face up-to-four distinct shortfalls, as they might 1) require that the difference between causal and measurement timescales is known; 2) only handle very small number of random variables when the timescale difference is unknown; 3) only apply to pairs of variables; or 4) be unable to find a solution given statistical noise in the data. This research addresses these challenges. Our approach combines constraint programming with both theoretical insights into the problem structure and prior information about admissible causal interactions to achieve multiple orders of magnitude in speed-up. The resulting system maintains theoretical guarantees while scaling to significantly larger sets of random variables (>100) without knowledge of timescale differences. This method is also robust to edge misidentification and can use parametric connection strengths, while optionally finding the optimal solution among many possible ones. △ Less

Submitted 21 May, 2024; v1 submitted 18 May, 2022; originally announced May 2022.

Comments: published in International Conference on Learning Representation (Spotlight)

arXiv:2004.07407 [pdf, other]

Radiologist-Level COVID-19 Detection Using CT Scans with Detail-Oriented Capsule Networks

Authors: Aryan Mobiny, Pietro Antonio Cicalese, Samira Zare, Pengyu Yuan, Mohammadsajad Abavisani, Carol C. Wu, Jitesh Ahuja, Patricia M. de Groot, Hien Van Nguyen

Abstract: Radiographic images offer an alternative method for the rapid screening and monitoring of Coronavirus Disease 2019 (COVID-19) patients. This approach is limited by the shortage of radiology experts who can provide a timely interpretation of these images. Motivated by this challenge, our paper proposes a novel learning architecture, called Detail-Oriented Capsule Networks (DECAPS), for the automati… ▽ More Radiographic images offer an alternative method for the rapid screening and monitoring of Coronavirus Disease 2019 (COVID-19) patients. This approach is limited by the shortage of radiology experts who can provide a timely interpretation of these images. Motivated by this challenge, our paper proposes a novel learning architecture, called Detail-Oriented Capsule Networks (DECAPS), for the automatic diagnosis of COVID-19 from Computed Tomography (CT) scans. Our network combines the strength of Capsule Networks with several architecture improvements meant to boost classification accuracies. First, DECAPS uses an Inverted Dynamic Routing mechanism which increases model stability by preventing the passage of information from non-descriptive regions. Second, DECAPS employs a Peekaboo training procedure which uses a two-stage patch crop and drop strategy to encourage the network to generate activation maps for every target concept. The network then uses the activation maps to focus on regions of interest and combines both coarse and fine-grained representations of the data. Finally, we use a data augmentation method based on conditional generative adversarial networks to deal with the issue of data scarcity. Our model achieves 84.3% precision, 91.5% recall, and 96.1% area under the ROC curve, significantly outperforming state-of-the-art methods. We compare the performance of the DECAPS model with three experienced, well-trained thoracic radiologists and show that the architecture significantly outperforms them. While further studies on larger datasets are required to confirm this finding, our results imply that architectures like DECAPS can be used to assist radiologists in the CT scan mediated diagnosis of COVID-19. △ Less

Submitted 15 April, 2020; originally announced April 2020.

arXiv:2004.04917 [pdf, other]

Multimodal Categorization of Crisis Events in Social Media

Authors: Mahdi Abavisani, Liwei Wu, Shengli Hu, Joel Tetreault, Alejandro Jaimes

Abstract: Recent developments in image classification and natural language processing, coupled with the rapid growth in social media usage, have enabled fundamental advances in detecting breaking events around the world in real-time. Emergency response is one such area that stands to gain from these advances. By processing billions of texts and images a minute, events can be automatically detected to enable… ▽ More Recent developments in image classification and natural language processing, coupled with the rapid growth in social media usage, have enabled fundamental advances in detecting breaking events around the world in real-time. Emergency response is one such area that stands to gain from these advances. By processing billions of texts and images a minute, events can be automatically detected to enable emergency response workers to better assess rapidly evolving situations and deploy resources accordingly. To date, most event detection techniques in this area have focused on image-only or text-only approaches, limiting detection performance and impacting the quality of information delivered to crisis response teams. In this paper, we present a new multimodal fusion method that leverages both images and texts as input. In particular, we introduce a cross-attention module that can filter uninformative and misleading components from weak modalities on a sample by sample basis. In addition, we employ a multimodal graph-based approach to stochastically transition between embeddings of different multimodal pairs during training to better regularize the learning process as well as dealing with limited training data by constructing new matched pairs from different samples. We show that our method outperforms the unimodal approaches and strong multimodal baselines by a large margin on three crisis-related tasks. △ Less

Submitted 10 April, 2020; originally announced April 2020.

Comments: Conference on Computer Vision and Pattern Recognition (CVPR 2020)

ACM Class: I.5.4

Journal ref: Conference on Computer Vision and Pattern Recognition (CVPR 2020)

arXiv:1908.00704 [pdf, other]

Greedy AutoAugment

Authors: Alireza Naghizadeh, Mohammadsajad Abavisani, Dimitris N. Metaxas

Abstract: A major problem in data augmentation is to ensure that the generated new samples cover the search space. This is a challenging problem and requires exploration for data augmentation policies to ensure their effectiveness in covering the search space. In this paper, we propose Greedy AutoAugment as a highly efficient search algorithm to find the best augmentation policies. We use a greedy approach… ▽ More A major problem in data augmentation is to ensure that the generated new samples cover the search space. This is a challenging problem and requires exploration for data augmentation policies to ensure their effectiveness in covering the search space. In this paper, we propose Greedy AutoAugment as a highly efficient search algorithm to find the best augmentation policies. We use a greedy approach to reduce the exponential growth of the number of possible trials to linear growth. The Greedy Search also helps us to lead the search towards the sub-policies with better results, which eventually helps to increase the accuracy. The proposed method can be used as a reliable addition to the current artifitial neural networks. Our experiments on four datasets (Tiny ImageNet, CIFAR-10, CIFAR-100, and SVHN) show that Greedy AutoAugment provides better accuracy, while using 360 times fewer computational resources. △ Less

Submitted 6 October, 2020; v1 submitted 2 August, 2019; originally announced August 2019.

Comments: Pattern Recognition Letters (2020)

arXiv:1904.11093 [pdf, other]

doi 10.1109/LSP.2019.2913022

Deep Sparse Representation-based Classification

Authors: Mahdi Abavisani, Vishal M. Patel

Abstract: We present a transductive deep learning-based formulation for the sparse representation-based classification (SRC) method. The proposed network consists of a convolutional autoencoder along with a fully-connected layer. The role of the autoencoder network is to learn robust deep features for classification. On the other hand, the fully-connected layer, which is placed in between the encoder and th… ▽ More We present a transductive deep learning-based formulation for the sparse representation-based classification (SRC) method. The proposed network consists of a convolutional autoencoder along with a fully-connected layer. The role of the autoencoder network is to learn robust deep features for classification. On the other hand, the fully-connected layer, which is placed in between the encoder and the decoder networks, is responsible for finding the sparse representation. The estimated sparse codes are then used for classification. Various experiments on three different datasets show that the proposed network leads to sparse representations that give better classification results than state-of-the-art SRC methods. The source code is available at: github.com/mahdiabavisani/DSRC. △ Less

Submitted 24 April, 2019; originally announced April 2019.

MSC Class: 68T45; 62H30 ACM Class: I.5.3; I.2.10

Journal ref: IEEE Signal Processing Letters, 2019

arXiv:1812.06145 [pdf, other]

Improving the Performance of Unimodal Dynamic Hand-Gesture Recognition with Multimodal Training

Authors: Mahdi Abavisani, Hamid Reza Vaezi Joze, Vishal M. Patel

Abstract: We present an efficient approach for leveraging the knowledge from multiple modalities in training unimodal 3D convolutional neural networks (3D-CNNs) for the task of dynamic hand gesture recognition. Instead of explicitly combining multimodal information, which is commonplace in many state-of-the-art methods, we propose a different framework in which we embed the knowledge of multiple modalities… ▽ More We present an efficient approach for leveraging the knowledge from multiple modalities in training unimodal 3D convolutional neural networks (3D-CNNs) for the task of dynamic hand gesture recognition. Instead of explicitly combining multimodal information, which is commonplace in many state-of-the-art methods, we propose a different framework in which we embed the knowledge of multiple modalities in individual networks so that each unimodal network can achieve an improved performance. In particular, we dedicate separate networks per available modality and enforce them to collaborate and learn to develop networks with common semantics and better representations. We introduce a "spatiotemporal semantic alignment" loss (SSA) to align the content of the features from different networks. In addition, we regularize this loss with our proposed "focal regularization parameter" to avoid negative knowledge transfer. Experimental results show that our framework improves the test time recognition accuracy of unimodal networks, and provides the state-of-the-art performance on various dynamic hand gesture recognition datasets. △ Less

Submitted 12 August, 2019; v1 submitted 14 December, 2018; originally announced December 2018.

MSC Class: 68T45; 62H30 ACM Class: I.5.3; I.2.10

Journal ref: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 1165-1174

arXiv:1804.06498 [pdf, other]

doi 10.1109/JSTSP.2018.2875385

Deep Multimodal Subspace Clustering Networks

Authors: Mahdi Abavisani, Vishal M. Patel

Abstract: We present convolutional neural network (CNN) based approaches for unsupervised multimodal subspace clustering. The proposed framework consists of three main stages - multimodal encoder, self-expressive layer, and multimodal decoder. The encoder takes multimodal data as input and fuses them to a latent space representation. The self-expressive layer is responsible for enforcing the self-expressive… ▽ More We present convolutional neural network (CNN) based approaches for unsupervised multimodal subspace clustering. The proposed framework consists of three main stages - multimodal encoder, self-expressive layer, and multimodal decoder. The encoder takes multimodal data as input and fuses them to a latent space representation. The self-expressive layer is responsible for enforcing the self-expressiveness property and acquiring an affinity matrix corresponding to the data points. The decoder reconstructs the original input data. The network uses the distance between the decoder's reconstruction and the original input in its training. We investigate early, late and intermediate fusion techniques and propose three different encoders corresponding to them for spatial fusion. The self-expressive layers and multimodal decoders are essentially the same for different spatial fusion-based approaches. In addition to various spatial fusion-based methods, an affinity fusion-based network is also proposed in which the self-expressive layer corresponding to different modalities is enforced to be the same. Extensive experiments on three datasets show that the proposed methods significantly outperform the state-of-the-art multimodal subspace clustering methods. △ Less

Submitted 4 January, 2019; v1 submitted 17 April, 2018; originally announced April 2018.

MSC Class: 68T45; 62H30 ACM Class: I.5.3; I.2.10

Journal ref: IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 6, pp. 1601-1614, Dec. 2018

arXiv:1711.09334 [pdf, other]

In2I : Unsupervised Multi-Image-to-Image Translation Using Generative Adversarial Networks

Authors: Pramuditha Perera, Mahdi Abavisani, Vishal M. Patel

Abstract: In unsupervised image-to-image translation, the goal is to learn the map** between an input image and an output image using a set of unpaired training images. In this paper, we propose an extension of the unsupervised image-to-image translation problem to multiple input setting. Given a set of paired images from multiple modalities, a transformation is learned to translate the input into a speci… ▽ More In unsupervised image-to-image translation, the goal is to learn the map** between an input image and an output image using a set of unpaired training images. In this paper, we propose an extension of the unsupervised image-to-image translation problem to multiple input setting. Given a set of paired images from multiple modalities, a transformation is learned to translate the input into a specified domain. For this purpose, we introduce a Generative Adversarial Network (GAN) based framework along with a multi-modal generator structure and a new loss term, latent consistency loss. Through various experiments we show that leveraging multiple inputs generally improves the visual quality of the translated images. Moreover, we show that the proposed method outperforms current state-of-the-art unsupervised image-to-image translation methods. △ Less

Submitted 25 November, 2017; originally announced November 2017.

Showing 1–8 of 8 results for author: Abavisani, M