Search | arXiv e-print repository

doi 10.1109/ICIP49359.2023.10221899

IKD+: Reliable Low Complexity Deep Models For Retinopathy Classification

Authors: Shreyas Bhat Brahmavar, Rohit Rajesh, Tirtharaj Dash, Lovekesh Vig, Tanmay Tulsidas Verlekar, Md Mahmudul Hasan, Tariq Khan, Erik Meijering, Ashwin Srinivasan

Abstract: Deep neural network (DNN) models for retinopathy have estimated predictive accuracies in the mid-to-high 90%. However, the following aspects remain unaddressed: State-of-the-art models are complex and require substantial computational infrastructure to train and deploy; The reliability of predictions can vary widely. In this paper, we focus on these aspects and propose a form of iterative knowledg… ▽ More Deep neural network (DNN) models for retinopathy have estimated predictive accuracies in the mid-to-high 90%. However, the following aspects remain unaddressed: State-of-the-art models are complex and require substantial computational infrastructure to train and deploy; The reliability of predictions can vary widely. In this paper, we focus on these aspects and propose a form of iterative knowledge distillation(IKD), called IKD+ that incorporates a tradeoff between size, accuracy and reliability. We investigate the functioning of IKD+ using two widely used techniques for estimating model calibration (Platt-scaling and temperature-scaling), using the best-performing model available, which is an ensemble of EfficientNets with approximately 100M parameters. We demonstrate that IKD+ equipped with temperature-scaling results in models that show up to approximately 500-fold decreases in the number of parameters than the original ensemble without a significant loss in accuracy. In addition, calibration scores (reliability) for the IKD+ models are as good as or better than the base mode △ Less

Submitted 3 March, 2023; originally announced March 2023.

Comments: Submitted to IEEE International Conference on Image Processing (ICIP 2023)

Journal ref: IEEE International Conference on Image Processing (ICIP 2023)

arXiv:2302.09833 [pdf, other]

doi 10.1109/EMBC40787.2023.10340659

Domain-Specific Pre-training Improves Confidence in Whole Slide Image Classification

Authors: Soham Rohit Chitnis, Sidong Liu, Tirtharaj Dash, Tanmay Tulsidas Verlekar, Antonio Di Ieva, Shlomo Berkovsky, Lovekesh Vig, Ashwin Srinivasan

Abstract: Whole Slide Images (WSIs) or histopathology images are used in digital pathology. WSIs pose great challenges to deep learning models for clinical diagnosis, owing to their size and lack of pixel-level annotations. With the recent advancements in computational pathology, newer multiple-instance learning-based models have been proposed. Multiple-instance learning for WSIs necessitates creating patch… ▽ More Whole Slide Images (WSIs) or histopathology images are used in digital pathology. WSIs pose great challenges to deep learning models for clinical diagnosis, owing to their size and lack of pixel-level annotations. With the recent advancements in computational pathology, newer multiple-instance learning-based models have been proposed. Multiple-instance learning for WSIs necessitates creating patches and uses the encoding of these patches for diagnosis. These models use generic pre-trained models (ResNet-50 pre-trained on ImageNet) for patch encoding. The recently proposed KimiaNet, a DenseNet121 model pre-trained on TCGA slides, is a domain-specific pre-trained model. This paper shows the effect of domain-specific pre-training on WSI classification. To investigate the effect of domain-specific pre-training, we considered the current state-of-the-art multiple-instance learning models, 1) CLAM, an attention-based model, and 2) TransMIL, a self-attention-based model, and evaluated the models' confidence and predictive performance in detecting primary brain tumors - gliomas. Domain-specific pre-training improves the confidence of the models and also achieves a new state-of-the-art performance of WSI-based glioma subtype classification, showing a high clinical applicability in assisting glioma diagnosis. We will publicly share our code and experimental results at https://github.com/soham-chitnis10/WSI-domain-specific. △ Less

Submitted 3 May, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

Comments: Accepted in EMBC 2023

Journal ref: Annu Int Conf IEEE Eng Med Biol Soc (EMBC 2023)

arXiv:2212.10005 [pdf, other]

Calibrating Deep Neural Networks using Explicit Regularisation and Dynamic Data Pruning

Authors: Ramya Hebbalaguppe, Rishabh Patra, Tirtharaj Dash, Gautam Shroff, Lovekesh Vig

Abstract: Deep neural networks (DNN) are prone to miscalibrated predictions, often exhibiting a mismatch between the predicted output and the associated confidence scores. Contemporary model calibration techniques mitigate the problem of overconfident predictions by pushing down the confidence of the winning class while increasing the confidence of the remaining classes across all test samples. However, fro… ▽ More Deep neural networks (DNN) are prone to miscalibrated predictions, often exhibiting a mismatch between the predicted output and the associated confidence scores. Contemporary model calibration techniques mitigate the problem of overconfident predictions by pushing down the confidence of the winning class while increasing the confidence of the remaining classes across all test samples. However, from a deployment perspective, an ideal model is desired to (i) generate well-calibrated predictions for high-confidence samples with predicted probability say >0.95, and (ii) generate a higher proportion of legitimate high-confidence samples. To this end, we propose a novel regularization technique that can be used with classification losses, leading to state-of-the-art calibrated predictions at test time; From a deployment standpoint in safety-critical applications, only high-confidence samples from a well-calibrated model are of interest, as the remaining samples have to undergo manual inspection. Predictive confidence reduction of these potentially ``high-confidence samples'' is a downside of existing calibration approaches. We mitigate this by proposing a dynamic train-time data pruning strategy that prunes low-confidence samples every few epochs, providing an increase in "confident yet calibrated samples". We demonstrate state-of-the-art calibration performance across image classification benchmarks, reducing training time without much compromise in accuracy. We provide insights into why our dynamic pruning strategy that prunes low-confidence training samples leads to an increase in high-confidence samples at test time. △ Less

Submitted 20 December, 2022; originally announced December 2022.

Comments: The paper is accepted at Winter Conference on applications of Computer Vision (IEEE WACV) in algorithms tracks. 8 pages Main paper; 3 pages supplementary material

arXiv:2209.08750 [pdf, other]

Knowledge-based Analogical Reasoning in Neuro-symbolic Latent Spaces

Authors: Vishwa Shah, Aditya Sharma, Gautam Shroff, Lovekesh Vig, Tirtharaj Dash, Ashwin Srinivasan

Abstract: Analogical Reasoning problems challenge both connectionist and symbolic AI systems as these entail a combination of background knowledge, reasoning and pattern recognition. While symbolic systems ingest explicit domain knowledge and perform deductive reasoning, they are sensitive to noise and require inputs be mapped to preset symbolic features. Connectionist systems on the other hand can directly… ▽ More Analogical Reasoning problems challenge both connectionist and symbolic AI systems as these entail a combination of background knowledge, reasoning and pattern recognition. While symbolic systems ingest explicit domain knowledge and perform deductive reasoning, they are sensitive to noise and require inputs be mapped to preset symbolic features. Connectionist systems on the other hand can directly ingest rich input spaces such as images, text or speech and recognize pattern even with noisy inputs. However, connectionist models struggle to include explicit domain knowledge for deductive reasoning. In this paper, we propose a framework that combines the pattern recognition abilities of neural networks with symbolic reasoning and background knowledge for solving a class of Analogical Reasoning problems where the set of attributes and possible relations across them are known apriori. We take inspiration from the 'neural algorithmic reasoning' approach [DeepMind 2020] and use problem-specific background knowledge by (i) learning a distributed representation based on a symbolic model of the problem (ii) training neural-network transformations reflective of the relations involved in the problem and finally (iii) training a neural network encoder from images to the distributed representation in (i). These three elements enable us to perform search-based reasoning using neural networks as elementary functions manipulating distributed representations. We test this on visual analogy problems in RAVENs Progressive Matrices, and achieve accuracy competitive with human performance and, in certain cases, superior to initial end-to-end neural-network based approaches. While recent neural models trained at scale yield SOTA, our novel neuro-symbolic reasoning approach is a promising direction for this problem, and is arguably more general, especially for problems where domain knowledge is available. △ Less

Submitted 19 September, 2022; originally announced September 2022.

Comments: 13 pages, 4 figures, Accepted at 16th International Workshop on Neural-Symbolic Learning and Reasoning as part of the 2nd International Joint Conference on Learning & Reasoning (IJCLR 2022)

arXiv:2209.02449 [pdf, other]

Efficient quantum non-fungible tokens for blockchain

Authors: Subhash Shankar Pandey, Tadasha Dash, Prasanta K. Panigrahi, Ahmed Farouk

Abstract: Blockchain is a decentralized system that allows transaction transmission and storage according to the roles of the Consensus algorithm and Smart contracts. Non-fungible tokens (NFTs) consolidate the best characteristics of blockchain technology to deliver unique and bona fide tokens, each with distinctive attributes with non-fungible resources. Unfortunately, current classical NFTs are suffering… ▽ More Blockchain is a decentralized system that allows transaction transmission and storage according to the roles of the Consensus algorithm and Smart contracts. Non-fungible tokens (NFTs) consolidate the best characteristics of blockchain technology to deliver unique and bona fide tokens, each with distinctive attributes with non-fungible resources. Unfortunately, current classical NFTs are suffering from high costs regarding the consumed power of mining and lack of security. Therefore, this paper presents a new protocol for preparing quantum non-fungible tokens where a quantum state representing NFT is mounted on a blockchain instead of physically giving it to the owner. The proposed scheme is simulated and analyzed against various attacks and proves its ability to secure against them. Furthermore, the presented protocol provides reliable and cheaper NFTs than the classical one. △ Less

Submitted 2 September, 2022; originally announced September 2022.

Comments: 8 pages, 12 figures

arXiv:2206.09258 [pdf, other]

Machine Learning in Sports: A Case Study on Using Explainable Models for Predicting Outcomes of Volleyball Matches

Authors: Abhinav Lalwani, Aman Saraiya, Apoorv Singh, Aditya Jain, Tirtharaj Dash

Abstract: Machine Learning has become an integral part of engineering design and decision making in several domains, including sports. Deep Neural Networks (DNNs) have been the state-of-the-art methods for predicting outcomes of professional sports events. However, apart from getting highly accurate predictions on these sports events outcomes, it is necessary to answer questions such as "Why did the model p… ▽ More Machine Learning has become an integral part of engineering design and decision making in several domains, including sports. Deep Neural Networks (DNNs) have been the state-of-the-art methods for predicting outcomes of professional sports events. However, apart from getting highly accurate predictions on these sports events outcomes, it is necessary to answer questions such as "Why did the model predict that Team A would win Match X against Team B?" DNNs are inherently black-box in nature. Therefore, it is required to provide high-quality interpretable, and understandable explanations for a model's prediction in sports. This paper explores a two-phased Explainable Artificial Intelligence(XAI) approach to predict outcomes of matches in the Brazilian volleyball League (SuperLiga). In the first phase, we directly use the interpretable rule-based ML models that provide a global understanding of the model's behaviors based on Boolean Rule Column Generation (BRCG; extracts simple AND-OR classification rules) and Logistic Regression (LogReg; allows to estimate the feature importance scores). In the second phase, we construct non-linear models such as Support Vector Machine (SVM) and Deep Neural Network (DNN) to obtain predictive performance on the volleyball matches' outcomes. We construct the "post-hoc" explanations for each data instance using ProtoDash, a method that finds prototypes in the training dataset that are most similar to the test instance, and SHAP, a method that estimates the contribution of each feature on the model's prediction. We evaluate the SHAP explanations using the faithfulness metric. Our results demonstrate the effectiveness of the explanations for the model's predictions. △ Less

Submitted 18 June, 2022; originally announced June 2022.

Comments: 9 pages, 1 figure, accepted to 2nd International Conference on Sports Engineering (ICSE 2021)

arXiv:2206.00738 [pdf, other]

doi 10.1007/s10994-023-06399-6

Composition of Relational Features with an Application to Explaining Black-Box Predictors

Authors: Ashwin Srinivasan, A Baskar, Tirtharaj Dash, Devanshu Shah

Abstract: Relational machine learning programs like those developed in Inductive Logic Programming (ILP) offer several advantages: (1) The ability to model complex relationships amongst data instances; (2) The use of domain-specific relations during model construction; and (3) The models constructed are human-readable, which is often one step closer to being human-understandable. However, these ILP-like met… ▽ More Relational machine learning programs like those developed in Inductive Logic Programming (ILP) offer several advantages: (1) The ability to model complex relationships amongst data instances; (2) The use of domain-specific relations during model construction; and (3) The models constructed are human-readable, which is often one step closer to being human-understandable. However, these ILP-like methods have not been able to capitalise fully on the rapid hardware, software and algorithmic developments fuelling current developments in deep neural networks. In this paper, we treat relational features as functions and use the notion of generalised composition of functions to derive complex functions from simpler ones. We formulate the notion of a set of $\text{M}$-simple features in a mode language $\text{M}$ and identify two composition operators ($ρ_1$ and $ρ_2$) from which all possible complex features can be derived. We use these results to implement a form of "explainable neural network" called Compositional Relational Machines, or CRMs, which are labelled directed-acyclic graphs. The vertex-label for any vertex $j$ in the CRM contains a feature-function $f_j$ and a continuous activation function $g_j$. If $j$ is a "non-input" vertex, then $f_j$ is the composition of features associated with vertices in the direct predecessors of $j$. Our focus is on CRMs in which input vertices (those without any direct predecessors) all have $\text{M}$-simple features in their vertex-labels. We provide a randomised procedure for constructing and learning such CRMs. Using a notion of explanations based on the compositional structure of features in a CRM, we provide empirical evidence on synthetic data of the ability to identify appropriate explanations; and demonstrate the use of CRMs as 'explanation machines' for black-box models that do not provide explanations for their predictions. △ Less

Submitted 6 May, 2023; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: 47 pages; Revision1 for Machine Learning Journal (MLJ)

MSC Class: 68T07; 68T05; 68T27; 68T30 ACM Class: I.2.6

Journal ref: Mach Learn (2023)

arXiv:2111.10361 [pdf, other]

Solving Visual Analogies Using Neural Algorithmic Reasoning

Authors: Atharv Sonwane, Gautam Shroff, Lovekesh Vig, Ashwin Srinivasan, Tirtharaj Dash

Abstract: We consider a class of visual analogical reasoning problems that involve discovering the sequence of transformations by which pairs of input/output images are related, so as to analogously transform future inputs. This program synthesis task can be easily solved via symbolic search. Using a variation of the `neural analogical reasoning' approach of (Velickovic and Blundell 2021), we instead search… ▽ More We consider a class of visual analogical reasoning problems that involve discovering the sequence of transformations by which pairs of input/output images are related, so as to analogously transform future inputs. This program synthesis task can be easily solved via symbolic search. Using a variation of the `neural analogical reasoning' approach of (Velickovic and Blundell 2021), we instead search for a sequence of elementary neural network transformations that manipulate distributed representations derived from a symbolic space, to which input images are directly encoded. We evaluate the extent to which our `neural reasoning' approach generalizes for images with unseen shapes and positions. △ Less

Submitted 19 November, 2021; originally announced November 2021.

Comments: 20 pages. Contains extended abstract accepted at the AAAI-22 Student Abstract and Poster Program along with relevent supplementary material

arXiv:2110.09947 [pdf, other]

Using Program Synthesis and Inductive Logic Programming to solve Bongard Problems

Authors: Atharv Sonwane, Sharad Chitlangia, Tirtharaj Dash, Lovekesh Vig, Gautam Shroff, Ashwin Srinivasan

Abstract: The ability to recognise and make analogies is often used as a measure or test of human intelligence. The ability to solve Bongard problems is an example of such a test. It has also been postulated that the ability to rapidly construct novel abstractions is critical to being able to solve analogical problems. Given an image, the ability to construct a program that would generate that image is one… ▽ More The ability to recognise and make analogies is often used as a measure or test of human intelligence. The ability to solve Bongard problems is an example of such a test. It has also been postulated that the ability to rapidly construct novel abstractions is critical to being able to solve analogical problems. Given an image, the ability to construct a program that would generate that image is one form of abstraction, as exemplified in the Dreamcoder project. In this paper, we present a preliminary examination of whether programs constructed by Dreamcoder can be used for analogical reasoning to solve certain Bongard problems. We use Dreamcoder to discover programs that generate the images in a Bongard problem and represent each of these as a sequence of state transitions. We decorate the states using positional information in an automated manner and then encode the resulting sequence into logical facts in Prolog. We use inductive logic programming (ILP), to learn an (interpretable) theory for the abstract concept involved in an instance of a Bongard problem. Experiments on synthetically created Bongard problems for concepts such as 'above/below' and 'clockwise/counterclockwise' demonstrate that our end-to-end system can solve such problems. We study the importance and completeness of each component of our approach, highlighting its current limitations and pointing to directions for improvement in our formulation as well as in elements of any Dreamcoder-like program synthesis system used for such an approach. △ Less

Submitted 19 October, 2021; originally announced October 2021.

Comments: Equal contribution from first two authors. Accepted at the 10th International Workshop on Approaches and Applications of Inductive Programming as a Work In Progress Report

arXiv:2107.10295 [pdf, other]

doi 10.1038/s41598-021-04590-0

A Review of Some Techniques for Inclusion of Domain-Knowledge into Deep Neural Networks

Authors: Tirtharaj Dash, Sharad Chitlangia, Aditya Ahuja, Ashwin Srinivasan

Abstract: We present a survey of ways in which existing scientific knowledge are included when constructing models with neural networks. The inclusion of domain-knowledge is of special interest not just to constructing scientific assistants, but also, many other areas that involve understanding data using human-machine collaboration. In many such instances, machine-based model construction may benefit signi… ▽ More We present a survey of ways in which existing scientific knowledge are included when constructing models with neural networks. The inclusion of domain-knowledge is of special interest not just to constructing scientific assistants, but also, many other areas that involve understanding data using human-machine collaboration. In many such instances, machine-based model construction may benefit significantly from being provided with human-knowledge of the domain encoded in a sufficiently precise form. This paper examines the inclusion of domain-knowledge by means of changes to: the input, the loss-function, and the architecture of deep networks. The categorisation is for ease of exposition: in practice we expect a combination of such changes will be employed. In each category, we describe techniques that have been shown to yield significant changes in the performance of deep neural networks. △ Less

Submitted 21 December, 2021; v1 submitted 21 July, 2021; originally announced July 2021.

Comments: 16 pages; Accepted at Nature Scientific Reports. arXiv admin note: substantial text overlap with arXiv:2103.00180

MSC Class: 68T07 (Primary); 68T05; 68T01 (Secondary) ACM Class: I.2.6; I.2.4

Journal ref: Sci Rep 12, 1040 (2022)

arXiv:2105.10709 [pdf, other]

doi 10.1007/s10994-021-06090-8

Inclusion of Domain-Knowledge into GNNs using Mode-Directed Inverse Entailment

Authors: Tirtharaj Dash, Ashwin Srinivasan, A Baskar

Abstract: We present a general technique for constructing Graph Neural Networks (GNNs) capable of using multi-relational domain knowledge. The technique is based on mode-directed inverse entailment (MDIE) developed in Inductive Logic Programming (ILP). Given a data instance $e$ and background knowledge $B$, MDIE identifies a most-specific logical formula $\bot_B(e)$ that contains all the relational informat… ▽ More We present a general technique for constructing Graph Neural Networks (GNNs) capable of using multi-relational domain knowledge. The technique is based on mode-directed inverse entailment (MDIE) developed in Inductive Logic Programming (ILP). Given a data instance $e$ and background knowledge $B$, MDIE identifies a most-specific logical formula $\bot_B(e)$ that contains all the relational information in $B$ that is related to $e$. We represent $\bot_B(e)$ by a "bottom-graph" that can be converted into a form suitable for GNN implementations. This transformation allows a principled way of incorporating generic background knowledge into GNNs: we use the term `BotGNN' for this form of graph neural networks. For several GNN variants, using real-world datasets with substantial background knowledge, we show that BotGNNs perform significantly better than both GNNs without background knowledge and a recently proposed simplified technique for including domain knowledge into GNNs. We also provide experimental evidence comparing BotGNNs favourably to multi-layer perceptrons (MLPs) that use features representing a "propositionalised" form of the background knowledge; and BotGNNs to a standard ILP based on the use of most-specific clauses. Taken together, these results point to BotGNNs as capable of combining the computational efficacy of GNNs with the representational versatility of ILP. △ Less

Submitted 14 August, 2021; v1 submitted 22 May, 2021; originally announced May 2021.

Comments: Revised version; submitted to Machine Learning Journal (MLJ)

MSC Class: 68T07; 68T05; 68T27; 68T30 ACM Class: I.2.6

Journal ref: Mach Learn (2021)

arXiv:2105.09448 [pdf, other]

doi 10.1145/3476883.3520216

Superpixel-based Knowledge Infusion in Deep Neural Networks for Image Classification

Authors: Gunjan Chhablani, Abheesht Sharma, Harshit Pandey, Tirtharaj Dash

Abstract: Superpixels are higher-order perceptual groups of pixels in an image, often carrying much more information than the raw pixels. There is an inherent relational structure to the relationship among different superpixels of an image such as adjacent superpixels are neighbours of each other. Our interest here is to treat these relative positions of various superpixels as relational information of an i… ▽ More Superpixels are higher-order perceptual groups of pixels in an image, often carrying much more information than the raw pixels. There is an inherent relational structure to the relationship among different superpixels of an image such as adjacent superpixels are neighbours of each other. Our interest here is to treat these relative positions of various superpixels as relational information of an image. This relational information can convey higher-order spatial information about the image, such as the relationship between superpixels representing two eyes in an image of a cat. That is, two eyes are placed adjacent to each other in a straight line or the mouth is below the nose. Our motive in this paper is to assist computer vision models, specifically those based on Deep Neural Networks (DNNs), by incorporating this higher-order information from superpixels. We construct a hybrid model that leverages (a) Convolutional Neural Network (CNN) to deal with spatial information in an image and (b) Graph Neural Network (GNN) to deal with relational superpixel information in the image. The proposed model is learned using a generic hybrid loss function. Our experiments are extensive, and we evaluate the predictive performance of our proposed hybrid vision model on seven different image classification datasets from a variety of domains such as digit and object recognition, biometrics, medical imaging. The results demonstrate that the relational superpixel information processed by a GNN can improve the performance of a standard CNN-based vision system. △ Less

Submitted 23 February, 2022; v1 submitted 19 May, 2021; originally announced May 2021.

Comments: ACM Proc. format: 5 pages; Accepted at ACM SE'22, April 18-20, 2022, Virtual Event, USA

Journal ref: Proceedings of the 2022 ACM Southeast Conference, April 2022, Pages 243-247

arXiv:2103.00180 [pdf, other]

Incorporating Domain Knowledge into Deep Neural Networks

Authors: Tirtharaj Dash, Sharad Chitlangia, Aditya Ahuja, Ashwin Srinivasan

Abstract: We present a survey of ways in which domain-knowledge has been included when constructing models with neural networks. The inclusion of domain-knowledge is of special interest not just to constructing scientific assistants, but also, many other areas that involve understanding data using human-machine collaboration. In many such instances, machine-based model construction may benefit significantly… ▽ More We present a survey of ways in which domain-knowledge has been included when constructing models with neural networks. The inclusion of domain-knowledge is of special interest not just to constructing scientific assistants, but also, many other areas that involve understanding data using human-machine collaboration. In many such instances, machine-based model construction may benefit significantly from being provided with human-knowledge of the domain encoded in a sufficiently precise form. This paper examines two broad approaches to encode such knowledge--as logical and numerical constraints--and describes techniques and results obtained in several sub-categories under each of these approaches. △ Less

Submitted 15 March, 2021; v1 submitted 27 February, 2021; originally announced March 2021.

Comments: Submitted to IJCAI-2021 Survey Track (6+2 pages)

MSC Class: 68T07 (Primary); 68T05; 68T01 (Secondary) ACM Class: I.2.6; I.2.4

arXiv:2102.12255 [pdf, other]

doi 10.18653/v1/2021.semeval-1.21

LRG at SemEval-2021 Task 4: Improving Reading Comprehension with Abstract Words using Augmentation, Linguistic Features and Voting

Authors: Abheesht Sharma, Harshit Pandey, Gunjan Chhablani, Yash Bhartia, Tirtharaj Dash

Abstract: In this article, we present our methodologies for SemEval-2021 Task-4: Reading Comprehension of Abstract Meaning. Given a fill-in-the-blank-type question and a corresponding context, the task is to predict the most suitable word from a list of 5 options. There are three sub-tasks within this task: Imperceptibility (subtask-I), Non-Specificity (subtask-II), and Intersection (subtask-III). We use en… ▽ More In this article, we present our methodologies for SemEval-2021 Task-4: Reading Comprehension of Abstract Meaning. Given a fill-in-the-blank-type question and a corresponding context, the task is to predict the most suitable word from a list of 5 options. There are three sub-tasks within this task: Imperceptibility (subtask-I), Non-Specificity (subtask-II), and Intersection (subtask-III). We use encoders of transformers-based models pre-trained on the masked language modelling (MLM) task to build our Fill-in-the-blank (FitB) models. Moreover, to model imperceptibility, we define certain linguistic features, and to model non-specificity, we leverage information from hypernyms and hyponyms provided by a lexical database. Specifically, for non-specificity, we try out augmentation techniques, and other statistical techniques. We also propose variants, namely Chunk Voting and Max Context, to take care of input length restrictions for BERT, etc. Additionally, we perform a thorough ablation study, and use Integrated Gradients to explain our predictions on a few samples. Our best submissions achieve accuracies of 75.31% and 77.84%, on the test sets for subtask-I and subtask-II, respectively. For subtask-III, we achieve accuracies of 65.64% and 62.27%. △ Less

Submitted 26 June, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

Comments: 10 pages, 4 figures, SemEval-2021 Workshop, ACL-IJCNLP 2021

Journal ref: Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), 2021, Online

arXiv:2012.10787 [pdf, other]

Constructing and Evaluating an Explainable Model for COVID-19 Diagnosis from Chest X-rays

Authors: Rishab Khincha, Soundarya Krishnan, Tirtharaj Dash, Lovekesh Vig, Ashwin Srinivasan

Abstract: In this paper, our focus is on constructing models to assist a clinician in the diagnosis of COVID-19 patients in situations where it is easier and cheaper to obtain X-ray data than to obtain high-quality images like those from CT scans. Deep neural networks have repeatedly been shown to be capable of constructing highly predictive models for disease detection directly from image data. However, th… ▽ More In this paper, our focus is on constructing models to assist a clinician in the diagnosis of COVID-19 patients in situations where it is easier and cheaper to obtain X-ray data than to obtain high-quality images like those from CT scans. Deep neural networks have repeatedly been shown to be capable of constructing highly predictive models for disease detection directly from image data. However, their use in assisting clinicians has repeatedly hit a stumbling block due to their black-box nature. Some of this difficulty can be alleviated if predictions were accompanied by explanations expressed in clinically relevant terms. In this paper, deep neural networks are used to extract domain-specific features(morphological features like ground-glass opacity and disease indications like pneumonia) directly from the image data. Predictions about these features are then used to construct a symbolic model (a decision tree) for the diagnosis of COVID-19 from chest X-rays, accompanied with two kinds of explanations: visual (saliency maps, derived from the neural stage), and textual (logical descriptions, derived from the symbolic stage). A radiologist rates the usefulness of the visual and textual explanations. Our results demonstrate that neural models can be employed usefully in identifying domain-specific features from low-level image data; that textual explanations in terms of clinically relevant features may be useful; and that visual explanations will need to be clinically meaningful to be useful. △ Less

Submitted 12 February, 2021; v1 submitted 19 December, 2020; originally announced December 2020.

arXiv:2010.13900 [pdf, other]

doi 10.1007/s10994-021-05966-z

Incorporating Symbolic Domain Knowledge into Graph Neural Networks

Authors: Tirtharaj Dash, Ashwin Srinivasan, Lovekesh Vig

Abstract: Our interest is in scientific problems with the following characteristics: (1) Data are naturally represented as graphs; (2) The amount of data available is typically small; and (3) There is significant domain-knowledge, usually expressed in some symbolic form. These kinds of problems have been addressed effectively in the past by Inductive Logic Programming (ILP), by virtue of 2 important charact… ▽ More Our interest is in scientific problems with the following characteristics: (1) Data are naturally represented as graphs; (2) The amount of data available is typically small; and (3) There is significant domain-knowledge, usually expressed in some symbolic form. These kinds of problems have been addressed effectively in the past by Inductive Logic Programming (ILP), by virtue of 2 important characteristics: (a) The use of a representation language that easily captures the relation encoded in graph-structured data, and (b) The inclusion of prior information encoded as domain-specific relations, that can alleviate problems of data scarcity, and construct new relations. Recent advances have seen the emergence of deep neural networks specifically developed for graph-structured data (Graph-based Neural Networks, or GNNs). While GNNs have been shown to be able to handle graph-structured data, less has been done to investigate the inclusion of domain-knowledge. Here we investigate this aspect of GNNs empirically by employing an operation we term "vertex-enrichment" and denote the corresponding GNNs as "VEGNNs". Using over 70 real-world datasets and substantial amounts of symbolic domain-knowledge, we examine the result of vertex-enrichment across 5 different variants of GNNs. Our results provide support for the following: (a) Inclusion of domain-knowledge by vertex-enrichment can significantly improve the performance of a GNN. That is, the performance VEGNNs is significantly better than GNNs across all GNN variants; (b) The inclusion of domain-specific relations constructed using ILP improves the performance of VEGNNs, across all GNN variants. Taken together, the results provide evidence that it is possible to incorporate symbolic domain knowledge into a GNN, and that ILP can play an important role in providing high-level relationships that are not easily discovered by a GNN. △ Less

Submitted 19 February, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

Comments: Accepted in Machine Learning Journal (MLJ)

Journal ref: Mach Learn 110, 1609-1636 (2021)

arXiv:1911.06704 [pdf, other]

doi 10.1049/cit2.12002

Performance evaluation of deep neural networks for forecasting time-series with multiple structural breaks and high volatility

Authors: Rohit Kaushik, Shikhar Jain, Siddhant Jain, Tirtharaj Dash

Abstract: The problem of automatic and accurate forecasting of time-series data has always been an interesting challenge for the machine learning and forecasting community. A majority of the real-world time-series problems have non-stationary characteristics that make the understanding of trend and seasonality difficult. Our interest in this paper is to study the applicability of the popular deep neural net… ▽ More The problem of automatic and accurate forecasting of time-series data has always been an interesting challenge for the machine learning and forecasting community. A majority of the real-world time-series problems have non-stationary characteristics that make the understanding of trend and seasonality difficult. Our interest in this paper is to study the applicability of the popular deep neural networks (DNN) as function approximators for non-stationary TSF. We evaluate the following DNN models: Multi-layer Perceptron (MLP), Convolutional Neural Network (CNN), and RNN with Long-Short Term Memory (LSTM-RNN) and RNN with Gated-Recurrent Unit (GRU-RNN). These DNN methods have been evaluated over 10 popular Indian financial stocks data. Further, the performance evaluation of these DNNs has been carried out in multiple independent runs for two settings of forecasting: (1) single-step forecasting, and (2) multi-step forecasting. These DNN methods show convincing performance for single-step forecasting (one-day ahead forecast). For the multi-step forecasting (multiple days ahead forecast), we have evaluated the methods for different forecast periods. The performance of these methods demonstrates that long forecast periods have an adverse effect on performance. △ Less

Submitted 25 July, 2020; v1 submitted 14 November, 2019; originally announced November 2019.

Comments: Preprint (18 pages)

Journal ref: CAAI Trans. Intell. Technol. 6(3), 265-280 (2021)

arXiv:1809.05611 [pdf, other]

A study on the use of Boundary Equilibrium GAN for Approximate Frontalization of Unconstrained Faces to aid in Surveillance

Authors: Wazeer Zulfikar, Sebastin Santy, Sahith Dambekodi, Tirtharaj Dash

Abstract: Face frontalization is the process of synthesizing frontal facing views of faces given its angled poses. We implement a generative adversarial network (GAN) with spherical linear interpolation (Slerp) for frontalization of unconstrained facial images. Our special focus is intended towards the generation of approximate frontal faces of the side posed images captured from surveillance cameras. Speci… ▽ More Face frontalization is the process of synthesizing frontal facing views of faces given its angled poses. We implement a generative adversarial network (GAN) with spherical linear interpolation (Slerp) for frontalization of unconstrained facial images. Our special focus is intended towards the generation of approximate frontal faces of the side posed images captured from surveillance cameras. Specifically, the present work is a comprehensive study on the implementation of an auto-encoder based Boundary Equilibrium GAN (BEGAN) to generate frontal faces using an interpolation of a side view face and its mirrored view. To increase the quality of the interpolated output we implement a BEGAN with Slerp. This approach could produce a promising output along with a faster and more stable training for the model. The BEGAN model additionally has a balanced generator-discriminator combination, which prevents mode collapse along with a global convergence measure. It is expected that such an approximate face generation model would be able to replace face composites used in surveillance and crime detection. △ Less

Submitted 14 September, 2018; originally announced September 2018.

arXiv:1612.00671 [pdf, ps, other]

Reliable Evaluation of Neural Network for Multiclass Classification of Real-world Data

Authors: Siddharth Dinesh, Tirtharaj Dash

Abstract: This paper presents a systematic evaluation of Neural Network (NN) for classification of real-world data. In the field of machine learning, it is often seen that a single parameter that is 'predictive accuracy' is being used for evaluating the performance of a classifier model. However, this parameter might not be considered reliable given a dataset with very high level of skewness. To demonstrate… ▽ More This paper presents a systematic evaluation of Neural Network (NN) for classification of real-world data. In the field of machine learning, it is often seen that a single parameter that is 'predictive accuracy' is being used for evaluating the performance of a classifier model. However, this parameter might not be considered reliable given a dataset with very high level of skewness. To demonstrate such behavior, seven different types of datasets have been used to evaluate a Multilayer Perceptron (MLP) using twelve(12) different parameters which include micro- and macro-level estimation. In the present study, the most common problem of prediction called 'multiclass' classification has been considered. The results that are obtained for different parameters for each of the dataset could demonstrate interesting findings to support the usability of these set of performance evaluation parameters. △ Less

Submitted 30 November, 2016; originally announced December 2016.

Report number: TR-2016-STUDY-1

arXiv:1601.03481 [pdf]

A Fuzzy MLP Approach for Non-linear Pattern Classification

Authors: Tirtharaj Dash, H. S. Behera

Abstract: In case of decision making problems, classification of pattern is a complex and crucial task. Pattern classification using multilayer perceptron (MLP) trained with back propagation learning becomes much complex with increase in number of layers, number of nodes and number of epochs and ultimate increases computational time [31]. In this paper, an attempt has been made to use fuzzy MLP and its lear… ▽ More In case of decision making problems, classification of pattern is a complex and crucial task. Pattern classification using multilayer perceptron (MLP) trained with back propagation learning becomes much complex with increase in number of layers, number of nodes and number of epochs and ultimate increases computational time [31]. In this paper, an attempt has been made to use fuzzy MLP and its learning algorithm for pattern classification. The time and space complexities of the algorithm have been analyzed. A training performance comparison has been carried out between MLP and the proposed fuzzy-MLP model by considering six cases. Results are noted against different learning rates ranging from 0 to 1. A new performance evaluation factor 'convergence gain' has been introduced. It is observed that the number of epochs drastically reduced and performance increased compared to MLP. The average and minimum gain has been found to be 93% and 75% respectively. The best gain is found to be 95% and is obtained by setting the learning rate to 0.55. △ Less

Submitted 19 September, 2015; originally announced January 2016.

Comments: The final version of this paper has been published in "International Conference on Communication and Computing (ICC-2014)" [http://www.elsevierst.com/conference_book_download_chapter.php?cbid=86#chapter41]

Journal ref: In Proc: K.R. Venugopal, S.C. Lingareddy (eds.) International Conference on Communication and Computing (ICC- 2014), Bangalore, India (June 12-14, 2014), Computer Networks and Security, 314-323

arXiv:1306.4672 [pdf]

A Novel Approach for Intelligent Robot Path Planning

Authors: Tirtharaj Dash, Goutam Mishra, Tanistha Nayak

Abstract: Path planning of Robot is one of the challenging fields in the area of Robotics research. In this paper, we proposed a novel algorithm to find path between starting and ending position for an intelligent system. An intelligent system is considered to be a device/robot having an antenna connected with sensor-detector system. The proposed algorithm is based on Neural Network training concept. The co… ▽ More Path planning of Robot is one of the challenging fields in the area of Robotics research. In this paper, we proposed a novel algorithm to find path between starting and ending position for an intelligent system. An intelligent system is considered to be a device/robot having an antenna connected with sensor-detector system. The proposed algorithm is based on Neural Network training concept. The considered neural network is Adapti ve to the knowledge bases. However, implementation of this algorithm is slightly expensive due to hardware it requires. From detailed analysis, it can be proved that the resulted path of this algorithm is efficient. △ Less

Submitted 19 June, 2013; originally announced June 2013.

Comments: appeared in: Proceedings of National Conference on Artificial Intelligence, Robotics and Embedded Systems (AIRES) - 2012, Andhra University, Visakhapatnam (29-30 June, 2012), pp. 388-391

arXiv:1306.4629 [pdf]

Non-Correlated Character Recognition using Artificial Neural Network

Authors: Tirtharaj Dash, Tanistha Nayak

Abstract: This paper investigates a method of Handwritten English Character Recognition using Artificial Neural Network (ANN). This work has been done in offline Environment for non correlated characters, which do not possess any linear relationships among them. We test that whether the particular tested character belongs to a cluster or not. The implementation is carried out in Matlab environment and succe… ▽ More This paper investigates a method of Handwritten English Character Recognition using Artificial Neural Network (ANN). This work has been done in offline Environment for non correlated characters, which do not possess any linear relationships among them. We test that whether the particular tested character belongs to a cluster or not. The implementation is carried out in Matlab environment and successfully tested. Fifty-two sets of English alphabets are used to train the ANN and test the network. The algorithms are tested with 26 capital letters and 26 small letters. The testing result showed that the proposed ANN based algorithm showed a maximum recognition rate of 85%. △ Less

Submitted 19 June, 2013; originally announced June 2013.

Comments: appeared in: proceedings of National Conference on Dynamics and Prospects of Data Mining: Theory and Practices (DPDM)-2012; September 30, 2012, India; Publisher: OITS-BLS, Balasore Chapter; Proceeding ISBN: 987-93-81361-31-6, pp. 79-83

Journal ref: proc. National Conference on Dynamics and Prospects of Data Mining: Theory and Practices (DPDM)-2012; September 30, 2012, India; ISBN: 987-93-81361-31-6, pp. 79-83

arXiv:1306.4627 [pdf]

Parallel Algorithm for Longest Common Subsequence in a String

Authors: Tirtharaj Dash, Tanistha Nayak

Abstract: In the area of Pattern Recognition and Matching, finding a Longest Common Subsequence plays an important role. In this paper, we have proposed one algorithm based on parallel computation. We have used OpenMP API package as middleware to send the data to different processors. We have tested our algorithm in a system having four processors and 2 GB physical memory. The best result showed that the pa… ▽ More In the area of Pattern Recognition and Matching, finding a Longest Common Subsequence plays an important role. In this paper, we have proposed one algorithm based on parallel computation. We have used OpenMP API package as middleware to send the data to different processors. We have tested our algorithm in a system having four processors and 2 GB physical memory. The best result showed that the parallel algorithm increases the performance (speed of computation) by 3.22. △ Less

Submitted 19 June, 2013; originally announced June 2013.

Comments: appeared in: Proceedings of National Conference on Artificial Intelligence, Robotics and Embedded Systems (AIRES) - 2012, Andhra University, Visakhapatnam (29-30 June, 2012), pp. 66-69

arXiv:1306.4622 [pdf]

Solution to Quadratic Equation Using Genetic Algorithm

Authors: Tanistha Nayak, Tirtharaj Dash

Abstract: Solving Quadratic equation is one of the intrinsic interests as it is the simplest nonlinear equations. A novel approach for solving Quadratic Equation based on Genetic Algorithms (GAs) is presented. Genetic Algorithms (GAs) are a technique to solve problems which need optimization. Generation of trial solutions have been formed by this method. Many examples have been worked out, and in most cases… ▽ More Solving Quadratic equation is one of the intrinsic interests as it is the simplest nonlinear equations. A novel approach for solving Quadratic Equation based on Genetic Algorithms (GAs) is presented. Genetic Algorithms (GAs) are a technique to solve problems which need optimization. Generation of trial solutions have been formed by this method. Many examples have been worked out, and in most cases we find out the exact solution. We have discussed the effect of different parameters on the performance of the developed algorithm. The results are concluded after rigorous testing on different equations. △ Less

Submitted 19 June, 2013; originally announced June 2013.

Comments: appeared in: Conf. Proceedings of National Conference on Artificial Intelligence, Robotics and Embedded Systems (AIRES-2012), Andhra University, Vishakhapatnam, India (29-30 June, 2012), pp. 10-13

arXiv:1306.4621 [pdf]

English Character Recognition using Artificial Neural Network

Authors: Tirtharaj Dash, Tanistha Nayak

Abstract: This work focuses on development of a Offline Hand Written English Character Recognition algorithm based on Artificial Neural Network (ANN). The ANN implemented in this work has single output neuron which shows whether the tested character belongs to a particular cluster or not. The implementation is carried out completely in 'C' language. Ten sets of English alphabets (small-26, capital-26) were… ▽ More This work focuses on development of a Offline Hand Written English Character Recognition algorithm based on Artificial Neural Network (ANN). The ANN implemented in this work has single output neuron which shows whether the tested character belongs to a particular cluster or not. The implementation is carried out completely in 'C' language. Ten sets of English alphabets (small-26, capital-26) were used to train the ANN and 5 sets of English alphabets were used to test the network. The characters were collected from different persons over duration of about 25 days. The algorithm was tested with 5 capital letters and 5 small letter sets. However, the result showed that the algorithm recognized English alphabet patterns with maximum accuracy of 92.59% and False Rejection Rate (FRR) of 0%. △ Less

Submitted 19 June, 2013; originally announced June 2013.

Comments: appeared in Proceedings of National Conference on Artificial Intelligence, Robotics and Embedded Systems (AIRES-2012), Andhra University, Vishakhapatnam, India (29-30 June, 2012), pp. 7-9

arXiv:1306.4592 [pdf]

Time Efficient Approach To Offline Hand Written Character Recognition Using Associative Memory Net

Authors: Tirtharaj Dash

Abstract: In this paper, an efficient Offline Hand Written Character Recognition algorithm is proposed based on Associative Memory Net (AMN). The AMN used in this work is basically auto associative. The implementation is carried out completely in 'C' language. To make the system perform to its best with minimal computation time, a Parallel algorithm is also developed using an API package OpenMP. Characters… ▽ More In this paper, an efficient Offline Hand Written Character Recognition algorithm is proposed based on Associative Memory Net (AMN). The AMN used in this work is basically auto associative. The implementation is carried out completely in 'C' language. To make the system perform to its best with minimal computation time, a Parallel algorithm is also developed using an API package OpenMP. Characters are mainly English alphabets (Small (26), Capital (26)) collected from system (52) and from different persons (52). The characters collected from system are used to train the AMN and characters collected from different persons are used for testing the recognition ability of the net. The detailed analysis showed that the network recognizes the hand written characters with recognition rate of 72.20% in average case. However, in best case, it recognizes the collected hand written characters with 88.5%. The developed network consumes 3.57 sec (average) in Serial implementation and 1.16 sec (average) in Parallel implementation using OpenMP. △ Less

Submitted 19 June, 2013; originally announced June 2013.

Journal ref: International Journal of Computing and Business Research (IJCBR) ISSN (Online) : 2229-6166; Volume 3, Issue 3; September 2012

Showing 1–26 of 26 results for author: Dash, T