Search | arXiv e-print repository

BERTologyNavigator: Advanced Question Answering with BERT-based Semantics

Abstract: The development and integration of knowledge graphs and language models has significance in artificial intelligence and natural language processing. In this study, we introduce the BERTologyNavigator -- a two-phased system that combines relation extraction techniques and BERT embeddings to navigate the relationships within the DBLP Knowledge Graph (KG). Our approach focuses on extracting one-hop r… ▽ More The development and integration of knowledge graphs and language models has significance in artificial intelligence and natural language processing. In this study, we introduce the BERTologyNavigator -- a two-phased system that combines relation extraction techniques and BERT embeddings to navigate the relationships within the DBLP Knowledge Graph (KG). Our approach focuses on extracting one-hop relations and labelled candidate pairs in the first phases. This is followed by employing BERT's CLS embeddings and additional heuristics for relation selection in the second phase. Our system reaches an F1 score of 0.2175 on the DBLP QuAD Final test dataset for Scholarly QALD and 0.98 F1 score on the subset of the DBLP QuAD test dataset during the QA phase. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: Accepted in Scholarly QALD Challenge @ ISWC 2023

ACM Class: I.2.4; I.2.7

Journal ref: Joint Proceedings of Scholarly QALD 2023 and SemREC 2023 co-located with 22nd International Semantic Web Conference ISWC 2023. Athens, Greece, November 6-10, 2023

arXiv:2111.05546 [pdf, other]

Biomarker Gene Identification for Breast Cancer Classification

Authors: Sheetal Rajpal, Ankit Rajpal, Manoj Agarwal, Naveen Kumar

Abstract: BACKGROUND: Breast cancer has emerged as one of the most prevalent cancers among women leading to a high mortality rate. Due to the heterogeneous nature of breast cancer, there is a need to identify differentially expressed genes associated with breast cancer subtypes for its timely diagnosis and treatment. OBJECTIVE: To identify a small gene set for each of the four breast cancer subtypes that co… ▽ More BACKGROUND: Breast cancer has emerged as one of the most prevalent cancers among women leading to a high mortality rate. Due to the heterogeneous nature of breast cancer, there is a need to identify differentially expressed genes associated with breast cancer subtypes for its timely diagnosis and treatment. OBJECTIVE: To identify a small gene set for each of the four breast cancer subtypes that could act as its signature, the paper proposes a novel algorithm for gene signature identification. METHODS: The present work uses interpretable AI methods to investigate the predictions made by the deep neural network employed for subtype classification to identify biomarkers using the TCGA breast cancer RNA Sequence data. RESULTS: The proposed algorithm led to the discovery of a set of 43 differentially expressed gene signatures. We achieved a competitive average 10-fold accuracy of 0.91, using neural network classifier. Further, gene set analysis revealed several relevant pathways, such as GRB7 events in ERBB2 and p53 signaling pathway. Using the Pearson correlation matrix, we noted that the subtype-specific genes are correlated within each subtype. CONCLUSIONS: The proposed technique enables us to find a concise and clinically relevant gene signature set. △ Less

Submitted 29 November, 2021; v1 submitted 10 November, 2021; originally announced November 2021.

arXiv:2111.03923 [pdf, other]

Deep Learning Based Model for Breast Cancer Subtype Classification

Authors: Sheetal Rajpal, Virendra Kumar, Manoj Agarwal, Naveen Kumar

Abstract: Breast cancer has long been a prominent cause of mortality among women. Diagnosis, therapy, and prognosis are now possible, thanks to the availability of RNA sequencing tools capable of recording gene expression data. Molecular subty** being closely related to devising clinical strategy and prognosis, this paper focuses on the use of gene expression data for the classification of breast cancer i… ▽ More Breast cancer has long been a prominent cause of mortality among women. Diagnosis, therapy, and prognosis are now possible, thanks to the availability of RNA sequencing tools capable of recording gene expression data. Molecular subty** being closely related to devising clinical strategy and prognosis, this paper focuses on the use of gene expression data for the classification of breast cancer into four subtypes, namely, Basal, Her2, LumA, and LumB. In stage 1, we suggested a deep learning-based model that uses an autoencoder to reduce dimensionality. The size of the feature set is reduced from 20,530 gene expression values to 500 by using an autoencoder. This encoded representation is passed to the deep neural network of the second stage for the classification of patients into four molecular subtypes of breast cancer. By deploying the combined network of stages 1 and 2, we have been able to attain a mean 10-fold test accuracy of 0.907 on the TCGA breast cancer dataset. The proposed framework is fairly robust throughout 10 different runs, as shown by the boxplot for classification accuracy. Compared to related work reported in the literature, we have achieved a competitive outcome. In conclusion, the proposed two-stage deep learning-based model is able to accurately classify four breast cancer subtypes, highlighting the autoencoder's capacity to deduce the compact representation and the neural network classifier's ability to correctly label breast cancer patients. △ Less

Submitted 9 November, 2021; v1 submitted 6 November, 2021; originally announced November 2021.

Comments: Paper has been accepted for publication in ICACET 2021

arXiv:2007.08637 [pdf, other]

COV-ELM classifier: An Extreme Learning Machine based identification of COVID-19 using Chest X-Ray Images

Authors: Sheetal Rajpal, Manoj Agarwal, Ankit Rajpal, Navin Lakhyani, Arpita Saggar, Naveen Kumar

Abstract: Coronaviruses constitute a family of viruses that gives rise to respiratory diseases. As COVID-19 is highly contagious, early diagnosis of COVID-19 is crucial for an effective treatment strategy. However, the RT-PCR test which is considered to be a gold standard in the diagnosis of COVID-19 suffers from a high false-negative rate. Chest X-ray (CXR) image analysis has emerged as a feasible and effe… ▽ More Coronaviruses constitute a family of viruses that gives rise to respiratory diseases. As COVID-19 is highly contagious, early diagnosis of COVID-19 is crucial for an effective treatment strategy. However, the RT-PCR test which is considered to be a gold standard in the diagnosis of COVID-19 suffers from a high false-negative rate. Chest X-ray (CXR) image analysis has emerged as a feasible and effective diagnostic technique towards this objective. In this work, we propose the COVID-19 classification problem as a three-class classification problem to distinguish between COVID-19, normal, and pneumonia classes. We propose a three-stage framework, named COV-ELM. Stage one deals with preprocessing and transformation while stage two deals with feature extraction. These extracted features are passed as an input to the ELM at the third stage, resulting in the identification of COVID-19. The choice of ELM in this work has been motivated by its faster convergence, better generalization capability, and shorter training time in comparison to the conventional gradient-based learning algorithms. As bigger and diverse datasets become available, ELM can be quickly retrained as compared to its gradient-based competitor models. The proposed model achieved a macro average F1-score of 0.95 and the overall sensitivity of ${0.94 \pm 0.02} at a 95% confidence interval. When compared to state-of-the-art machine learning algorithms, the COV-ELM is found to outperform its competitors in this three-class classification scenario. Further, LIME has been integrated with the proposed COV-ELM model to generate annotated CXR images. The annotations are based on the superpixels that have contributed to distinguish between the different classes. It was observed that the superpixels correspond to the regions of the human lungs that are clinically observed in COVID-19 and Pneumonia cases. △ Less

Submitted 28 September, 2021; v1 submitted 16 July, 2020; originally announced July 2020.

arXiv:2004.08982 [pdf, other]

doi 10.1002/mrm.28491

Fully Self-Gated Whole-Heart 4D Flow Imaging from a Five-Minute Scan

Authors: Aaron Pruitt, Adam Rich, Yingmin Liu, Ning **, Lee Potter, Matthew Tong, Saurabh Rajpal, Orlando Simonetti, Rizwan Ahmad

Abstract: Purpose: To develop and validate an acquisition and processing technique that enables fully self-gated 4D flow imaging with whole-heart coverage in a fixed five-minute scan. Theory and Methods: The data are acquired continuously using Cartesian sampling and sorted into respiratory and cardiac bins using the self-gating signal. The reconstruction is performed using a recently proposed Bayesian me… ▽ More Purpose: To develop and validate an acquisition and processing technique that enables fully self-gated 4D flow imaging with whole-heart coverage in a fixed five-minute scan. Theory and Methods: The data are acquired continuously using Cartesian sampling and sorted into respiratory and cardiac bins using the self-gating signal. The reconstruction is performed using a recently proposed Bayesian method called ReVEAL4D. ReVEAL4D is validated using data from eight healthy volunteers and two patients and compared with a compressed sensing technique, L1-SENSE. Results: Healthy subjects -- Compared to 2D phase-contrast MRI (2D-PC), flow quantification from ReVEAL4D shows no significant bias. In contrast, the peak velocity and peak flow rate for L1-SENSE are significantly underestimated. Compared to traditional parallel MRI-based 4D flow imaging, ReVEAL4D demonstrates small but significant biases in net flow and peak flow rate, with no significant bias in peak velocity. All three indices are significantly and more markedly underestimated by L1-SENSE. Patients -- Flow quantification from ReVEAL4D agrees well with the 2D-PC reference. In contrast, L1-SENSE markedly underestimated peak velocity. Conclusions: The combination of highly accelerated five-minute Cartesian acquisition, self-gating, and ReVEAL4D enables whole-heart 4D flow imaging with accurate flow quantification. △ Less

Submitted 5 August, 2020; v1 submitted 19 April, 2020; originally announced April 2020.

arXiv:1803.09196 [pdf, other]

Learning Type-Aware Embeddings for Fashion Compatibility

Authors: Mariya I. Vasileva, Bryan A. Plummer, Krishna Dusad, Shreya Rajpal, Ranjitha Kumar, David Forsyth

Abstract: Outfits in online fashion data are composed of items of many different types (e.g. top, bottom, shoes) that share some stylistic relationship with one another. A representation for building outfits requires a method that can learn both notions of similarity (for example, when two tops are interchangeable) and compatibility (items of possibly different type that can go together in an outfit). This… ▽ More Outfits in online fashion data are composed of items of many different types (e.g. top, bottom, shoes) that share some stylistic relationship with one another. A representation for building outfits requires a method that can learn both notions of similarity (for example, when two tops are interchangeable) and compatibility (items of possibly different type that can go together in an outfit). This paper presents an approach to learning an image embedding that respects item type, and jointly learns notions of item similarity and compatibility in an end-to-end model. To evaluate the learned representation, we crawled 68,306 outfits created by users on the Polyvore website. Our approach obtains 3-5% improvement over the state-of-the-art on outfit compatibility prediction and fill-in-the-blank tasks using our dataset, as well as an established smaller dataset, while supporting a variety of useful queries. △ Less

Submitted 27 July, 2018; v1 submitted 24 March, 2018; originally announced March 2018.

Comments: Accepted at ECCV 2018

arXiv:1702.03488 [pdf, other]

Octopus: A Framework for Cost-Quality-Time Optimization in Crowdsourcing

Authors: Karan Goel, Shreya Rajpal, Mausam

Abstract: We present Octopus, an AI agent to jointly balance three conflicting task objectives on a micro-crowdsourcing marketplace - the quality of work, total cost incurred, and time to completion. Previous control agents have mostly focused on cost-quality, or cost-time tradeoffs, but not on directly controlling all three in concert. A naive formulation of three-objective optimization is intractable; Oct… ▽ More We present Octopus, an AI agent to jointly balance three conflicting task objectives on a micro-crowdsourcing marketplace - the quality of work, total cost incurred, and time to completion. Previous control agents have mostly focused on cost-quality, or cost-time tradeoffs, but not on directly controlling all three in concert. A naive formulation of three-objective optimization is intractable; Octopus takes a hierarchical POMDP approach, with three different components responsible for setting the pay per task, selecting the next task, and controlling task-level quality. We demonstrate that Octopus significantly outperforms existing state-of-the-art approaches on real experiments. We also deploy Octopus on Amazon Mechanical Turk, showing its ability to manage tasks in a real-world dynamic setting. △ Less

Submitted 15 August, 2017; v1 submitted 11 February, 2017; originally announced February 2017.

Comments: 10 pages, to appear in HCOMP 2017

arXiv:1203.2247 [pdf]

An Optimum Time Quantum Using Linguistic Synthesis for Round Robin Scheduling Algorithm

Authors: Supriya Raheja, Reena Dadhich, Smita Rajpal

Abstract: In Round Robin CPU scheduling algorithm the main concern is with the size of time quantum and the increased waiting and turnaround time. Decision for these is usually based on parameters which are assumed to be precise. However, in many cases the values of these parameters are vague and imprecise. The performance of fuzzy logic depends upon the ability to deal with Linguistic variables. With this… ▽ More In Round Robin CPU scheduling algorithm the main concern is with the size of time quantum and the increased waiting and turnaround time. Decision for these is usually based on parameters which are assumed to be precise. However, in many cases the values of these parameters are vague and imprecise. The performance of fuzzy logic depends upon the ability to deal with Linguistic variables. With this intent, this paper attempts to generate an Optimal Time Quantum dynamically based on the parameters which are treated as Linguistic variables. This paper also includes Mamdani Fuzzy Inference System using Trapezoidal membership function, results in LRRTQ Fuzzy Inference System. In this paper, we present an algorithm to improve the performance of round robin scheduling algorithm. Numerical analysis based on LRRTQ results on proposed algorithm show the improvement in the performance of the system by reducing unnecessary context switches and also by providing reasonable turnaround time. △ Less

Submitted 10 March, 2012; originally announced March 2012.

Comments: International Journal of Soft Computing, Feb 2012

arXiv:1112.3435 [pdf]

An Alternative Interpretation of Linguistic Variables as Linguistic Finite Automata

Authors: Supriya Raheja, Reena Dhadich, Smita Rajpal

Abstract: Linguistic variables represent crisp information in a form and precision appropriate for the problem. For example, to answer the question "How are you?" one may say "I am fine." the linguistic variables like "fine", so common in everyday speech. In this paper an alternative interpretation of linguistic variables is introduced with the notion of a linguistic description of a value or set of values.… ▽ More Linguistic variables represent crisp information in a form and precision appropriate for the problem. For example, to answer the question "How are you?" one may say "I am fine." the linguistic variables like "fine", so common in everyday speech. In this paper an alternative interpretation of linguistic variables is introduced with the notion of a linguistic description of a value or set of values. The use of linguistic variables in many applications reduces the overall computation complexity of the application. Linguistic variables have been shown to be particularly useful in complex non-linear applications. Here we are applying the concept of reasoning with Linguistic Quantifiers to define the Linguistic Finite Automata along with the expansion of δ^{\box} and λ^{\box} over δand λ. △ Less

Submitted 15 December, 2011; originally announced December 2011.

Comments: International Journal of Computer Science & Issues, Aug 2011

arXiv:1006.4551 [pdf]

Vagueness of Linguistic variable

Authors: Supriya Raheja, Smita Rajpal

Abstract: In the area of computer science focusing on creating machines that can engage on behaviors that humans consider intelligent. The ability to create intelligent machines has intrigued humans since ancient times and today with the advent of the computer and 50 years of research into various programming techniques, the dream of smart machines is becoming a reality. Researchers are creating systems whi… ▽ More In the area of computer science focusing on creating machines that can engage on behaviors that humans consider intelligent. The ability to create intelligent machines has intrigued humans since ancient times and today with the advent of the computer and 50 years of research into various programming techniques, the dream of smart machines is becoming a reality. Researchers are creating systems which can mimic human thought, understand speech, beat the best human chessplayer, and countless other feats never before possible. Ability of the human to estimate the information is most brightly shown in using of natural languages. Using words of a natural language for valuation qualitative attributes, for example, the person pawns uncertainty in form of vagueness in itself estimations. Vague sets, vague judgments, vague conclusions takes place there and then, where and when the reasonable subject exists and also is interested in something. The vague sets theory has arisen as the answer to an illegibility of language the reasonable subject speaks. Language of a reasonable subject is generated by vague events which are created by the reason and which are operated by the mind. The theory of vague sets represents an attempt to find such approximation of vague grou** which would be more convenient, than the classical theory of sets in situations where the natural language plays a significant role. Such theory has been offered by known American mathematician Gau and Buehrer .In our paper we are describing how vagueness of linguistic variables can be solved by using the vague set theory.This paper is mainly designed for one of directions of the eventology (the theory of the random vague events), which has arisen within the limits of the probability theory and which pursue the unique purpose to describe eventologically a movement of reason. △ Less

Submitted 23 June, 2010; originally announced June 2010.

Comments: IEEE Publication Format, https://sites.google.com/site/journalofcomputing/

Journal ref: Journal of Computing, Vol. 2, No. 6, June 2010, NY, USA, ISSN 2151-9617

Showing 1–10 of 10 results for author: Rajpal, S