-
SemPat: Using Hyperproperty-based Semantic Analysis to Generate Microarchitectural Attack Patterns
Authors:
Adwait Godbole,
Yatin A. Manerkar,
Sanjit A. Seshia
Abstract:
Microarchitectural security verification of software has seen the emergence of two broad classes of approaches. The first is based on semantic security properties (e.g., non-interference) which are verified for a given program and a specified abstract model of the hardware microarchitecture. The second is based on attack patterns, which, if found in a program execution, indicates the presence of a…
▽ More
Microarchitectural security verification of software has seen the emergence of two broad classes of approaches. The first is based on semantic security properties (e.g., non-interference) which are verified for a given program and a specified abstract model of the hardware microarchitecture. The second is based on attack patterns, which, if found in a program execution, indicates the presence of an exploit. While the former uses a formal specification that can capture several gadget variants targeting the same vulnerability, it is limited by the scalability of verification. Patterns, while more scalable, must be currently constructed manually, as they are narrower in scope and sensitive to gadget-specific structure.
This work develops a technique that, given a non-interference-based semantic security hyperproperty, automatically generates attack patterns up to a certain complexity parameter (called the skeleton size). Thus, we combine the advantages of both approaches: security can be specified by a hyperproperty that uniformly captures several gadget variants, while automatically generated patterns can be used for scalable verification. We implement our approach in a tool and demonstrate the ability to generate new patterns, (e.g., for SpectreV1, SpectreV4) and improved scalability using the generated patterns over hyperproperty-based verification.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Mobile Contactless Palmprint Recognition: Use of Multiscale, Multimodel Embeddings
Authors:
Steven A. Grosz,
Akash Godbole,
Anil K. Jain
Abstract:
Contactless palmprints are comprised of both global and local discriminative features. Most prior work focuses on extracting global features or local features alone for palmprint matching, whereas this research introduces a novel framework that combines global and local features for enhanced palmprint matching accuracy. Leveraging recent advancements in deep learning, this study integrates a visio…
▽ More
Contactless palmprints are comprised of both global and local discriminative features. Most prior work focuses on extracting global features or local features alone for palmprint matching, whereas this research introduces a novel framework that combines global and local features for enhanced palmprint matching accuracy. Leveraging recent advancements in deep learning, this study integrates a vision transformer (ViT) and a convolutional neural network (CNN) to extract complementary local and global features. Next, a mobile-based, end-to-end palmprint recognition system is developed, referred to as Palm-ID. On top of the ViT and CNN features, Palm-ID incorporates a palmprint enhancement module and efficient dimensionality reduction (for faster matching). Palm-ID balances the trade-off between accuracy and latency, requiring just 18ms to extract a template of size 516 bytes, which can be efficiently searched against a 10,000 palmprint gallery in 0.33ms on an AMD EPYC 7543 32-Core CPU utilizing 128-threads. Cross-database matching protocols and evaluations on large-scale operational datasets demonstrate the robustness of the proposed method, achieving a TAR of 98.06% at FAR=0.01% on a newly collected, time-separated dataset. To show a practical deployment of the end-to-end system, the entire recognition pipeline is embedded within a mobile device for enhanced user privacy and security.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Overcoming Memory Weakness with Unified Fairness
Authors:
Parosh Aziz Abdulla,
Mohamed Faouzi Atig,
Adwait Godbole,
Shankaranarayanan Krishna,
Mihir Vahanwala
Abstract:
We consider the verification of liveness properties for concurrent programs running on weak memory models. To that end, we identify notions of fairness that preclude demonic non-determinism, are motivated by practical observations, and are amenable to algorithmic techniques. We provide both logical and stochastic definitions of our fairness notions and prove that they are equivalent in the context…
▽ More
We consider the verification of liveness properties for concurrent programs running on weak memory models. To that end, we identify notions of fairness that preclude demonic non-determinism, are motivated by practical observations, and are amenable to algorithmic techniques. We provide both logical and stochastic definitions of our fairness notions and prove that they are equivalent in the context of liveness verification. In particular, we show that our fairness allows us to reduce the liveness problem (repeated control state reachability) to the problem of simple control state reachability. We show that this is a general phenomenon by develo** a uniform framework which serves as the formal foundation of our fairness definition and can be instantiated to a wide landscape of memory models. These models include SC, TSO, PSO, (Strong/Weak) Release-Acquire, Strong Coherence, FIFO-consistency, and RMO.
△ Less
Submitted 27 May, 2023;
originally announced May 2023.
-
SCENE: Self-Labeled Counterfactuals for Extrapolating to Negative Examples
Authors:
Deqing Fu,
Ameya Godbole,
Robin Jia
Abstract:
Detecting negatives (such as non-entailment relationships, unanswerable questions, and false claims) is an important and challenging aspect of many natural language understanding tasks. Though manually collecting challenging negative examples can help models detect them, it is both costly and domain-specific. In this work, we propose Self-labeled Counterfactuals for Extrapolating to Negative Examp…
▽ More
Detecting negatives (such as non-entailment relationships, unanswerable questions, and false claims) is an important and challenging aspect of many natural language understanding tasks. Though manually collecting challenging negative examples can help models detect them, it is both costly and domain-specific. In this work, we propose Self-labeled Counterfactuals for Extrapolating to Negative Examples (SCENE), an automatic method for synthesizing training data that greatly improves models' ability to detect challenging negative examples. In contrast with standard data augmentation, which synthesizes new examples for existing labels, SCENE can synthesize negative examples zero-shot from only positive ones. Given a positive example, SCENE perturbs it with a mask infilling model, then determines whether the resulting example is negative based on a self-training heuristic. With access to only answerable training examples, SCENE can close 69.6% of the performance gap on SQuAD 2.0, a dataset where half of the evaluation examples are unanswerable, compared to a model trained on SQuAD 2.0. Our method also extends to boolean question answering and recognizing textual entailment, and improves generalization from SQuAD to ACE-whQA, an out-of-domain extractive QA benchmark.
△ Less
Submitted 27 January, 2024; v1 submitted 13 May, 2023;
originally announced May 2023.
-
Child Palm-ID: Contactless Palmprint Recognition for Children
Authors:
Akash Godbole,
Steven A. Grosz,
Anil K. Jain
Abstract:
Effective distribution of nutritional and healthcare aid for children, particularly infants and toddlers, in some of the least developed and most impoverished countries of the world, is a major problem due to the lack of reliable identification documents. Biometric authentication technology has been investigated to address child recognition in the absence of reliable ID documents. We present a mob…
▽ More
Effective distribution of nutritional and healthcare aid for children, particularly infants and toddlers, in some of the least developed and most impoverished countries of the world, is a major problem due to the lack of reliable identification documents. Biometric authentication technology has been investigated to address child recognition in the absence of reliable ID documents. We present a mobile-based contactless palmprint recognition system, called Child Palm-ID, which meets the requirements of usability, hygiene, cost, and accuracy for child recognition. Using a contactless child palmprint database, Child-PalmDB1, consisting of 19,158 images from 1,020 unique palms (in the age range of 6 mos. to 48 mos.), we report a TAR=94.11% @ FAR=0.1%. The proposed Child Palm-ID system is also able to recognize adults, achieving a TAR=99.4% on the CASIA contactless palmprint database and a TAR=100% on the COEP contactless adult palmprint database, both @ FAR=0.1%. These accuracies are competitive with the SOTA provided by COTS systems. Despite these high accuracies, we show that the TAR for time-separated child-palmprints is only 78.1% @ FAR=0.1%.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
Parameterized Verification under TSO with Data Types
Authors:
Parosh Aziz Abdulla,
Mohamed Faouzi Atig,
Florian Furbach,
Adwait Godbole,
Yacoub G. Hendi,
Shankaranarayanan Krishna,
Stephan Spengler
Abstract:
We consider parameterized verification of systems executing according to the total store ordering (TSO) semantics. The processes manipulate abstract data types over potentially infinite domains. We present a framework that translates the reachability problem for such systems to the reachability problem for register machines enriched with the given abstract data type. We use the translation to obta…
▽ More
We consider parameterized verification of systems executing according to the total store ordering (TSO) semantics. The processes manipulate abstract data types over potentially infinite domains. We present a framework that translates the reachability problem for such systems to the reachability problem for register machines enriched with the given abstract data type. We use the translation to obtain tight complexity bounds for TSO-based parameterized verification over several abstract data types, such as push-down automata, ordered multi push-down automata, one-counter nets, one-counter automata, and Petri nets. We apply the framework to get complexity bounds for higher order stack and counter variants as well.
△ Less
Submitted 12 February, 2023; v1 submitted 4 February, 2023;
originally announced February 2023.
-
Child PalmID: Contactless Palmprint Recognition
Authors:
Anil K. Jain,
Akash Godbole,
Anjoo Bhatnagar,
Prem Sewak Sudhish
Abstract:
Develo** and least developed countries face the dire challenge of ensuring that each child in their country receives required doses of vaccination, adequate nutrition and proper medication. International agencies such as UNICEF, WHO and WFP, among other organizations, strive to find innovative solutions to determine which child has received the benefits and which have not. Biometric recognition…
▽ More
Develo** and least developed countries face the dire challenge of ensuring that each child in their country receives required doses of vaccination, adequate nutrition and proper medication. International agencies such as UNICEF, WHO and WFP, among other organizations, strive to find innovative solutions to determine which child has received the benefits and which have not. Biometric recognition systems have been sought out to help solve this problem. To that end, this report establishes a baseline accuracy of a commercial contactless palmprint recognition system that may be deployed for recognizing children in the age group of one to five years old. On a database of contactless palmprint images of one thousand unique palms from 500 children, we establish SOTA authentication accuracy of 90.85% @ FAR of 0.01%, rank-1 identification accuracy of 99.0% (closed set), and FPIR=0.01 @ FNIR=0.3 for open-set identification using PalmMobile SDK from Armatura.
△ Less
Submitted 14 December, 2022;
originally announced December 2022.
-
Benchmarking Long-tail Generalization with Likelihood Splits
Authors:
Ameya Godbole,
Robin Jia
Abstract:
In order to reliably process natural language, NLP systems must generalize to the long tail of rare utterances. We propose a method to create challenging benchmarks that require generalizing to the tail of the distribution by re-splitting existing datasets. We create 'Likelihood Splits' where examples that are assigned lower likelihood by a pre-trained language model (LM) are placed in the test se…
▽ More
In order to reliably process natural language, NLP systems must generalize to the long tail of rare utterances. We propose a method to create challenging benchmarks that require generalizing to the tail of the distribution by re-splitting existing datasets. We create 'Likelihood Splits' where examples that are assigned lower likelihood by a pre-trained language model (LM) are placed in the test set, and more likely examples are in the training set. This simple approach can be customized to construct meaningful train-test splits for a wide range of tasks. Likelihood Splits surface more challenges than random splits: relative error rates of state-of-the-art models increase by 59% for semantic parsing on Spider, 93% for natural language inference on SNLI, and 33% for yes/no question answering on BoolQ, on our splits compared with the corresponding random splits. Moreover, Likelihood Splits create fairer benchmarks than adversarial filtering; when the LM used to create the splits is also employed as the task model, our splits do not unfairly penalize the LM.
△ Less
Submitted 2 May, 2023; v1 submitted 13 October, 2022;
originally announced October 2022.
-
Learning an Ensemble of Deep Fingerprint Representations
Authors:
Akash Godbole,
Karthik Nandakumar,
Anil K. Jain
Abstract:
Deep neural networks (DNNs) have shown incredible promise in learning fixed-length representations from fingerprints. Since the representation learning is often focused on capturing specific prior knowledge (e.g., minutiae), there is no universal representation that comprehensively encapsulates all the discriminatory information available in a fingerprint. While learning an ensemble of representat…
▽ More
Deep neural networks (DNNs) have shown incredible promise in learning fixed-length representations from fingerprints. Since the representation learning is often focused on capturing specific prior knowledge (e.g., minutiae), there is no universal representation that comprehensively encapsulates all the discriminatory information available in a fingerprint. While learning an ensemble of representations can mitigate this problem, two critical challenges need to be addressed: (i) How to extract multiple diverse representations from the same fingerprint image? and (ii) How to optimally exploit these representations during the matching process? In this work, we train multiple instances of DeepPrint (a state-of-the-art DNN-based fingerprint encoder) on different transformations of the input image to generate an ensemble of fingerprint embeddings. We also propose a feature fusion technique that distills these multiple representations into a single embedding, which faithfully captures the diversity present in the ensemble without increasing the computational complexity. The proposed approach has been comprehensively evaluated on five databases containing rolled, plain, and latent fingerprints (NIST SD4, NIST SD14, NIST SD27, NIST SD302, and FVC2004 DB2A) and statistically significant improvements in accuracy have been consistently demonstrated across a range of verification as well as closed- and open-set identification settings. The proposed approach serves as a wrapper capable of improving the accuracy of any DNN-based recognition system.
△ Less
Submitted 2 September, 2022;
originally announced September 2022.
-
Automated Conversion of Axiomatic to Operational Models: Theory and Practice
Authors:
Adwait Godbole,
Yatin A. Manerkar,
Sanjit A. Seshia
Abstract:
A system may be modelled as an operational model (which has explicit notions of state and transitions between states) or an axiomatic model (which is specified entirely as a set of invariants). Most formal methods techniques (e.g., IC3, invariant synthesis, etc) are designed for operational models and are largely inaccessible to axiomatic models. Furthermore, no prior method exists to automaticall…
▽ More
A system may be modelled as an operational model (which has explicit notions of state and transitions between states) or an axiomatic model (which is specified entirely as a set of invariants). Most formal methods techniques (e.g., IC3, invariant synthesis, etc) are designed for operational models and are largely inaccessible to axiomatic models. Furthermore, no prior method exists to automatically convert axiomatic models to operational ones, so operational equivalents to axiomatic models had to be manually created and proven equivalent.
In this paper, we advance the state-of-the-art in axiomatic to operational model conversion. We show that general axioms in the $μ$spec axiomatic modelling framework cannot be translated to equivalent finite-state operational models. We also derive restrictions on the space of $μ$spec axioms that enable the feasible generation of equivalent finite-state operational models for them. As for practical results, we develop a methodology for automatically translating $μ$spec axioms to equivalent finite-state automata-based operational models. We demonstrate the efficacy of our method by using the models generated by our procedure to prove the correctness of ordering properties on three RTL designs.
△ Less
Submitted 13 August, 2022;
originally announced August 2022.
-
UCLID5: Multi-Modal Formal Modeling, Verification, and Synthesis
Authors:
Elizabeth Polgreen,
Kevin Cheang,
Pranav Gaddamadugu,
Adwait Godbole,
Kevin Laeufer,
Shaokai Lin,
Yatin A. Manerkar,
Federico Mora,
Sanjit A. Seshia
Abstract:
UCLID5 is a tool for the multi-modal formal modeling, verification, and synthesis of systems. It enables one to tackle verification problems for heterogeneous systems such as combinations of hardware and software, or those that have multiple, varied specifications, or systems that require hybrid modes of modeling. A novel aspect of \uclid is an emphasis on the use of syntax-guided and inductive sy…
▽ More
UCLID5 is a tool for the multi-modal formal modeling, verification, and synthesis of systems. It enables one to tackle verification problems for heterogeneous systems such as combinations of hardware and software, or those that have multiple, varied specifications, or systems that require hybrid modes of modeling. A novel aspect of \uclid is an emphasis on the use of syntax-guided and inductive synthesis to automate steps in modeling and verification. This tool paper presents new developments in the \uclid tool including new language features, integration with new techniques for syntax-guided synthesis and satisfiability solving, support for hyperproperties and combinations of axiomatic and operational modeling, demonstrations on new problem classes, and a robust implementation.
△ Less
Submitted 7 August, 2022;
originally announced August 2022.
-
On Demographic Bias in Fingerprint Recognition
Authors:
Akash Godbole,
Steven A. Grosz,
Karthik Nandakumar,
Anil K. Jain
Abstract:
Fingerprint recognition systems have been deployed globally in numerous applications including personal devices, forensics, law enforcement, banking, and national identity systems. For these systems to be socially acceptable and trustworthy, it is critical that they perform equally well across different demographic groups. In this work, we propose a formal statistical framework to test for the exi…
▽ More
Fingerprint recognition systems have been deployed globally in numerous applications including personal devices, forensics, law enforcement, banking, and national identity systems. For these systems to be socially acceptable and trustworthy, it is critical that they perform equally well across different demographic groups. In this work, we propose a formal statistical framework to test for the existence of bias (demographic differentials) in fingerprint recognition across four major demographic groups (white male, white female, black male, and black female) for two state-of-the-art (SOTA) fingerprint matchers operating in verification and identification modes. Experiments on two different fingerprint databases (with 15,468 and 1,014 subjects) show that demographic differentials in SOTA fingerprint recognition systems decrease as the matcher accuracy increases and any small bias that may be evident is likely due to certain outlier, low-quality fingerprint images.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
Knowledge Base Question Answering by Case-based Reasoning over Subgraphs
Authors:
Rajarshi Das,
Ameya Godbole,
Ankita Naik,
Elliot Tower,
Robin Jia,
Manzil Zaheer,
Hannaneh Hajishirzi,
Andrew McCallum
Abstract:
Question answering (QA) over knowledge bases (KBs) is challenging because of the diverse, essentially unbounded, types of reasoning patterns needed. However, we hypothesize in a large KB, reasoning patterns required to answer a query type reoccur for various entities in their respective subgraph neighborhoods. Leveraging this structural similarity between local neighborhoods of different subgraphs…
▽ More
Question answering (QA) over knowledge bases (KBs) is challenging because of the diverse, essentially unbounded, types of reasoning patterns needed. However, we hypothesize in a large KB, reasoning patterns required to answer a query type reoccur for various entities in their respective subgraph neighborhoods. Leveraging this structural similarity between local neighborhoods of different subgraphs, we introduce a semiparametric model (CBR-SUBG) with (i) a nonparametric component that for each query, dynamically retrieves other similar $k$-nearest neighbor (KNN) training queries along with query-specific subgraphs and (ii) a parametric component that is trained to identify the (latent) reasoning patterns from the subgraphs of KNN queries and then apply them to the subgraph of the target query. We also propose an adaptive subgraph collection strategy to select a query-specific compact subgraph, allowing us to scale to full Freebase KB containing billions of facts. We show that CBR-SUBG can answer queries requiring subgraph reasoning patterns and performs competitively with the best models on several KBQA benchmarks. Our subgraph collection strategy also produces more compact subgraphs (e.g. 55\% reduction in size for WebQSP while increasing answer recall by 4.85\%)\footnote{Code, model, and subgraphs are available at \url{https://github.com/rajarshd/CBR-SUBG}}.
△ Less
Submitted 17 June, 2022; v1 submitted 21 February, 2022;
originally announced February 2022.
-
Probabilistic Total Store Ordering
Authors:
Parosh Aziz Abdulla,
Mohamed Faouzi Atig,
Raj Aryan Agarwal,
Adwait Godbole,
Krishna S
Abstract:
We present $\textit{Probabilistic Total Store Ordering (PTSO)}$ -- a probabilistic extension of the classical TSO semantics. For a given (finite-state) program, the operational semantics of PTSO induces an infinite-state Markov chain. We resolve the inherent non-determinism due to process schedulings and memory updates according to given probability distributions. We provide a comprehensive set of…
▽ More
We present $\textit{Probabilistic Total Store Ordering (PTSO)}$ -- a probabilistic extension of the classical TSO semantics. For a given (finite-state) program, the operational semantics of PTSO induces an infinite-state Markov chain. We resolve the inherent non-determinism due to process schedulings and memory updates according to given probability distributions. We provide a comprehensive set of results showing the decidability of several properties for PTSO, namely (i) Almost-Sure (Repeated) Reachability: whether a run, starting from a given initial configuration, almost surely visits (resp. almost surely repeatedly visits) a given set of target configurations. (ii) Almost-Never (Repeated) Reachability: whether a run from the initial configuration, almost never visits (resp. almost never repeatedly visits) the target. (iii) Approximate Quantitative (Repeated) Reachability: to approximate, up to an arbitrary degree of precision, the measure of runs that start from the initial configuration and (repeatedly) visit the target. (iv) Expected Average Cost: to approximate, up to an arbitrary degree of precision, the expected average cost of a run from the initial configuration to the target. We derive our results through a nontrivial combination of results from the classical theory of (infinite-state) Markov chains, the theories of decisive and eager Markov chains, specific techniques from combinatorics, as well as, decidability and complexity results for the classical (non-probabilistic) TSO semantics. As far as we know, this is the first work that considers probabilistic verification of programs running on weak memory models.
△ Less
Submitted 25 January, 2022;
originally announced January 2022.
-
Case-based Reasoning for Natural Language Queries over Knowledge Bases
Authors:
Rajarshi Das,
Manzil Zaheer,
Dung Thai,
Ameya Godbole,
Ethan Perez,
Jay-Yoon Lee,
Lizhen Tan,
Lazaros Polymenakos,
Andrew McCallum
Abstract:
It is often challenging to solve a complex problem from scratch, but much easier if we can access other similar problems with their solutions -- a paradigm known as case-based reasoning (CBR). We propose a neuro-symbolic CBR approach (CBR-KBQA) for question answering over large knowledge bases. CBR-KBQA consists of a nonparametric memory that stores cases (question and logical forms) and a paramet…
▽ More
It is often challenging to solve a complex problem from scratch, but much easier if we can access other similar problems with their solutions -- a paradigm known as case-based reasoning (CBR). We propose a neuro-symbolic CBR approach (CBR-KBQA) for question answering over large knowledge bases. CBR-KBQA consists of a nonparametric memory that stores cases (question and logical forms) and a parametric model that can generate a logical form for a new question by retrieving cases that are relevant to it. On several KBQA datasets that contain complex questions, CBR-KBQA achieves competitive performance. For example, on the ComplexWebQuestions dataset, CBR-KBQA outperforms the current state of the art by 11\% on accuracy. Furthermore, we show that CBR-KBQA is capable of using new cases \emph{without} any further training: by incorporating a few human-labeled examples in the case memory, CBR-KBQA is able to successfully generate logical forms containing unseen KB entities as well as relations.
△ Less
Submitted 7 November, 2021; v1 submitted 18 April, 2021;
originally announced April 2021.
-
Safety Verification of Parameterized Systems under Release-Acquire
Authors:
Adwait Godbole,
Shankara Narayanan Krishna,
Roland Meyer
Abstract:
We study the safety verification problem for parameterized systems under the release-acquire (RA) semantics. It has been shown that the problem is intractable for systems with unlimited access to atomic compare-and-swap (CAS) instructions. We show that, from a verification perspective where approximate results help, this is overly pessimistic. We study parameterized systems consisting of an unboun…
▽ More
We study the safety verification problem for parameterized systems under the release-acquire (RA) semantics. It has been shown that the problem is intractable for systems with unlimited access to atomic compare-and-swap (CAS) instructions. We show that, from a verification perspective where approximate results help, this is overly pessimistic. We study parameterized systems consisting of an unbounded number of environment threads executing identical but CAS-free programs and a fixed number of distinguished threads that are unrestricted.
Our first contribution is a new semantics that considerably simplifies RA but is still equivalent for the above systems as far as safety verification is concerned. We apply this (general) result to two subclasses of our model. We show that safety verification is only \pspace-complete for the bounded model checking problem where the distinguished threads are loop-free. Interestingly, we can still afford the unbounded environment. We show that the complexity jumps to \nexp-complete for thread-modular verification where an unrestricted distinguished `ego' thread interacts with an environment of CAS-free threads plus loop-free distinguished threads (as in the earlier setting). Besides the usefulness for verification, the results are strong in that they delineate the tractability border for an established semantics.
△ Less
Submitted 5 May, 2022; v1 submitted 28 January, 2021;
originally announced January 2021.
-
Probabilistic Case-based Reasoning for Open-World Knowledge Graph Completion
Authors:
Rajarshi Das,
Ameya Godbole,
Nicholas Monath,
Manzil Zaheer,
Andrew McCallum
Abstract:
A case-based reasoning (CBR) system solves a new problem by retrieving `cases' that are similar to the given problem. If such a system can achieve high accuracy, it is appealing owing to its simplicity, interpretability, and scalability. In this paper, we demonstrate that such a system is achievable for reasoning in knowledge-bases (KBs). Our approach predicts attributes for an entity by gathering…
▽ More
A case-based reasoning (CBR) system solves a new problem by retrieving `cases' that are similar to the given problem. If such a system can achieve high accuracy, it is appealing owing to its simplicity, interpretability, and scalability. In this paper, we demonstrate that such a system is achievable for reasoning in knowledge-bases (KBs). Our approach predicts attributes for an entity by gathering reasoning paths from similar entities in the KB. Our probabilistic model estimates the likelihood that a path is effective at answering a query about the given entity. The parameters of our model can be efficiently computed using simple path statistics and require no iterative optimization. Our model is non-parametric, growing dynamically as new entities and relations are added to the KB. On several benchmark datasets our approach significantly outperforms other rule learning approaches and performs comparably to state-of-the-art embedding-based approaches. Furthermore, we demonstrate the effectiveness of our model in an "open-world" setting where new entities arrive in an online fashion, significantly outperforming state-of-the-art approaches and nearly matching the best offline method. Code available at https://github.com/ameyagodbole/Prob-CBR
△ Less
Submitted 9 October, 2020; v1 submitted 7 October, 2020;
originally announced October 2020.
-
The Decidability of Verification under Promising 2.0
Authors:
Parosh Aziz Abdulla,
Mohamed Faouzi Atig,
Adwait Godbole,
Shankaranarayanan Krishna,
Viktor Vafeiadis
Abstract:
In PLDI'20, Lee et al. introduced the \emph{promising } semantics PS 2.0 of the C++ concurrency that captures most of the common program transformations while satisfying the DRF guarantee. The reachability problem for finite-state programs under PS 2.0 with only release-acquire accesses is already known to be undecidable. Therefore, we address, in this paper, the reachability problem for programs…
▽ More
In PLDI'20, Lee et al. introduced the \emph{promising } semantics PS 2.0 of the C++ concurrency that captures most of the common program transformations while satisfying the DRF guarantee. The reachability problem for finite-state programs under PS 2.0 with only release-acquire accesses is already known to be undecidable. Therefore, we address, in this paper, the reachability problem for programs running under PS 2.0 with relaxed accesses together with promises. We show that this problem is undecidable even in the case where the input program has finite state. Given this undecidability result, we consider the fragment of PS 2.0 with only relaxed accesses allowing bounded number of promises. We show that under this restriction, the reachability is decidable, albeit very expensive: it is non-primitive recursive. Given this high complexity with bounded number of promises and the undecidability result for the RA fragment of PS 2.0, we consider a bounded version of the reachability problem. To this end, we bound both the number of promises and the "view-switches", i.e, the number of times the processes may switch their local views of the global memory. We provide a code-to-code translation from an input program under PS 2.0, with relaxed and release-acquire memory accesses along with promises, to a program under SC. This leads to a reduction of the bounded reachability problem under PS 2.0 to the bounded context-switching problem under SC. We have implemented a prototype tool and tested it on a set of benchmarks, demonstrating that many bugs in programs can be found using a small bound.
△ Less
Submitted 16 October, 2020; v1 submitted 20 July, 2020;
originally announced July 2020.
-
A Simple Approach to Case-Based Reasoning in Knowledge Bases
Authors:
Rajarshi Das,
Ameya Godbole,
Shehzaad Dhuliawala,
Manzil Zaheer,
Andrew McCallum
Abstract:
We present a surprisingly simple yet accurate approach to reasoning in knowledge graphs (KGs) that requires \emph{no training}, and is reminiscent of case-based reasoning in classical artificial intelligence (AI). Consider the task of finding a target entity given a source entity and a binary relation. Our non-parametric approach derives crisp logical rules for each query by finding multiple \text…
▽ More
We present a surprisingly simple yet accurate approach to reasoning in knowledge graphs (KGs) that requires \emph{no training}, and is reminiscent of case-based reasoning in classical artificial intelligence (AI). Consider the task of finding a target entity given a source entity and a binary relation. Our non-parametric approach derives crisp logical rules for each query by finding multiple \textit{graph path patterns} that connect similar source entities through the given relation. Using our method, we obtain new state-of-the-art accuracy, outperforming all previous models, on NELL-995 and FB-122. We also demonstrate that our model is robust in low data settings, outperforming recently proposed meta-learning approaches
△ Less
Submitted 18 July, 2020; v1 submitted 25 June, 2020;
originally announced June 2020.
-
Containment of Simple Regular Path Queries
Authors:
Diego Figueira,
Adwait Godbole,
S. Krishna,
Wim Martens,
Matthias Niewerth,
Tina Trautner
Abstract:
Testing containment of queries is a fundamental reasoning task in knowledge representation. We study here the containment problem for Conjunctive Regular Path Queries (CRPQs), a navigational query language extensively used in ontology and graph database querying. While it is known that containment of CRPQs is expspace-complete in general, we focus here on severely restricted fragments, which are k…
▽ More
Testing containment of queries is a fundamental reasoning task in knowledge representation. We study here the containment problem for Conjunctive Regular Path Queries (CRPQs), a navigational query language extensively used in ontology and graph database querying. While it is known that containment of CRPQs is expspace-complete in general, we focus here on severely restricted fragments, which are known to be highly relevant in practice according to several recent studies. We obtain a detailed overview of the complexity of the containment problem, depending on the features used in the regular expressions of the queries, with completeness results for np, pitwo, pspace or expspace.
△ Less
Submitted 9 March, 2020;
originally announced March 2020.
-
Multi-step Entity-centric Information Retrieval for Multi-Hop Question Answering
Authors:
Ameya Godbole,
Dilip Kavarthapu,
Rajarshi Das,
Zhiyu Gong,
Abhishek Singhal,
Hamed Zamani,
Mo Yu,
Tian Gao,
Xiaoxiao Guo,
Manzil Zaheer,
Andrew McCallum
Abstract:
Multi-hop question answering (QA) requires an information retrieval (IR) system that can find \emph{multiple} supporting evidence needed to answer the question, making the retrieval process very challenging. This paper introduces an IR technique that uses information of entities present in the initially retrieved evidence to learn to `\emph{hop}' to other relevant evidence. In a setting, with more…
▽ More
Multi-hop question answering (QA) requires an information retrieval (IR) system that can find \emph{multiple} supporting evidence needed to answer the question, making the retrieval process very challenging. This paper introduces an IR technique that uses information of entities present in the initially retrieved evidence to learn to `\emph{hop}' to other relevant evidence. In a setting, with more than \textbf{5 million} Wikipedia paragraphs, our approach leads to significant boost in retrieval performance. The retrieved evidence also increased the performance of an existing QA model (without any training) on the \hotpot benchmark by \textbf{10.59} F1.
△ Less
Submitted 17 September, 2019;
originally announced September 2019.
-
Controlling a population
Authors:
Nathalie Bertrand,
Miheer Dewaskar,
Blaise Genest,
Hugo Gimbert,
Adwait Amit Godbole
Abstract:
We introduce a new setting where a population of agents, each modelled by a finite-state system, are controlled uniformly: the controller applies the same action to every agent. The framework is largely inspired by the control of a biological system, namely a population of yeasts, where the controller may only change the environment common to all cells. We study a synchronisation problem for such…
▽ More
We introduce a new setting where a population of agents, each modelled by a finite-state system, are controlled uniformly: the controller applies the same action to every agent. The framework is largely inspired by the control of a biological system, namely a population of yeasts, where the controller may only change the environment common to all cells. We study a synchronisation problem for such populations: no matter how individual agents react to the actions of the controller, the controller aims at driving all agents synchronously to a target state. The agents are naturally represented by a non-deterministic finite state automaton (NFA), the same for every agent, and the whole system is encoded as a 2-player game. The first player (Controller) chooses actions, and the second player (Agents) resolves non-determinism for each agent. The game with m agents is called the m -population game. This gives rise to a parameterized control problem (where control refers to 2 player games), namely the population control problem: can Controller control the m-population game for all m in N whatever Agents does?
△ Less
Submitted 26 July, 2019; v1 submitted 2 July, 2018;
originally announced July 2018.
-
Siamese Neural Networks with Random Forest for detecting duplicate question pairs
Authors:
Ameya Godbole,
Aman Dalmia,
Sunil Kumar Sahu
Abstract:
Determining whether two given questions are semantically similar is a fairly challenging task given the different structures and forms that the questions can take. In this paper, we use Gated Recurrent Units(GRU) in combination with other highly used machine learning algorithms like Random Forest, Adaboost and SVM for the similarity prediction task on a dataset released by Quora, consisting of abo…
▽ More
Determining whether two given questions are semantically similar is a fairly challenging task given the different structures and forms that the questions can take. In this paper, we use Gated Recurrent Units(GRU) in combination with other highly used machine learning algorithms like Random Forest, Adaboost and SVM for the similarity prediction task on a dataset released by Quora, consisting of about 400k labeled question pairs. We got the best result by using the Siamese adaptation of a Bidirectional GRU with a Random Forest classifier, which landed us among the top 24% in the competition Quora Question Pairs hosted on Kaggle.
△ Less
Submitted 28 January, 2018; v1 submitted 22 January, 2018;
originally announced January 2018.
-
$t$-Covering Arrays Generated by a Tiling Probability Model
Authors:
Michael S. Donders,
Anant P. Godbole
Abstract:
A $t-\a$ covering array is an $m\times n$ matrix, with entries from an alphabet of size $α$, such that for any choice of $t$ rows, and any ordered string of $t$ letters of the alphabet, there exists a column such that the "values" of the rows in that column match those of the string of letters. We use the Lovász Local Lemma in conjunction with a new tiling-based probability model to improve the up…
▽ More
A $t-\a$ covering array is an $m\times n$ matrix, with entries from an alphabet of size $α$, such that for any choice of $t$ rows, and any ordered string of $t$ letters of the alphabet, there exists a column such that the "values" of the rows in that column match those of the string of letters. We use the Lovász Local Lemma in conjunction with a new tiling-based probability model to improve the upper bound on the smallest number of columns $N=N(m,t,α)$ of a $t-\a$ covering array.
△ Less
Submitted 1 November, 2010;
originally announced November 2010.