Search | arXiv e-print repository

GenAIPABench: A Benchmark for Generative AI-based Privacy Assistants

Authors: Aamir Hamid, Hemanth Reddy Samidi, Tim Finin, Primal Pappachan, Roberto Yus

Abstract: Privacy policies of websites are often lengthy and intricate. Privacy assistants assist in simplifying policies and making them more accessible and user friendly. The emergence of generative AI (genAI) offers new opportunities to build privacy assistants that can answer users questions about privacy policies. However, genAIs reliability is a concern due to its potential for producing inaccurate in… ▽ More Privacy policies of websites are often lengthy and intricate. Privacy assistants assist in simplifying policies and making them more accessible and user friendly. The emergence of generative AI (genAI) offers new opportunities to build privacy assistants that can answer users questions about privacy policies. However, genAIs reliability is a concern due to its potential for producing inaccurate information. This study introduces GenAIPABench, a benchmark for evaluating Generative AI-based Privacy Assistants (GenAIPAs). GenAIPABench includes: 1) A set of questions about privacy policies and data protection regulations, with annotated answers for various organizations and regulations; 2) Metrics to assess the accuracy, relevance, and consistency of responses; and 3) A tool for generating prompts to introduce privacy documents and varied privacy questions to test system robustness. We evaluated three leading genAI systems ChatGPT-4, Bard, and Bing AI using GenAIPABench to gauge their effectiveness as GenAIPAs. Our results demonstrate significant promise in genAI capabilities in the privacy domain while also highlighting challenges in managing complex queries, ensuring consistency, and verifying source accuracy. △ Less

Submitted 18 December, 2023; v1 submitted 10 September, 2023; originally announced September 2023.

arXiv:2308.05890 [pdf, other]

A Study of the Landscape of Privacy Policies of Smart Devices

Authors: Aamir Hamid, Hemanth Reddy Samidi, Tim Finin, Primal Pappachan, Roberto Yus

Abstract: As the adoption of smart devices continues to permeate all aspects of our lives, user privacy concerns have become more pertinent than ever. Privacy policies outline the data handling practices of these devices. Prior work in the domains of websites and mobile apps has shown that privacy policies are rarely read and understood by users. In these domains, automatic analysis of privacy policies has… ▽ More As the adoption of smart devices continues to permeate all aspects of our lives, user privacy concerns have become more pertinent than ever. Privacy policies outline the data handling practices of these devices. Prior work in the domains of websites and mobile apps has shown that privacy policies are rarely read and understood by users. In these domains, automatic analysis of privacy policies has been shown to help give users appropriate insights. However, there is a lack of such an analysis in the domain of smart device privacy policies. This paper presents a comprehensive study of the landscape of privacy policies of smart devices. We introduce a methodology that addresses the unique challenges of smart devices, by finding information about them, their manufacturers, and their privacy policies on the Web. Our methodology utilizes state-of-the-art analysis techniques to assess readability and privacy of smart device policies and compares it policies of e-commerce websites and mobile applications. Overall, we analyzed 4,556 smart devices, 2,211 manufacturers, and 819 privacy policies. Despite smart devices having access to more intrusive data about their users (using sensors such as cameras and microphones), more than 1,167 of the analyzed manufacturers did not have policies available. The study highlights that significant improvement is required on communicating the data management practices of smart devices. △ Less

Submitted 13 December, 2023; v1 submitted 10 August, 2023; originally announced August 2023.

arXiv:2306.10044 [pdf]

A Practical Entity Linking System for Tables in Scientific Literature

Authors: Varish Mulwad, Tim Finin, Vijay S. Kumar, Jenny Weisenberg Williams, Sharad Dixit, Anupam Joshi

Abstract: Entity linking is an important step towards constructing knowledge graphs that facilitate advanced question answering over scientific documents, including the retrieval of relevant information included in tables within these documents. This paper introduces a general-purpose system for linking entities to items in the Wikidata knowledge base. It describes how we adapt this system for linking domai… ▽ More Entity linking is an important step towards constructing knowledge graphs that facilitate advanced question answering over scientific documents, including the retrieval of relevant information included in tables within these documents. This paper introduces a general-purpose system for linking entities to items in the Wikidata knowledge base. It describes how we adapt this system for linking domain-specific entities, especially for those entities embedded within tables drawn from COVID-19-related scientific literature. We describe the setup of an efficient offline instance of the system that enables our entity-linking approach to be more feasible in practice. As part of a broader approach to infer the semantic meaning of scientific tables, we leverage the structural and semantic characteristics of the tables to improve overall entity linking performance. △ Less

Submitted 11 June, 2023; originally announced June 2023.

Journal ref: 3rd Workshop on Scientific Document Understanding at AAAI-2023

arXiv:2208.02042 [pdf, other]

Quantum-Assisted Greedy Algorithms

Authors: Ramin Ayanzadeh, John E Dorband, Milton Halem, Tim Finin

Abstract: We show how to leverage quantum annealers (QAs) to better select candidates in greedy algorithms. Unlike conventional greedy algorithms that employ problem-specific heuristics for making locally optimal choices at each stage, we use QAs that sample from the ground state of problem-dependent Hamiltonians at cryogenic temperatures and use retrieved samples to estimate the probability distribution of… ▽ More We show how to leverage quantum annealers (QAs) to better select candidates in greedy algorithms. Unlike conventional greedy algorithms that employ problem-specific heuristics for making locally optimal choices at each stage, we use QAs that sample from the ground state of problem-dependent Hamiltonians at cryogenic temperatures and use retrieved samples to estimate the probability distribution of problem variables. More specifically, we look at each spin of the Ising model as a random variable and contract all problem variables whose corresponding uncertainties are negligible. Our empirical results on a D-Wave 2000Q quantum processor demonstrate that the proposed quantum-assisted greedy algorithm (QAGA) scheme can find notably better solutions compared to the state-of-the-art techniques in the realm of quantum annealing △ Less

Submitted 3 August, 2022; originally announced August 2022.

Comments: in Proceedings of the 2022 International Geoscience and Remote Sensing Symposium (IGARSS)

arXiv:2208.01703 [pdf, other]

CAPD: A Context-Aware, Policy-Driven Framework for Secure and Resilient IoBT Operations

Authors: Sai Sree Laya Chukkapalli, Anupam Joshi, Tim Finin, Robert F. Erbacher

Abstract: The Internet of Battlefield Things (IoBT) will advance the operational effectiveness of infantry units. However, this requires autonomous assets such as sensors, drones, combat equipment, and uncrewed vehicles to collaborate, securely share information, and be resilient to adversary attacks in contested multi-domain operations. CAPD addresses this problem by providing a context-aware, policy-drive… ▽ More The Internet of Battlefield Things (IoBT) will advance the operational effectiveness of infantry units. However, this requires autonomous assets such as sensors, drones, combat equipment, and uncrewed vehicles to collaborate, securely share information, and be resilient to adversary attacks in contested multi-domain operations. CAPD addresses this problem by providing a context-aware, policy-driven framework supporting data and knowledge exchange among autonomous entities in a battlespace. We propose an IoBT ontology that facilitates controlled information sharing to enable semantic interoperability between systems. Its key contributions include providing a knowledge graph with a shared semantic schema, integration with background knowledge, efficient mechanisms for enforcing data consistency and drawing inferences, and supporting attribute-based access control. The sensors in the IoBT provide data that create populated knowledge graphs based on the ontology. This paper describes using CAPD to detect and mitigate adversary actions. CAPD enables situational awareness using reasoning over the sensed data and SPARQL queries. For example, adversaries can cause sensor failure or hijacking and disrupt the tactical networks to degrade video surveillance. In such instances, CAPD uses an ontology-based reasoner to see how alternative approaches can still support the mission. Depending on bandwidth availability, the reasoner initiates the creation of a reduced frame rate grayscale video by active transcoding or transmits only still images. This ability to reason over the mission sensed environment and attack context permits the autonomous IoBT system to exhibit resilience in contested conditions. △ Less

Submitted 2 August, 2022; originally announced August 2022.

arXiv:2208.01693 [pdf, other]

Recognizing and Extracting Cybersecurtity-relevant Entities from Text

Authors: Casey Hanks, Michael Maiden, Priyanka Ranade, Tim Finin, Anupam Joshi

Abstract: Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks and is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG). There is a strong need to develop community-accessible datasets to train existing AI-based cybersecurity pipelines to efficiently and accurately extract meaningful insights from CTI… ▽ More Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks and is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG). There is a strong need to develop community-accessible datasets to train existing AI-based cybersecurity pipelines to efficiently and accurately extract meaningful insights from CTI. We have created an initial unstructured CTI corpus from a variety of open sources that we are using to train and test cybersecurity entity models using the spaCy framework and exploring self-learning methods to automatically recognize cybersecurity entities. We also describe methods to apply cybersecurity domain entity linking with existing world knowledge from Wikidata. Our future work will survey and test spaCy NLP tools and create methods for continuous integration of new information extracted from text. △ Less

Submitted 2 August, 2022; originally announced August 2022.

Journal ref: Workshop on Machine Learning for Cybersecurity, 2022 International Conference on Machine Learning

arXiv:2102.04351 [pdf, other]

Generating Fake Cyber Threat Intelligence Using Transformer-Based Models

Authors: Priyanka Ranade, Aritran Piplai, Sudip Mittal, Anupam Joshi, Tim Finin

Abstract: Cyber-defense systems are being developed to automatically ingest Cyber Threat Intelligence (CTI) that contains semi-structured data and/or text to populate knowledge graphs. A potential risk is that fake CTI can be generated and spread through Open-Source Intelligence (OSINT) communities or on the Web to effect a data poisoning attack on these systems. Adversaries can use fake CTI examples as tra… ▽ More Cyber-defense systems are being developed to automatically ingest Cyber Threat Intelligence (CTI) that contains semi-structured data and/or text to populate knowledge graphs. A potential risk is that fake CTI can be generated and spread through Open-Source Intelligence (OSINT) communities or on the Web to effect a data poisoning attack on these systems. Adversaries can use fake CTI examples as training input to subvert cyber defense systems, forcing the model to learn incorrect inputs to serve their malicious needs. In this paper, we automatically generate fake CTI text descriptions using transformers. We show that given an initial prompt sentence, a public language model like GPT-2 with fine-tuning, can generate plausible CTI text with the ability of corrupting cyber-defense systems. We utilize the generated fake CTI text to perform a data poisoning attack on a Cybersecurity Knowledge Graph (CKG) and a cybersecurity corpus. The poisoning attack introduced adverse impacts such as returning incorrect reasoning outputs, representation poisoning, and corruption of other dependent AI-based cyber defense systems. We evaluate with traditional approaches and conduct a human evaluation study with cybersecurity professionals and threat hunters. Based on the study, professional threat hunters were equally likely to consider our fake generated CTI as true. △ Less

Submitted 18 June, 2021; v1 submitted 8 February, 2021; originally announced February 2021.

Comments: In Proceedings of International Joint Conference on Neural Networks 2021 (IJCNN 2021), July 2021

arXiv:2010.00115 [pdf, other]

Multi-Qubit Correction for Quantum Annealers

Authors: Ramin Ayanzadeh, John Dorband, Milton Halem, Tim Finin

Abstract: We present \emph{multi-qubit correction} (MQC) as a novel postprocessing method for quantum annealers that views the evolution in an open-system as a Gibbs sampler and reduces a set of excited states to a new synthetic state with lower energy value. After sampling from the ground state of a given (Ising) Hamiltonian, MQC compares pairs of excited states to recognize virtual tunnels--i.e., a group… ▽ More We present \emph{multi-qubit correction} (MQC) as a novel postprocessing method for quantum annealers that views the evolution in an open-system as a Gibbs sampler and reduces a set of excited states to a new synthetic state with lower energy value. After sampling from the ground state of a given (Ising) Hamiltonian, MQC compares pairs of excited states to recognize virtual tunnels--i.e., a group of qubits that changing their states simultaneously can result in a new state with lower energy value--and successively converges to the ground state. Experimental results using D-Wave 2000Q quantum annealers demonstrate that MQC finds samples with notably lower energy values and improves the reproducibility of results when compared to recent hardware/software advances in the realm of quantum annealing, such as spin-reversal transforms, classical postprocessing techniques, and increased inter-sample delay between successive measurements. △ Less

Submitted 10 July, 2021; v1 submitted 30 September, 2020; originally announced October 2020.

arXiv:2006.04682 [pdf, ps, other]

An Ensemble Approach for Compressive Sensing with Quantum

Authors: Ramin Ayanzadeh, Milton Halem, Tim Finin

Abstract: We leverage the idea of a statistical ensemble to improve the quality of quantum annealing based binary compressive sensing. Since executing quantum machine instructions on a quantum annealer can result in an excited state, rather than the ground state of the given Hamiltonian, we use different penalty parameters to generate multiple distinct quadratic unconstrained binary optimization (QUBO) func… ▽ More We leverage the idea of a statistical ensemble to improve the quality of quantum annealing based binary compressive sensing. Since executing quantum machine instructions on a quantum annealer can result in an excited state, rather than the ground state of the given Hamiltonian, we use different penalty parameters to generate multiple distinct quadratic unconstrained binary optimization (QUBO) functions whose ground state(s) represent a potential solution of the original problem. We then employ the attained samples from minimizing all corresponding (different) QUBOs to estimate the solution of the problem of binary compressive sensing. Our experiments, on a D-Wave 2000Q quantum processor, demonstrated that the proposed ensemble scheme is notably less sensitive to the calibration of the penalty parameter that controls the trade-off between the feasibility and sparsity of recoveries. △ Less

Submitted 8 June, 2020; originally announced June 2020.

arXiv:2003.03072 [pdf, other]

Improving Neural Named Entity Recognition with Gazetteers

Authors: Chan Hee Song, Dawn Lawrie, Tim Finin, James Mayfield

Abstract: The goal of this work is to improve the performance of a neural named entity recognition system by adding input features that indicate a word is part of a name included in a gazetteer. This article describes how to generate gazetteers from the Wikidata knowledge graph as well as how to integrate the information into a neural NER system. Experiments reveal that the approach yields performance gains… ▽ More The goal of this work is to improve the performance of a neural named entity recognition system by adding input features that indicate a word is part of a name included in a gazetteer. This article describes how to generate gazetteers from the Wikidata knowledge graph as well as how to integrate the information into a neural NER system. Experiments reveal that the approach yields performance gains in two distinct languages: a high-resource, word-based language, English and a high-resource, character-based language, Chinese. Experiments were also performed in a low-resource language, Russian on a newly annotated Russian NER corpus from Reddit tagged with four core types and twelve extended types. This article reports a baseline score. It is a longer version of a paper in the 33rd FLAIRS conference (Song et al. 2020). △ Less

Submitted 6 March, 2020; originally announced March 2020.

Comments: Short version accepted to the 33rd FLAIRS conference

arXiv:2001.00234 [pdf, other]

Reinforcement Quantum Annealing: A Quantum-Assisted Learning Automata Approach

Authors: Ramin Ayanzadeh, Milton Halem, Tim Finin

Abstract: We introduce the reinforcement quantum annealing (RQA) scheme in which an intelligent agent interacts with a quantum annealer that plays the stochastic environment role of learning automata and tries to iteratively find better Ising Hamiltonians for the given problem of interest. As a proof-of-concept, we propose a novel approach for reducing the NP-complete problem of Boolean satisfiability (SAT)… ▽ More We introduce the reinforcement quantum annealing (RQA) scheme in which an intelligent agent interacts with a quantum annealer that plays the stochastic environment role of learning automata and tries to iteratively find better Ising Hamiltonians for the given problem of interest. As a proof-of-concept, we propose a novel approach for reducing the NP-complete problem of Boolean satisfiability (SAT) to minimizing Ising Hamiltonians and show how to apply the RQA for increasing the probability of finding the global optimum. Our experimental results on two different benchmark SAT problems (namely factoring pseudo-prime numbers and random SAT with phase transitions), using a D-Wave 2000Q quantum processor, demonstrated that RQA finds notably better solutions with fewer samples, compared to state-of-the-art techniques in the realm of quantum annealing. △ Less

Submitted 1 January, 2020; originally announced January 2020.

arXiv:1912.02362 [pdf, other]

Quantum-Assisted Greedy Algorithms

Authors: Ramin Ayanzadeh, Milton Halem, John Dorband, Tim Finin

Abstract: We show how to leverage quantum annealers to better select candidates in greedy algorithms. Unlike conventional greedy algorithms that employ problem-specific heuristics for making locally optimal choices at each stage, we use quantum annealers that sample from the ground state(s) of a problem-dependent Ising Hamiltonians at cryogenic temperatures and use retrieved samples to estimate the probabil… ▽ More We show how to leverage quantum annealers to better select candidates in greedy algorithms. Unlike conventional greedy algorithms that employ problem-specific heuristics for making locally optimal choices at each stage, we use quantum annealers that sample from the ground state(s) of a problem-dependent Ising Hamiltonians at cryogenic temperatures and use retrieved samples to estimate the probability distribution of problem variables. More specifically, we look at each spin of the Ising model as a random variable and contract all problem variables whose corresponding uncertainties are negligible. Our empirical results, on a D-Wave 2000Q quantum processor, revealed that the proposed quantum-assisted greedy algorithm (QAGA) can find notably better solutions (i.e., samples with lower energy value), compared to the state-of-the-art techniques in the realm of quantum annealing. △ Less

Submitted 4 February, 2020; v1 submitted 4 December, 2019; originally announced December 2019.

arXiv:1910.03678 [pdf, other]

Unfolding the Structure of a Document using Deep Learning

Authors: Muhammad Mahbubur Rahman, Tim Finin

Abstract: Understanding and extracting of information from large documents, such as business opportunities, academic articles, medical documents and technical reports, poses challenges not present in short documents. Such large documents may be multi-themed, complex, noisy and cover diverse topics. We describe a framework that can analyze large documents and help people and computer systems locate desired i… ▽ More Understanding and extracting of information from large documents, such as business opportunities, academic articles, medical documents and technical reports, poses challenges not present in short documents. Such large documents may be multi-themed, complex, noisy and cover diverse topics. We describe a framework that can analyze large documents and help people and computer systems locate desired information in them. We aim to automatically identify and classify different sections of documents and understand their purpose within the document. A key contribution of our research is modeling and extracting the logical and semantic structure of electronic documents using deep learning techniques. We evaluate the effectiveness and robustness of our framework through extensive experiments on two collections: more than one million scholarly articles from arXiv and a collection of requests for proposal documents from government sources. △ Less

Submitted 29 September, 2019; originally announced October 2019.

Comments: 16 pages, 16 figures and 10 tables. arXiv admin note: text overlap with arXiv:1709.00770

arXiv:1905.02895 [pdf, other]

Cyber-All-Intel: An AI for Security related Threat Intelligence

Authors: Sudip Mittal, Anupam Joshi, Tim Finin

Abstract: Kee** up with threat intelligence is a must for a security analyst today. There is a volume of information present in `the wild' that affects an organization. We need to develop an artificial intelligence system that scours the intelligence sources, to keep the analyst updated about various threats that pose a risk to her organization. A security analyst who is better `tapped in' can be more eff… ▽ More Kee** up with threat intelligence is a must for a security analyst today. There is a volume of information present in `the wild' that affects an organization. We need to develop an artificial intelligence system that scours the intelligence sources, to keep the analyst updated about various threats that pose a risk to her organization. A security analyst who is better `tapped in' can be more effective. In this paper we present, Cyber-All-Intel an artificial intelligence system to aid a security analyst. It is a system for knowledge extraction, representation and analytics in an end-to-end pipeline grounded in the cybersecurity informatics domain. It uses multiple knowledge representations like, vector spaces and knowledge graphs in a 'VKG structure' to store incoming intelligence. The system also uses neural network models to pro-actively improve its knowledge. We have also created a query engine and an alert system that can be used by an analyst to find actionable cybersecurity insights. △ Less

Submitted 7 May, 2019; originally announced May 2019.

Comments: arXiv admin note: substantial text overlap with arXiv:1708.03310

arXiv:1903.03650 [pdf, other]

SAT-based Compressive Sensing

Authors: Ramin Ayanzadeh, Milton Halem, Tim Finin

Abstract: We propose to reduce the original well-posed problem of compressive sensing to weighted-MAX-SAT. Compressive sensing is a novel randomized data acquisition approach that linearly samples sparse or compressible signals at a rate much below the Nyquist-Shannon sampling rate. The original problem of compressive sensing in sparse recovery is NP-hard; therefore, in addition to restrictions for the uniq… ▽ More We propose to reduce the original well-posed problem of compressive sensing to weighted-MAX-SAT. Compressive sensing is a novel randomized data acquisition approach that linearly samples sparse or compressible signals at a rate much below the Nyquist-Shannon sampling rate. The original problem of compressive sensing in sparse recovery is NP-hard; therefore, in addition to restrictions for the uniqueness of the sparse solution, the coding matrix has also to satisfy additional stringent constraints -usually the restricted isometry property (RIP)- so we can handle it by its convex or nonconvex relaxations. In practice, such constraints are not only intractable to be verified but also invalid in broad applications. We first divide the well-posed problem of compressive sensing into relaxed sub-problems and represent them as separate SAT instances in conjunctive normal form (CNF). After merging the resulting sub-problems, we assign weights to all clauses in such a way that the aggregated weighted-MAX-SAT can guarantee successful recovery of the original signal. The only requirement in our approach is the solution uniqueness of the associated problems, which is notably looser. As a proof of concept, we demonstrate the applicability of our approach in tackling the original problem of binary compressive sensing with binary design matrices. Experimental results demonstrate the supremacy of the proposed SAT-based compressive sensing over the $\ell_1$-minimization in the robust recovery of sparse binary signals. SAT-based compressive sensing on average requires 8.3% fewer measurements for exact recovery of highly sparse binary signals ($s/N\approx 0.1$). When $s/N \approx 0.5$, the $\ell_1$-minimization on average requires 22.2% more measurements for exact reconstruction of the binary signals. Thus, the proposed SAT-based compressive sensing is less sensitive to the sparsity of the original signals. △ Less

Submitted 25 May, 2019; v1 submitted 6 March, 2019; originally announced March 2019.

arXiv:1902.03077 [pdf, other]

Knowledge Graph Fact Prediction via Knowledge-Enriched Tensor Factorization

Authors: Ankur Padia, Kostantinos Kalpakis, Francis Ferraro, Tim Finin

Abstract: We present a family of novel methods for embedding knowledge graphs into real-valued tensors. These tensor-based embeddings capture the ordered relations that are typical in the knowledge graphs represented by semantic web languages like RDF. Unlike many previous models, our methods can easily use prior background knowledge provided by users or extracted automatically from existing knowledge graph… ▽ More We present a family of novel methods for embedding knowledge graphs into real-valued tensors. These tensor-based embeddings capture the ordered relations that are typical in the knowledge graphs represented by semantic web languages like RDF. Unlike many previous models, our methods can easily use prior background knowledge provided by users or extracted automatically from existing knowledge graphs. In addition to providing more robust methods for knowledge graph embedding, we provide a provably-convergent, linear tensor factorization algorithm. We demonstrate the efficacy of our models for the task of predicting new facts across eight different knowledge graphs, achieving between 5% and 50% relative improvement over existing state-of-the-art knowledge graph embedding techniques. Our empirical evaluation shows that all of the tensor decomposition models perform well when the average degree of an entity in a graph is high, with constraint-based models doing better on graphs with a small number of highly similar relations and regularization-based models dominating for graphs with relations of varying degrees of similarity. △ Less

Submitted 8 February, 2019; originally announced February 2019.

Comments: accepted by the Journal of Web Semantics, to appear 2019

arXiv:1901.00088 [pdf]

Quantum Annealing Based Binary Compressive Sensing with Matrix Uncertainty

Authors: Ramin Ayanzadeh, Seyedahmad Mousavi, Milton Halem, Tim Finin

Abstract: Compressive sensing is a novel approach that linearly samples sparse or compressible signals at a rate much below the Nyquist-Shannon sampling rate and outperforms traditional signal processing techniques in acquiring and reconstructing such signals. Compressive sensing with matrix uncertainty is an extension of the standard compressive sensing problem that appears in various applications includin… ▽ More Compressive sensing is a novel approach that linearly samples sparse or compressible signals at a rate much below the Nyquist-Shannon sampling rate and outperforms traditional signal processing techniques in acquiring and reconstructing such signals. Compressive sensing with matrix uncertainty is an extension of the standard compressive sensing problem that appears in various applications including but not limited to cognitive radio sensing, calibration of the antenna, and deconvolution. The original problem of compressive sensing is NP-hard so the traditional techniques, such as convex and nonconvex relaxations and greedy algorithms, apply stringent constraints on the measurement matrix to indirectly handle this problem in the realm of classical computing. We propose well-posed approaches for both binary compressive sensing and binary compressive sensing with matrix uncertainty problems that are tractable by quantum annealers. Our approach formulates an Ising model whose ground state represents a sparse solution for the binary compressive sensing problem and then employs an alternating minimization scheme to tackle the binary compressive sensing with matrix uncertainty problem. This setting only requires the solution uniqueness of the considered problem to have a successful recovery process, and therefore the required conditions on the measurement matrix are notably looser. As a proof of concept, we can demonstrate the applicability of the proposed approach on the D-Wave quantum annealers; however, we can adapt our method to employ other modern computing phenomena -like adiabatic quantum computers (in general), CMOS annealers, optical parametric oscillators, and neuromorphic computing. △ Less

Submitted 31 December, 2018; originally announced January 2019.

arXiv:1810.13223 [pdf, other]

SURFACE: Semantically Rich Fact Validation with Explanations

Authors: Ankur Padia, Francis Ferraro, Tim Finin

Abstract: Judging the veracity of a sentence making one or more claims is an important and challenging problem with many dimensions. The recent FEVER task asked participants to classify input sentences as either SUPPORTED, REFUTED or NotEnoughInfo using Wikipedia as a source of true facts. SURFACE does this task and explains its decision through a selection of sentences from the trusted source. Our multi-ta… ▽ More Judging the veracity of a sentence making one or more claims is an important and challenging problem with many dimensions. The recent FEVER task asked participants to classify input sentences as either SUPPORTED, REFUTED or NotEnoughInfo using Wikipedia as a source of true facts. SURFACE does this task and explains its decision through a selection of sentences from the trusted source. Our multi-task neural approach uses semantic lexical frames from FrameNet to jointly (i) find relevant evidential sentences in the trusted source and (ii) use them to classify the input sentence's veracity. An evaluation of our efficient three-parameter model on the FEVER dataset showed an improvement of 90% over the state-of-the-art baseline on retrieving relevant sentences and a 70% relative improvement in classification. △ Less

Submitted 31 October, 2018; originally announced October 2018.

arXiv:1808.04816 [pdf, other]

Jointly Identifying and Fixing Inconsistent Readings from Information Extraction Systems

Authors: Ankur Padia, Francis Ferraro, Tim Finin

Abstract: KGCleaner is a framework to identify and correct errors in data produced and delivered by an information extraction system. These tasks have been understudied and KGCleaner is the first to address both. We introduce a multi-task model that jointly learns to predict if an extracted relation is credible and repair it if not. We evaluate our approach and other models as instance of our framework on t… ▽ More KGCleaner is a framework to identify and correct errors in data produced and delivered by an information extraction system. These tasks have been understudied and KGCleaner is the first to address both. We introduce a multi-task model that jointly learns to predict if an extracted relation is credible and repair it if not. We evaluate our approach and other models as instance of our framework on two collections: a Wikidata corpus of nearly 700K facts and 5M fact-relevant sentences and a collection of 30K facts from the 2015 TAC Knowledge Base Population task. For credibility classification, parameter efficient simple shallow neural network can achieve an absolute performance gain of 30 $F_1$ points on Wikidata and comparable performance on TAC. For the repair task, significant performance (at more than twice) gain can be obtained depending on the nature of the dataset and the models. △ Less

Submitted 26 January, 2023; v1 submitted 14 August, 2018; originally announced August 2018.

Comments: Accepted at Deep Learning Inside Out (DeeLIO) workshop at ACL 2022

arXiv:1808.00116 [pdf, other]

Cognitive Techniques for Early Detection of Cybersecurity Events

Authors: Sandeep Narayanan, Ashwinkumar Ganesan, Karuna Joshi, Tim Oates, Anupam Joshi, Tim Finin

Abstract: The early detection of cybersecurity events such as attacks is challenging given the constantly evolving threat landscape. Even with advanced monitoring, sophisticated attackers can spend as many as 146 days in a system before being detected. This paper describes a novel, cognitive framework that assists a security analyst by exploiting the power of semantically rich knowledge representation and r… ▽ More The early detection of cybersecurity events such as attacks is challenging given the constantly evolving threat landscape. Even with advanced monitoring, sophisticated attackers can spend as many as 146 days in a system before being detected. This paper describes a novel, cognitive framework that assists a security analyst by exploiting the power of semantically rich knowledge representation and reasoning with machine learning techniques. Our Cognitive Cybersecurity system ingests information from textual sources, and various agents representing host and network-based sensors, and represents this information in a knowledge graph. This graph uses terms from an extended version of the Unified Cybersecurity Ontology. The system reasons over the knowledge graph to derive better actionable intelligence to security administrators, thus decreasing their cognitive load and increasing their confidence in the system. We have developed a proof of concept framework for our approach and demonstrate its capabilities using a custom-built ransomware instance that is similar to WannaCry. △ Less

Submitted 31 July, 2018; originally announced August 2018.

arXiv:1807.10965 [pdf, other]

Ontology-Grounded Topic Modeling for Climate Science Research

Authors: Jennifer Sleeman, Tim Finin, Milton Halem

Abstract: In scientific disciplines where research findings have a strong impact on society, reducing the amount of time it takes to understand, synthesize and exploit the research is invaluable. Topic modeling is an effective technique for summarizing a collection of documents to find the main themes among them and to classify other documents that have a similar mixture of co-occurring words. We show how g… ▽ More In scientific disciplines where research findings have a strong impact on society, reducing the amount of time it takes to understand, synthesize and exploit the research is invaluable. Topic modeling is an effective technique for summarizing a collection of documents to find the main themes among them and to classify other documents that have a similar mixture of co-occurring words. We show how grounding a topic model with an ontology, extracted from a glossary of important domain phrases, improves the topics generated and makes them easier to understand. We apply and evaluate this method to the climate science domain. The result improves the topics generated and supports faster research understanding, discovery of social networks among researchers, and automatic ontology generation. △ Less

Submitted 30 July, 2018; v1 submitted 28 July, 2018; originally announced July 2018.

Comments: To appear in Proc. of Semantic Web for Social Good Workshop of the Int. Semantic Web Conf., Oct 2018 and published as part of the book "Emerging Topics in Semantic Technologies. ISWC 2018 Satellite Events", E. Demidova, A.J. Zaveri, E. Simperl (Eds.), ISBN: 978-3-89838-736-1, 2018, AKA Verlag Berlin, (edited authors)

ACM Class: I.2.4; I.2.6; I.2.7

arXiv:1807.09842 [pdf, other]

Understanding and representing the semantics of large structured documents

Authors: Muhammad Mahbubur Rahman, Tim Finin

Abstract: Understanding large, structured documents like scholarly articles, requests for proposals or business reports is a complex and difficult task. It involves discovering a document's overall purpose and subject(s), understanding the function and meaning of its sections and subsections, and extracting low level entities and facts about them. In this research, we present a deep learning based document… ▽ More Understanding large, structured documents like scholarly articles, requests for proposals or business reports is a complex and difficult task. It involves discovering a document's overall purpose and subject(s), understanding the function and meaning of its sections and subsections, and extracting low level entities and facts about them. In this research, we present a deep learning based document ontology to capture the general purpose semantic structure and domain specific semantic concepts from a large number of academic articles and business documents. The ontology is able to describe different functional parts of a document, which can be used to enhance semantic indexing for a better understanding by human beings and machines. We evaluate our models through extensive experiments on datasets of scholarly articles from arXiv and Request for Proposal documents. △ Less

Submitted 24 July, 2018; originally announced July 2018.

Comments: 10 pages, 6 figures, 28 references and 2 tables

Journal ref: Semantic Deep Learning at ISWC 2018

arXiv:1709.00770 [pdf, other]

Understanding the Logical and Semantic Structure of Large Documents

Authors: Muhammad Mahbubur Rahman, Tim Finin

Abstract: Current language understanding approaches focus on small documents, such as newswire articles, blog posts, product reviews and discussion forum entries. Understanding and extracting information from large documents like legal briefs, proposals, technical manuals and research articles is still a challenging task. We describe a framework that can analyze a large document and help people to know wher… ▽ More Current language understanding approaches focus on small documents, such as newswire articles, blog posts, product reviews and discussion forum entries. Understanding and extracting information from large documents like legal briefs, proposals, technical manuals and research articles is still a challenging task. We describe a framework that can analyze a large document and help people to know where a particular information is in that document. We aim to automatically identify and classify semantic sections of documents and assign consistent and human-understandable labels to similar sections across documents. A key contribution of our research is modeling the logical and semantic structure of an electronic document. We apply machine learning techniques, including deep learning, in our prototype system. We also make available a dataset of information about a collection of scholarly articles from the arXiv eprints collection that includes a wide range of metadata for each article, including a table of contents, section labels, section summarizations and more. We hope that this dataset will be a useful resource for the machine learning and NLP communities in information retrieval, content-based question answering and language modeling. △ Less

Submitted 3 September, 2017; originally announced September 2017.

Comments: 10 pages, 15 figures and 6 tables

arXiv:1708.03310 [pdf, other]

Thinking, Fast and Slow: Combining Vector Spaces and Knowledge Graphs

Authors: Sudip Mittal, Anupam Joshi, Tim Finin

Abstract: Knowledge graphs and vector space models are robust knowledge representation techniques with individual strengths and weaknesses. Vector space models excel at determining similarity between concepts, but are severely constrained when evaluating complex dependency relations and other logic-based operations that are a strength of knowledge graphs. We describe the VKG structure that helps unify knowl… ▽ More Knowledge graphs and vector space models are robust knowledge representation techniques with individual strengths and weaknesses. Vector space models excel at determining similarity between concepts, but are severely constrained when evaluating complex dependency relations and other logic-based operations that are a strength of knowledge graphs. We describe the VKG structure that helps unify knowledge graphs and vector representation of entities, and enables powerful inference methods and search capabilities that combine their complementary strengths. We analogize this to thinking `fast' in vector space along with thinking 'slow' and `deeply' by reasoning over the knowledge graph. We have created a query processing engine that takes complex queries and decomposes them into subqueries optimized to run on the respective knowledge graph or vector view of a VKG. We show that the VKG structure can process specific queries that are not efficiently handled by vector spaces or knowledge graphs alone. We also demonstrate and evaluate the VKG structure and the query processing engine by develo** a system called Cyber-All-Intel for knowledge extraction, representation and querying in an end-to-end pipeline grounded in the cybersecurity informatics domain. △ Less

Submitted 20 August, 2017; v1 submitted 10 August, 2017; originally announced August 2017.

arXiv:1506.00301 [pdf, ps, other]

Interactive Knowledge Base Population

Authors: Travis Wolfe, Mark Dredze, James Mayfield, Paul McNamee, Craig Harman, Tim Finin, Benjamin Van Durme

Abstract: Most work on building knowledge bases has focused on collecting entities and facts from as large a collection of documents as possible. We argue for and describe a new paradigm where the focus is on a high-recall extraction over a small collection of documents under the supervision of a human expert, that we call Interactive Knowledge Base Population (IKBP). Most work on building knowledge bases has focused on collecting entities and facts from as large a collection of documents as possible. We argue for and describe a new paradigm where the focus is on a high-recall extraction over a small collection of documents under the supervision of a human expert, that we call Interactive Knowledge Base Population (IKBP). △ Less

Submitted 31 May, 2015; originally announced June 2015.

arXiv:cs/9809034 [pdf, ps, other]

Semantics and Conversations for an Agent Communication Language

Authors: Yannis Labrou, Tim Finin

Abstract: We address the issues of semantics and conversations for agent communication languages and the Knowledge Query Manipulation Language (KQML) in particular. Based on ideas from speech act theory, we present a semantic description for KQML that associates ``cognitive'' states of the agent with the use of the language's primitives (performatives). We have used this approach to describe the semantics… ▽ More We address the issues of semantics and conversations for agent communication languages and the Knowledge Query Manipulation Language (KQML) in particular. Based on ideas from speech act theory, we present a semantic description for KQML that associates ``cognitive'' states of the agent with the use of the language's primitives (performatives). We have used this approach to describe the semantics for the whole set of reserved KQML performatives. Building on the semantics, we devise the conversation policies, i.e., a formal description of how KQML performatives may be combined into KQML exchanges (conversations), using a Definite Clause Grammar. Our research offers methods for a speech act theory-based semantic description of a language of communication acts and for the specification of the protocols associated with these acts. Languages of communication acts address the issue of communication among software applications at a level of abstraction that is useful to the emerging software agents paradigm. △ Less

Submitted 18 September, 1998; originally announced September 1998.

Comments: Also in in "Readings in Agents", Michael Huhns and Munindar Singh (eds), Morgan Kaufmann Publishers, Inc

ACM Class: I.2.11

Journal ref: Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97) August, 1997

Showing 1–26 of 26 results for author: Finin, T