-
Noisy Positive-Unlabeled Learning with Self-Training for Speculative Knowledge Graph Reasoning
Authors:
Ruijie Wang,
Baoyu Li,
Yichen Lu,
Dachun Sun,
**ning Li,
Yuchen Yan,
Shengzhong Liu,
Hanghang Tong,
Tarek F. Abdelzaher
Abstract:
This paper studies speculative reasoning task on real-world knowledge graphs (KG) that contain both \textit{false negative issue} (i.e., potential true facts being excluded) and \textit{false positive issue} (i.e., unreliable or outdated facts being included). State-of-the-art methods fall short in the speculative reasoning ability, as they assume the correctness of a fact is solely determined by…
▽ More
This paper studies speculative reasoning task on real-world knowledge graphs (KG) that contain both \textit{false negative issue} (i.e., potential true facts being excluded) and \textit{false positive issue} (i.e., unreliable or outdated facts being included). State-of-the-art methods fall short in the speculative reasoning ability, as they assume the correctness of a fact is solely determined by its presence in KG, making them vulnerable to false negative/positive issues. The new reasoning task is formulated as a noisy Positive-Unlabeled learning problem. We propose a variational framework, namely nPUGraph, that jointly estimates the correctness of both collected and uncollected facts (which we call \textit{label posterior}) and updates model parameters during training. The label posterior estimation facilitates speculative reasoning from two perspectives. First, it improves the robustness of a label posterior-aware graph encoder against false positive links. Second, it identifies missing facts to provide high-quality grounds of reasoning. They are unified in a simple yet effective self-training procedure. Empirically, extensive experiments on three benchmark KG and one Twitter dataset with various degrees of false negative/positive cases demonstrate the effectiveness of nPUGraph.
△ Less
Submitted 12 June, 2023;
originally announced June 2023.
-
New Frontiers in IoT: Networking, Systems, Reliability, and Security Challenges
Authors:
Saurabh Bagchi,
Tarek F. Abdelzaher,
Ramesh Govindan,
Prashant Shenoy,
Akanksha Atrey,
Pradipta Ghosh,
Ran Xu
Abstract:
The field of IoT has blossomed and is positively influencing many application domains. In this paper, we bring out the unique challenges this field poses to research in computer systems and networking. The unique challenges arise from the unique characteristics of IoT systems such as the diversity of application domains where they are used and the increasingly demanding protocols they are being ca…
▽ More
The field of IoT has blossomed and is positively influencing many application domains. In this paper, we bring out the unique challenges this field poses to research in computer systems and networking. The unique challenges arise from the unique characteristics of IoT systems such as the diversity of application domains where they are used and the increasingly demanding protocols they are being called upon to run (such as, video and LIDAR processing) on constrained resources (on-node and network). We show how these open challenges can benefit from foundations laid in other areas, such as, 5G cellular protocols, ML model reduction, and device-edge-cloud offloading. We then discuss the unique challenges for reliability, security, and privacy posed by IoT systems due to their salient characteristics which include heterogeneity of devices and protocols, dependence on the physical environment, and the close coupling with humans. We again show how the open research challenges benefit from reliability, security, and privacy advancements in other areas. We conclude by providing a vision for a desirable end state for IoT systems.
△ Less
Submitted 14 May, 2020;
originally announced May 2020.
-
Hierarchical Overlap** Belief Estimation by Structured Matrix Factorization
Authors:
Chaoqi Yang,
**yang Li,
Ruijie Wang,
Shuochao Yao,
Huajie Shao,
Dongxin Liu,
Shengzhong Liu,
Tianshi Wang,
Tarek F. Abdelzaher
Abstract:
Much work on social media opinion polarization focuses on a flat categorization of stances (or orthogonal beliefs) of different communities from media traces. We extend in this work in two important respects. First, we detect not only points of disagreement between communities, but also points of agreement. In other words, we estimate community beliefs in the presence of overlap. Second, in lieu o…
▽ More
Much work on social media opinion polarization focuses on a flat categorization of stances (or orthogonal beliefs) of different communities from media traces. We extend in this work in two important respects. First, we detect not only points of disagreement between communities, but also points of agreement. In other words, we estimate community beliefs in the presence of overlap. Second, in lieu of flat categorization, we consider hierarchical belief estimation, where communities might be hierarchically divided. For example, two opposing parties might disagree on core issues, but within a party, despite agreement on fundamentals, disagreement might occur on further details. We call the resulting combined problem a hierarchical overlap** belief estimation problem. To solve it, this paper develops a new class of unsupervised Non-negative Matrix Factorization (NMF) algorithms, we call Belief Structured Matrix Factorization (BSMF). Our proposed unsupervised algorithm captures both the latent belief intersections and dissimilarities, as well as a hierarchical structure. We discuss the properties of the algorithm and evaluate it on both synthetic and real-world datasets. In the synthetic dataset, our model reduces error by 40%. In real Twitter traces, it improves accuracy by around 10%. The model also achieves 96.08% self-consistency in a sanity check.
△ Less
Submitted 19 September, 2022; v1 submitted 13 February, 2020;
originally announced February 2020.
-
CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases
Authors:
Xiang Ren,
Zeqiu Wu,
Wenqi He,
Meng Qu,
Clare R. Voss,
Heng Ji,
Tarek F. Abdelzaher,
Jiawei Han
Abstract:
Extracting entities and relations for types of interest from text is important for understanding massive text corpora. Traditionally, systems of entity relation extraction have relied on human-annotated corpora for training and adopted an incremental pipeline. Such systems require additional human expertise to be ported to a new domain, and are vulnerable to errors cascading down the pipeline. In…
▽ More
Extracting entities and relations for types of interest from text is important for understanding massive text corpora. Traditionally, systems of entity relation extraction have relied on human-annotated corpora for training and adopted an incremental pipeline. Such systems require additional human expertise to be ported to a new domain, and are vulnerable to errors cascading down the pipeline. In this paper, we investigate joint extraction of typed entities and relations with labeled data heuristically obtained from knowledge bases (i.e., distant supervision). As our algorithm for type labeling via distant supervision is context-agnostic, noisy training data poses unique challenges for the task. We propose a novel domain-independent framework, called CoType, that runs a data-driven text segmentation algorithm to extract entity mentions, and jointly embeds entity mentions, relation mentions, text features and type labels into two low-dimensional spaces (for entity and relation mentions respectively), where, in each space, objects whose types are close will also have similar representations. CoType, then using these learned embeddings, estimates the types of test (unlinkable) mentions. We formulate a joint optimization problem to learn embeddings from text corpora and knowledge bases, adopting a novel partial-label loss function for noisy labeled data and introducing an object "translation" function to capture the cross-constraints of entities and relations on each other. Experiments on three public datasets demonstrate the effectiveness of CoType across different domains (e.g., news, biomedical), with an average of 25% improvement in F1 score compared to the next best method.
△ Less
Submitted 2 June, 2017; v1 submitted 27 October, 2016;
originally announced October 2016.