-
Query Rewriting and Optimization for Ontological Databases
Authors:
Georg Gottlob,
Giorgio Orsi,
Andreas Pieris
Abstract:
Ontological queries are evaluated against a knowledge base consisting of an extensional database and an ontology (i.e., a set of logical assertions and constraints which derive new intensional knowledge from the extensional database), rather than directly on the extensional database. The evaluation and optimization of such queries is an intriguing new problem for database research. In this paper,…
▽ More
Ontological queries are evaluated against a knowledge base consisting of an extensional database and an ontology (i.e., a set of logical assertions and constraints which derive new intensional knowledge from the extensional database), rather than directly on the extensional database. The evaluation and optimization of such queries is an intriguing new problem for database research. In this paper, we discuss two important aspects of this problem: query rewriting and query optimization. Query rewriting consists of the compilation of an ontological query into an equivalent first-order query against the underlying extensional database. We present a novel query rewriting algorithm for rather general types of ontological constraints which is well-suited for practical implementations. In particular, we show how a conjunctive query against a knowledge base, expressed using linear and sticky existential rules, that is, members of the recently introduced Datalog+/- family of ontology languages, can be compiled into a union of conjunctive queries (UCQ) against the underlying database. Ontological query optimization, in this context, attempts to improve this rewriting process so to produce possibly small and cost-effective UCQ rewritings for an input query.
△ Less
Submitted 12 May, 2014;
originally announced May 2014.
-
AMBER: Automatic Supervision for Multi-Attribute Extraction
Authors:
Tim Furche,
Georg Gottlob,
Giovanni Grasso,
Giorgio Orsi,
Christian Schallhart,
Cheng Wang
Abstract:
The extraction of multi-attribute objects from the deep web is the bridge between the unstructured web and structured data. Existing approaches either induce wrappers from a set of human-annotated pages or leverage repeated structures on the page without supervision. What the former lack in automation, the latter lack in accuracy. Thus accurate, automatic multi-attribute object extraction has rema…
▽ More
The extraction of multi-attribute objects from the deep web is the bridge between the unstructured web and structured data. Existing approaches either induce wrappers from a set of human-annotated pages or leverage repeated structures on the page without supervision. What the former lack in automation, the latter lack in accuracy. Thus accurate, automatic multi-attribute object extraction has remained an open challenge.
AMBER overcomes both limitations through mutual supervision between the repeated structure and automatically produced annotations. Previous approaches based on automatic annotations have suffered from low quality due to the inherent noise in the annotations and have attempted to compensate by exploring multiple candidate wrappers. In contrast, AMBER compensates for this noise by integrating repeated structure analysis with annotation-based induction: The repeated structure limits the search space for wrapper induction, and conversely, annotations allow the repeated structure analysis to distinguish noise from relevant data. Both, low recall and low precision in the annotations are mitigated to achieve almost human quality (more than 98 percent) multi-attribute object extraction.
To achieve this accuracy, AMBER needs to be trained once for an entire domain. AMBER bootstraps its training from a small, possibly noisy set of attribute instances and a few unannotated sites of the domain.
△ Less
Submitted 22 October, 2012;
originally announced October 2012.
-
The Ontological Key: Automatically Understanding and Integrating Forms to Access the Deep Web
Authors:
Tim Furche,
Georg Gottlob,
Giovanni Grasso,
Xiaonan Guo,
Giorgio Orsi,
Christian Schallhart
Abstract:
Forms are our gates to the web. They enable us to access the deep content of web sites. Automatic form understanding provides applications, ranging from crawlers over meta-search engines to service integrators, with a key to this content. Yet, it has received little attention other than as component in specific applications such as crawlers or meta-search engines. No comprehensive approach to form…
▽ More
Forms are our gates to the web. They enable us to access the deep content of web sites. Automatic form understanding provides applications, ranging from crawlers over meta-search engines to service integrators, with a key to this content. Yet, it has received little attention other than as component in specific applications such as crawlers or meta-search engines. No comprehensive approach to form understanding exists, let alone one that produces rich models for semantic services or integration with linked open data.
In this paper, we present OPAL, the first comprehensive approach to form understanding and integration. We identify form labeling and form interpretation as the two main tasks involved in form understanding. On both problems OPAL pushes the state of the art: For form labeling, it combines features from the text, structure, and visual rendering of a web page. In extensive experiments on the ICQ and TEL-8 benchmarks and a set of 200 modern web forms OPAL outperforms previous approaches for form labeling by a significant margin. For form interpretation, OPAL uses a schema (or ontology) of forms in a given domain. Thanks to this domain schema, it is able to produce nearly perfect (more than 97 percent accuracy in the evaluation domains) form interpretations. Yet, the effort to produce a domain schema is very low, as we provide a Datalog-based template language that eases the specification of such schemata and a methodology for deriving a domain schema largely automatically from an existing domain ontology. We demonstrate the value of the form interpretations in OPAL through a light-weight form integration system that successfully translates and distributes master queries to hundreds of forms with no error, yet is implemented with only a handful translation rules.
△ Less
Submitted 22 October, 2012;
originally announced October 2012.
-
Heuristic Ranking in Tightly Coupled Probabilistic Description Logics
Authors:
Thomas Lukasiewicz,
Maria Vanina Martinez,
Giorgio Orsi,
Gerardo I. Simari
Abstract:
The Semantic Web effort has steadily been gaining traction in the recent years. In particular,Web search companies are recently realizing that their products need to evolve towards having richer semantic search capabilities. Description logics (DLs) have been adopted as the formal underpinnings for Semantic Web languages used in describing ontologies. Reasoning under uncertainty has recently taken…
▽ More
The Semantic Web effort has steadily been gaining traction in the recent years. In particular,Web search companies are recently realizing that their products need to evolve towards having richer semantic search capabilities. Description logics (DLs) have been adopted as the formal underpinnings for Semantic Web languages used in describing ontologies. Reasoning under uncertainty has recently taken a leading role in this arena, given the nature of data found on theWeb. In this paper, we present a probabilistic extension of the DL EL++ (which underlies the OWL2 EL profile) using Markov logic networks (MLNs) as probabilistic semantics. This extension is tightly coupled, meaning that probabilistic annotations in formulas can refer to objects in the ontology. We show that, even though the tightly coupled nature of our language means that many basic operations are data-intractable, we can leverage a sublanguage of MLNs that allows to rank the atomic consequences of an ontology relative to their probability values (called ranking queries) even when these values are not fully computed. We present an anytime algorithm to answer ranking queries, and provide an upper bound on the error that it incurs, as well as a criterion to decide when results are guaranteed to be correct.
△ Less
Submitted 16 October, 2012;
originally announced October 2012.
-
Ontological Queries: Rewriting and Optimization (Extended Version)
Authors:
Georg Gottlob,
Giorgio Orsi,
Andreas Pieris
Abstract:
Ontological queries are evaluated against an ontology rather than directly on a database. The evaluation and optimization of such queries is an intriguing new problem for database research.
In this paper we discuss two important aspects of this problem: query rewriting and query optimization. Query rewriting consists of the compilation of an ontological query into an equivalent query against the…
▽ More
Ontological queries are evaluated against an ontology rather than directly on a database. The evaluation and optimization of such queries is an intriguing new problem for database research.
In this paper we discuss two important aspects of this problem: query rewriting and query optimization. Query rewriting consists of the compilation of an ontological query into an equivalent query against the underlying relational database. The focus here is on soundness and completeness. We review previous results and present a new rewriting algorithm for rather general types of ontological constraints.
In particular, we show how a conjunctive query against an ontology can be compiled into a union of conjunctive queries against the underlying database. Ontological query optimization, in this context, attempts to improve this process so to produce possibly small and cost-effective UCQ rewritings for an input query. We review existing optimization methods, and propose an effective new method that works for linear Datalog+/-, a class of Datalog-based rules that encompasses well-known description logics of the DL-Lite family.
△ Less
Submitted 1 December, 2011;
originally announced December 2011.
-
Planck-LFI: Design and Performance of the 4 Kelvin Reference Load Unit
Authors:
Luca Valenziano,
Francesco Cuttaia,
Adriano De Rosa,
Luca Terenzi,
Alberto Brighenti,
GianPaolo Cazzola,
Anna Garbesi,
Sergio Mariotti,
Giordano Orsi,
Luca Pagan,
Francesco Cavaliere,
Roberto Lapini,
Matteo Biggi,
Enzo Panagin,
Battaglia Paola,
Chris Butler,
Marco Bersanelli,
Ocleto D'Arcangelo,
Steve Levin,
Nazzareno Mandolesi,
Aniello Mennella,
Gianluca Morgante,
Gabriele Morigi,
Maura Sandri,
Alessandro Simonetto
, et al. (13 additional authors not shown)
Abstract:
The LFI radiometers use a pseudo-correlation design where the signal from the sky is continuously compared with a stable reference signal, provided by a cryogenic reference load system. The reference unit is composed by small pyramidal horns, one for each radiometer, 22 in total, facing small absorbing targets, made of a commercial resin ECCOSORB CR (TM), cooled to approximately 4.5 K. Horns and…
▽ More
The LFI radiometers use a pseudo-correlation design where the signal from the sky is continuously compared with a stable reference signal, provided by a cryogenic reference load system. The reference unit is composed by small pyramidal horns, one for each radiometer, 22 in total, facing small absorbing targets, made of a commercial resin ECCOSORB CR (TM), cooled to approximately 4.5 K. Horns and targets are separated by a small gap to allow thermal decoupling. Target and horn design is optimized for each of the LFI bands, centered at 70, 44 and 30 GHz. Pyramidal horns are either machined inside the radiometer 20K module or connected via external electro-formed bended waveguides. The requirement of high stability of the reference signal imposed a careful design for the radiometric and thermal properties of the loads. Materials used for the manufacturing have been characterized for thermal, RF and mechanical properties. We describe in this paper the design and the performance of the reference system.
△ Less
Submitted 26 January, 2010;
originally announced January 2010.