-
HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments
Authors:
Akari Asai,
Sara Evensen,
Behzad Golshan,
Alon Halevy,
Vivian Li,
Andrei Lopatenko,
Daniela Stepanov,
Yoshihiko Suhara,
Wang-Chiew Tan,
Yinzhan Xu
Abstract:
The science of happiness is an area of positive psychology concerned with understanding what behaviors make people happy in a sustainable fashion. Recently, there has been interest in develo** technologies that help incorporate the findings of the science of happiness into users' daily lives by steering them towards behaviors that increase happiness. With the goal of building technology that can…
▽ More
The science of happiness is an area of positive psychology concerned with understanding what behaviors make people happy in a sustainable fashion. Recently, there has been interest in develo** technologies that help incorporate the findings of the science of happiness into users' daily lives by steering them towards behaviors that increase happiness. With the goal of building technology that can understand how people express their happy moments in text, we crowd-sourced HappyDB, a corpus of 100,000 happy moments that we make publicly available. This paper describes HappyDB and its properties, and outlines several important NLP problems that can be studied with the help of the corpus. We also apply several state-of-the-art analysis techniques to analyze HappyDB. Our results demonstrate the need for deeper NLP techniques to be developed which makes HappyDB an exciting resource for follow-on research.
△ Less
Submitted 25 January, 2018; v1 submitted 23 January, 2018;
originally announced January 2018.
-
A Lightweight Front-end Tool for Interactive Entity Population
Authors:
Hidekazu Oiwa,
Yoshihiko Suhara,
Jiyu Komiya,
Andrei Lopatenko
Abstract:
Entity population, a task of collecting entities that belong to a particular category, has attracted attention from vertical domains. There is still a high demand for creating entity dictionaries in vertical domains, which are not covered by existing knowledge bases. We develop a lightweight front-end tool for facilitating interactive entity population. We implement key components necessary for ef…
▽ More
Entity population, a task of collecting entities that belong to a particular category, has attracted attention from vertical domains. There is still a high demand for creating entity dictionaries in vertical domains, which are not covered by existing knowledge bases. We develop a lightweight front-end tool for facilitating interactive entity population. We implement key components necessary for effective interactive entity population: 1) GUI-based dashboards to quickly modify an entity dictionary, and 2) entity highlighting on documents for quickly viewing the current progress. We aim to reduce user cost from beginning to end, including package installation and maintenance. The implementation enables users to use this tool on their web browsers without any additional packages --- users can focus on their missions to create entity dictionaries. Moreover, an entity expansion module is implemented as external APIs. This design makes it easy to continuously improve interactive entity population pipelines. We are making our demo publicly available (http://bit.ly/luwak-demo).
△ Less
Submitted 1 August, 2017;
originally announced August 2017.
-
Complexity of Consistent Query Answering in Databases under Cardinality-Based and Incremental Repair Semantics (extended version)
Authors:
Andrei Lopatenko,
Leopoldo Bertossi
Abstract:
A database D may be inconsistent wrt a given set IC of integrity constraints. Consistent Query Answering (CQA) is the problem of computing from D the answers to a query that are consistent wrt IC . Consistent answers are invariant under all the repairs of D, i.e. the consistent instances that minimally depart from D. Three classes of repair have been considered in the literature: those that minimi…
▽ More
A database D may be inconsistent wrt a given set IC of integrity constraints. Consistent Query Answering (CQA) is the problem of computing from D the answers to a query that are consistent wrt IC . Consistent answers are invariant under all the repairs of D, i.e. the consistent instances that minimally depart from D. Three classes of repair have been considered in the literature: those that minimize set-theoretically the set of tuples in the symmetric difference; those that minimize the changes of attribute values, and those that minimize the cardinality of the set of tuples in the symmetric difference. The latter class has not been systematically investigated. In this paper we obtain algorithmic and complexity theoretic results for CQA under this cardinality-based repair semantics. We do this in the usual, static setting, but also in a dynamic framework where a consistent database is affected by a sequence of updates, which may make it inconsistent. We also establish comparative results with the other two kinds of repairs in the dynamic case.
△ Less
Submitted 23 May, 2016;
originally announced May 2016.
-
Complexity of Consistent Query Answering in Databases under Cardinality-Based and Incremental Repair Semantics
Authors:
Andrei Lopatenko,
Leopoldo Bertossi
Abstract:
Consistent Query Answering (CQA) is the problem of computing from a database the answers to a query that are consistent with respect to certain integrity constraints that the database, as a whole, may fail to satisfy. Consistent answers have been characterized as those that are invariant under certain minimal forms of restoration of the database consistency. We investigate algorithmic and comple…
▽ More
Consistent Query Answering (CQA) is the problem of computing from a database the answers to a query that are consistent with respect to certain integrity constraints that the database, as a whole, may fail to satisfy. Consistent answers have been characterized as those that are invariant under certain minimal forms of restoration of the database consistency. We investigate algorithmic and complexity theoretic issues of CQA under database repairs that minimally depart -wrt the cardinality of the symmetric difference- from the original database. We obtain first tight complexity bounds.
We also address the problem of incremental complexity of CQA, that naturally occurs when an originally consistent database becomes inconsistent after the execution of a sequence of update operations. Tight bounds on incremental complexity are provided for various semantics under denial constraints. Fixed parameter tractability is also investigated in this dynamic context, where the size of the update sequence becomes the relevant parameter.
△ Less
Submitted 1 April, 2006;
originally announced April 2006.
-
Complexity and Approximation of Fixing Numerical Attributes in Databases Under Integrity Constraints
Authors:
L. Bertossi,
L. Bravo,
E. Franconi,
A. Lopatenko
Abstract:
Consistent query answering is the problem of computing the answers from a database that are consistent with respect to certain integrity constraints that the database as a whole may fail to satisfy. Those answers are characterized as those that are invariant under minimal forms of restoring the consistency of the database. In this context, we study the problem of repairing databases by fixing in…
▽ More
Consistent query answering is the problem of computing the answers from a database that are consistent with respect to certain integrity constraints that the database as a whole may fail to satisfy. Those answers are characterized as those that are invariant under minimal forms of restoring the consistency of the database. In this context, we study the problem of repairing databases by fixing integer numerical values at the attribute level with respect to denial and aggregation constraints. We introduce a quantitative definition of database fix, and investigate the complexity of several decision and optimization problems, including DFP, i.e. the existence of fixes within a given distance from the original instance, and CQA, i.e. deciding consistency of answers to aggregate conjunctive queries under different semantics. We provide sharp complexity bounds, identify relevant tractable cases; and introduce approximation algorithms for some of those that are intractable. More specifically, we obtain results like undecidability of existence of fixes for aggregation constraints; MAXSNP-hardness of DFP, but a good approximation algorithm for a relevant special case; and intractability but good approximation for CQA for aggregate queries for one database atom denials (plus built-ins).
△ Less
Submitted 28 October, 2005; v1 submitted 14 March, 2005;
originally announced March 2005.
-
A Robust and Computational Characterisation of Peer-to-Peer Database Systems
Authors:
Enrico Franconi,
Gabriel Kuper,
Andrei Lopatenko,
Luciano Serafini
Abstract:
In this paper we give a robust logical and computational characterisation of peer-to-peer database systems. We first define a pre- cise model-theoretic semantics of a peer-to-peer system, which allows for local inconsistency handling. We then characterise the general computa- tional properties for the problem of answering queries to such a peer-to- peer system. Finally, we devise tight complexit…
▽ More
In this paper we give a robust logical and computational characterisation of peer-to-peer database systems. We first define a pre- cise model-theoretic semantics of a peer-to-peer system, which allows for local inconsistency handling. We then characterise the general computa- tional properties for the problem of answering queries to such a peer-to- peer system. Finally, we devise tight complexity bounds and distributed procedures for the problem of answering queries in few relevant special cases.
△ Less
Submitted 6 August, 2003;
originally announced August 2003.
-
Information retrieval in Current Research Information Systems
Authors:
Andrei Lopatenko
Abstract:
In this paper we describe the requirements for research information systems and problems which arise in the development of such system. Here is shown which problems could be solved by using of knowledge markup technologies. Ontology for Research Information System offered. Architecture for collecting research data and providing access to it is described.
In this paper we describe the requirements for research information systems and problems which arise in the development of such system. Here is shown which problems could be solved by using of knowledge markup technologies. Ontology for Research Information System offered. Architecture for collecting research data and providing access to it is described.
△ Less
Submitted 10 October, 2001;
originally announced October 2001.
-
Semantic Web Content Accessibility Guidelines for Current Research Information Systems (CRIS)
Authors:
A. Lopatenko
Abstract:
The most exciting challenge for CRIS is to create a service for research information which should be wide-spread, distributed and actual like Google, but at the same time structured, trusted, with a complex search and navigation similar to today CRIS application. The core technology for such a "new" CRIS is the semantic web technology to integrate database contents with HTML and XML web pages fo…
▽ More
The most exciting challenge for CRIS is to create a service for research information which should be wide-spread, distributed and actual like Google, but at the same time structured, trusted, with a complex search and navigation similar to today CRIS application. The core technology for such a "new" CRIS is the semantic web technology to integrate database contents with HTML and XML web pages for being provided to the research interested public. One (at the moment the best) possible way is to use RDF (Resource Description Framework) which is also recommended by the W3 consortium.
△ Less
Submitted 29 July, 2001;
originally announced July 2001.