Open Domain Knowledge Extraction for Knowledge Graphs
Authors:
Kun Qian,
Anton Belyi,
Fei Wu,
Samira Khorshidi,
Azadeh Nikfarjam,
Rahul Khot,
Yisi Sang,
Katherine Luna,
Xianqi Chu,
Eric Choi,
Yash Govind,
Chloe Seivwright,
Yiwen Sun,
Ahmed Fakhry,
Theo Rekatsinas,
Ihab Ilyas,
Xiaoguang Qi,
Yunyao Li
Abstract:
The quality of a knowledge graph directly impacts the quality of downstream applications (e.g. the number of answerable questions using the graph). One ongoing challenge when building a knowledge graph is to ensure completeness and freshness of the graph's entities and facts. In this paper, we introduce ODKE, a scalable and extensible framework that sources high-quality entities and facts from ope…
▽ More
The quality of a knowledge graph directly impacts the quality of downstream applications (e.g. the number of answerable questions using the graph). One ongoing challenge when building a knowledge graph is to ensure completeness and freshness of the graph's entities and facts. In this paper, we introduce ODKE, a scalable and extensible framework that sources high-quality entities and facts from open web at scale. ODKE utilizes a wide range of extraction models and supports both streaming and batch processing at different latency. We reflect on the challenges and design decisions made and share lessons learned when building and deploying ODKE to grow an industry-scale open domain knowledge graph.
△ Less
Submitted 30 October, 2023;
originally announced December 2023.
Toward a System Building Agenda for Data Integration
Authors:
AnHai Doan,
Adel Ardalan,
Jeffrey R. Ballard,
Sanjib Das,
Yash Govind,
Pradap Konda,
Han Li,
Erik Paulson,
Paul Suganthan G. C.,
Haojun Zhang
Abstract:
In this paper we argue that the data management community should devote far more effort to building data integration (DI) systems, in order to truly advance the field. Toward this goal, we make three contributions. First, we draw on our recent industrial experience to discuss the limitations of current DI systems. Second, we propose an agenda to build a new kind of DI systems to address these limi…
▽ More
In this paper we argue that the data management community should devote far more effort to building data integration (DI) systems, in order to truly advance the field. Toward this goal, we make three contributions. First, we draw on our recent industrial experience to discuss the limitations of current DI systems. Second, we propose an agenda to build a new kind of DI systems to address these limitations. These systems guide users through the DI workflow, step by step. They provide tools to address the "pain points" of the steps, and tools are built on top of the Python data science and Big Data ecosystem (PyData). We discuss how to foster an ecosystem of such tools within PyData, then use it to build DI systems for collaborative/cloud/crowd/lay user settings. Finally, we discuss ongoing work at Wisconsin, which suggests that these DI systems are highly promising and building them raises many interesting research challenges.
△ Less
Submitted 29 September, 2017;
originally announced October 2017.