or2yw: Modeling and Visualizing OpenRefineHistories as YesWorkflow Diagrams
Authors:
Nikolaus Nova Parulian,
Lan Li,
Bertram Ludaescher
Abstract:
OpenRefine is a popular open-source data cleaning tool. It allows users to export a previously executed data cleaning workflow in a JSON format for possible reuse on other datasets. We have developed or2yw, a novel tool that maps a JSON-formatted OpenRefine operation history to a YesWorkflow (YW) model, which then can be visualized and queried using the YW tool. The latter was originally developed…
▽ More
OpenRefine is a popular open-source data cleaning tool. It allows users to export a previously executed data cleaning workflow in a JSON format for possible reuse on other datasets. We have developed or2yw, a novel tool that maps a JSON-formatted OpenRefine operation history to a YesWorkflow (YW) model, which then can be visualized and queried using the YW tool. The latter was originally developed to allow researchers a simple way to annotate their program scripts in order to reveal the workflow steps and dataflow dependencies implicit in those scripts. With or2yw the user can automatically generate YW models from OpenRefine operation histories, thus providing a 'workflow view' on a previously executed sequence of data cleaning operations.
The or2yw tool can generate different types of YesWorkflow models, e.g., a linear model which mirrors the sequential execution order of operations in OpenRefine, and a \emph{parallel model} which reveals independent workflow branches, based on a simple analysis of dependencies between steps: if two operations are independent of each other (e.g., when the columns they read and write do not overlap) then these can be viewed as parallel steps in the data cleaning workflow. The resulting YW models can be understood as a form of prospective provenance, i.e., knowledge artifacts that can be queried and visualized (i) to help authors document their own data cleaning workflows, thereby increasing transparency, and (ii) to help other users, who might want to reuse such workflows, to understand them better.
△ Less
Submitted 15 December, 2021;
originally announced December 2021.
Fine-Grained Chemical Entity Ty** with Multimodal Knowledge Representation
Authors:
Chenkai Sun,
Weijiang Li,
**feng Xiao,
Nikolaus Nova Parulian,
ChengXiang Zhai,
Heng Ji
Abstract:
Automated knowledge discovery from trending chemical literature is essential for more efficient biomedical research. How to extract detailed knowledge about chemical reactions from the core chemistry literature is a new emerging challenge that has not been well studied. In this paper, we study the new problem of fine-grained chemical entity ty**, which poses interesting new challenges especially…
▽ More
Automated knowledge discovery from trending chemical literature is essential for more efficient biomedical research. How to extract detailed knowledge about chemical reactions from the core chemistry literature is a new emerging challenge that has not been well studied. In this paper, we study the new problem of fine-grained chemical entity ty**, which poses interesting new challenges especially because of the complex name mentions frequently occurring in chemistry literature and graphic representation of entities. We introduce a new benchmark data set (CHEMET) to facilitate the study of the new task and propose a novel multi-modal representation learning framework to solve the problem of fine-grained chemical entity ty** by leveraging external resources with chemical structures and using cross-modal attention to learn effective representation of text in the chemistry domain. Experiment results show that the proposed framework outperforms multiple state-of-the-art methods.
△ Less
Submitted 29 August, 2021;
originally announced August 2021.