Semi-automatic staging area for high-quality structured data extraction from scientific literature
Authors:
Luca Foppiano,
Tomoya Mato,
Kensei Terashima,
Pedro Ortiz Suarez,
Taku Tou,
Chikako Sakai,
Wei-Sheng Wang,
Toshiyuki Amagasa,
Yoshihiko Takano,
Masashi Ishii
Abstract:
We propose a semi-automatic staging area for efficiently building an accurate database of experimental physical properties of superconductors from literature, called SuperCon2, to enrich the existing manually-built superconductor database SuperCon. Here we report our curation interface (SuperCon2 Interface) and a workflow managing the state transitions of each examined record, to validate the data…
▽ More
We propose a semi-automatic staging area for efficiently building an accurate database of experimental physical properties of superconductors from literature, called SuperCon2, to enrich the existing manually-built superconductor database SuperCon. Here we report our curation interface (SuperCon2 Interface) and a workflow managing the state transitions of each examined record, to validate the dataset of superconductors from PDF documents collected using Grobid-superconductors in a previous work. This curation workflow allows both automatic and manual operations, the former contains ``anomaly detection'' that scans new data identifying outliers, and a ``training data collector'' mechanism that collects training data examples based on manual corrections. Such training data collection policy is effective in improving the machine-learning models with a reduced number of examples. For manual operations, the interface (SuperCon2 interface) is developed to increase efficiency during manual correction by providing a smart interface and an enhanced PDF document viewer. We show that our interface significantly improves the curation quality by boosting precision and recall as compared with the traditional ``manual correction''. Our semi-automatic approach would provide a solution for achieving a reliable database with text-data mining of scientific documents.
△ Less
Submitted 16 November, 2023; v1 submitted 19 September, 2023;
originally announced September 2023.
Enlightening the chemistry of infalling envelopes and accretion disks around Sun-like protostars: the ALMA FAUST project
Authors:
C. Codella,
C. Ceccarelli,
C. Chandler N. Sakai,
S. Yamamoto,
the FAUST team
Abstract:
The huge variety of planetary systems discovered in recent decades likely depends on the early history of their formation. In this contribution we introduce the FAUST Large Program, which focuses specifically on the early history of Solar-like protostars and their chemical diversity at scales of $\sim$ 50 au, where planets are expected to form. In particular, the goal of the project is to reveal a…
▽ More
The huge variety of planetary systems discovered in recent decades likely depends on the early history of their formation. In this contribution we introduce the FAUST Large Program, which focuses specifically on the early history of Solar-like protostars and their chemical diversity at scales of $\sim$ 50 au, where planets are expected to form. In particular, the goal of the project is to reveal and quantify the variety of chemical composition of the envelope/disk system at scales of 50 au in a sample of Class 0 and I protostars representative of the chemical diversity observed at larger scales. For each source, we propose a set of molecules able to: (1) disentangle the components of the 50-2000 au envelope/disk system; (2) characterise the organic complexity in each of them; (3) probe their ionization structure; (4) measure their molecular deuteration. The output will be a homogeneous database of thousands of images from different lines and species, i.e., an unprecedented source-survey of the chemical diversity of Solar-like protostars. FAUST will provide the community with a legacy dataset that will be a milestone for astrochemistry and star formation studies.
△ Less
Submitted 28 November, 2021;
originally announced November 2021.