-
FAME-MT Dataset: Formality Awareness Made Easy for Machine Translation Purposes
Authors:
Dawid Wiśniewski,
Zofia Rostek,
Artur Nowakowski
Abstract:
People use language for various purposes. Apart from sharing information, individuals may use it to express emotions or to show respect for another person. In this paper, we focus on the formality level of machine-generated translations and present FAME-MT -- a dataset consisting of 11.2 million translations between 15 European source languages and 8 European target languages classified to formal…
▽ More
People use language for various purposes. Apart from sharing information, individuals may use it to express emotions or to show respect for another person. In this paper, we focus on the formality level of machine-generated translations and present FAME-MT -- a dataset consisting of 11.2 million translations between 15 European source languages and 8 European target languages classified to formal and informal classes according to target sentence formality. This dataset can be used to fine-tune machine translation models to ensure a given formality level for each European target language considered. We describe the dataset creation procedure, the analysis of the dataset's quality showing that FAME-MT is a reliable source of language register information, and we present a publicly available proof-of-concept machine translation model that uses the dataset to steer the formality level of the translation. Currently, it is the largest dataset of formality annotations, with examples expressed in 112 European language pairs. The dataset is published online: https://github.com/laniqo-public/fame-mt/ .
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
On a simple quartic family of Thue equations over imaginary quadratic number fields
Authors:
Benjamin Earp-Lynch,
Bernadette Faye,
Eva G. Goedhart,
Ingrid Vukusic,
Daniel P. Wisniewski
Abstract:
Let $t$ be any imaginary quadratic integer with $|t|\geq 100$. We prove that the inequality \[
|F_t(X,Y)|
= | X^4 - t X^3 Y - 6 X^2 Y^2 + t X Y^3 + Y^4 |
\leq 1 \] has only trivial solutions $(x,y)$ in integers of the same imaginary quadratic number field as $t$. Moreover, we prove results on the inequalities $|F_t(X,Y)| \leq C|t|$ and $|F_t(X,Y)| \leq |t|^{2 -\varepsilon}$. These results fo…
▽ More
Let $t$ be any imaginary quadratic integer with $|t|\geq 100$. We prove that the inequality \[
|F_t(X,Y)|
= | X^4 - t X^3 Y - 6 X^2 Y^2 + t X Y^3 + Y^4 |
\leq 1 \] has only trivial solutions $(x,y)$ in integers of the same imaginary quadratic number field as $t$. Moreover, we prove results on the inequalities $|F_t(X,Y)| \leq C|t|$ and $|F_t(X,Y)| \leq |t|^{2 -\varepsilon}$. These results follow from an approximation result that is based on the hypergeometric method. The proofs in this paper require a fair amount of computations, for which the code (in Sage) is provided.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
TASTEset -- Recipe Dataset and Food Entities Recognition Benchmark
Authors:
Ania Wróblewska,
Agnieszka Kaliska,
Maciej Pawłowski,
Dawid Wiśniewski,
Witold Sosnowski,
Agnieszka Ławrynowicz
Abstract:
Food Computing is currently a fast-growing field of research. Natural language processing (NLP) is also increasingly essential in this field, especially for recognising food entities. However, there are still only a few well-defined tasks that serve as benchmarks for solutions in this area. We introduce a new dataset -- called \textit{TASTEset} -- to bridge this gap. In this dataset, Named Entity…
▽ More
Food Computing is currently a fast-growing field of research. Natural language processing (NLP) is also increasingly essential in this field, especially for recognising food entities. However, there are still only a few well-defined tasks that serve as benchmarks for solutions in this area. We introduce a new dataset -- called \textit{TASTEset} -- to bridge this gap. In this dataset, Named Entity Recognition (NER) models are expected to find or infer various types of entities helpful in processing recipes, e.g.~food products, quantities and their units, names of cooking processes, physical quality of ingredients, their purpose, taste.
The dataset consists of 700 recipes with more than 13,000 entities to extract. We provide a few state-of-the-art baselines of named entity recognition models, which show that our dataset poses a solid challenge to existing models. The best model achieved, on average, 0.95 $F_1$ score, depending on the entity type -- from 0.781 to 0.982. We share the dataset and the task to encourage progress on more in-depth and complex information extraction from recipes.
△ Less
Submitted 16 April, 2022;
originally announced April 2022.
-
BigCQ: A large-scale synthetic dataset of competency question patterns formalized into SPARQL-OWL query templates
Authors:
Dawid Wiśniewski,
Jędrzej Potoniec,
Agnieszka Ławrynowicz
Abstract:
Competency Questions (CQs) are used in many ontology engineering methodologies to collect requirements and track the completeness and correctness of an ontology being constructed. Although they are frequently suggested by ontology engineering methodologies, the publicly available datasets of CQs and their formalizations in ontology query languages are very scarce. Since first efforts to automate p…
▽ More
Competency Questions (CQs) are used in many ontology engineering methodologies to collect requirements and track the completeness and correctness of an ontology being constructed. Although they are frequently suggested by ontology engineering methodologies, the publicly available datasets of CQs and their formalizations in ontology query languages are very scarce. Since first efforts to automate processes utilizing CQs are being made, it is of high importance to provide large and diverse datasets to fuel these solutions. In this paper, we present BigCQ, the biggest dataset of CQ templates with their formalizations into SPARQL-OWL query templates. BigCQ is created automatically from a dataset of frequently used axiom shapes. These pairs of CQ templates and query templates can be then materialized as actual CQs and SPARQL-OWL queries if filled with resource labels and IRIs from a given ontology. We describe the dataset in detail, provide a description of the process leading to the creation of the dataset and analyze how well the dataset covers real-world examples. We also publish the dataset as well as scripts transforming axiom shapes into pairs of CQ patterns and SPARQL-OWL templates, to make engineers able to adapt the process to their particular needs.
△ Less
Submitted 20 May, 2021;
originally announced May 2021.
-
Contract Discovery: Dataset and a Few-Shot Semantic Retrieval Challenge with Competitive Baselines
Authors:
Łukasz Borchmann,
Dawid Wiśniewski,
Andrzej Gretkowski,
Izabela Kosmala,
Dawid Jurkiewicz,
Łukasz Szałkiewicz,
Gabriela Pałka,
Karol Kaczmarek,
Agnieszka Kaliska,
Filip Graliński
Abstract:
We propose a new shared task of semantic retrieval from legal texts, in which a so-called contract discovery is to be performed, where legal clauses are extracted from documents, given a few examples of similar clauses from other legal acts. The task differs substantially from conventional NLI and shared tasks on legal information extraction (e.g., one has to identify text span instead of a single…
▽ More
We propose a new shared task of semantic retrieval from legal texts, in which a so-called contract discovery is to be performed, where legal clauses are extracted from documents, given a few examples of similar clauses from other legal acts. The task differs substantially from conventional NLI and shared tasks on legal information extraction (e.g., one has to identify text span instead of a single document, page, or paragraph). The specification of the proposed task is followed by an evaluation of multiple solutions within the unified framework proposed for this branch of methods. It is shown that state-of-the-art pretrained encoders fail to provide satisfactory results on the task proposed. In contrast, Language Model-based solutions perform better, especially when unsupervised fine-tuning is applied. Besides the ablation studies, we addressed questions regarding detection accuracy for relevant text fragments depending on the number of examples available. In addition to the dataset and reference results, LMs specialized in the legal domain were made publicly available.
△ Less
Submitted 8 October, 2020; v1 submitted 10 November, 2019;
originally announced November 2019.
-
Competency Questions and SPARQL-OWL Queries Dataset and Analysis
Authors:
Dawid Wisniewski,
Jedrzej Potoniec,
Agnieszka Lawrynowicz,
C. Maria Keet
Abstract:
Competency Questions (CQs) are natural language questions outlining and constraining the scope of knowledge represented by an ontology. Despite that CQs are a part of several ontology engineering methodologies, we have observed that the actual publication of CQs for the available ontologies is very limited and even scarcer is the publication of their respective formalisations in terms of, e.g., SP…
▽ More
Competency Questions (CQs) are natural language questions outlining and constraining the scope of knowledge represented by an ontology. Despite that CQs are a part of several ontology engineering methodologies, we have observed that the actual publication of CQs for the available ontologies is very limited and even scarcer is the publication of their respective formalisations in terms of, e.g., SPARQL queries. This paper aims to contribute to addressing the engineering shortcomings of using CQs in ontology development, to facilitate wider use of CQs. In order to understand the relation between CQs and the queries over the ontology to test the CQs on an ontology, we gather, analyse, and publicly release a set of 234 CQs and their translations to SPARQL-OWL for several ontologies in different domains developed by different groups. We analysed the CQs in two principal ways. The first stage focused on a linguistic analysis of the natural language text itself, i.e., a lexico-syntactic analysis without any presuppositions of ontology elements, and a subsequent step of semantic analysis in order to find patterns. This increased diversity of CQ sources resulted in a 5-fold increase of hitherto published patterns, to 106 distinct CQ patterns, which have a limited subset of few patterns shared across the CQ sets from the different ontologies. Next, we analysed the relation between the found CQ patterns and the 46 SPARQL-OWL query signatures, which revealed that one CQ pattern may be realised by more than one SPARQL-OWL query signature, and vice versa. We hope that our work will contribute to establishing common practices, templates, automation, and user tools that will support CQ formulation, formalisation, execution, and general management.
△ Less
Submitted 23 November, 2018;
originally announced November 2018.
-
Many-body kinetics of dynamic nuclear polarization by the cross effect
Authors:
Alexander Karabanov,
Daniel Wiśniewski,
Federica Raimondi,
Igor Lesanovsky,
Walter Köckenberger
Abstract:
Dynamic nuclear polarization (DNP) is an out-of-equilibrium method for generating non-thermal spin polarization which provides large signal enhancements in modern diagnostic methods based on nuclear magnetic resonance. A particular instance is cross effect DNP, which involves the interaction of two coupled electrons with the nuclear spin ensemble. Here we develop a theory for this important DNP me…
▽ More
Dynamic nuclear polarization (DNP) is an out-of-equilibrium method for generating non-thermal spin polarization which provides large signal enhancements in modern diagnostic methods based on nuclear magnetic resonance. A particular instance is cross effect DNP, which involves the interaction of two coupled electrons with the nuclear spin ensemble. Here we develop a theory for this important DNP mechanism and show that the non-equilibrium nuclear polarization build-up is effectively driven by three-body incoherent Markovian dissipative processes involving simultaneous state changes of two electrons and one nucleus. Our theoretical approach allows for the first time simulations of the polarization dynamics on an individual spin level for ensembles consisting of hundreds of nuclear spins. The insight obtained by these simulations can be used to find optimal experimental conditions for cross effect DNP and to design tailored radical systems that provide optimal DNP efficiency.
△ Less
Submitted 7 July, 2017;
originally announced July 2017.
-
Solid effect DNP polarization dynamics in a system of many spins
Authors:
Daniel Wiśniewski,
Alexander Karabanov,
Igor Lesanovsky,
Walter Köckenberger
Abstract:
We discuss the polarization dynamics during solid effect dynamic nuclear polarization (DNP) in a central spin model that consists of an electron surrounded by many nuclei. To this end we use a recently developed formalism and validate first its performance by comparing its predictions to results obtained by solving the Liouville von Neumann master equation. The use of a Monte Carlo method in our f…
▽ More
We discuss the polarization dynamics during solid effect dynamic nuclear polarization (DNP) in a central spin model that consists of an electron surrounded by many nuclei. To this end we use a recently developed formalism and validate first its performance by comparing its predictions to results obtained by solving the Liouville von Neumann master equation. The use of a Monte Carlo method in our formalism makes it possible to significantly increase the number of spins considered in the model system. We then analyse the dependence of the nuclear bulk polarization on the presence of nuclei in the vicinity of the electron and demonstrate that increasing the minimal distance between nuclei and electrons leads to a rise of the nuclear bulk polarization. These observations have implications for the design of radicals that can lead to impoved values of nuclear spin polarization. Furthermore, we discuss the potential to extend our formalism for more complex spin systems such as cross effect DNP.
△ Less
Submitted 19 January, 2016;
originally announced January 2016.
-
Dynamic nuclear polarization as kinetically constrained diffusion
Authors:
Alexander Karabanov,
Daniel Wisniewski,
Igor Lesanovsky,
Walter Köckenberger
Abstract:
Dynamic nuclear polarization (DNP) is a promising strategy for generating a significantly increased non-thermal spin polarization in nuclear magnetic resonance (NMR) applications thereby circumventing the need for strong magnetic fields. Although much explored in recent experiments, a detailed theoretical understanding of the precise mechanism behind DNP is so far lacking. We address this issue by…
▽ More
Dynamic nuclear polarization (DNP) is a promising strategy for generating a significantly increased non-thermal spin polarization in nuclear magnetic resonance (NMR) applications thereby circumventing the need for strong magnetic fields. Although much explored in recent experiments, a detailed theoretical understanding of the precise mechanism behind DNP is so far lacking. We address this issue by theoretically investigating solid effect DNP in a system where a single electron is coupled to an ensemble of interacting nuclei and which can be microscopically modelled by a quantum master equation. By deriving effective equations of motion that govern the polarization dynamics we show analytically that DNP can be understood as kinetically constrained spin diffusion. On the one hand this approach provides analytical insights into the mechanism and timescales underlying DNP. On the other hand it permits the numerical study of large ensembles which are typically intractable from the perspective of a quantum master equation. This paves the way for a detailed exploration of DNP dynamics which might form the basis for future NMR applications.
△ Less
Submitted 14 March, 2015;
originally announced March 2015.
-
The Structure of Pre-transitional Protoplanetary Disks. II. Azimuthal Asymmetries, Different Radial Distributions of Large and Small Dust Grains in PDS~70
Authors:
J. Hashimoto,
T. Tsukagoshi,
J. M. Brown,
R. Dong,
Mr. Takayuki Muto,
Dr. Zhaohuan Zhu,
Dr. John P. Wisniewski,
N. Ohashi,
T. kudo,
N. Kusakabe,
L. Abe,
E. Akiyama,
Wolfgang Brandner,
T. Brandt,
J. Carson,
Dr. Thayne Currie,
S. Egner,
M. Feldt,
C. A. Grady,
O. Guyon,
Y. Hayano,
M. Hayashi,
S. Hayashi,
Thomas Henning,
K. Hodapp
, et al. (32 additional authors not shown)
Abstract:
The formation scenario of a gapped disk, i.e., transitional disk, and its asymmetry is still under debate. Proposed scenarios such as disk-planet interaction, photoevaporation, grain growth, anticyclonic vortex, eccentricity, and their combinations would result in different radial distributions of the gas and the small (sub-$μ$m size) and large (millimeter size) dust grains as well as asymmetric s…
▽ More
The formation scenario of a gapped disk, i.e., transitional disk, and its asymmetry is still under debate. Proposed scenarios such as disk-planet interaction, photoevaporation, grain growth, anticyclonic vortex, eccentricity, and their combinations would result in different radial distributions of the gas and the small (sub-$μ$m size) and large (millimeter size) dust grains as well as asymmetric structures in a disk. Optical/near-infrared (NIR) imaging observations and (sub-)millimeter interferometry can trace small and large dust grains, respectively; therefore multi-wavelength observations could help elucidate the origin of complicated structures of a disk. Here we report SMA observations of the dust continuum at 1.3~mm and $^{12}$CO~$J=2\rightarrow1$ line emission of the pre-transitional protoplanetary disk around the solar-mass star PDS~70. PDS~70, a weak-lined T Tauri star, exhibits a gap in the scattered light from its disk with a radius of $\sim$65~AU at NIR wavelengths. However, we found a larger gap in the disk with a radius of $\sim$80~AU at 1.3~mm. Emission from all three disk components (the gas and the small and large dust grains) in images exhibits a deficit in brightness in the central region of the disk, in particular, the dust-disk in small and large dust grains has asymmetric brightness. The contrast ratio of the flux density in the dust continuum between the peak position to the opposite side of the disk reaches 1.4. We suggest the asymmetries and different gap-radii of the disk around PDS~70 are potentially formed by several (unseen) accreting planets inducing dust filtration.
△ Less
Submitted 13 November, 2014; v1 submitted 10 November, 2014;
originally announced November 2014.