Search | arXiv e-print repository

Transforming UNL graphs in OWL representations

Authors: David Rouquet, Valérie Bellynck, Christian Boitet, Vincent Berment

Abstract: Extracting formal knowledge (ontologies) from natural language is a challenge that can benefit from a (semi-) formal linguistic representation of texts, at the semantic level. We propose to achieve such a representation by implementing the Universal Networking Language (UNL) specifications on top of RDF. Thus, the meaning of a statement in any language will be soundly expressed as a RDF-UNL graph… ▽ More Extracting formal knowledge (ontologies) from natural language is a challenge that can benefit from a (semi-) formal linguistic representation of texts, at the semantic level. We propose to achieve such a representation by implementing the Universal Networking Language (UNL) specifications on top of RDF. Thus, the meaning of a statement in any language will be soundly expressed as a RDF-UNL graph that constitutes a middle ground between natural language and formal knowledge. In particular, we show that RDF-UNL graphs can support content extraction using generic SHACL rules and that reasoning on the extracted facts allows detecting incoherence in the original texts. This approach is experimented in the UNseL project that aims at extracting ontological representations from system requirements/specifications in order to check that they are consistent, complete and unambiguous. Our RDF-UNL implementation and all code for the working examples of this paper are publicly available under the CeCILL-B license at https://gitlab.tetras-libre.fr/unl/rdf-unl △ Less

Submitted 13 January, 2022; originally announced January 2022.

arXiv:1902.08061 [pdf]

Development of a classifiers/quantifiers dictionary towards French-Japanese MT

Authors: Mutsuko Tomokiyo, Mathieu Mangeot, Christian Boitet

Abstract: Although classifiers/quantifiers (CQs) expressions appear frequently in everyday communications or written documents, they are described neither in classical bilingual paper dictionaries , nor in machine-readable dictionaries. The paper describes a CQs dictionary, edited from the corpus we have annotated, and its usage in the framework of French-Japanese machine translation (MT). CQs treatment in… ▽ More Although classifiers/quantifiers (CQs) expressions appear frequently in everyday communications or written documents, they are described neither in classical bilingual paper dictionaries , nor in machine-readable dictionaries. The paper describes a CQs dictionary, edited from the corpus we have annotated, and its usage in the framework of French-Japanese machine translation (MT). CQs treatment in MT often causes problems of lexical ambiguity, polylexical phrase recognition difficulties in analysis and doubtful output in transfer-generation, in particular for distant languages pairs like French and Japanese. Our basic treatment of CQs is to annotate the corpus by UNL-UWs (Universal Networking Language-Universal words) 1 , and then to produce a bilingual or multilingual dictionary of CQs, based on synonymy through identity of UWs. △ Less

Submitted 21 February, 2019; originally announced February 2019.

Journal ref: MT Summit 2017, Sep 2017, Nagoya, Japan

arXiv:0811.0579 [pdf]

UNL-French deconversion as transfer & generation from an interlingua with possible quality enhancement through offline human interaction

Authors: Gilles sérasset, Christian Boitet

Abstract: We present the architecture of the UNL-French deconverter, which "generates" from the UNL interlingua by first"localizing" the UNL form for French, within UNL, and then applying slightly adapted but classical transfer and generation techniques, implemented in GETA's Ariane-G5 environment, supplemented by some UNL-specific tools. Online interaction can be used during deconversion to enhance outpu… ▽ More We present the architecture of the UNL-French deconverter, which "generates" from the UNL interlingua by first"localizing" the UNL form for French, within UNL, and then applying slightly adapted but classical transfer and generation techniques, implemented in GETA's Ariane-G5 environment, supplemented by some UNL-specific tools. Online interaction can be used during deconversion to enhance output quality and is now used for development purposes. We show how interaction could be delayed and embedded in the postedition phase, which would then interact not directly with the output text, but indirectly with several components of the deconverter. Interacting online or offline can improve the quality not only of the utterance at hand, but also of the utterances processed later, as various preferences may be automatically changed to let the deconverter "learn". △ Less

Submitted 4 November, 2008; originally announced November 2008.

Journal ref: MACHINE TRANSLATION SUMMIT VII, Singapour : Singapour (1999)

arXiv:cmp-lg/9411010 [pdf, ps, other]

The "Whiteboard" Architecture: a way to integrate heterogeneous components of NLP systems

Authors: Christian Boitet, Mark Seligman

Abstract: We present a new software architecture for NLP systems made of heterogeneous components, and demonstrate an architectural prototype we have built at ATR in the context of Speech Translation. We present a new software architecture for NLP systems made of heterogeneous components, and demonstrate an architectural prototype we have built at ATR in the context of Speech Translation. △ Less

Submitted 4 November, 1994; originally announced November 1994.

Comments: Postscript, 6 pages

Journal ref: COLING-94

Showing 1–4 of 4 results for author: Boitet, C