Skip to main content

Showing 1–44 of 44 results for author: Romary, L

.
  1. arXiv:2403.16609  [pdf, other

    cs.CL

    Conversational Grounding: Annotation and Analysis of Grounding Acts and Grounding Units

    Authors: Biswesh Mohapatra, Seemab Hassan, Laurent Romary, Justine Cassell

    Abstract: Successful conversations often rest on common understanding, where all parties are on the same page about the information being shared. This process, known as conversational grounding, is crucial for building trustworthy dialog systems that can accurately keep track of and recall the shared information. The proficiencies of an agent in grounding the conveyed information significantly contribute to… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Journal ref: LREC-COLING 2024

  2. arXiv:2306.15550  [pdf, other

    cs.CL cs.AI

    CamemBERT-bio: Leveraging Continual Pre-training for Cost-Effective Models on French Biomedical Data

    Authors: Rian Touchent, Laurent Romary, Eric de la Clergerie

    Abstract: Clinical data in hospitals are increasingly accessible for research through clinical data warehouses. However these documents are unstructured and it is therefore necessary to extract information from medical reports to conduct clinical studies. Transfer learning with BERT-like models such as CamemBERT has allowed major advances for French, especially for named entity recognition. However, these m… ▽ More

    Submitted 3 April, 2024; v1 submitted 27 June, 2023; originally announced June 2023.

    Comments: Accepted to LREC-COLING 2024

  3. arXiv:2201.06642  [pdf, other

    cs.CL

    Towards a Cleaner Document-Oriented Multilingual Crawled Corpus

    Authors: Julien Abadji, Pedro Ortiz Suarez, Laurent Romary, Benoît Sagot

    Abstract: The need for raw large raw corpora has dramatically increased in recent years with the introduction of transfer learning and semi-supervised learning methods to Natural Language Processing. And while there have been some recent attempts to manually curate the amount of data necessary to train large language models, the main way to obtain this data is still through automatic web crawling. In this p… ▽ More

    Submitted 17 January, 2022; originally announced January 2022.

    Comments: 12 pages, 6 figures, 2 tables

  4. SuperMat: Construction of a linked annotated dataset from superconductors-related publications

    Authors: Luca Foppiano, Sae Dieb, Akira Suzuki, Pedro Baptista de Castro, Suguru Iwasaki, Azusa Uzuki, Miren Garbine Esparza Echevarria, Yan Meng, Kensei Terashima, Laurent Romary, Yoshihiko Takano, Masashi Ishii

    Abstract: A growing number of papers are published in the area of superconducting materials science. However, novel text and data mining (TDM) processes are still needed to efficiently access and exploit this accumulated knowledge, paving the way towards data-driven materials design. Herein, we present SuperMat (Superconductor Materials), an annotated corpus of linked data derived from scientific publicatio… ▽ More

    Submitted 15 April, 2021; v1 submitted 7 January, 2021; originally announced January 2021.

    Journal ref: STAM:M, 2021, VOL. 1, NO. 1, 34-44

  5. A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages

    Authors: Pedro Javier Ortiz Suárez, Laurent Romary, Benoît Sagot

    Abstract: We use the multilingual OSCAR corpus, extracted from Common Crawl via language classification, filtering and cleaning, to train monolingual contextualized word embeddings (ELMo) for five mid-resource languages. We then compare the performance of OSCAR-based and Wikipedia-based ELMo embeddings for these languages on the part-of-speech tagging and parsing tasks. We show that, despite the noise in th… ▽ More

    Submitted 18 June, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

    Journal ref: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, July 2020, Online

  6. arXiv:2005.13236  [pdf, ps, other

    cs.CL

    Establishing a New State-of-the-Art for French Named Entity Recognition

    Authors: Pedro Javier Ortiz Suárez, Yoann Dupont, Benjamin Muller, Laurent Romary, Benoît Sagot

    Abstract: The French TreeBank developed at the University Paris 7 is the main source of morphosyntactic and syntactic annotations for French. However, it does not include explicit information related to named entities, which are among the most useful information for several natural language processing tasks and applications. Moreover, no large-scale French corpus with named entity annotations contain refere… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

    Journal ref: LREC 2020 - 12th Language Resources and Evaluation Conference, May 2020, Marseille, France

  7. CamemBERT: a Tasty French Language Model

    Authors: Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah, Benoît Sagot

    Abstract: Pretrained language models are now ubiquitous in Natural Language Processing. Despite their success, most available models have either been trained on English data or on the concatenation of data in multiple languages. This makes practical use of such models --in all languages except English-- very limited. In this paper, we investigate the feasibility of training monolingual Transformer-based lan… ▽ More

    Submitted 21 May, 2020; v1 submitted 10 November, 2019; originally announced November 2019.

    Comments: ACL 2020 long paper. Web site: https://camembert-model.fr

    Journal ref: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, July 2020, Online

  8. arXiv:1906.02136  [pdf

    cs.CL

    LMF Reloaded

    Authors: Laurent Romary, Mohamed Khemakhem, Fahad Khan, Jack Bowers, Nicoletta Calzolari, Monte George, Mandy Pet, Piotr Bański

    Abstract: Lexical Markup Framework (LMF) or ISO 24613 [1] is a de jure standard that provides a framework for modelling and encoding lexical information in retrodigitised print dictionaries and NLP lexical databases. An in-depth review is currently underway within the standardisation subcommittee , ISO-TC37/SC4/WG4, to find a more modular, flexible and durable follow up to the original LMF standard publish… ▽ More

    Submitted 23 May, 2019; originally announced June 2019.

    Comments: AsiaLex 2019: Past, Present and Future, Jun 2019, Istanbul, Turkey

  9. arXiv:1611.10122  [pdf

    cs.CL

    Deep encoding of etymological information in TEI

    Authors: Jack Bowers, Laurent Romary

    Abstract: This paper aims to provide a comprehensive modeling and representation of etymological data in digital dictionaries. The purpose is to integrate in one coherent framework both digital representations of legacy dictionaries, and also born-digital lexical databases that are constructed manually or semi-automatically. We want to propose a systematic and coherent set of modeling principles for a varie… ▽ More

    Submitted 30 November, 2016; originally announced November 2016.

  10. arXiv:1603.03170  [pdf

    cs.CY cs.CL cs.DL

    Data fluidity in DARIAH -- pushing the agenda forward

    Authors: Laurent Romary, Mike Mertens, Anne Baillot

    Abstract: This paper provides both an update concerning the setting up of the European DARIAH infrastructure and a series of strong action lines related to the development of a data centred strategy for the humanities in the coming years. In particular we tackle various aspect of data management: data hosting, the setting up of a DARIAH seal of approval, the establishment of a charter between cultural herit… ▽ More

    Submitted 24 March, 2016; v1 submitted 10 March, 2016; originally announced March 2016.

    Journal ref: BIBLIOTHEK Forschung und Praxis, De Gruyter, 2016, 39 (3), pp.350-357

  11. arXiv:1601.00533  [pdf

    cs.OH

    Crowds for Clouds: Recent Trends in Humanities Research Infrastructures

    Authors: Tobias Blanke, Conny Kristel, Laurent Romary

    Abstract: Humanities have convincingly argued that they need transnational research opportunities and through the digital transformation of their disciplines also have the means to proceed with it on an up to now unknown scale. The digital transformation of research and its resources means that many of the artifacts, documents, materials, etc. that interest humanities research can now be combined in new and… ▽ More

    Submitted 27 December, 2015; originally announced January 2016.

    Journal ref: Agiati Benardou, Erik Champion, Costis Dallas, Lorna Hughes Cultural Heritage Digital Tools and Infrastructures, 2016, 978-1-4724-4712-8

  12. arXiv:1510.07851  [pdf

    cs.CL

    Standards for language resources in ISO -- Looking back at 13 fruitful years

    Authors: Laurent Romary

    Abstract: This paper provides an overview of the various projects carried out within ISO committee TC 37/SC 4 dealing with the management of language (digital) resources. On the basis of the technical experience gained in the committee and the wider standardization landscape the paper identifies some possible trends for the future.

    Submitted 27 October, 2015; originally announced October 2015.

    Comments: edition - die Terminologiefachzeitschrift, Deutscher Terminologie-Tag e.V. (DTT), 2015

  13. arXiv:1405.3925  [pdf

    cs.CL

    Méthodes pour la représentation informatisée de données lexicales / Methoden der Speicherung lexikalischer Daten

    Authors: Laurent Romary, Andreas Witt

    Abstract: In recent years, new developments in the area of lexicography have altered not only the management, processing and publishing of lexicographical data, but also created new types of products such as electronic dictionaries and thesauri. These expand the range of possible uses of lexical data and support users with more flexibility, for instance in assisting human translation. In this article, we gi… ▽ More

    Submitted 15 May, 2014; originally announced May 2014.

    Comments: This text comprises both a French and a German version

    Journal ref: Lexicographica 30 (2014)

  14. arXiv:1403.0052  [pdf

    cs.CL

    TBX goes TEI -- Implementing a TBX basic extension for the Text Encoding Initiative guidelines

    Authors: Laurent Romary

    Abstract: This paper presents an attempt to customise the TEI (Text Encoding Initiative) guidelines in order to offer the possibility to incorporate TBX (TermBase eXchange) based terminological entries within any kind of TEI documents. After presenting the general historical, conceptual and technical contexts, we describe the various design choices we had to take while creating this customisation, which in… ▽ More

    Submitted 1 March, 2014; originally announced March 2014.

  15. arXiv:1301.2444  [pdf

    cs.CL

    TEI and LMF crosswalks

    Authors: Laurent Romary

    Abstract: The present paper explores various arguments in favour of making the Text Encoding Initia-tive (TEI) guidelines an appropriate serialisation for ISO standard 24613:2008 (LMF, Lexi-cal Mark-up Framework) . It also identifies the issues that would have to be resolved in order to reach an appropriate implementation of these ideas, in particular in terms of infor-mational coverage. We show how the cus… ▽ More

    Submitted 28 January, 2016; v1 submitted 11 January, 2013; originally announced January 2013.

    Journal ref: JLCL - Journal for Language Technology and Computational Linguistics, 2015, 30 (1)

  16. arXiv:1207.5328  [pdf

    cs.CL

    A prototype for projecting HPSG syntactic lexica towards LMF

    Authors: Kais Haddar, Héla Fehri, Laurent Romary

    Abstract: The comparative evaluation of Arabic HPSG grammar lexica requires a deep study of their linguistic coverage. The complexity of this task results mainly from the heterogeneity of the descriptive components within those lexica (underlying linguistic resources and different data categories, for example). It is therefore essential to define more homogeneous representations, which in turn will enable u… ▽ More

    Submitted 31 August, 2012; v1 submitted 23 July, 2012; originally announced July 2012.

    Journal ref: Journal of Language Technology and Computational Linguistics 27, 1 (2012) 21-46

  17. arXiv:1110.1758  [pdf

    cs.CL

    Data formats for phonological corpora

    Authors: Laurent Romary, Andreas Witt

    Abstract: The goal of the present chapter is to explore the possibility of providing the research (but also the industrial) community that commonly uses spoken corpora with a stable portfolio of well-documented standardised formats that allow a high re-use rate of annotated spoken resources and, as a consequence, better interoperability across tools used to produce or exploit such resources.

    Submitted 4 March, 2012; v1 submitted 8 October, 2011; originally announced October 2011.

    Comments: Handbook of Corpus Phonology Oxford University Press (Ed.) (2012)

  18. arXiv:1108.0631  [pdf

    cs.CL

    Serialising the ISO SynAF Syntactic Object Model

    Authors: Laurent Romary, Amir Zeldes, Florian Zipser

    Abstract: This paper introduces, an XML format developed to serialise the object model defined by the ISO Syntactic Annotation Framework SynAF. Based on widespread best practices we adapt a popular XML format for syntactic annotation, TigerXML, with additional features to support a variety of syntactic phenomena including constituent and dependency structures, binding, and different node types such as compo… ▽ More

    Submitted 15 September, 2014; v1 submitted 2 August, 2011; originally announced August 2011.

  19. arXiv:1105.3287  [pdf

    cs.DL

    Scholarly Communication

    Authors: Laurent Romary

    Abstract: The chapter tackles the role of scholarly publication in the research process (quality, preservation) and looks at the consequences of new information technologies in the organization of the scholarly communication ecology. It will then show how new technologies have had an impact on the scholarly communication process and made it depart from the traditional publishing environment. Developments wi… ▽ More

    Submitted 17 May, 2011; originally announced May 2011.

    Comments: To appear in Mehler, Romary, Gibbon (eds), Technical Communication, M. de Gruyter, Berlin (2011)

  20. arXiv:1011.0519  [pdf

    cs.CL

    Stabilizing knowledge through standards - A perspective for the humanities

    Authors: Laurent Romary

    Abstract: It is usual to consider that standards generate mixed feelings among scientists. They are often seen as not really reflecting the state of the art in a given domain and a hindrance to scientific creativity. Still, scientists should theoretically be at the best place to bring their expertise into standard developments, being even more neutral on issues that may typically be related to competing ind… ▽ More

    Submitted 2 November, 2010; originally announced November 2010.

    Journal ref: Going Digital: Evolutionary and Revolutionary Aspects of Digitization, Karl Grandin (Ed.) (2011)

  21. arXiv:1005.0839  [pdf

    cs.DL

    Comparing Repository Types - Challenges and barriers for subject-based repositories, research repositories, national repository systems and institutional repositories in serving scholarly communication

    Authors: Chris Armbruster, Laurent Romary

    Abstract: After two decades of repository development, some conclusions may be drawn as to which type of repository and what kind of service best supports digital scholarly communication, and thus the production of new knowledge. Four types of publication repository may be distinguished, namely the subject-based repository, research repository, national repository system and institutional repository. Two im… ▽ More

    Submitted 5 May, 2010; originally announced May 2010.

    Journal ref: International Journal of Digital Library Systems 1, 4 (2010) 61-73

  22. arXiv:1003.4187  [pdf

    cs.DL

    Comparing Repository Types - Challenges and barriers for subject-based repositories, research repositories, national repository systems and institutional repositories in serving scholarly communication

    Authors: Chris Armbruster, Laurent Romary

    Abstract: After two decades of repository development, some conclusions may be drawn as to which type of repository and what kind of service best supports digital scholarly communication, and thus the production of new knowledge. Four types of publication repository may be distinguished, namely the subject-based repository, research repository, national repository system and institutional repository. Two im… ▽ More

    Submitted 22 March, 2010; originally announced March 2010.

  23. arXiv:0912.2881  [pdf

    cs.CL

    Representing human and machine dictionaries in Markup languages

    Authors: Lothar Lemnitzer, Laurent Romary, Andreas Witt

    Abstract: In this chapter we present the main issues in representing machine readable dictionaries in XML, and in particular according to the Text Encoding Dictionary (TEI) guidelines.

    Submitted 16 December, 2009; v1 submitted 15 December, 2009; originally announced December 2009.

    Journal ref: Dictionaries. An International Encyclopedia of Lexicography. Supplementary volume: Recent developments with special focus on computational lexicography, Ulrich Heid (Ed.) (2010)

  24. arXiv:0911.5116  [pdf

    cs.CL

    Standardization of the formal representation of lexical information for NLP

    Authors: Laurent Romary

    Abstract: A survey of dictionary models and formats is presented as well as a presentation of corresponding recent standardisation activities.

    Submitted 26 November, 2009; originally announced November 2009.

    Journal ref: Dictionarie. An International Encyclopedia of Lexicography. Supplementary volume: Recent developments with special focus on computational lexicography (2010) -

  25. arXiv:0911.1842  [pdf

    cs.CL

    Standards for Language Resources

    Authors: Nancy Ide, Laurent Romary

    Abstract: The goal of this paper is two-fold: to present an abstract data model for linguistic annotations and its implementation using XML, RDF and related standards; and to outline the work of a newly formed committee of the International Standards Organization (ISO), ISO/TC 37/SC 4 Language Resource Management, which will use this work as its starting point.

    Submitted 10 November, 2009; originally announced November 2009.

    Comments: Colloque avec actes et comité de lecture. internationale

    Report number: A01-R-287 || ide01b

    Journal ref: IRCS Workshop on Linguistic Databases, Philadelphia : United States (2001)

  26. arXiv:0910.2632  [pdf

    cs.DL

    Communication scientifique : Pour le meilleur et pour le PEER

    Authors: Laurent Romary

    Abstract: This paper provides an overview (in French) of the European PEER project, focusing on its origins, the actual objectives and the technical deployment.

    Submitted 14 October, 2009; originally announced October 2009.

    Journal ref: Hermes (2009)

  27. arXiv:0909.4280  [pdf

    cs.CL

    Towards Multimodal Content Representation

    Authors: Harry Bunt, Laurent Romary

    Abstract: Multimodal interfaces, combining the use of speech, graphics, gestures, and facial expressions in input and output, promise to provide new possibilities to deal with information in more effective and efficient ways, supporting for instance: - the understanding of possibly imprecise, partial or ambiguous multimodal input; - the generation of coordinated, cohesive, and coherent multimodal presenta… ▽ More

    Submitted 23 September, 2009; originally announced September 2009.

    Comments: Colloque avec actes et comité de lecture. internationale

    Report number: A02-R-095 || bunt02a

    Journal ref: LREC Workshop on International Standards of Terminology and Language Resources Management, Las Palams : Spain (2002)

  28. arXiv:0909.2721  [pdf

    cs.OH

    Dynamically Generated Interfaces in XML Based Architecture

    Authors: Minit Gupta, Laurent Romary

    Abstract: Providing on-line services on the Internet will require the definition of flexible interfaces that are capable of adapting to the user's characteristics. This is all the more important in the context of medical applications like home monitoring, where no two patients have the same medical profile. Still, the problem is not limited to the capacity of defining generic interfaces, as has been made… ▽ More

    Submitted 15 September, 2009; originally announced September 2009.

    Comments: Colloque avec actes et comité de lecture. internationale

    Report number: A01-R-293 || gupta01a

    Journal ref: User Interface Markup Language - UIMl'2001, Paris : France (2001)

  29. arXiv:0909.2719  [pdf

    cs.CL

    Standards for Language Resources

    Authors: Nancy Ide, Laurent Romary

    Abstract: This paper presents an abstract data model for linguistic annotations and its implementation using XML, RDF and related standards; and to outline the work of a newly formed committee of the International Standards Organization (ISO), ISO/TC 37/SC 4 Language Resource Management, which will use this work as its starting point. The primary motive for presenting the latter is to solicit the particip… ▽ More

    Submitted 15 September, 2009; originally announced September 2009.

    Comments: Colloque avec actes et comité de lecture. internationale

    Report number: A02-R-096 || ide02a

    Journal ref: Third International Conference on Language Resources and Evaluation - LREC 2002, Las Palmas, Spain : France (2002)

  30. arXiv:0909.2718  [pdf

    cs.CL

    A Common XML-based Framework for Syntactic Annotations

    Authors: Nancy Ide, Laurent Romary, Tomaz Erjavec

    Abstract: It is widely recognized that the proliferation of annotation schemes runs counter to the need to re-use language resources, and that standards for linguistic annotation are becoming increasingly mandatory. To answer this need, we have developed a framework comprised of an abstract model for a variety of different annotation types (e.g., morpho-syntactic tagging, syntactic annotation, co-referenc… ▽ More

    Submitted 15 September, 2009; originally announced September 2009.

    Comments: Colloque avec actes et comité de lecture. internationale

    Report number: A01-R-289 || ide01d

    Journal ref: 1st NLP and XML Workshop, Tokyo, Japan : Japan (2001)

  31. arXiv:0909.2715  [pdf

    cs.CL

    Marking-up multiple views of a Text: Discourse and Reference

    Authors: Dan Cristea, Nancy Ide, Laurent Romary

    Abstract: We describe an encoding scheme for discourse structure and reference, based on the TEI Guidelines and the recommendations of the Corpus Encoding Specification (CES). A central feature of the scheme is a CES-based data architecture enabling the encoding of and access to multiple views of a marked-up document. We describe a tool architecture that supports the encoding scheme, and then show how we… ▽ More

    Submitted 15 September, 2009; originally announced September 2009.

    Journal ref: First International Language Resources and Evaluation Conference, Grenada, Espagne : France (1998)

  32. arXiv:0909.2626  [pdf

    cs.CL

    Reference Resolution within the Framework of Cognitive Grammar

    Authors: Susanne Salmon-Alt, Laurent Romary

    Abstract: Following the principles of Cognitive Grammar, we concentrate on a model for reference resolution that attempts to overcome the difficulties previous approaches, based on the fundamental assumption that all reference (independent on the type of the referring expression) is accomplished via access to and restructuring of domains of reference rather than by direct linkage to the entities themselve… ▽ More

    Submitted 14 September, 2009; originally announced September 2009.

    Comments: Colloque avec actes et comité de lecture. internationale

    Report number: A01-R-057 || salmon-alt01a

    Journal ref: International Colloqium on Cognitive Science, San Sebastian : Spain (2001)

  33. arXiv:0909.2145  [pdf

    cs.SE

    A general XML-based distributed software architecture for accessing and sharing ressources

    Authors: Samuel Cruz-Lara, Patrice Bonhomme, Christophe De Saint-Rat, Laurent Romary

    Abstract: This paper presents a general xml-based distributed software architecture in the aim of accessing and sharing resources in an opened client/server environment. The paper is organized as follows : First, we introduce the idea of a "General Distributed Software Architecture". Second, we describe the general framework in which this architecture is used. Third, we describe the process of information… ▽ More

    Submitted 11 September, 2009; originally announced September 2009.

    Comments: Colloque avec actes et comité de lecture

    Report number: 99-R-368 || cruz-lara99a

    Journal ref: XML Finland'99, Helsinki : Finland (1999)

  34. arXiv:0908.4413  [pdf, ps, other

    cs.CL

    Multiple Retrieval Models and Regression Models for Prior Art Search

    Authors: Patrice Lopez, Laurent Romary

    Abstract: This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten diffe… ▽ More

    Submitted 30 August, 2009; originally announced August 2009.

  35. arXiv:0907.2452  [pdf

    cs.CL

    Pattern Based Term Extraction Using ACABIT System

    Authors: Koichi Takeuchi, Kyo Kageura, Teruo Koyama, Béatrice Daille, Laurent Romary

    Abstract: In this paper, we propose a pattern-based term extraction approach for Japanese, applying ACABIT system originally developed for French. The proposed approach evaluates termhood using morphological patterns of basic terms and term variants. After extracting term candidates, ACABIT system filters out non-terms from the candidates based on log-likelihood. This approach is suitable for Japanese ter… ▽ More

    Submitted 14 July, 2009; originally announced July 2009.

    Journal ref: IEIC Technical Report 103, 280 (2003) 31-36

  36. Encoding models for scholarly literature

    Authors: Martin Holmes, Laurent Romary

    Abstract: We examine the issue of digital formats for document encoding, archiving and publishing, through the specific example of "born-digital" scholarly journal articles. We will begin by looking at the traditional workflow of journal editing and publication, and how these practices have made the transition into the online domain. We will examine the range of different file formats in which electronic… ▽ More

    Submitted 3 June, 2009; originally announced June 2009.

    Journal ref: Publishing and digital libraries: Legal and organizational issues, Ioannis Iglezakis, Tatiana-Eleni Synodinou, Sarantos Kapidakis (Ed.) (2010) -

  37. arXiv:0812.3563  [pdf

    cs.DL

    Questions & Answers for TEI Newcomers

    Authors: Laurent Romary

    Abstract: This paper provides an introduction to the Text Encoding Initia-tive (TEI), focused at bringing in newcomers who have to deal with a digital document project and are looking at the capacity that the TEI environment may have to fulfil his needs. To this end, we avoid a strictly technical presentation of the TEI and concentrate on the actual issues that such projects face, with parallel made on th… ▽ More

    Submitted 26 January, 2009; v1 submitted 18 December, 2008; originally announced December 2008.

    Journal ref: Jahrbuch für Computerphilologie 10 (2009)

  38. arXiv:0707.3270  [pdf

    cs.CL

    A Formal Model of Dictionary Structure and Content

    Authors: Laurent Romary, Nancy Ide, Adam Kilgarriff

    Abstract: We show that a general model of lexical information conforms to an abstract model that reflects the hierarchy of information found in a typical dictionary entry. We show that this model can be mapped into a well-formed XML document, and how the XSL transformation language can be used to implement a semantics defined over the abstract model to enable extraction and manipulation of the information… ▽ More

    Submitted 22 July, 2007; originally announced July 2007.

    Journal ref: Dans Euralex 2000 Euralex 2000, Stuttgart : Allemagne (2000)

  39. arXiv:0707.3269  [pdf

    cs.CL

    International Standard for a Linguistic Annotation Framework

    Authors: Laurent Romary, Nancy Ide

    Abstract: This paper describes the Linguistic Annotation Framework under development within ISO TC37 SC4 WG1. The Linguistic Annotation Framework is intended to serve as a basis for harmonizing existing language resources as well as develo** new ones.

    Submitted 22 July, 2007; originally announced July 2007.

    Journal ref: Natural Language Engineering 10, 3-4 (09/2004) 211-225

  40. arXiv:0707.2886  [pdf

    cs.DL

    OA@MPS - a colourful view

    Authors: Laurent Romary

    Abstract: The open access agenda of the Max Planck Society, initiator of the Berlin Declaration, envisions the support of both the green way and the golden way to open access. For the implementation of the green way the Max Planck Society through its newly established unit (Max Planck Digital Library) follows the idea of providing a centralized technical platform for publications and a local support for e… ▽ More

    Submitted 19 July, 2007; originally announced July 2007.

    Journal ref: Zeitschrift für Bibliothekswesen und Bibliographie (15/08/2007) 7 pages

  41. arXiv:cs/0703091  [pdf

    cs.AI cs.MM

    Multimodal Meaning Representation for Generic Dialogue Systems Architectures

    Authors: Frédéric Landragin, Alexandre Denis, Annalisa Ricci, Laurent Romary

    Abstract: An unified language for the communicative acts between agents is essential for the design of multi-agents architectures. Whatever the type of interaction (linguistic, multimodal, including particular aspects such as force feedback), whatever the type of application (command dialogue, request dialogue, database querying), the concepts are common and we need a generic meta-model. In order to tend… ▽ More

    Submitted 16 March, 2007; originally announced March 2007.

    Journal ref: Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004) (2004) 521-524

  42. arXiv:cs/0611026  [pdf

    cs.CL

    Un modèle générique d'organisation de corpus en ligne: application à la FReeBank

    Authors: Susanne Salmon-Alt, Laurent Romary, Jean-Marie Pierrel

    Abstract: The few available French resources for evaluating linguistic models or algorithms on other linguistic levels than morpho-syntax are either insufficient from quantitative as well as qualitative point of view or not freely accessible. Based on this fact, the FREEBANK project intends to create French corpora constructed using manually revised output from a hybrid Constraint Grammar parser and annot… ▽ More

    Submitted 6 November, 2006; originally announced November 2006.

    Journal ref: Traitement Automatique des Langues (TAL) 45 (2006) 145-169

  43. arXiv:cs/0606006  [pdf

    cs.CL

    Foundations of Modern Language Resource Archives

    Authors: Peter Wittenburg, Daan Broeder, Wolfgang Klein, Stephen Levinson, Laurent Romary

    Abstract: A number of serious reasons will convince an increasing amount of researchers to store their relevant material in centers which we will call "language resource archives". They combine the duty of taking care of long-term preservation as well as the task to give access to their material to different user groups. Access here is meant in the sense that an active interaction with the data will be ma… ▽ More

    Submitted 1 June, 2006; originally announced June 2006.

  44. arXiv:cs/0604027  [pdf

    cs.CL

    Unification of multi-lingual scientific terminological resources using the ISO 16642 standard. The TermSciences initiative

    Authors: Majid Khayari, Stéphane Schneider, Isabelle Kramer, Laurent Romary, the termsciences Collaboration

    Abstract: This paper presents the TermSciences portal, which deals with the implementation of a conceptual model that uses the recent ISO 16642 standard (Terminological Markup Framework). This standard turns out to be suitable for concept modeling since it allowed for organizing the original resources by concepts and to associate the various terms for a given concept. Additional structuring is produced by… ▽ More

    Submitted 7 April, 2006; originally announced April 2006.

    Comments: 6p