Search | arXiv e-print repository

doi 10.1515/bfp-2018-0023

Improving (Re-)Usability of Musical Datasets: An Overview of the DOREMUS Project

Authors: Pasquale Lisena, Manel Achichi, Pierre Choffé, Cécile Cecconi, Konstantin Todorov, Bernard Jacquemin, Raphaël Troncy

Abstract: DOREMUS works on a better description of music by building new tools to link and explore the data of three French institutions. This paper gives an overview of the data model based on FRBRoo, explains the conversion and linking processes using linked data technologies and presents the prototypes created to consume the data according to the web users' needs. DOREMUS works on a better description of music by building new tools to link and explore the data of three French institutions. This paper gives an overview of the data model based on FRBRoo, explains the conversion and linking processes using linked data technologies and presents the prototypes created to consume the data according to the web users' needs. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Journal ref: Bibliothek Forschung und Praxis, 2018, 42 (2), pp.194-205.

arXiv:2309.04184 [pdf, ps, other]

doi 10.4000/rfsic.14204

Receiving an algorithmic recommendation based on documentary filmmaking techniques

Authors: Samuel Gantier, Ève Givois, Bernard Jacquemin, Bouchra Atbane-El Houadi

Abstract: This article analyzes the reception of a novel algorithmic recommendation of documentary films by a panel of moviegoers of the T{ë}nk platform. In order to propose an alternative to recommendations based on a thematic classification, the director or the production period, a set of metadata has been elaborated within the framework of this experimentation in order to characterize the great variety o… ▽ More This article analyzes the reception of a novel algorithmic recommendation of documentary films by a panel of moviegoers of the T{ë}nk platform. In order to propose an alternative to recommendations based on a thematic classification, the director or the production period, a set of metadata has been elaborated within the framework of this experimentation in order to characterize the great variety of ``documentary filmmaking dispositifs'' . The goal is to investigate the different ways in which the platform's film lovers appropriate a personalized recommendation of 4 documentaries with similar or similar filmmaking dispositifs. To conclude, the contributions and limits of this proof of concept are discussed in order to sketch out avenues of reflection for improving the instrumented mediation of documentary films. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: in French language

Journal ref: Revue fran{\c c}aise des sciences de l'information et de la communication, 2023, 26

arXiv:1808.04124 [pdf, other]

doi 10.3166/dn.2017.00011

Methodology for identifying study sites in scientific corpus

Authors: Eric Kergosien, Marie-Noëlle Bessagnet, Maguelonne Teisseire, Joachim Schöpfel, Mohammad Amin Farvardin, Stéphane Chaudiron, Bernard Jacquemin, Annig Le Parc-Lacayrelle, Mathieu Roche, Christian Sallaberry, Jean-Philippe Tonneau, Marie-Noelle Bessagnet, Amin Farvardin, Annig Lacayrelle

Abstract: The TERRE-ISTEX project aims at identifying the evolution of research working relation to study areas, disciplinary crossings and concrete research methods based on the heterogeneous digital content available in scientific corpora. The project is divided into three main actions: (1) to identify the periods and places which have been the subject of empirical studies, and which reflect the publicati… ▽ More The TERRE-ISTEX project aims at identifying the evolution of research working relation to study areas, disciplinary crossings and concrete research methods based on the heterogeneous digital content available in scientific corpora. The project is divided into three main actions: (1) to identify the periods and places which have been the subject of empirical studies, and which reflect the publications resulting from the corpus analyzed, (2) to identify the thematics addressed in these works and (3) to develop a web-based geographical information retrieval tool (GIR). The first two actions involve approaches combining Natural languages processing patterns with text mining methods. By crossing the three dimensions (spatial, thematic and temporal) in a GIR engine, it will be possible to understand what research has been carried out on which territories and at what time. In the project, the experiments are carried out on a heterogeneous corpus including electronic thesis and scientific articles from the ISTEX digital libraries and the CIRAD research center. △ Less

Submitted 13 August, 2018; originally announced August 2018.

Journal ref: Revue des Sciences et Technologies de l'Information - Série Document Numérique, Lavoisier, 2017, 20 (2-3), pp.11-30

arXiv:1806.03144 [pdf, ps, other]

Automatic Identification of Research Fields in Scientific Papers

Authors: Eric Kergosien, Amin Farvardin, Maguelonne Teisseire, Marie-Noëlle Bessagnet, Joachim Schöpfel, Stéphane Chaudiron, Bernard Jacquemin, Annig Le Parc-Lacayrelle, Mathieu Roche, Christian Sallaberry, Jean-Philippe Tonneau

Abstract: The TERRE-ISTEX project aims to identify scientific research dealing with specific geographical territories areas based on heterogeneous digital content available in scientific papers. The project is divided into three main work packages: (1) identification of the periods and places of empirical studies, and which reflect the publications resulting from the analyzed text samples, (2) identificatio… ▽ More The TERRE-ISTEX project aims to identify scientific research dealing with specific geographical territories areas based on heterogeneous digital content available in scientific papers. The project is divided into three main work packages: (1) identification of the periods and places of empirical studies, and which reflect the publications resulting from the analyzed text samples, (2) identification of the themes which appear in these documents, and (3) development of a web-based geographical information retrieval tool (GIR). The first two actions combine Natural Language Processing patterns with text mining methods. The integration of the spatial, thematic and temporal dimensions in a GIR contributes to a better understanding of what kind of research has been carried out, of its topics and its geographical and historical coverage. Another originality of the TERRE-ISTEX project is the heterogeneous character of the corpus, including PhD theses and scientific articles from the ISTEX digital libraries and the CIRAD research center. △ Less

Submitted 8 June, 2018; originally announced June 2018.

Journal ref: Proceedings of the Eleventh International Conference on Language Resources and Evaluation, pp.1902-1907, 2018, http://lrec2018.lrec-conf.org

arXiv:1010.6242 [pdf, ps, other]

GraphDuplex: visualisation simultanée de N réseaux couplés 2 par 2

Authors: Martine Hurault-Plantet, Elie Naulleau, Bernard Jacquemin

Abstract: While social network analysis often focuses on graph structure of social actors, an increasing number of communication networks now provide textual content within social activity (email, instant messaging, blogging, collaboration networks). We present an open source visualization software, GraphDuplex, which brings together social structure and textual content, adding a semantic dimension to socia… ▽ More While social network analysis often focuses on graph structure of social actors, an increasing number of communication networks now provide textual content within social activity (email, instant messaging, blogging, collaboration networks). We present an open source visualization software, GraphDuplex, which brings together social structure and textual content, adding a semantic dimension to social analysis. GraphDuplex eventually connects any number of social or semantic graphs together, and through dynamic queries enables user interaction and exploration across multiple graphs of different nature. △ Less

Submitted 29 October, 2010; originally announced October 2010.

Journal ref: Conférence en Recherche d'Information et Applications (CORIA 2009), Prequ'île de Giens : France (2009)

arXiv:1010.5584 [pdf, ps, other]

A derivational rephrasing experiment for question answering

Authors: Bernard Jacquemin

Abstract: In Knowledge Management, variations in information expressions have proven a real challenge. In particular, classical semantic relations (e.g. synonymy) do not connect words with different parts-of-speech. The method proposed tries to address this issue. It consists in building a derivational resource from a morphological derivation tool together with derivational guidelines from a dictionary in o… ▽ More In Knowledge Management, variations in information expressions have proven a real challenge. In particular, classical semantic relations (e.g. synonymy) do not connect words with different parts-of-speech. The method proposed tries to address this issue. It consists in building a derivational resource from a morphological derivation tool together with derivational guidelines from a dictionary in order to store only correct derivatives. This resource, combined with a syntactic parser, a semantic disambiguator and some derivational patterns, helps to reformulate an original sentence while kee** the initial meaning in a convincing manner This approach has been evaluated in three different ways: the precision of the derivatives produced from a lemma; its ability to provide well-formed reformulations from an original sentence, preserving the initial meaning; its impact on the results co** with a real issue, ie a question answering task . The evaluation of this approach through a question answering system shows the pros and cons of this system, while foreshadowing some interesting future developments. △ Less

Submitted 27 October, 2010; originally announced October 2010.

arXiv:0901.3990 [pdf, ps, other]

Du corpus au dictionnaire

Authors: Bernard Jacquemin, Sabine Ploux

Abstract: In this article, we propose an automatic process to build multi-lingual lexico-semantic resources. The goal of these resources is to browse semantically textual information contained in texts of different languages. This method uses a mathematical model called Atlas sémantiques in order to represent the different senses of each word. It uses the linguistic relations between words to create graph… ▽ More In this article, we propose an automatic process to build multi-lingual lexico-semantic resources. The goal of these resources is to browse semantically textual information contained in texts of different languages. This method uses a mathematical model called Atlas sémantiques in order to represent the different senses of each word. It uses the linguistic relations between words to create graphs that are projected into a semantic space. These projections constitute semantic maps that denote the sense trends of each given word. This model is fed with syntactic relations between words extracted from a corpus. Therefore, the lexico-semantic resource produced describes all the words and all their meanings observed in the corpus. The sense trends are expressed by syntactic contexts, typical for a given meaning. The link between each sense trend and the utterances used to build the sense trend are also stored in an index. Thus all the instances of a word in a particular sense are linked and can be browsed easily. And by using several corpora of different languages, several resources are built that correspond with each other through languages. It makes it possible to browse information through languages thanks to syntactic contexts translations (even if some of them are partial). △ Less

Submitted 26 January, 2009; originally announced January 2009.

Journal ref: Cahiers de Linguistique. Revue de sociolinguistique et de sociologie de la langue française 33, 1 (2008) 63-84

arXiv:0805.4754 [pdf, ps, other]

Managing conflicts between users in Wikipedia

Authors: Bernard Jacquemin, Aurélien Lauf, Céline Poudat, Martine Hurault-Plantet, Nicolas Auray

Abstract: Wikipedia is nowadays a widely used encyclopedia, and one of the most visible sites on the Internet. Its strong principle of collaborative work and free editing sometimes generates disputes due to disagreements between users. In this article we study how the wikipedian community resolves the conflicts and which roles do wikipedian choose in this process. We observed the users behavior both in th… ▽ More Wikipedia is nowadays a widely used encyclopedia, and one of the most visible sites on the Internet. Its strong principle of collaborative work and free editing sometimes generates disputes due to disagreements between users. In this article we study how the wikipedian community resolves the conflicts and which roles do wikipedian choose in this process. We observed the users behavior both in the article talk pages, and in the Arbitration Committee pages specifically dedicated to serious disputes. We first set up a users typology according to their involvement in conflicts and their publishing and management activity in the encyclopedia. We then used those user types to describe users behavior in contributing to articles that are tagged by the wikipedian community as being in conflict with the official guidelines of Wikipedia, or conversely as being well featured. △ Less

Submitted 30 May, 2008; originally announced May 2008.

Comments: 12 pp

Journal ref: Dans BIS 2008 Workshop proceedings - 11th Conference on Business Information Systems (BIS 2008), Social Aspects of the Web Workshop (SAW 2008), Innsbruck : Autriche (2008)

arXiv:0805.4722 [pdf, ps, other]

La fiabilité des informations sur le web

Authors: Bernard Jacquemin, Aurélien Lauf, Céline Poudat, Martine Hurault-Plantet, Nicolas Auray

Abstract: Online IR tools have to take into account new phenomena linked to the appearance of blogs, wiki and other collaborative publications. Among these collaborative sites, Wikipedia represents a crucial source of information. However, the quality of this information has been recently questionned. A better knowledge of the contributors' behaviors should help users navigate through information whose qu… ▽ More Online IR tools have to take into account new phenomena linked to the appearance of blogs, wiki and other collaborative publications. Among these collaborative sites, Wikipedia represents a crucial source of information. However, the quality of this information has been recently questionned. A better knowledge of the contributors' behaviors should help users navigate through information whose quality may vary from one source to another. In order to explore this idea, we present an analysis of the role of different types of contributors in the control of the publication of conflictual articles. △ Less

Submitted 30 May, 2008; originally announced May 2008.

Comments: 8 pp

Journal ref: Dans Actes de la Conférence en Recherche d'Information et Applications CORIA 2008 - Conférence en Recherche d'Information et Applications 2008, Trégastel : France (2008)

arXiv:0801.1179 [pdf, ps, other]

Corpus sp{é}cialis{é} et ressource de sp{é}cialit{é}

Authors: Bernard Jacquemin, Sabine Ploux

Abstract: "Semantic Atlas" is a mathematic and statistic model to visualise word senses according to relations between words. The model, that has been applied to proximity relations from a corpus, has shown its ability to distinguish word senses as the corpus' contributors comprehend them. We propose to use the model and a specialised corpus in order to create automatically a specialised dictionary relative… ▽ More "Semantic Atlas" is a mathematic and statistic model to visualise word senses according to relations between words. The model, that has been applied to proximity relations from a corpus, has shown its ability to distinguish word senses as the corpus' contributors comprehend them. We propose to use the model and a specialised corpus in order to create automatically a specialised dictionary relative to the corpus' domain. A morpho-syntactic analysis performed on the corpus makes it possible to create the dictionary from syntactic relations between lexical units. The semantic resource can be used to navigate semantically - and not only lexically - through the corpus, to create classical dictionaries or for diachronic studies of the language. △ Less

Submitted 19 June, 2015; v1 submitted 8 January, 2008; originally announced January 2008.

Comments: 16 pages, in French

Journal ref: Appears in François Maniez; Pascaline Dury; Nathalie Arlin; Claire Rougemont. Corpus et dictionnaires de langues de sp{é}cialit{é}, Presses Universitaires de Granoble, pp.197-212, 2008

arXiv:cs/0703027 [pdf, ps, other]

Interroger un corpus par le sens

Authors: Bernard Jacquemin

Abstract: In textual knowledge management, statistical methods prevail. Nonetheless, some difficulties cannot be overcome by these methodologies. I propose a symbolic approach using a complete textual analysis to identify which analysis level can improve the the answers provided by a system. The approach identifies word senses and relation between words and generates as many rephrasings as possible. Using… ▽ More In textual knowledge management, statistical methods prevail. Nonetheless, some difficulties cannot be overcome by these methodologies. I propose a symbolic approach using a complete textual analysis to identify which analysis level can improve the the answers provided by a system. The approach identifies word senses and relation between words and generates as many rephrasings as possible. Using synonyms and derivative, the system provides new utterances without changing the original meaning of the sentences. Such a way, an information can be retrieved whatever the question or answer's wording may be. △ Less

Submitted 31 May, 2008; v1 submitted 6 March, 2007; originally announced March 2007.

Comments: 13 pp

Journal ref: Dans "Mots, termes et contextes", Actes des septièmes Journées scientifiques du réseau de chercheurs Lexicologie, Terminologie, Traduction - Bruxelles : Belgique (2005)

arXiv:cs/0506049 [pdf, ps, other]

Exploitation de dictionnaires électroniques pour la désambiguïsation sémantique lexicale

Authors: Caroline Brun, Bernard Jacquemin, Frédérique Segond

Abstract: This paper presents a lexical disambiguation system, initially developed for English and now adapted to French. This system associates a word with its meaning in a given context using electronic dictionaries as semantically annotated corpora in order to extract semantic disambiguation rules. We describe the rule extraction and application process as well as the evaluation of the system. The resu… ▽ More This paper presents a lexical disambiguation system, initially developed for English and now adapted to French. This system associates a word with its meaning in a given context using electronic dictionaries as semantically annotated corpora in order to extract semantic disambiguation rules. We describe the rule extraction and application process as well as the evaluation of the system. The results for French give us insight information on some possible improvments of the nature and content of lexical resources adapted for disambiguation in this framework. △ Less

Submitted 12 June, 2005; originally announced June 2005.

Comments: 25 pp

ACM Class: H.3; H.4; H.5

Journal ref: Traitement Automatique des Langues (TAL) 42, no. 3 (2001) pp. 667-690

arXiv:cs/0506048 [pdf, ps, other]

Enriching a Text by Semantic Disambiguation for Information Extraction

Authors: Bernard Jacquemin, Caroline Brun, Claude Roux

Abstract: External linguistic resources have been used for a very long time in information extraction. These methods enrich a document with data that are semantically equivalent, in order to improve recall. For instance, some of these methods use synonym dictionaries. These dictionaries enrich a sentence with words that have a similar meaning. However, these methods present some serious drawbacks, since w… ▽ More External linguistic resources have been used for a very long time in information extraction. These methods enrich a document with data that are semantically equivalent, in order to improve recall. For instance, some of these methods use synonym dictionaries. These dictionaries enrich a sentence with words that have a similar meaning. However, these methods present some serious drawbacks, since words are usually synonyms only in restricted contexts. The method we propose here consists of using word sense disambiguation rules (WSD) to restrict the selection of synonyms to only these that match a specific syntactico-semantic context. We show how WSD rules are built and how information extraction techniques can benefit from the application of these rules. △ Less

Submitted 12 June, 2005; originally announced June 2005.

Comments: 7 pp

ACM Class: H.3; H.4; H.5

Journal ref: LREC 2002 Workshop Proceedings "Using semantics for informaiton retrival and filtering" (2002) 45-51

arXiv:cs/0506047 [pdf, ps, other]

Analyse et expansion des textes en question-réponse

Authors: Bernard Jacquemin

Abstract: This paper presents an original methodology to consider question answering. We noticed that query expansion is often incorrect because of a bad understanding of the question. But the automatic good understanding of an utterance is linked to the context length, and the question are often short. This methodology proposes to analyse the documents and to construct an informative structure from the r… ▽ More This paper presents an original methodology to consider question answering. We noticed that query expansion is often incorrect because of a bad understanding of the question. But the automatic good understanding of an utterance is linked to the context length, and the question are often short. This methodology proposes to analyse the documents and to construct an informative structure from the results of the analysis and from a semantic text expansion. The linguistic analysis identifies words (tokenization and morphological analysis), links between words (syntactic analysis) and word sense (semantic disambiguation). The text expansion adds to each word the synonyms matching its sense and replaces the words in the utterances by derivatives, modifying the syntactic schema if necessary. In this way, whatever enrichment may be, the text keeps the same meaning, but each piece of information matches many realisations. The questioning method consists in constructing a local informative structure without enrichment, and matches it with the documentary structure. If a sentence in the informative structure matches the question structure, this sentence is the answer to the question. △ Less

Submitted 12 June, 2005; originally announced June 2005.

Comments: 11 pp

ACM Class: H.3; H.4; H.5

Journal ref: Le poids des mots. Actes des 7es journées internationales d'Analyse statistique des Données Textuelles (2004) 1219

arXiv:cs/0506046 [pdf, ps, other]

Dictionaries merger for text expansion in question answering

Authors: Bernard Jacquemin

Abstract: This paper presents an original way to add new data in a reference dictionary from several other lexical resources, without loosing any consistence. This operation is carried in order to get lexical information classified by the sense of the entry. This classification makes it possible to enrich utterances (in QA: the queries) following the meaning, and to reduce noise. An analysis of the experi… ▽ More This paper presents an original way to add new data in a reference dictionary from several other lexical resources, without loosing any consistence. This operation is carried in order to get lexical information classified by the sense of the entry. This classification makes it possible to enrich utterances (in QA: the queries) following the meaning, and to reduce noise. An analysis of the experienced problems shows the interest of this method, and insists on the points that have to be tackled. △ Less

Submitted 12 June, 2005; originally announced June 2005.

Comments: 4 pp

ACM Class: H.3; H.5

Journal ref: Proceedings of COLING 2004 (2004) 1398

Showing 1–15 of 15 results for author: Jacquemin, B