-
Improving (Re-)Usability of Musical Datasets: An Overview of the DOREMUS Project
Authors:
Pasquale Lisena,
Manel Achichi,
Pierre Choffé,
Cécile Cecconi,
Konstantin Todorov,
Bernard Jacquemin,
Raphaël Troncy
Abstract:
DOREMUS works on a better description of music by building new tools to link and explore the data of three French institutions. This paper gives an overview of the data model based on FRBRoo, explains the conversion and linking processes using linked data technologies and presents the prototypes created to consume the data according to the web users' needs.
DOREMUS works on a better description of music by building new tools to link and explore the data of three French institutions. This paper gives an overview of the data model based on FRBRoo, explains the conversion and linking processes using linked data technologies and presents the prototypes created to consume the data according to the web users' needs.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Receiving an algorithmic recommendation based on documentary filmmaking techniques
Authors:
Samuel Gantier,
Ève Givois,
Bernard Jacquemin,
Bouchra Atbane-El Houadi
Abstract:
This article analyzes the reception of a novel algorithmic recommendation of documentary films by a panel of moviegoers of the T{ë}nk platform. In order to propose an alternative to recommendations based on a thematic classification, the director or the production period, a set of metadata has been elaborated within the framework of this experimentation in order to characterize the great variety o…
▽ More
This article analyzes the reception of a novel algorithmic recommendation of documentary films by a panel of moviegoers of the T{ë}nk platform. In order to propose an alternative to recommendations based on a thematic classification, the director or the production period, a set of metadata has been elaborated within the framework of this experimentation in order to characterize the great variety of ``documentary filmmaking dispositifs'' . The goal is to investigate the different ways in which the platform's film lovers appropriate a personalized recommendation of 4 documentaries with similar or similar filmmaking dispositifs. To conclude, the contributions and limits of this proof of concept are discussed in order to sketch out avenues of reflection for improving the instrumented mediation of documentary films.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
Methodology for identifying study sites in scientific corpus
Authors:
Eric Kergosien,
Marie-Noëlle Bessagnet,
Maguelonne Teisseire,
Joachim Schöpfel,
Mohammad Amin Farvardin,
Stéphane Chaudiron,
Bernard Jacquemin,
Annig Le Parc-Lacayrelle,
Mathieu Roche,
Christian Sallaberry,
Jean-Philippe Tonneau,
Marie-Noelle Bessagnet,
Amin Farvardin,
Annig Lacayrelle
Abstract:
The TERRE-ISTEX project aims at identifying the evolution of research working relation to study areas, disciplinary crossings and concrete research methods based on the heterogeneous digital content available in scientific corpora. The project is divided into three main actions: (1) to identify the periods and places which have been the subject of empirical studies, and which reflect the publicati…
▽ More
The TERRE-ISTEX project aims at identifying the evolution of research working relation to study areas, disciplinary crossings and concrete research methods based on the heterogeneous digital content available in scientific corpora. The project is divided into three main actions: (1) to identify the periods and places which have been the subject of empirical studies, and which reflect the publications resulting from the corpus analyzed, (2) to identify the thematics addressed in these works and (3) to develop a web-based geographical information retrieval tool (GIR). The first two actions involve approaches combining Natural languages processing patterns with text mining methods. By crossing the three dimensions (spatial, thematic and temporal) in a GIR engine, it will be possible to understand what research has been carried out on which territories and at what time. In the project, the experiments are carried out on a heterogeneous corpus including electronic thesis and scientific articles from the ISTEX digital libraries and the CIRAD research center.
△ Less
Submitted 13 August, 2018;
originally announced August 2018.
-
Automatic Identification of Research Fields in Scientific Papers
Authors:
Eric Kergosien,
Amin Farvardin,
Maguelonne Teisseire,
Marie-Noëlle Bessagnet,
Joachim Schöpfel,
Stéphane Chaudiron,
Bernard Jacquemin,
Annig Le Parc-Lacayrelle,
Mathieu Roche,
Christian Sallaberry,
Jean-Philippe Tonneau
Abstract:
The TERRE-ISTEX project aims to identify scientific research dealing with specific geographical territories areas based on heterogeneous digital content available in scientific papers. The project is divided into three main work packages: (1) identification of the periods and places of empirical studies, and which reflect the publications resulting from the analyzed text samples, (2) identificatio…
▽ More
The TERRE-ISTEX project aims to identify scientific research dealing with specific geographical territories areas based on heterogeneous digital content available in scientific papers. The project is divided into three main work packages: (1) identification of the periods and places of empirical studies, and which reflect the publications resulting from the analyzed text samples, (2) identification of the themes which appear in these documents, and (3) development of a web-based geographical information retrieval tool (GIR). The first two actions combine Natural Language Processing patterns with text mining methods. The integration of the spatial, thematic and temporal dimensions in a GIR contributes to a better understanding of what kind of research has been carried out, of its topics and its geographical and historical coverage. Another originality of the TERRE-ISTEX project is the heterogeneous character of the corpus, including PhD theses and scientific articles from the ISTEX digital libraries and the CIRAD research center.
△ Less
Submitted 8 June, 2018;
originally announced June 2018.
-
GraphDuplex: visualisation simultanée de N réseaux couplés 2 par 2
Authors:
Martine Hurault-Plantet,
Elie Naulleau,
Bernard Jacquemin
Abstract:
While social network analysis often focuses on graph structure of social actors, an increasing number of communication networks now provide textual content within social activity (email, instant messaging, blogging, collaboration networks). We present an open source visualization software, GraphDuplex, which brings together social structure and textual content, adding a semantic dimension to socia…
▽ More
While social network analysis often focuses on graph structure of social actors, an increasing number of communication networks now provide textual content within social activity (email, instant messaging, blogging, collaboration networks). We present an open source visualization software, GraphDuplex, which brings together social structure and textual content, adding a semantic dimension to social analysis. GraphDuplex eventually connects any number of social or semantic graphs together, and through dynamic queries enables user interaction and exploration across multiple graphs of different nature.
△ Less
Submitted 29 October, 2010;
originally announced October 2010.
-
A derivational rephrasing experiment for question answering
Authors:
Bernard Jacquemin
Abstract:
In Knowledge Management, variations in information expressions have proven a real challenge. In particular, classical semantic relations (e.g. synonymy) do not connect words with different parts-of-speech. The method proposed tries to address this issue. It consists in building a derivational resource from a morphological derivation tool together with derivational guidelines from a dictionary in o…
▽ More
In Knowledge Management, variations in information expressions have proven a real challenge. In particular, classical semantic relations (e.g. synonymy) do not connect words with different parts-of-speech. The method proposed tries to address this issue. It consists in building a derivational resource from a morphological derivation tool together with derivational guidelines from a dictionary in order to store only correct derivatives. This resource, combined with a syntactic parser, a semantic disambiguator and some derivational patterns, helps to reformulate an original sentence while kee** the initial meaning in a convincing manner This approach has been evaluated in three different ways: the precision of the derivatives produced from a lemma; its ability to provide well-formed reformulations from an original sentence, preserving the initial meaning; its impact on the results co** with a real issue, ie a question answering task . The evaluation of this approach through a question answering system shows the pros and cons of this system, while foreshadowing some interesting future developments.
△ Less
Submitted 27 October, 2010;
originally announced October 2010.
-
Du corpus au dictionnaire
Authors:
Bernard Jacquemin,
Sabine Ploux
Abstract:
In this article, we propose an automatic process to build multi-lingual lexico-semantic resources. The goal of these resources is to browse semantically textual information contained in texts of different languages. This method uses a mathematical model called Atlas sémantiques in order to represent the different senses of each word. It uses the linguistic relations between words to create graph…
▽ More
In this article, we propose an automatic process to build multi-lingual lexico-semantic resources. The goal of these resources is to browse semantically textual information contained in texts of different languages. This method uses a mathematical model called Atlas sémantiques in order to represent the different senses of each word. It uses the linguistic relations between words to create graphs that are projected into a semantic space. These projections constitute semantic maps that denote the sense trends of each given word. This model is fed with syntactic relations between words extracted from a corpus. Therefore, the lexico-semantic resource produced describes all the words and all their meanings observed in the corpus. The sense trends are expressed by syntactic contexts, typical for a given meaning. The link between each sense trend and the utterances used to build the sense trend are also stored in an index. Thus all the instances of a word in a particular sense are linked and can be browsed easily. And by using several corpora of different languages, several resources are built that correspond with each other through languages. It makes it possible to browse information through languages thanks to syntactic contexts translations (even if some of them are partial).
△ Less
Submitted 26 January, 2009;
originally announced January 2009.
-
Managing conflicts between users in Wikipedia
Authors:
Bernard Jacquemin,
Aurélien Lauf,
Céline Poudat,
Martine Hurault-Plantet,
Nicolas Auray
Abstract:
Wikipedia is nowadays a widely used encyclopedia, and one of the most visible sites on the Internet. Its strong principle of collaborative work and free editing sometimes generates disputes due to disagreements between users. In this article we study how the wikipedian community resolves the conflicts and which roles do wikipedian choose in this process. We observed the users behavior both in th…
▽ More
Wikipedia is nowadays a widely used encyclopedia, and one of the most visible sites on the Internet. Its strong principle of collaborative work and free editing sometimes generates disputes due to disagreements between users. In this article we study how the wikipedian community resolves the conflicts and which roles do wikipedian choose in this process. We observed the users behavior both in the article talk pages, and in the Arbitration Committee pages specifically dedicated to serious disputes. We first set up a users typology according to their involvement in conflicts and their publishing and management activity in the encyclopedia. We then used those user types to describe users behavior in contributing to articles that are tagged by the wikipedian community as being in conflict with the official guidelines of Wikipedia, or conversely as being well featured.
△ Less
Submitted 30 May, 2008;
originally announced May 2008.
-
La fiabilité des informations sur le web
Authors:
Bernard Jacquemin,
Aurélien Lauf,
Céline Poudat,
Martine Hurault-Plantet,
Nicolas Auray
Abstract:
Online IR tools have to take into account new phenomena linked to the appearance of blogs, wiki and other collaborative publications. Among these collaborative sites, Wikipedia represents a crucial source of information. However, the quality of this information has been recently questionned. A better knowledge of the contributors' behaviors should help users navigate through information whose qu…
▽ More
Online IR tools have to take into account new phenomena linked to the appearance of blogs, wiki and other collaborative publications. Among these collaborative sites, Wikipedia represents a crucial source of information. However, the quality of this information has been recently questionned. A better knowledge of the contributors' behaviors should help users navigate through information whose quality may vary from one source to another. In order to explore this idea, we present an analysis of the role of different types of contributors in the control of the publication of conflictual articles.
△ Less
Submitted 30 May, 2008;
originally announced May 2008.
-
Corpus sp{é}cialis{é} et ressource de sp{é}cialit{é}
Authors:
Bernard Jacquemin,
Sabine Ploux
Abstract:
"Semantic Atlas" is a mathematic and statistic model to visualise word senses according to relations between words. The model, that has been applied to proximity relations from a corpus, has shown its ability to distinguish word senses as the corpus' contributors comprehend them. We propose to use the model and a specialised corpus in order to create automatically a specialised dictionary relative…
▽ More
"Semantic Atlas" is a mathematic and statistic model to visualise word senses according to relations between words. The model, that has been applied to proximity relations from a corpus, has shown its ability to distinguish word senses as the corpus' contributors comprehend them. We propose to use the model and a specialised corpus in order to create automatically a specialised dictionary relative to the corpus' domain. A morpho-syntactic analysis performed on the corpus makes it possible to create the dictionary from syntactic relations between lexical units. The semantic resource can be used to navigate semantically - and not only lexically - through the corpus, to create classical dictionaries or for diachronic studies of the language.
△ Less
Submitted 19 June, 2015; v1 submitted 8 January, 2008;
originally announced January 2008.
-
Interroger un corpus par le sens
Authors:
Bernard Jacquemin
Abstract:
In textual knowledge management, statistical methods prevail. Nonetheless, some difficulties cannot be overcome by these methodologies. I propose a symbolic approach using a complete textual analysis to identify which analysis level can improve the the answers provided by a system. The approach identifies word senses and relation between words and generates as many rephrasings as possible. Using…
▽ More
In textual knowledge management, statistical methods prevail. Nonetheless, some difficulties cannot be overcome by these methodologies. I propose a symbolic approach using a complete textual analysis to identify which analysis level can improve the the answers provided by a system. The approach identifies word senses and relation between words and generates as many rephrasings as possible. Using synonyms and derivative, the system provides new utterances without changing the original meaning of the sentences. Such a way, an information can be retrieved whatever the question or answer's wording may be.
△ Less
Submitted 31 May, 2008; v1 submitted 6 March, 2007;
originally announced March 2007.
-
Exploitation de dictionnaires électroniques pour la désambiguïsation sémantique lexicale
Authors:
Caroline Brun,
Bernard Jacquemin,
Frédérique Segond
Abstract:
This paper presents a lexical disambiguation system, initially developed for English and now adapted to French. This system associates a word with its meaning in a given context using electronic dictionaries as semantically annotated corpora in order to extract semantic disambiguation rules. We describe the rule extraction and application process as well as the evaluation of the system. The resu…
▽ More
This paper presents a lexical disambiguation system, initially developed for English and now adapted to French. This system associates a word with its meaning in a given context using electronic dictionaries as semantically annotated corpora in order to extract semantic disambiguation rules. We describe the rule extraction and application process as well as the evaluation of the system. The results for French give us insight information on some possible improvments of the nature and content of lexical resources adapted for disambiguation in this framework.
△ Less
Submitted 12 June, 2005;
originally announced June 2005.
-
Enriching a Text by Semantic Disambiguation for Information Extraction
Authors:
Bernard Jacquemin,
Caroline Brun,
Claude Roux
Abstract:
External linguistic resources have been used for a very long time in information extraction. These methods enrich a document with data that are semantically equivalent, in order to improve recall. For instance, some of these methods use synonym dictionaries. These dictionaries enrich a sentence with words that have a similar meaning. However, these methods present some serious drawbacks, since w…
▽ More
External linguistic resources have been used for a very long time in information extraction. These methods enrich a document with data that are semantically equivalent, in order to improve recall. For instance, some of these methods use synonym dictionaries. These dictionaries enrich a sentence with words that have a similar meaning. However, these methods present some serious drawbacks, since words are usually synonyms only in restricted contexts. The method we propose here consists of using word sense disambiguation rules (WSD) to restrict the selection of synonyms to only these that match a specific syntactico-semantic context. We show how WSD rules are built and how information extraction techniques can benefit from the application of these rules.
△ Less
Submitted 12 June, 2005;
originally announced June 2005.
-
Analyse et expansion des textes en question-réponse
Authors:
Bernard Jacquemin
Abstract:
This paper presents an original methodology to consider question answering. We noticed that query expansion is often incorrect because of a bad understanding of the question. But the automatic good understanding of an utterance is linked to the context length, and the question are often short. This methodology proposes to analyse the documents and to construct an informative structure from the r…
▽ More
This paper presents an original methodology to consider question answering. We noticed that query expansion is often incorrect because of a bad understanding of the question. But the automatic good understanding of an utterance is linked to the context length, and the question are often short. This methodology proposes to analyse the documents and to construct an informative structure from the results of the analysis and from a semantic text expansion. The linguistic analysis identifies words (tokenization and morphological analysis), links between words (syntactic analysis) and word sense (semantic disambiguation). The text expansion adds to each word the synonyms matching its sense and replaces the words in the utterances by derivatives, modifying the syntactic schema if necessary. In this way, whatever enrichment may be, the text keeps the same meaning, but each piece of information matches many realisations. The questioning method consists in constructing a local informative structure without enrichment, and matches it with the documentary structure. If a sentence in the informative structure matches the question structure, this sentence is the answer to the question.
△ Less
Submitted 12 June, 2005;
originally announced June 2005.
-
Dictionaries merger for text expansion in question answering
Authors:
Bernard Jacquemin
Abstract:
This paper presents an original way to add new data in a reference dictionary from several other lexical resources, without loosing any consistence. This operation is carried in order to get lexical information classified by the sense of the entry. This classification makes it possible to enrich utterances (in QA: the queries) following the meaning, and to reduce noise. An analysis of the experi…
▽ More
This paper presents an original way to add new data in a reference dictionary from several other lexical resources, without loosing any consistence. This operation is carried in order to get lexical information classified by the sense of the entry. This classification makes it possible to enrich utterances (in QA: the queries) following the meaning, and to reduce noise. An analysis of the experienced problems shows the interest of this method, and insists on the points that have to be tackled.
△ Less
Submitted 12 June, 2005;
originally announced June 2005.