Search | arXiv e-print repository

ERTIM@MC2: Diversified Argumentative Tweets Retrieval

Authors: Kévin Deturck, Parantapa Goswami, Damien Nouvel, Frédérique Segond

Abstract: In this paper, we present our participation to CLEF MC2 2018 edition for the task 2 Mining opinion argumentation. It consists in detecting the most argumentative and diverse Tweets about some festivals in English and French from a massive multilingual collection. We measure argumentativity of a Tweet computing the amount of argumentation compounds it contains. We consider argumentation compounds a… ▽ More In this paper, we present our participation to CLEF MC2 2018 edition for the task 2 Mining opinion argumentation. It consists in detecting the most argumentative and diverse Tweets about some festivals in English and French from a massive multilingual collection. We measure argumentativity of a Tweet computing the amount of argumentation compounds it contains. We consider argumentation compounds as a combination between opinion expression and its support with facts and a particular structuration. Regarding diversity, we consider the amount of festival aspects covered by Tweets. An initial step filters the original dataset to fit the language and topic requirements of the task. Then, we compute and integrate linguistic descriptors to detect claims and their respective justifications in Tweets. The final step extracts the most diverse arguments by clustering Tweets according to their textual content and selecting the most argumentative ones from each cluster. We conclude the paper describing the different ways we combined the descriptors among the different runs we submitted and discussing their results. △ Less

Submitted 17 April, 2023; originally announced April 2023.

Journal ref: CLEF 2018, 2018, Avignon, France

arXiv:1901.02055 [pdf]

doi 10.3166/RIA.32.287-312

SMILK, linking natural language and data from the web

Authors: Cédric Lopez, Molka Dhouib, Elena Cabrio, Catherine Faron Zucker, Fabien Gandon, Frédérique Segond

Abstract: As part of the SMILK Joint Lab, we studied the use of Natural Language Processing to: (1) enrich knowledge bases and link data on the web, and conversely (2) use this linked data to contribute to the improvement of text analysis and the annotation of textual content, and to support knowledge extraction. The evaluation focused on brand-related information retrieval in the field of cosmetics. This a… ▽ More As part of the SMILK Joint Lab, we studied the use of Natural Language Processing to: (1) enrich knowledge bases and link data on the web, and conversely (2) use this linked data to contribute to the improvement of text analysis and the annotation of textual content, and to support knowledge extraction. The evaluation focused on brand-related information retrieval in the field of cosmetics. This article describes each step of our approach: the creation of ProVoc, an ontology to describe products and brands; the automatic population of a knowledge base mainly based on ProVoc from heterogeneous textual resources; and the evaluation of an application which that takes the form of a browser plugin providing additional knowledge to users browsing the web. △ Less

Submitted 20 December, 2018; originally announced January 2019.

Comments: in French

Journal ref: RIA - Revue d'Intelligence Artificielle, 2018

arXiv:1707.07568 [pdf, other]

CAp 2017 challenge: Twitter Named Entity Recognition

Authors: Cédric Lopez, Ioannis Partalas, Georgios Balikas, Nadia Derbas, Amélie Martin, Coralie Reutenauer, Frédérique Segond, Massih-Reza Amini

Abstract: The paper describes the CAp 2017 challenge. The challenge concerns the problem of Named Entity Recognition (NER) for tweets written in French. We first present the data preparation steps we followed for constructing the dataset released in the framework of the challenge. We begin by demonstrating why NER for tweets is a challenging problem especially when the number of entities increases. We detai… ▽ More The paper describes the CAp 2017 challenge. The challenge concerns the problem of Named Entity Recognition (NER) for tweets written in French. We first present the data preparation steps we followed for constructing the dataset released in the framework of the challenge. We begin by demonstrating why NER for tweets is a challenging problem especially when the number of entities increases. We detail the annotation process and the necessary decisions we made. We provide statistics on the inter-annotator agreement, and we conclude the data description part with examples and statistics for the data. We, then, describe the participation in the challenge, where 8 teams participated, with a focus on the methods employed by the challenge participants and the scores achieved in terms of F$_1$ measure. Importantly, the constructed dataset comprising $\sim$6,000 tweets annotated for 13 types of entities, which to the best of our knowledge is the first such dataset in French, is publicly available at \url{http://cap2017.imag.fr/competition.html} . △ Less

Submitted 24 July, 2017; originally announced July 2017.

Comments: Presented at CAp 2017 (French Conference on Machine Learning)

arXiv:cs/0506049 [pdf, ps, other]

Exploitation de dictionnaires électroniques pour la désambiguïsation sémantique lexicale

Authors: Caroline Brun, Bernard Jacquemin, Frédérique Segond

Abstract: This paper presents a lexical disambiguation system, initially developed for English and now adapted to French. This system associates a word with its meaning in a given context using electronic dictionaries as semantically annotated corpora in order to extract semantic disambiguation rules. We describe the rule extraction and application process as well as the evaluation of the system. The resu… ▽ More This paper presents a lexical disambiguation system, initially developed for English and now adapted to French. This system associates a word with its meaning in a given context using electronic dictionaries as semantically annotated corpora in order to extract semantic disambiguation rules. We describe the rule extraction and application process as well as the evaluation of the system. The results for French give us insight information on some possible improvments of the nature and content of lexical resources adapted for disambiguation in this framework. △ Less

Submitted 12 June, 2005; originally announced June 2005.

Comments: 25 pp

ACM Class: H.3; H.4; H.5

Journal ref: Traitement Automatique des Langues (TAL) 42, no. 3 (2001) pp. 667-690

Showing 1–4 of 4 results for author: Segond, F