Search | arXiv e-print repository

arXiv:2001.11382 [pdf, ps, other]

Intweetive Text Summarization

Authors: Jean Valère Cossu, Juan-Manuel Torres-Moreno, Eric SanJuan, Marc El-Bèze

Abstract: The amount of user generated contents from various social medias allows analyst to handle a wide view of conversations on several topics related to their business. Nevertheless kee** up-to-date with this amount of information is not humanly feasible. Automatic Summarization then provides an interesting mean to digest the dynamics and the mass volume of contents. In this paper, we address the iss… ▽ More The amount of user generated contents from various social medias allows analyst to handle a wide view of conversations on several topics related to their business. Nevertheless kee** up-to-date with this amount of information is not humanly feasible. Automatic Summarization then provides an interesting mean to digest the dynamics and the mass volume of contents. In this paper, we address the issue of tweets summarization which remains scarcely explored. We propose to automatically generated summaries of Micro-Blogs conversations dealing with public figures E-Reputation. These summaries are generated using key-word queries or sample tweet and offer a focused view of the whole Micro-Blog network. Since state-of-the-art is lacking on this point we conduct and evaluate our experiments over the multilingual CLEF RepLab Topic-Detection dataset according to an experimental evaluation process. △ Less

Submitted 16 January, 2020; originally announced January 2020.

Comments: 8 pages, 4 tables

Journal ref: International Journal of Computational Linguistics and Applications vol. 7, no. 1, 2016, pp. 67-83

arXiv:2001.10613 [pdf, other]

Predicting Personalized Academic and Career Roads: First Steps Toward a Multi-Uses Recommender System

Authors: Alexandre Nadjem, Juan-Manuel Torres-Moreno, Marc El-Bèze, Guillaume Marrel, Benoît Bonte

Abstract: Nobody knows what one's do in the future and everyone will have had a different answer to the question : how do you see yourself in five years after your current job/diploma? In this paper we introduce concepts, large categories of fields of studies or job domains in order to represent the vision of the future of the user's trajectory. Then, we show how they can influence the prediction when propo… ▽ More Nobody knows what one's do in the future and everyone will have had a different answer to the question : how do you see yourself in five years after your current job/diploma? In this paper we introduce concepts, large categories of fields of studies or job domains in order to represent the vision of the future of the user's trajectory. Then, we show how they can influence the prediction when proposing him a set of next steps to take. △ Less

Submitted 3 January, 2020; originally announced January 2020.

Comments: 4 pages, 3 figures, 4 tables

Journal ref: Digital Tools & Uses Congress (DTUC '18), pp 1--4, 2018, Paris, France

arXiv:1903.07397 [pdf, other]

Un duel probabiliste pour départager deux présidents (LIA @ DEFT'2005)

Authors: Marc El-Bèze, Juan-Manuel Torres-Moreno, Frédéric Béchet

Abstract: We present a set of probabilistic models applied to binary classification as defined in the DEFT'05 challenge. The challenge consisted a mixture of two differents problems in Natural Language Processing : identification of author (a sequence of François Mitterrand's sentences might have been inserted into a speech of Jacques Chirac) and thematic break detection (the subjects addressed by the two a… ▽ More We present a set of probabilistic models applied to binary classification as defined in the DEFT'05 challenge. The challenge consisted a mixture of two differents problems in Natural Language Processing : identification of author (a sequence of François Mitterrand's sentences might have been inserted into a speech of Jacques Chirac) and thematic break detection (the subjects addressed by the two authors are supposed to be different). Markov chains, Bayes models and an adaptative process have been used to identify the paternity of these sequences. A probabilistic model of the internal coherence of speeches which has been employed to identify thematic breaks. Adding this model has shown to improve the quality results. A comparison with different approaches demostrates the superiority of a strategy that combines learning, coherence and adaptation. Applied to the DEFT'05 data test the results in terms of precision (0.890), recall (0.955) and Fscore (0.925) measure are very promising. △ Less

Submitted 11 March, 2019; originally announced March 2019.

Comments: 27 figures, 1 table (in French)

Journal ref: RNTI (E10)776:1889-1918, 2007

arXiv:1812.07207 [pdf, other]

doi 10.1016/j.csl.2015.03.006

Multiple topic identification in human/human conversations

Authors: X. Bost, G. Senay, M. El-Bèze, R. De Mori

Abstract: The paper deals with the automatic analysis of real-life telephone conversations between an agent and a customer of a customer care service (ccs). The application domain is the public transportation system in Paris and the purpose is to collect statistics about customer problems in order to monitor the service and decide priorities on the intervention for improving user satisfaction. Of primary im… ▽ More The paper deals with the automatic analysis of real-life telephone conversations between an agent and a customer of a customer care service (ccs). The application domain is the public transportation system in Paris and the purpose is to collect statistics about customer problems in order to monitor the service and decide priorities on the intervention for improving user satisfaction. Of primary importance for the analysis is the detection of themes that are the object of customer problems. Themes are defined in the application requirements and are part of the application ontology that is implicit in the ccs documentation. Due to variety of customer population, the structure of conversations with an agent is unpredictable. A conversation may be about one or more themes. Theme mentions can be interleaved with mentions of facts that are irrelevant for the application purpose. Furthermore, in certain conversations theme mentions are localized in specific conversation segments while in other conversations mentions cannot be localized. As a consequence, approaches to feature extraction with and without mention localization are considered. Application domain relevant themes identified by an automatic procedure are expressed by specific sentences whose words are hypothesized by an automatic speech recognition (asr) system. The asr system is error prone. The word error rates can be very high for many reasons. Among them it is worth mentioning unpredictable background noise, speaker accent, and various types of speech disfluencies. As the application task requires the composition of proportions of theme mentions, a sequential decision strategy is introduced in this paper for performing a survey of the large amount of conversations made available in a given time period. The strategy has to sample the conversations to form a survey containing enough data analyzed with high accuracy so that proportions can be estimated with sufficient accuracy. Due to the unpredictable type of theme mentions, it is appropriate to consider methods for theme hypothesization based on global as well as local feature extraction. Two systems based on each type of feature extraction will be considered by the strategy. One of the four methods is novel. It is based on a new definition of density of theme mentions and on the localization of high density zones whose boundaries do not need to be precisely detected. The sequential decision strategy starts by grou** theme hypotheses into sets of different expected accuracy and coverage levels. For those sets for which accuracy can be improved with a consequent increase of coverage a new system with new features is introduced. Its execution is triggered only when specific preconditions are met on the hypotheses generated by the basic four systems. Experimental results are provided on a corpus collected in the call center of the Paris transportation system known as ratp. The results show that surveys with high accuracy and coverage can be composed with the proposed strategy and systems. This makes it possible to apply a previously published proportion estimation approach that takes into account hypothesization errors . △ Less

Submitted 29 December, 2018; v1 submitted 18 December, 2018; originally announced December 2018.

Journal ref: Computer Speech \& Language, 2015, 34 (1), pp.18-42

arXiv:1702.06510 [pdf, ps, other]

Algorithmes de classification et d'optimisation: participation du LIA/ADOC á DEFT'14

Authors: Luis Adrián Cabrera-Diego, Stéphane Huet, Bassam Jabaian, Alejandro Molina, Juan-Manuel Torres-Moreno, Marc El-Bèze, Barthélémy Durette

Abstract: This year, the DEFT campaign (Défi Fouilles de Textes) incorporates a task which aims at identifying the session in which articles of previous TALN conferences were presented. We describe the three statistical systems developed at LIA/ADOC for this task. A fusion of these systems enables us to obtain interesting results (micro-precision score of 0.76 measured on the test corpus) This year, the DEFT campaign (Défi Fouilles de Textes) incorporates a task which aims at identifying the session in which articles of previous TALN conferences were presented. We describe the three statistical systems developed at LIA/ADOC for this task. A fusion of these systems enables us to obtain interesting results (micro-precision score of 0.76 measured on the test corpus) △ Less

Submitted 21 February, 2017; originally announced February 2017.

Comments: 8 pages, 3 tables, Conference paper (in French)

arXiv:1702.06478 [pdf, ps, other]

Systèmes du LIA à DEFT'13

Authors: Xavier Bost, Ilaria Brunetti, Luis Adrián Cabrera-Diego, Jean-Valère Cossu, Andréa Linhares, Mohamed Morchid, Juan-Manuel Torres-Moreno, Marc El-Bèze, Richard Dufour

Abstract: The 2013 Défi de Fouille de Textes (DEFT) campaign is interested in two types of language analysis tasks, the document classification and the information extraction in the specialized domain of cuisine recipes. We present the systems that the LIA has used in DEFT 2013. Our systems show interesting results, even though the complexity of the proposed tasks. The 2013 Défi de Fouille de Textes (DEFT) campaign is interested in two types of language analysis tasks, the document classification and the information extraction in the specialized domain of cuisine recipes. We present the systems that the LIA has used in DEFT 2013. Our systems show interesting results, even though the complexity of the proposed tasks. △ Less

Submitted 21 February, 2017; originally announced February 2017.

Comments: 12 pages, 3 tables, (Paper in French)

Journal ref: Proceedings of the Ninth DEFT Workshop, DEFT2013, Les Sables-d'Olonne, France, 21st June 2013

arXiv:1501.01252 [pdf, other]

doi 10.15439/2014F336

Optimisation using Natural Language Processing: Personalized Tour Recommendation for Museums

Authors: Mayeul Mathias, Assema Moussa, Fen Zhou, Juan-Manuel Torres-Moreno, Marie-Sylvie Poli, Didier Josselin, Marc El-Bèze, Andréa Carneiro Linhares, Francoise Rigat

Abstract: This paper proposes a new method to provide personalized tour recommendation for museum visits. It combines an optimization of preference criteria of visitors with an automatic extraction of artwork importance from museum information based on Natural Language Processing using textual energy. This project includes researchers from computer and social sciences. Some results are obtained with numeric… ▽ More This paper proposes a new method to provide personalized tour recommendation for museum visits. It combines an optimization of preference criteria of visitors with an automatic extraction of artwork importance from museum information based on Natural Language Processing using textual energy. This project includes researchers from computer and social sciences. Some results are obtained with numerical experiments. They show that our model clearly improves the satisfaction of the visitor who follows the proposed tour. This work foreshadows some interesting outcomes and applications about on-demand personalized visit of museums in a very near future. △ Less

Submitted 6 January, 2015; originally announced January 2015.

Comments: 8 pages, 4 figures; Proceedings of the 2014 Federated Conference on Computer Science and Information Systems pp. 439-446

arXiv:1004.3371 [pdf, other]

Improving Update Summarization by Revisiting the MMR Criterion

Authors: Florian Boudin, Juan-Manuel Torres-Moreno, Marc El-Bèze

Abstract: This paper describes a method for multi-document update summarization that relies on a double maximization criterion. A Maximal Marginal Relevance like criterion, modified and so called Smmr, is used to select sentences that are close to the topic and at the same time, distant from sentences used in already read documents. Summaries are then generated by assembling the high ranked material and app… ▽ More This paper describes a method for multi-document update summarization that relies on a double maximization criterion. A Maximal Marginal Relevance like criterion, modified and so called Smmr, is used to select sentences that are close to the topic and at the same time, distant from sentences used in already read documents. Summaries are then generated by assembling the high ranked material and applying some ruled-based linguistic post-processing in order to obtain length reduction and maintain coherency. Through a participation to the Text Analysis Conference (TAC) 2008 evaluation campaign, we have shown that our method achieves promising results. △ Less

Submitted 20 April, 2010; originally announced April 2010.

Comments: 20 pages, 3 figures and 8 tables.

ACM Class: I.2.7

arXiv:0905.2990 [pdf, other]

Automatic Summarization System coupled with a Question-Answering System (QAAS)

Authors: Juan-Manuel Torres-Moreno, Pier-Luc St-Onge, Michel Gagnon, Marc El-Bèze, Patrice Bellot

Abstract: To select the most relevant sentences of a document, it uses an optimal decision algorithm that combines several metrics. The metrics processes, weighting and extract pertinence sentences by statistical and informational algorithms. This technique might improve a Question-Answering system, whose function is to provide an exact answer to a question in natural language. In this paper, we present t… ▽ More To select the most relevant sentences of a document, it uses an optimal decision algorithm that combines several metrics. The metrics processes, weighting and extract pertinence sentences by statistical and informational algorithms. This technique might improve a Question-Answering system, whose function is to provide an exact answer to a question in natural language. In this paper, we present the results obtained by coupling the Cortex summarizer with a Question-Answering system (QAAS). Two configurations have been evaluated. In the first one, a low compression level is selected and the summarization system is only used as a noise filter. In the second configuration, the system actually functions as a summarizer, with a very high level of compression. Our results on French corpus demonstrate that the coupling of Automatic Summarization system with a Question-Answering system is promising. Then the system has been adapted to generate a customized summary depending on the specific question. Tests on a french multi-document corpus have been realized, and the personalized QAAS system obtains the best performances. △ Less

Submitted 18 May, 2009; originally announced May 2009.

Comments: 28 pages, 11 figures

Showing 1–9 of 9 results for author: El-Bèze, M