Skip to main content

Showing 1–48 of 48 results for author: Rosso, P

.
  1. arXiv:2402.13222  [pdf, other

    cs.CL

    RoCode: A Dataset for Measuring Code Intelligence from Problem Definitions in Romanian

    Authors: Adrian Cosma, Bogdan Iordache, Paolo Rosso

    Abstract: Recently, large language models (LLMs) have become increasingly powerful and have become capable of solving a plethora of tasks through proper instructions in natural language. However, the vast majority of testing suites assume that the instructions are written in English, the de facto prompting language. Code intelligence and problem solving still remain a difficult task, even for the most advan… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted at LREC-COLING 2024

  2. arXiv:2402.01235  [pdf, other

    eess.SP

    QSpeckleFilter: a Quantum Machine Learning approach for SAR speckle filtering

    Authors: Francesco Mauro, Alessandro Sebastianelli, Maria Pia Del Rosso, Paolo Gamba, Silvia Liberata Ullo

    Abstract: The use of Synthetic Aperture Radar (SAR) has greatly advanced our capacity for comprehensive Earth monitoring, providing detailed insights into terrestrial surface use and cover regardless of weather conditions, and at any time of day or night. However, SAR imagery quality is often compromised by speckle, a granular disturbance that poses challenges in producing accurate results without suitable… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: We have submitted this paper to IGARSS 2024

  3. arXiv:2401.02746  [pdf, other

    cs.CV

    Reading Between the Frames: Multi-Modal Depression Detection in Videos from Non-Verbal Cues

    Authors: David Gimeno-Gómez, Ana-Maria Bucur, Adrian Cosma, Carlos-David Martínez-Hinarejos, Paolo Rosso

    Abstract: Depression, a prominent contributor to global disability, affects a substantial portion of the population. Efforts to detect depression from social media texts have been prevalent, yet only a few works explored depression detection from user-generated video content. In this work, we address this research gap by proposing a simple and flexible multi-modal temporal model capable of discerning non-ve… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: Accepted at 46th European Conference on Information Retrieval (ECIR 2024)

  4. arXiv:2312.07228  [pdf

    cs.CL

    Toxic language detection: a systematic review of Arabic datasets

    Authors: Imene Bensalem, Paolo Rosso, Hanane Zitouni

    Abstract: The detection of toxic language in the Arabic language has emerged as an active area of research in recent years, and reviewing the existing datasets employed for training the developed solutions has become a pressing need. This paper offers a comprehensive survey of Arabic datasets focused on online toxic language. We systematically gathered a total of 54 available datasets and their correspondin… ▽ More

    Submitted 29 January, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

  5. arXiv:2311.02025  [pdf, other

    cs.CL

    Vicinal Risk Minimization for Few-Shot Cross-lingual Transfer in Abusive Language Detection

    Authors: Gretel Liz De la Peña Sarracén, Paolo Rosso, Robert Litschko, Goran Glavaš, Simone Paolo Ponzetto

    Abstract: Cross-lingual transfer learning from high-resource to medium and low-resource languages has shown encouraging results. However, the scarcity of resources in target languages remains a challenge. In this work, we resort to data augmentation and continual pre-training for domain adaptation to improve cross-lingual abusive language detection. For data augmentation, we analyze two existing techniques… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    Comments: Accepted at EMNLP 2023 (Main Conference)

  6. arXiv:2309.11285  [pdf, other

    cs.CL cs.AI cs.LG

    Overview of AuTexTification at IberLEF 2023: Detection and Attribution of Machine-Generated Text in Multiple Domains

    Authors: Areg Mikael Sarvazyan, José Ángel González, Marc Franco-Salvador, Francisco Rangel, Berta Chulvi, Paolo Rosso

    Abstract: This paper presents the overview of the AuTexTification shared task as part of the IberLEF 2023 Workshop in Iberian Languages Evaluation Forum, within the framework of the SEPLN 2023 conference. AuTexTification consists of two subtasks: for Subtask 1, participants had to determine whether a text is human-authored or has been generated by a large language model. For Subtask 2, participants had to a… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: Accepted at SEPLN 2023

    Journal ref: Procesamiento del Lenguaje Natural, [S.l.], v. 71, p. 275-288, sep. 2023

  7. arXiv:2307.03377  [pdf, ps, other

    cs.CL cs.LG

    Mitigating Negative Transfer with Task Awareness for Sexism, Hate Speech, and Toxic Language Detection

    Authors: Angel Felipe Magnossão de Paula, Paolo Rosso, Damiano Spina

    Abstract: This paper proposes a novelty approach to mitigate the negative transfer problem. In the field of machine learning, the common strategy is to apply the Single-Task Learning approach in order to train a supervised model to solve a specific task. Training a robust model requires a lot of data and a significant amount of computational resources, making this solution unfeasible in cases where data are… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

    Comments: 8 pages, 2 figures, 5 tables, IJCNN 2023 conference

  8. arXiv:2303.09823  [pdf, other

    cs.CL cs.AI cs.LG

    Transformers and Ensemble methods: A solution for Hate Speech Detection in Arabic languages

    Authors: Angel Felipe Magnossão de Paula, Imene Bensalem, Paolo Rosso, Wajdi Zaghouani

    Abstract: This paper describes our participation in the shared task of hate speech detection, which is one of the subtasks of the CERIST NLP Challenge 2022. Our experiments evaluate the performance of six transformer models and their combination using 2 ensemble approaches. The best results on the training set, in a five-fold cross validation scenario, were obtained by using the ensemble approach based on t… ▽ More

    Submitted 17 March, 2023; originally announced March 2023.

    Comments: 7 pages, 3 tables

  9. arXiv:2301.05494  [pdf, other

    cs.CL cs.IR

    Multilingual Detection of Check-Worthy Claims using World Languages and Adapter Fusion

    Authors: Ipek Baris Schlicht, Lucie Flek, Paolo Rosso

    Abstract: Check-worthiness detection is the task of identifying claims, worthy to be investigated by fact-checkers. Resource scarcity for non-world languages and model learning costs remain major challenges for the creation of models supporting multilingual check-worthiness detection. This paper proposes cross-training adapters on a subset of world languages, combined by adapter fusion, to detect claims eme… ▽ More

    Submitted 13 January, 2023; originally announced January 2023.

    Comments: 17 pages, 11 table. It has been accepted as a full paper at ECIR 2023

  10. arXiv:2301.05453  [pdf, other

    cs.CL

    It's Just a Matter of Time: Detecting Depression with Time-Enriched Multimodal Transformers

    Authors: Ana-Maria Bucur, Adrian Cosma, Paolo Rosso, Liviu P. Dinu

    Abstract: Depression detection from user-generated content on the internet has been a long-lasting topic of interest in the research community, providing valuable screening tools for psychologists. The ubiquitous use of social media platforms lays out the perfect avenue for exploring mental health manifestations in posts and interactions with other users. Current methods for depression detection from social… ▽ More

    Submitted 6 February, 2023; v1 submitted 13 January, 2023; originally announced January 2023.

    Comments: Accepted at ECIR 2023

  11. arXiv:2212.02352  [pdf, ps, other

    cs.CL

    Fake News and Hate Speech: Language in Common

    Authors: Berta Chulvi, Alejandro Toselli, Paolo Rosso

    Abstract: In this paper we raise the research question of whether fake news and hate speech spreaders share common patterns in language. We compute a novel index, the ingroup vs outgroup index, in three different datasets and we show that both phenomena share an "us vs them" narrative.

    Submitted 5 December, 2022; originally announced December 2022.

    Comments: 2 pages

  12. arXiv:2207.12406  [pdf, ps, other

    cs.CL

    UrduFake@FIRE2020: Shared Track on Fake News Identification in Urdu

    Authors: Maaz Amjad, Grigori Sidorov, Alisa Zhila, Alexander Gelbukh, Paolo Rosso

    Abstract: This paper gives the overview of the first shared task at FIRE 2020 on fake news detection in the Urdu language. This is a binary classification task in which the goal is to identify fake news using a dataset composed of 900 annotated news articles for training and 400 news articles for testing. The dataset contains news in five domains: (i) Health, (ii) Sports, (iii) Showbiz, (iv) Technology, and… ▽ More

    Submitted 24 July, 2022; originally announced July 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2207.11893

  13. arXiv:2207.11893  [pdf, other

    cs.CL

    Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2020

    Authors: Maaz Amjad, Grigori Sidorov, Alisa Zhila, Alexander Gelbukh, Paolo Rosso

    Abstract: This overview paper describes the first shared task on fake news detection in Urdu language. The task was posed as a binary classification task, in which the goal is to differentiate between real and fake news. We provided a dataset divided into 900 annotated news articles for training and 400 news articles for testing. The dataset contained news in five domains: (i) Health, (ii) Sports, (iii) Sho… ▽ More

    Submitted 24 July, 2022; originally announced July 2022.

  14. The OpenMP Cluster Programming Model

    Authors: Hervé Yviquel, Marcio Pereira, Emílio Francesquini, Guilherme Valarini, Gustavo Leite, Pedro Rosso, Rodrigo Ceccato, Carla Cusihualpa, Vitoria Dias, Sandro Rigo, Alan Souza, Guido Araujo

    Abstract: Despite the various research initiatives and proposed programming models, efficient solutions for parallel programming in HPC clusters still rely on a complex combination of different programming models (e.g., OpenMP and MPI), languages (e.g., C++ and CUDA), and specialized runtimes (e.g., Charm++ and Legion). On the other hand, task parallelism has shown to be an efficient and seamless programmin… ▽ More

    Submitted 13 August, 2022; v1 submitted 12 July, 2022; originally announced July 2022.

    Comments: 12 pages, 7 figures, 1 listing, to be published in the 51st International Conference on Parallel Processing Workshop Proceedings (ICPP Workshops 22)

    ACM Class: D.4.1; D.3.2

  15. arXiv:2207.00753  [pdf, other

    cs.CL

    An End-to-End Set Transformer for User-Level Classification of Depression and Gambling Disorder

    Authors: Ana-Maria Bucur, Adrian Cosma, Liviu P. Dinu, Paolo Rosso

    Abstract: This work proposes a transformer architecture for user-level classification of gambling addiction and depression that is trainable end-to-end. As opposed to other methods that operate at the post level, we process a set of social media posts from a particular individual, to make use of the interactions between posts and eliminate label noise at the post level. We exploit the fact that, by not inje… ▽ More

    Submitted 2 July, 2022; originally announced July 2022.

  16. arXiv:2206.06320  [pdf, other

    cs.CL cs.AI cs.LG cs.SI q-fin.ST

    Cryptocurrency Bubble Detection: A New Stock Market Dataset, Financial Task & Hyperbolic Models

    Authors: Ramit Sawhney, Shivam Agarwal, Vivek Mittal, Paolo Rosso, Vikram Nanda, Sudheer Chava

    Abstract: The rapid spread of information over social media influences quantitative trading and investments. The growing popularity of speculative trading of highly volatile assets such as cryptocurrencies and meme stocks presents a fresh challenge in the financial realm. Investigating such "bubbles" - periods of sudden anomalous behavior of markets are critical in better understanding investor behavior and… ▽ More

    Submitted 11 May, 2022; originally announced June 2022.

    Comments: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

  17. arXiv:2205.06181  [pdf, other

    cs.SI

    FACTOID: A New Dataset for Identifying Misinformation Spreaders and Political Bias

    Authors: Flora Sakketou, Joan Plepi, Riccardo Cervero, Henri-Jacques Geiss, Paolo Rosso, Lucie Flek

    Abstract: Proactively identifying misinformation spreaders is an important step towards mitigating the impact of fake news on our society. In this paper, we introduce a new contemporary Reddit dataset for fake news spreader analysis, called FACTOID, monitoring political discussions on Reddit since the beginning of 2020. The dataset contains over 4K users with 3.4M Reddit posts, and includes, beyond the user… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Comments: Accepted to LREC 2022

  18. arXiv:2204.10841  [pdf, other

    cs.CL cs.LG

    Detecting early signs of depression in the conversational domain: The role of transfer learning in low-resource scenarios

    Authors: Petr Lorenc, Ana-Sabina Uban, Paolo Rosso, Jan Šedivý

    Abstract: The high prevalence of depression in society has given rise to the need for new digital tools to assist in its early detection. To this end, existing research has mainly focused on detecting depression in the domain of social media, where there is a sufficient amount of data. However, with the rise of conversational agents like Siri or Alexa, the conversational domain is becoming more critical. Un… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

    Comments: Accepted to The 27th International Conference on Natural Language & Information Systems (NLDB) 2022

  19. arXiv:2204.09481  [pdf, other

    cs.CL cs.LG

    Unsupervised Ranking and Aggregation of Label Descriptions for Zero-Shot Classifiers

    Authors: Angelo Basile, Marc Franco-Salvador, Paolo Rosso

    Abstract: Zero-shot text classifiers based on label descriptions embed an input text and a set of labels into the same space: measures such as cosine similarity can then be used to select the most similar label description to the input text as the predicted label. In a true zero-shot setup, designing good label descriptions is challenging because no development set is available. Inspired by the literature o… ▽ More

    Submitted 24 May, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

    Comments: 6 pages, 2 figures

    MSC Class: I.2.7

  20. arXiv:2112.06080  [pdf, other

    cs.IR cs.AI

    UPV at TREC Health Misinformation Track 2021 Ranking with SBERT and Quality Estimators

    Authors: Ipek Baris Schlicht, Angel Felipe Magnossão de Paula, Paolo Rosso

    Abstract: Health misinformation on search engines is a significant problem that could negatively affect individuals or public health. To mitigate the problem, TREC organizes a health misinformation track. This paper presents our submissions to this track. We use a BM25 and a domain-specific semantic search engine for retrieving initial documents. Later, we examine a health news schema for quality assessment… ▽ More

    Submitted 11 December, 2021; originally announced December 2021.

    Comments: 6 pages; presented at the TREC 2021

  21. arXiv:2109.09232  [pdf, other

    cs.CL cs.LG

    UPV at CheckThat! 2021: Mitigating Cultural Differences for Identifying Multilingual Check-worthy Claims

    Authors: Ipek Baris Schlicht, Angel Felipe Magnossão de Paula, Paolo Rosso

    Abstract: Identifying check-worthy claims is often the first step of automated fact-checking systems. Tackling this task in a multilingual setting has been understudied. Encoding inputs with multilingual text representations could be one approach to solve the multilingual check-worthiness detection. However, this approach could suffer if cultural bias exists within the communities on determining what is che… ▽ More

    Submitted 19 September, 2021; originally announced September 2021.

    Comments: 11 pages, 2 figures. Link to the original paper: http://ceur-ws.org/Vol-2936/paper-36.pdf

    ACM Class: I.7; J.4

    Journal ref: published at CLEF 2021

  22. Studying Fake News Spreading, Polarisation Dynamics, and Manipulation by Bots: a Tale of Networks and Language

    Authors: Giancarlo Ruffo, Alfonso Semeraro, Anastasia Giachanou, Paolo Rosso

    Abstract: With the explosive growth of online social media, the ancient problem of information disorders interfering with news diffusion has surfaced with a renewed intensity threatening our democracies, public health, and news outlets' credibility. Therefore, thousands of scientific papers have been published in a relatively short period, making researchers of different disciplines struggle with an informa… ▽ More

    Submitted 14 January, 2023; v1 submitted 13 September, 2021; originally announced September 2021.

    Comments: 43 pages, 9 figures

    ACM Class: A.1; J.4; G.2; K.4; I.2.7

    Journal ref: Computer Science Review, Volume 47, 2023, 100531, ISSN 1574-0137

  23. arXiv:2106.15281  [pdf, other

    cs.CV cs.LG eess.IV

    On Board Volcanic Eruption Detection through CNNs and Satellite Multispectral Imagery

    Authors: Maria Pia Del Rosso, Alessandro Sebastianelli, Dario Spiller, Pierre Philippe Mathieu, Silvia Liberata Ullo

    Abstract: In recent years, the growth of Machine Learning (ML) algorithms has raised the number of studies including their applicability in a variety of different scenarios. Among all, one of the hardest ones is the aerospace, due to its peculiar physical requirements. In this context, a feasibility study and a first prototype for an Artificial Intelligence (AI) model to be deployed on board satellites are… ▽ More

    Submitted 28 July, 2021; v1 submitted 29 June, 2021; originally announced June 2021.

  24. arXiv:2106.12226  [pdf, other

    cs.CV cs.AI eess.IV

    Spatio-Temporal SAR-Optical Data Fusion for Cloud Removal via a Deep Hierarchical Model

    Authors: Alessandro Sebastianelli, Artur Nowakowski, Erika Puglisi, Maria Pia Del Rosso, Jamila Mifdal, Fiora Pirri, Pierre Philippe Mathieu, Silvia Liberata Ullo

    Abstract: Cloud removal is a relevant topic in Remote Sensing as it fosters the usability of high-resolution optical images for Earth monitoring and study. Related techniques have been analyzed for years with a progressively clearer view of the appropriate methods to adopt, from multi-spectral to inpainting methods. Recent applications of deep generative models and sequence-to-sequence-based models have pro… ▽ More

    Submitted 28 March, 2022; v1 submitted 23 June, 2021; originally announced June 2021.

  25. arXiv:2106.11056  [pdf, other

    cs.LG cs.AI cs.CV eess.IV

    Paradigm selection for Data Fusion of SAR and Multispectral Sentinel data applied to Land-Cover Classification

    Authors: Alessandro Sebastianelli, Maria Pia Del Rosso, Pierre Philippe Mathieu, Silvia Liberata Ullo

    Abstract: Data fusion is a well-known technique, becoming more and more popular in the Artificial Intelligence for Earth Observation (AI4EO) domain mainly due to its ability of reinforcing AI4EO applications by combining multiple data sources and thus bringing better results. On the other hand, like other methods for satellite data analysis, data fusion itself is also benefiting and evolving thanks to the i… ▽ More

    Submitted 18 June, 2021; originally announced June 2021.

    Comments: This work has been submitted to the IEEE Geoscience and Remote Sensing Letters for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  26. arXiv:2104.09350  [pdf, other

    cs.AI eess.IV

    A speckle filter for Sentinel-1 SAR Ground Range Detected data based on Residual Convolutional Neural Networks

    Authors: Alessandro Sebastianelli, Maria Pia Del Rosso, Silvia Liberata Ullo, Paolo Gamba

    Abstract: In recent years, machine learning (ML) algorithms have become widespread in all the fields of remote sensing (RS) and earth observation (EO). This has allowed the rapid development of new procedures to solve problems affecting these sectors. In this context, this work aims at presenting a novel method for filtering speckle noise from Sentinel-1 ground range detected (GRD) data by applying deep lea… ▽ More

    Submitted 17 May, 2022; v1 submitted 19 April, 2021; originally announced April 2021.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  27. arXiv:2101.09810  [pdf, other

    cs.CL

    FakeFlow: Fake News Detection by Modeling the Flow of Affective Information

    Authors: Bilal Ghanem, Simone Paolo Ponzetto, Paolo Rosso, Francisco Rangel

    Abstract: Fake news articles often stir the readers' attention by means of emotional appeals that arouse their feelings. Unlike in short news texts, authors of longer articles can exploit such affective factors to manipulate readers by adding exaggerations or fabricating events, in order to affect the readers' emotions. To capture this, we propose in this paper to model the flow of affective information in… ▽ More

    Submitted 24 January, 2021; originally announced January 2021.

    Comments: 9 pages, 6 figures, EACL-2021

  28. arXiv:2101.07598  [pdf, other

    stat.ML cs.LG

    Analysis and tuning of hierarchical topic models based on Renyi entropy approach

    Authors: Sergei Koltcov, Vera Ignatenko, Maxim Terpilovskii, Paolo Rosso

    Abstract: Hierarchical topic modeling is a potentially powerful instrument for determining the topical structure of text collections that allows constructing a topical hierarchy representing levels of topical abstraction. However, tuning of parameters of hierarchical models, including the number of topics on each hierarchical level, remains a challenging task and an open issue. In this paper, we propose a R… ▽ More

    Submitted 19 January, 2021; originally announced January 2021.

  29. arXiv:2011.05706  [pdf, ps, other

    cs.CL

    Multilingual Irony Detection with Dependency Syntax and Neural Models

    Authors: Alessandra Teresa Cignarella, Valerio Basile, Manuela Sanguinetti, Cristina Bosco, Paolo Rosso, Farah Benamara

    Abstract: This paper presents an in-depth investigation of the effectiveness of dependency-based syntactic features on the irony detection task in a multilingual perspective (English, Spanish, French and Italian). It focuses on the contribution from syntactic knowledge, exploiting linguistic resources where syntax is annotated according to the Universal Dependencies scheme. Three distinct experimental setti… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: long paper accepted at COLING 2020

  30. Classifier Combination Approach for Question Classification for Bengali Question Answering System

    Authors: Somnath Banerjee, Sudip Kumar Naskar, Paolo Rosso, Sivaji Bandyopadhyay

    Abstract: Question classification (QC) is a prime constituent of automated question answering system. The work presented here demonstrates that the combination of multiple models achieve better classification performance than those obtained with existing individual models for the question classification task in Bengali. We have exploited state-of-the-art multiple model combination techniques, i.e., ensemble… ▽ More

    Submitted 6 September, 2020; v1 submitted 31 August, 2020; originally announced August 2020.

    Comments: 16 pages, to be published in Sadhana

    Journal ref: Sadhana, Springer, 2019

  31. arXiv:2008.13173  [pdf, other

    cs.CL cs.AI

    LIMSI_UPV at SemEval-2020 Task 9: Recurrent Convolutional Neural Network for Code-mixed Sentiment Analysis

    Authors: Somnath Banerjee, Sahar Ghannay, Sophie Rosset, Anne Vilnat, Paolo Rosso

    Abstract: This paper describes the participation of LIMSI UPV team in SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text. The proposed approach competed in SentiMix Hindi-English subtask, that addresses the problem of predicting the sentiment of a given Hindi-English code-mixed tweet. We propose Recurrent Convolutional Neural Network that combines both the recurrent neural network and… ▽ More

    Submitted 30 August, 2020; originally announced August 2020.

    Comments: To be published in the Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval-2020), Barcelona, Spain, Sep. Association for Computational Linguistics

  32. arXiv:2008.01578  [pdf, other

    eess.IV

    Automatic Dataset Builder for Machine Learning Applications to Satellite Imagery

    Authors: Alessandro Sebastianelli, Maria Pia Del Rosso, Silvia Liberata Ullo

    Abstract: Nowadays the use of Machine Learning (ML) algorithms is spreading in the field of Remote Sensing, with applications ranging from detection and classification of land use and monitoring to the prediction of many natural or anthropic phenomena of interest. One main limit of their employment is related to the need for a huge amount of data for training the neural network, chosen for the specific appl… ▽ More

    Submitted 4 August, 2020; originally announced August 2020.

  33. #Brexit: Leave or Remain? The Role of User's Community and Diachronic Evolution on Stance Detection

    Authors: Mirko Lai, Viviana Patti, Giancarlo Ruffo, Paolo Rosso

    Abstract: Interest has grown around the classification of stance that users assume within online debates in recent years. Stance has been usually addressed by considering users posts in isolation, while social studies highlight that social communities may contribute to influence users' opinion. Furthermore, stance should be studied in a diachronic perspective, since it could help to shed light on users' opi… ▽ More

    Submitted 29 July, 2020; originally announced July 2020.

    Comments: To appear in Journal of Intelligent & Fuzzy Systems

  34. arXiv:2004.09501  [pdf, other

    eess.IV

    Application of DInSAR Technique to High Coherence Satellite Images for Strategic Infrastructure Monitoring

    Authors: Tony De Corso, Luca Mignone, Alessandro Sebastianelli, Maria Pia Del Rosso, Claire Yost, Elena Ciampa, Marisa Pecce, Stefania Sica, Silvia Ullo

    Abstract: In this paper the authors present and validate a procedure, which intends to combine the latest state of the art models in bridge monitoring with freely available satellite data. Through the Differential SAR interferometry (DinSAR) technique, a dataset of displacements for the Morandi bridge in Genoa (Italy), before its collapse, has been created, by using images downloaded by the Copernicus Open-… ▽ More

    Submitted 19 April, 2020; originally announced April 2020.

    Journal ref: IGARSS 2020 IEEE International Geoscience Remote Sensing Symposium

  35. arXiv:2002.02427  [pdf, ps, other

    cs.CL cs.AI

    Irony Detection in a Multilingual Context

    Authors: Bilal Ghanem, Jihen Karoui, Farah Benamara, Paolo Rosso, Véronique Moriceau

    Abstract: This paper proposes the first multilingual (French, English and Arabic) and multicultural (Indo-European languages vs. less culturally close languages) irony detection system. We employ both feature-based models and neural architectures using monolingual word representation. We compare the performance of these systems with state-of-the-art systems to identify their capabilities. We show that these… ▽ More

    Submitted 6 February, 2020; originally announced February 2020.

  36. arXiv:1910.14011  [pdf, other

    cs.SE

    Stryker: Scaling Specification-Based Program Repair by Pruning Infeasible Mutants with SAT

    Authors: Luciano Zemín, Simón Gutiérrez Brida, Santiago Bermúdez, Santiago Perez De Rosso, Nazareno Aguirre, Ali Mili, Ali Jaoua, Marcelo F. Frias

    Abstract: Many techniques for automated program repair involve syntactic program transformations. Applying combinations of such transformations on faulty code yields fix candidates whose correctness must be determined. Exploring these combinations leads to an explosion on the number of generated fix candidates that severely limits the applicability of such fault repair techniques. This explosion is most tim… ▽ More

    Submitted 30 October, 2019; originally announced October 2019.

    MSC Class: 68Q60

  37. arXiv:1910.06592  [pdf, other

    cs.CL cs.SI

    FacTweet: Profiling Fake News Twitter Accounts

    Authors: Bilal Ghanem, Simone Paolo Ponzetto, Paolo Rosso

    Abstract: We present an approach to detect fake news in Twitter at the account level using a neural recurrent model and a variety of different semantic and stylistic features. Our method extracts a set of features from the timelines of news Twitter accounts by reading their posts as chunks, rather than dealing with each tweet independently. We show the experimental benefits of modeling latent stylistic sign… ▽ More

    Submitted 15 October, 2019; originally announced October 2019.

    Comments: 6 pages

  38. arXiv:1910.01340  [pdf, other

    cs.CL cs.LG cs.SI

    TexTrolls: Identifying Russian Trolls on Twitter from a Textual Perspective

    Authors: Bilal Ghanem, Davide Buscaldi, Paolo Rosso

    Abstract: The online new emerging suspicious users, that usually are called trolls, are one of the main sources of hate, fake, and deceptive online messages. Some agendas are utilizing these harmful users to spread incitement tweets, and as a consequence, the audience get deceived. The challenge in detecting such accounts is that they conceal their identities which make them disguised in social media, addin… ▽ More

    Submitted 3 October, 2019; originally announced October 2019.

    Comments: 15 pages

  39. arXiv:1908.09951  [pdf, other

    cs.CL cs.IR cs.SI

    An Emotional Analysis of False Information in Social Media and News Articles

    Authors: Bilal Ghanem, Paolo Rosso, Francisco Rangel

    Abstract: Fake news is risky since it has been created to manipulate the readers' opinions and beliefs. In this work, we compared the language of false news to the real one of real news from an emotional perspective, considering a set of false information types (propaganda, hoax, clickbait, and satire) from social media and online news articles sources. Our experiments showed that false information has diff… ▽ More

    Submitted 26 August, 2019; originally announced August 2019.

  40. arXiv:1906.06151  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    Landslide Geohazard Assessment With Convolutional Neural Networks Using Sentinel-2 Imagery Data

    Authors: Silvia L. Ullo, Maximillian S. Langenkamp, Tuomas P. Oikarinen, Maria P. Del Rosso, Alessandro Sebastianelli, Federica Piccirillo, Stefania Sica

    Abstract: In this paper, the authors aim to combine the latest state of the art models in image recognition with the best publicly available satellite images to create a system for landslide risk mitigation. We focus first on landslide detection and further propose a similar system to be used for prediction. Such models are valuable as they could easily be scaled up to provide data for hazard evaluation, as… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

    Comments: 4 pages, 3 figures, 1 table, accepted to 2019 IEEE IGARSS Conference that will be held in Japan next July

  41. arXiv:1906.04836  [pdf, other

    cs.CL

    Unmasking Bias in News

    Authors: Javier Sánchez-Junquera, Paolo Rosso, Manuel Montes-y-Gómez, Simone Paolo Ponzetto

    Abstract: We present experiments on detecting hyperpartisanship in news using a 'masking' method that allows us to assess the role of style vs. content for the task at hand. Our results corroborate previous research on this task in that topic related features yield better results than stylistic ones. We additionally show that competitive results can be achieved by simply including higher-length n-grams, whi… ▽ More

    Submitted 11 June, 2019; originally announced June 2019.

  42. arXiv:1811.03091  [pdf

    physics.ins-det physics.optics

    Low-dispersion low-loss dielectric gratings for efficient ultrafast laser pulse compression at high average powers

    Authors: David A. Alessi, Hoang T. Nguyen, Jerald A. Britten, Paul A. Rosso, Constantin Haefner

    Abstract: We have developed low-dispersion (1480 l/mm), resonance-free, diffraction gratings made of dielectric materials resistant to femtosecond laser damage $(SiO_{2}/HfO_{2})$. A 14 cm diameter sample was fabricated resulting in a mean diffraction efficiency of 99.1% at λ = 810 nm with 0.4% uniformity using equipment which can fabricate gratings up to 1m diagonal. The implementation of these gratings in… ▽ More

    Submitted 6 November, 2018; originally announced November 2018.

  43. arXiv:1807.11584  [pdf, ps, other

    cs.CL

    UH-PRHLT at SemEval-2016 Task 3: Combining Lexical and Semantic-based Features for Community Question Answering

    Authors: Marc Franco-Salvador, Sudipta Kar, Thamar Solorio, Paolo Rosso

    Abstract: In this work we describe the system built for the three English subtasks of the SemEval 2016 Task 3 by the Department of Computer Science of the University of Houston (UH) and the Pattern Recognition and Human Language Technology (PRHLT) research center - Universitat Polit`ecnica de Val`encia: UH-PRHLT. Our system represents instances by using both lexical and semantic-based similarity measures be… ▽ More

    Submitted 30 July, 2018; originally announced July 2018.

    Comments: Top system for question-question similarity in SemEval 2016 Task 3

  44. Semantically-informed distance and similarity measures for paraphrase plagiarism identification

    Authors: Miguel A. Álvarez-Carmona, Marc Franco-Salvador, Esaú Villatoro-Tello, Manuel Montes-y-Gómez, Paolo Rosso, Luis Villaseñor-Pineda

    Abstract: Paraphrase plagiarism identification represents a very complex task given that plagiarized texts are intentionally modified through several rewording techniques. Accordingly, this paper introduces two new measures for evaluating the relatedness of two given texts: a semantically-informed similarity measure and a semantically-informed edit distance. Both measures are able to extract semantic inform… ▽ More

    Submitted 29 May, 2018; originally announced May 2018.

    Journal ref: Journal of Intelligent & Fuzzy Systems, vol. 34, no. 5, pp. 2983-2990, 2018

  45. arXiv:1801.06436  [pdf, other

    cs.CL

    A Resource-Light Method for Cross-Lingual Semantic Textual Similarity

    Authors: Goran Glavaš, Marc Franco-Salvador, Simone Paolo Ponzetto, Paolo Rosso

    Abstract: Recognizing semantically similar sentences or paragraphs across languages is beneficial for many tasks, ranging from cross-lingual information retrieval and plagiarism detection to machine translation. Recently proposed methods for predicting cross-lingual semantic similarity of short texts, however, make use of tools and resources (e.g., machine translation systems, syntactic parsers or named ent… ▽ More

    Submitted 19 January, 2018; originally announced January 2018.

    Comments: Accepted for publication in Knowledge-Based Systems journal

  46. arXiv:1705.10754  [pdf, other

    cs.CL

    A Low Dimensionality Representation for Language Variety Identification

    Authors: Francisco Rangel, Marc Franco-Salvador, Paolo Rosso

    Abstract: Language variety identification aims at labelling texts in a native language (e.g. Spanish, Portuguese, English) with its specific variation (e.g. Argentina, Chile, Mexico, Peru, Spain; Brazil, Portugal; UK, US). In this work we propose a low dimensionality representation (LDR) to address this task with five different varieties of Spanish: Argentina, Chile, Mexico, Peru and Spain. We compare our L… ▽ More

    Submitted 30 May, 2017; originally announced May 2017.

    Journal ref: CICLing - Computational Linguistics and Intelligent Text Processing, 2016

  47. Friends and Enemies of Clinton and Trump: Using Context for Detecting Stance in Political Tweets

    Authors: Mirko Lai, Delia Irazú Hernández Farías, Viviana Patti, Paolo Rosso

    Abstract: Stance detection, the task of identifying the speaker's opinion towards a particular target, has attracted the attention of researchers. This paper describes a novel approach for detecting stance in Twitter. We define a set of features in order to consider the context surrounding a target of interest with the final aim of training a model for predicting the stance towards the mentioned targets. In… ▽ More

    Submitted 26 February, 2017; originally announced February 2017.

    Comments: To appear in MICAI 2016 LNAI Proceedings

  48. arXiv:1402.3070  [pdf, other

    cs.IR cs.LG stat.ML

    Squeezing bottlenecks: exploring the limits of autoencoder semantic representation capabilities

    Authors: Parth Gupta, Rafael E. Banchs, Paolo Rosso

    Abstract: We present a comprehensive study on the use of autoencoders for modelling text data, in which (differently from previous studies) we focus our attention on the following issues: i) we explore the suitability of two different models bDA and rsDA for constructing deep autoencoders for text data at the sentence level; ii) we propose and evaluate two novel metrics for better assessing the text-reconst… ▽ More

    Submitted 13 February, 2014; originally announced February 2014.