Skip to main content

Showing 1–34 of 34 results for author: Flores, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.06198  [pdf, other

    cs.SI

    Time-dependent Personalized PageRank for temporal networks: discrete and continuous scales

    Authors: David Aleja, Julio Flores, Eva Primo, Miguel Romance

    Abstract: In this paper we explore the PageRank of temporal networks on both discrete and continuous time scales in the presence of personalization vectors that vary over time. Also the underlying interplay between the discrete and continuous settings arising from discretization is highlighted. Additionally, localization results that set bounds to the estimated influence of the personalization vector on the… ▽ More

    Submitted 20 June, 2024; originally announced July 2024.

  2. arXiv:2403.07966  [pdf, other

    cs.LG physics.ao-ph

    Applying ranking techniques for estimating influence of Earth variables on temperature forecast error

    Authors: M. Julia Flores, Melissa Ruiz-Vásquez, Ana Bastos, René Orth

    Abstract: This paper describes how to analyze the influence of Earth system variables on the errors when providing temperature forecasts. The initial framework to get the data has been based on previous research work, which resulted in a very interesting discovery. However, the aforementioned study only worked on individual correlations of the variables with respect to the error. This research work is going… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 17 pages, 19 figures, 5 tables, results SIMD UCLM - BGC Jena collaboration, research stay

  3. arXiv:2403.05788  [pdf, other

    cs.CL cs.AI

    On the Benefits of Fine-Grained Loss Truncation: A Case Study on Factuality in Summarization

    Authors: Lorenzo Jaime Yu Flores, Arman Cohan

    Abstract: Text summarization and simplification are among the most widely used applications of AI. However, models developed for such tasks are often prone to hallucination, which can result from training on unaligned data. One efficient approach to address this issue is Loss Truncation (LT) (Kang and Hashimoto, 2020), an approach to modify the standard log loss to adaptively remove noisy examples during tr… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: EACL 2024

  4. arXiv:2402.02633  [pdf, other

    cs.CL cs.LG

    Predicting Machine Translation Performance on Low-Resource Languages: The Role of Domain Similarity

    Authors: Eric Khiu, Hasti Toossi, David Anugraha, **yu Liu, Jiaxu Li, Juan Armando Parra Flores, Leandro Acros Roman, A. Seza Doğruöz, En-Shiun Annie Lee

    Abstract: Fine-tuning and testing a multilingual large language model is expensive and challenging for low-resource languages (LRLs). While previous studies have predicted the performance of natural language processing (NLP) tasks using machine learning methods, they primarily focus on high-resource languages, overlooking LRLs and shifts across domains. Focusing on LRLs, we investigate three factors: the si… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 13 pages, 5 figures, accepted to EACL 2024, findings

  5. arXiv:2310.11191  [pdf, other

    cs.CL cs.AI

    Medical Text Simplification: Optimizing for Readability with Unlikelihood Training and Reranked Beam Search Decoding

    Authors: Lorenzo Jaime Yu Flores, Heyuan Huang, Kejian Shi, Sophie Chheang, Arman Cohan

    Abstract: Text simplification has emerged as an increasingly useful application of AI for bridging the communication gap in specialized fields such as medicine, where the lexicon is often dominated by technical jargon and complex constructs. Despite notable progress, methods in medical simplification sometimes result in the generated text having lower quality and diversity. In this work, we explore ways to… ▽ More

    Submitted 25 October, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 Findings

  6. arXiv:2310.02062  [pdf, other

    cs.CR cs.SE

    Gotta Catch 'em All: Aggregating CVSS Scores

    Authors: Angel Longueira-Romero, Jose Luis Flores, Rosa Iglesias, Iñaki Garitano

    Abstract: Security metrics are not standardized, but inter-national proposals such as the Common Vulnerability ScoringSystem (CVSS) for quantifying the severity of known vulnerabil-ities are widely used. Many CVSS aggregation mechanisms havebeen proposed in the literature. Nevertheless, factors related tothe context of the System Under Test (SUT) are not taken intoaccount in the aggregation process; vulnera… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: 6 pages, conference

    Journal ref: The Spanish Meeting on Cryptology and Information Security (RECSI), 2022

  7. arXiv:2305.19629  [pdf, other

    cs.DB

    Measuring and Predicting the Quality of a Join for Data Discovery

    Authors: Sergi Nadal, Raquel Panadero, Javier Flores, Oscar Romero

    Abstract: We study the problem of discovering joinable datasets at scale. We approach the problem from a learning perspective relying on profiles. These are succinct representations that capture the underlying characteristics of the schemata and data values of datasets, which can be efficiently extracted in a distributed and parallel fashion. Profiles are then compared, to predict the quality of a join oper… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2012.00890

  8. Clustered Federated Learning Architecture for Network Anomaly Detection in Large Scale Heterogeneous IoT Networks

    Authors: Xabier Sáez-de-Cámara, Jose Luis Flores, Cristóbal Arellano, Aitor Urbieta, Urko Zurutuza

    Abstract: There is a growing trend of cyberattacks against Internet of Things (IoT) devices; moreover, the sophistication and motivation of those attacks is increasing. The vast scale of IoT, diverse hardware and software, and being typically placed in uncontrolled environments make traditional IT security mechanisms such as signature-based intrusion detection and prevention systems challenging to integrate… ▽ More

    Submitted 27 July, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

    Comments: Accepted for publication in Computers & Security

  9. arXiv:2302.02962  [pdf, other

    cs.CL

    LoFT: Enhancing Faithfulness and Diversity for Table-to-Text Generation via Logic Form Control

    Authors: Yilun Zhao, Zhenting Qi, Linyong Nan, Lorenzo Jaime Yu Flores, Dragomir Radev

    Abstract: Logical Table-to-Text (LT2T) generation is tasked with generating logically faithful sentences from tables. There currently exists two challenges in the field: 1) Faithfulness: how to generate sentences that are factually correct given the table content; 2) Diversity: how to generate multiple sentences that offer different perspectives on the table. This work proposes LoFT, which utilizes logic fo… ▽ More

    Submitted 6 February, 2023; originally announced February 2023.

    Comments: Accepted at EACL 2023 as a short paper

  10. arXiv:2301.06985  [pdf, other

    cs.CL physics.soc-ph

    Statistical analysis of word flow among five Indo-European languages

    Authors: Josué Ely Molina, Jorge Flores, Carlos Gershenson, Carlos Pineda

    Abstract: A recent increase in data availability has allowed the possibility to perform different statistical linguistic studies. Here we use the Google Books Ngram dataset to analyze word flow among English, French, German, Italian, and Spanish. We study what we define as ``migrant words'', a type of loanwords that do not change their spelling. We quantify migrant words from one language to another for dif… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

    Comments: 13 pages

  11. arXiv:2211.15217  [pdf, other

    cs.LG cs.RO

    AquaFeL-PSO: A Monitoring System for Water Resources using Autonomous Surface Vehicles based on Multimodal PSO and Federated Learning

    Authors: Micaela Jara Ten Kathen, Princy Johnson, Isabel Jurado Flores, Daniel Guti errez Reina

    Abstract: The preservation, monitoring, and control of water resources has been a major challenge in recent decades. Water resources must be constantly monitored to know the contamination levels of water. To meet this objective, this paper proposes a water monitoring system using autonomous surface vehicles, equipped with water quality sensors, based on a multimodal particle swarm optimization, and the fede… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

    Report number: 00-00

  12. arXiv:2210.02675  [pdf, other

    cs.CL

    Look Ma, Only 400 Samples! Revisiting the Effectiveness of Automatic N-Gram Rule Generation for Spelling Normalization in Filipino

    Authors: Lorenzo Jaime Yu Flores, Dragomir Radev

    Abstract: With 84.75 million Filipinos online, the ability for models to process online text is crucial for develo** Filipino NLP applications. To this end, spelling correction is a crucial preprocessing step for downstream processing. However, the lack of data prevents the use of language models for this task. In this paper, we propose an N-Gram + Damerau Levenshtein distance model with automatic rule ex… ▽ More

    Submitted 5 November, 2022; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: 4 pages, 1 figure, Presented at EMNLP 2022 Third Workshop on Simple and Efficient Natural Language Processing

  13. Gotham Testbed: a Reproducible IoT Testbed for Security Experiments and Dataset Generation

    Authors: Xabier Sáez-de-Cámara, Jose Luis Flores, Cristóbal Arellano, Aitor Urbieta, Urko Zurutuza

    Abstract: The growing adoption of the Internet of Things (IoT) has brought a significant increase in attacks targeting those devices. Machine learning (ML) methods have shown promising results for intrusion detection; however, the scarcity of IoT datasets remains a limiting factor in develo** ML-based security systems for IoT scenarios. Static datasets get outdated due to evolving IoT architectures and th… ▽ More

    Submitted 27 July, 2023; v1 submitted 28 July, 2022; originally announced July 2022.

    Comments: Accepted for publication in IEEE Transactions on Dependable and Secure Computing. Accepted version first online: Feb 22 2023

  14. arXiv:2206.13342  [pdf, other

    cs.CV cs.CL cs.IR cs.LG

    Open Set Classification of Untranscribed Handwritten Documents

    Authors: José Ramón Prieto, Juan José Flores, Enrique Vidal, Alejandro H. Toselli, David Garrido, Carlos Alonso

    Abstract: Huge amounts of digital page images of important manuscripts are preserved in archives worldwide. The amounts are so large that it is generally unfeasible for archivists to adequately tag most of the documents with the required metadata so as to low proper organization of the archives and effective exploration by scholars and the general public. The class or ``typology'' of a document is perhaps t… ▽ More

    Submitted 20 June, 2022; originally announced June 2022.

  15. arXiv:2205.12467  [pdf, other

    cs.CL

    R2D2: Robust Data-to-Text with Replacement Detection

    Authors: Linyong Nan, Lorenzo Jaime Yu Flores, Yilun Zhao, Yixin Liu, Luke Benson, Wei** Zou, Dragomir Radev

    Abstract: Unfaithful text generation is a common problem for text generation systems. In the case of Data-to-Text (D2T) systems, the factuality of the generated text is particularly crucial for any real-world applications. We introduce R2D2, a training framework that addresses unfaithful Data-to-Text generation by training a system both as a generator and a faithfulness discriminator with additional replace… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

  16. arXiv:2201.00912  [pdf, other

    cs.CL

    An Adversarial Benchmark for Fake News Detection Models

    Authors: Lorenzo Jaime Yu Flores, Yiding Hao

    Abstract: With the proliferation of online misinformation, fake news detection has gained importance in the artificial intelligence community. In this paper, we propose an adversarial benchmark that tests the ability of fake news detectors to reason about real-world facts. We formulate adversarial attacks that target three aspects of "understanding": compositional semantics, lexical relations, and sensitivi… ▽ More

    Submitted 3 January, 2022; originally announced January 2022.

    Comments: 6 pages, 2 figures, Presented at AAAI 2022, Workshop on Adversarial Machine Learning and Beyond

    ACM Class: I.2.7

  17. A Novel Model for Vulnerability Analysis through Enhanced Directed Graphs and Quantitative Metrics

    Authors: Ángel Longueira-Romero, Rosa Iglesias, Jose Luis Flores, Iñaki Garitano

    Abstract: Industrial components are of high importance because they control critical infrastructures that form the lifeline of modern societies. However, the rapid evolution of industrial components, together with the new paradigm of Industry 4.0, and the new connectivity features that will be introduced by the 5G technology, all increase the likelihood of security incidents. These incidents are caused by t… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

    Comments: 26 pages, 5 figures, 12 tables

    Journal ref: Sensors 2022, 22(6), 2126

  18. arXiv:2111.13425  [pdf, other

    cs.CR

    Keep It Unbiased: A Comparison Between Estimation of Distribution Algorithms and Deep Learning for Human Interaction-Free Side-Channel Analysis

    Authors: Unai Rioja, Lejla Batina, Igor Armendariz, Jose Luis Flores

    Abstract: Evaluating side-channel analysis (SCA) security is a complex process, involving applying several techniques whose success depends on human engineering. Therefore, it is crucial to avoid a false sense of confidence provided by non-optimal (failing) attacks. Different alternatives have emerged lately trying to mitigate human dependency, among which deep learning (DL) attacks are the most studied tod… ▽ More

    Submitted 26 November, 2021; originally announced November 2021.

  19. Statistical Properties of Rankings in Sports and Games

    Authors: José Antonio Morales, Jorge Flores, Carlos Gershenson, Carlos Pineda

    Abstract: Any collection can be ranked. Sports and games are common examples of ranked systems: players and teams are constantly ranked using different methods. The statistical properties of rankings have been studied for almost a century in a variety of fields. More recently, data availability has allowed us to study rank dynamics: how elements of a ranking change in time. Here, we study the rank distribut… ▽ More

    Submitted 5 November, 2021; originally announced November 2021.

    Comments: 15 pages

    Journal ref: Advances in Complex Systems, 0:2150007, 2021

  20. arXiv:2110.03567  [pdf, other

    cs.CL cs.IR

    GeSERA: General-domain Summary Evaluation by Relevance Analysis

    Authors: Jessica López Espejel, Gaël de Chalendar, Jorge Garcia Flores, Thierry Charnois, Ivan Vladimir Meza Ruiz

    Abstract: We present GeSERA, an open-source improved version of SERA for evaluating automatic extractive and abstractive summaries from the general domain. SERA is based on a search engine that compares candidate and reference summaries (called queries) against an information retrieval document base (called index). SERA was originally designed for the biomedical domain only, where it showed a better correla… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: Accepted in RANLP 2021 conference

  21. arXiv:2012.13225  [pdf, other

    cs.CR

    Auto-tune POIs: Estimation of distribution algorithms for efficient side-channel analysis

    Authors: Unai Rioja, Lejla Batina, Jose Luis Flores, Igor Armendariz

    Abstract: Due to the constant increase and versatility of IoT devices that should keep sensitive information private, Side-Channel Analysis (SCA) attacks on embedded devices are gaining visibility in the industrial field. The integration and validation of countermeasures against SCA can be an expensive and cumbersome process, especially for the less experienced ones, and current certification procedures req… ▽ More

    Submitted 20 January, 2021; v1 submitted 24 December, 2020; originally announced December 2020.

  22. arXiv:2012.00890  [pdf, ps, other

    cs.DB

    Scalable Data Discovery Using Profiles

    Authors: Javier Flores, Sergi Nadal, Oscar Romero

    Abstract: We study the problem of discovering joinable datasets at scale. This is, how to automatically discover pairs of attributes in a massive collection of independent, heterogeneous datasets that can be joined. Exact (e.g., based on distinct values) and hash-based (e.g., based on locality-sensitive hashing) techniques require indexing the entire dataset, which is unattainable at scale. To overcome this… ▽ More

    Submitted 3 December, 2020; v1 submitted 1 December, 2020; originally announced December 2020.

  23. arXiv:2012.00215  [pdf, other

    cs.SI cs.HC

    Audience and Streamer Participation at Scale on Twitch

    Authors: Claudia Flores-Saviaga, Jessica Hammer, Juan Pablo Flores, Joseph Seering, Stuart Reeves, Saiph Savage

    Abstract: Large-scale streaming platforms such as Twitch are becoming increasingly popular, but detailed audience-streamer interaction dynamics remain unexplored at scale. In this paper, we perform a mixed-methods study on a dataset with over 12 million audience chat messages and 45 hours of streaming video to understand audience participation and streamer performance on Twitch. We uncover five types of str… ▽ More

    Submitted 30 November, 2020; originally announced December 2020.

  24. arXiv:2011.13563  [pdf, other

    cs.CY

    Interpretable Poverty Map** using Social Media Data, Satellite Images, and Geospatial Information

    Authors: Chiara Ledesma, Oshean Lee Garonita, Lorenzo Jaime Flores, Isabelle Tingzon, Danielle Dalisay

    Abstract: Access to accurate, granular, and up-to-date poverty data is essential for humanitarian organizations to identify vulnerable areas for poverty alleviation efforts. Recent works have shown success in combining computer vision and satellite imagery for poverty estimation; however, the cost of acquiring high-resolution images coupled with black box models can be a barrier to adoption for many develop… ▽ More

    Submitted 27 November, 2020; originally announced November 2020.

    Comments: Presented at NeurIPS 2020 Workshop on Machine Learning for the Develo** World

    MSC Class: 68T99 ACM Class: K.4.2

  25. arXiv:1911.12618  [pdf, other

    cs.SD cs.IR cs.LG eess.AS

    Machine learning for music genre: multifaceted review and experimentation with audioset

    Authors: Jaime Ramírez, M. Julia Flores

    Abstract: Music genre classification is one of the sub-disciplines of music information retrieval (MIR) with growing popularity among researchers, mainly due to the already open challenges. Although research has been prolific in terms of number of published works, the topic still suffers from a problem in its foundations: there is no clear and formal definition of what genre is. Music categorizations are va… ▽ More

    Submitted 28 November, 2019; originally announced November 2019.

  26. Fast Convolutional Dictionary Learning off the Grid

    Authors: Andrew H. Song, Francisco J. Flores, Demba Ba

    Abstract: Given a continuous-time signal that can be modeled as the superposition of localized, time-shifted events from multiple sources, the goal of Convolutional Dictionary Learning (CDL) is to identify the location of the events--by Convolutional Sparse Coding (CSC)--and learn the template for each source--by Convolutional Dictionary Update (CDU). In practice, because we observe samples of the continuou… ▽ More

    Submitted 21 July, 2019; originally announced July 2019.

    Journal ref: IEEE Transactions on Signal Processing 2020

  27. Cellular morphogenesis of three-dimensional tensegrity structures

    Authors: Omar Aloui, Jessica Flores, David Orden, Landolf Rhode-Barbarigos

    Abstract: The topology and form finding of tensegrity structures have been studied extensively since the introduction of the tensegrity concept. However, most of these studies address topology and form separately, where the former represented a research focus of rigidity theory and graph theory, while the latter attracted the attention of structural engineers. In this paper, a biomimetic approach for the co… ▽ More

    Submitted 15 February, 2019; originally announced February 2019.

    Comments: 31 pages, 17 figures

    Journal ref: Computer Methods in Applied Mechanics and Engineering. Volume 346, 1 April 2019, Pages 85-108

  28. arXiv:1901.02393  [pdf, other

    cs.DS cs.LG

    Fair Algorithms for Clustering

    Authors: Suman K. Bera, Deeparnab Chakrabarty, Nicolas J. Flores, Maryam Negahbani

    Abstract: We study the problem of finding low-cost Fair Clusterings in data where each data point may belong to many protected groups. Our work significantly generalizes the seminal work of Chierichetti et.al. (NIPS 2017) as follows. - We allow the user to specify the parameters that define fair representation. More precisely, these parameters define the maximum over- and minimum under-representation of a… ▽ More

    Submitted 17 June, 2019; v1 submitted 8 January, 2019; originally announced January 2019.

  29. arXiv:1808.10837  [pdf, other

    cs.SI physics.soc-ph

    Diversity, Topology, and the Risk of Node Re-identification in Labeled Social Graphs

    Authors: Sameera Horawalavithana, Clayton Gandy, Juan Arroyo Flores, John Skvoretz, Adriana Iamnitchi

    Abstract: Real network datasets provide significant benefits for understanding phenomena such as information diffusion or network evolution. Yet the privacy risks raised from sharing real graph datasets, even when stripped of user identity information, are significant. When nodes have associated attributes, the privacy risks increase. In this paper we quantitatively study the impact of binary node attribute… ▽ More

    Submitted 31 August, 2018; originally announced August 2018.

    Journal ref: Applied Network Science 2019

  30. arXiv:1709.00927   

    cs.HC

    A Fuzzy Control System for Inductive Video Games

    Authors: Carlos Lara-Alvarez, Hugo Mitre-Hernandez, Juan Flores, Maria Fuentes

    Abstract: It has been shown that the emotional state of students has an important relationship with learning; for instance, engaged concentration is positively correlated with learning. This paper proposes the Inductive Control (IC) for educational games. Unlike conventional approaches that only modify the game level, the proposed technique also induces emotions in the player for supporting the learning pro… ▽ More

    Submitted 15 April, 2018; v1 submitted 4 September, 2017; originally announced September 2017.

    Comments: It needs to be reviewed

  31. Rank diversity of languages: Generic behavior in computational linguistics

    Authors: Germinal Cocho, Jorge Flores, Carlos Gershenson, Carlos Pineda, Sergio Sánchez

    Abstract: Statistical studies of languages have focused on the rank-frequency distribution of words. Instead, we introduce here a measure of how word ranks change in time and call this distribution \emph{rank diversity}. We calculate this diversity for books published in six European languages since 1800, and find that it follows a universal lognormal distribution. Based on the mean and standard deviation a… ▽ More

    Submitted 14 May, 2015; originally announced May 2015.

    Journal ref: PLoS ONE 10(4): e0121898 (2015)

  32. arXiv:1305.7445  [pdf, ps, other

    physics.soc-ph cs.SI

    Eigenvector centrality of nodes in multiplex networks

    Authors: Luis Sola, Miguel Romance, Regino Criado, Julio Flores, Alejandro Garcia del Amo, Stefano Boccaletti

    Abstract: We extend the concept of eigenvector centrality to multiplex networks, and introduce several alternative parameters that quantify the importance of nodes in a multi-layered networked system, including the definition of vectorial-type centralities. In addition, we rigorously show that, under reasonable conditions, such centrality measures exist and are unique. Computer experiments and simulations d… ▽ More

    Submitted 4 September, 2013; v1 submitted 31 May, 2013; originally announced May 2013.

    Journal ref: Chaos 23, 033131 (2013)

  33. arXiv:1212.2456  [pdf

    cs.AI

    Incremental Compilation of Bayesian networks

    Authors: Julia M. Flores, Jose A. Gamez, Kristian G. Olesen

    Abstract: Most methods of exact probability propagation in Bayesian networks do not carry out the inference directly over the network, but over a secondary structure known as a junction tree or a join tree (JT). The process of obtaining a JT is usually termed {sl compilation}. As compilation is usually viewed as a whole process; each time the network is modified, a new compilation process has to be carried… ▽ More

    Submitted 19 October, 2012; originally announced December 2012.

    Comments: Appears in Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI2003)

    Report number: UAI-P-2003-PG-233-240

  34. arXiv:1012.3252  [pdf, ps, other

    physics.soc-ph cond-mat.stat-mech cs.SI math.CO

    A mathematical model for networks with structures in the mesoscale

    Authors: Regino Criado, Julio Flores, Alejandro García del Amo, Jesús Gómez-Gardeñes, Miguel Romance

    Abstract: The new concept of multilevel network is introduced in order to embody some topological properties of complex systems with structures in the mesoscale which are not completely captured by the classical models. This new model, which generalizes the hyper-network and hyper-structure models, fits perfectly with several real-life complex systems, including social and public transportation networks. We… ▽ More

    Submitted 15 December, 2010; originally announced December 2010.

    Comments: 21 pages, 4 figures