Skip to main content

Showing 1–26 of 26 results for author: Baeza-Yates, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15386  [pdf, other

    cs.CY cs.AI

    U Can't Gen This? A Survey of Intellectual Property Protection Methods for Data in Generative AI

    Authors: Tanja Šarčević, Alicja Karlowicz, Rudolf Mayer, Ricardo Baeza-Yates, Andreas Rauber

    Abstract: Large Generative AI (GAI) models have the unparalleled ability to generate text, images, audio, and other forms of media that are increasingly indistinguishable from human-generated content. As these models often train on publicly available data, including copyrighted materials, art and other creative works, they inadvertently risk violating copyright and misappropriation of intellectual property… ▽ More

    Submitted 22 April, 2024; originally announced June 2024.

  2. arXiv:2405.12312  [pdf, other

    cs.LG cs.CY

    A Principled Approach for a New Bias Measure

    Authors: Bruno Scarone, Alfredo Viola, Ricardo Baeza-Yates

    Abstract: The widespread use of machine learning and data-driven algorithms for decision making has been steadily increasing over many years. The areas in which this is happening are diverse: healthcare, employment, finance, education, the legal system to name a few; and the associated negative side effects are being increasingly harmful for society. Negative data \emph{bias} is one of those, which tends to… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  3. arXiv:2403.00148  [pdf, ps, other

    cs.HC

    Implications of Regulations on the Use of AI and Generative AI for Human-Centered Responsible Artificial Intelligence

    Authors: Marios Constantinides, Mohammad Tahaei, Daniele Quercia, Simone Stumpf, Michael Madaio, Sean Kennedy, Lauren Wilcox, Jessica Vitak, Henriette Cramer, Edyta Bogucka, Ricardo Baeza-Yates, Ewa Luger, Jess Holbrook, Michael Muller, Ilana Golbin Blumenfeld, Giada Pistilli

    Abstract: With the upcoming AI regulations (e.g., EU AI Act) and rapid advancements in generative AI, new challenges emerge in the area of Human-Centered Responsible Artificial Intelligence (HCR-AI). As AI becomes more ubiquitous, questions around decision-making authority, human oversight, accountability, sustainability, and the ethical and legal responsibilities of AI and their creators become paramount.… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

    Comments: 6 pages

  4. arXiv:2306.13723  [pdf, other

    cs.AI

    Human-AI Coevolution

    Authors: Dino Pedreschi, Luca Pappalardo, Emanuele Ferragina, Ricardo Baeza-Yates, Albert-Laszlo Barabasi, Frank Dignum, Virginia Dignum, Tina Eliassi-Rad, Fosca Giannotti, Janos Kertesz, Alistair Knott, Yannis Ioannidis, Paul Lukowicz, Andrea Passarella, Alex Sandy Pentland, John Shawe-Taylor, Alessandro Vespignani

    Abstract: Human-AI coevolution, defined as a process in which humans and AI algorithms continuously influence each other, increasingly characterises our society, but is understudied in artificial intelligence and complexity science literature. Recommender systems and assistants play a prominent role in human-AI coevolution, as they permeate many facets of daily life and influence human choices on online pla… ▽ More

    Submitted 3 May, 2024; v1 submitted 23 June, 2023; originally announced June 2023.

  5. arXiv:2306.01650  [pdf, other

    cs.LG

    Fair multilingual vandalism detection system for Wikipedia

    Authors: Mykola Trokhymovych, Muniza Aslam, Ai-Jou Chou, Ricardo Baeza-Yates, Diego Saez-Trumper

    Abstract: This paper presents a novel design of the system aimed at supporting the Wikipedia community in addressing vandalism on the platform. To achieve this, we collected a massive dataset of 47 languages, and applied advanced filtering and feature engineering techniques, including multilingual masked language modeling to build the training dataset from human-generated data. The performance of the system… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

  6. arXiv:2303.15592  [pdf, other

    cs.CY cs.LG

    Uncovering Bias in Personal Informatics

    Authors: Sofia Yfantidou, Pavlos Sermpezis, Athena Vakali, Ricardo Baeza-Yates

    Abstract: Personal informatics (PI) systems, powered by smartphones and wearables, enable people to lead healthier lifestyles by providing meaningful and actionable insights that break down barriers between users and their health information. Today, such systems are used by billions of users for monitoring not only physical activity and sleep but also vital signs and women's and heart health, among others.… ▽ More

    Submitted 19 July, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Report number: Volume: 7 Number: 3, Article: 139

    Journal ref: IMWUT 2023

  7. Human-Centered Responsible Artificial Intelligence: Current & Future Trends

    Authors: Mohammad Tahaei, Marios Constantinides, Daniele Quercia, Sean Kennedy, Michael Muller, Simone Stumpf, Q. Vera Liao, Ricardo Baeza-Yates, Lora Aroyo, Jess Holbrook, Ewa Luger, Michael Madaio, Ilana Golbin Blumenfeld, Maria De-Arteaga, Jessica Vitak, Alexandra Olteanu

    Abstract: In recent years, the CHI community has seen significant growth in research on Human-Centered Responsible Artificial Intelligence. While different research communities may use different terminology to discuss similar topics, all of this work is ultimately aimed at develo** AI that benefits humanity while being grounded in human rights and ethics, and reducing the potential harms of AI. In this sp… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

    Comments: To appear in Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems

  8. arXiv:2203.04135  [pdf, other

    cs.SI cs.CY

    Bots don't Vote, but They Surely Bother! A Study of Anomalous Accounts in a National Referendum

    Authors: Eduardo Graells-Garrido, Ricardo Baeza-Yates

    Abstract: The Web contains several social media platforms for discussion, exchange of ideas, and content publishing. These platforms are used by people, but also by distributed agents known as bots. Although bots have existed for decades, with many of them being benevolent, their influence in propagating and generating deceptive information in the last years has increased. Here we present a characterization… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

    Comments: 5 pages, 9 figures

  9. arXiv:2011.05353  [pdf, other

    cs.DS cs.SI

    Adaptive Community Search in Dynamic Networks

    Authors: Ioanna Tsalouchidou, Francesco Bonchi, Ricardo Baeza-Yates

    Abstract: Community search is a well-studied problem which, given a static graph and a query set of vertices, requires to find a cohesive (or dense) subgraph containing the query vertices. In this paper we study the problem of community search in temporal dynamic networks. We adapt to the temporal setting the notion of \emph{network inefficiency} which is based on the pairwise shortest-path distance among a… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

    Comments: IEEE BigData 2020

  10. Every Colour You Are: Stance Prediction and Turnaround in Controversial Issues

    Authors: Eduardo Graells-Garrido, Ricardo Baeza-Yates, Mounia Lalmas

    Abstract: Web platforms have allowed political manifestation and debate for decades. Technology changes have brought new opportunities for expression, and the availability of longitudinal data of these debates entice new questions regarding who participates, and who updates their opinion. The aim of this work is to provide a methodology to measure these phenomena, and to test this methodology on a specific… ▽ More

    Submitted 19 May, 2020; originally announced May 2020.

    Comments: Accepted at WebSci'20

  11. Predicting risk of dyslexia with an online gamified test

    Authors: Luz Rello, Ricardo Baeza-Yates, Abdullah Ali, Jeffrey P. Bigham, Miquel Serra

    Abstract: Dyslexia is a specific learning disorder related to school failure. Detection is both crucial and challenging, especially in languages with transparent orthographies, such as Spanish. To make detecting dyslexia easier, we designed an online gamified test and a predictive machine learning model. In a study with more than 3,600 participants, our model correctly detected over 80% of the participants… ▽ More

    Submitted 9 December, 2019; v1 submitted 7 June, 2019; originally announced June 2019.

  12. arXiv:1807.07162  [pdf, other

    cs.SI

    What kind of content are you prone to tweet? Multi-topic Preference Model for Tweeters

    Authors: Lorena Recalde, Ricardo Baeza-Yates

    Abstract: According to tastes, a person could show preference for a given category of content to a greater or lesser extent. However, quantifying people's amount of interest in a certain topic is a challenging task, especially considering the massive digital information they are exposed to. For example, in the context of Twitter, aligned with his/her preferences a user may tweet and retweet more about techn… ▽ More

    Submitted 18 July, 2018; originally announced July 2018.

    Comments: 16 pages, 4 figures, Workshop on Social Aspects in Personalization and Search (SOAPS 2018), collocated with ECIR 2018, Apr 26, Grenoble, France

  13. arXiv:1711.02295  [pdf, other

    cs.IR cs.CL cs.LG

    Quality-Efficiency Trade-offs in Machine Learning for Text Processing

    Authors: Ricardo Baeza-Yates, Zeinab Liaghat

    Abstract: Data mining, machine learning, and natural language processing are powerful techniques that can be used together to extract information from large texts. Depending on the task or problem at hand, there are many different approaches that can be used. The methods available are continuously being optimized, but not all these methods have been tested and compared in a set of problems that can be solve… ▽ More

    Submitted 7 November, 2017; originally announced November 2017.

    Comments: Ten pages, long version of paper that will be presented at IEEE Big Data 2017 (8 pages)

  14. Detection of Trending Topic Communities: Bridging Content Creators and Distributors

    Authors: Lorena Recalde, David F. Nettleton, Ricardo Baeza-Yates, Ludovico Boratto

    Abstract: The rise of a trending topic on Twitter or Facebook leads to the temporal emergence of a set of users currently interested in that topic. Given the temporary nature of the links between these users, being able to dynamically identify communities of users related to this trending topic would allow for a rapid spread of information. Indeed, individual users inside a community might receive recommend… ▽ More

    Submitted 27 July, 2017; originally announced July 2017.

    Comments: 9 pages, 4 figures, 2 tables, Hypertext 2017 conference

  15. FA*IR: A Fair Top-k Ranking Algorithm

    Authors: Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, Ricardo Baeza-Yates

    Abstract: In this work, we define and solve the Fair Top-k Ranking problem, in which we want to determine a subset of k candidates from a large pool of n >> k candidates, maximizing utility (i.e., select the "best" candidates) subject to group fairness criteria. Our ranked group fairness definition extends group fairness using the standard notion of protected groups and is based on ensuring that the proport… ▽ More

    Submitted 2 July, 2018; v1 submitted 20 June, 2017; originally announced June 2017.

    Comments: In Proceedings of the 26th ACM International Conference on Information and Knowledge Management (CIKM'17). This version corrects an error on Table 4

    ACM Class: H.3.3; J.1

  16. arXiv:1607.01869  [pdf, other

    cs.IR cs.AI cs.CL

    Scalable Semantic Matching of Queries to Ads in Sponsored Search Advertising

    Authors: Mihajlo Grbovic, Nemanja Djuric, Vladan Radosavljevic, Fabrizio Silvestri, Ricardo Baeza-Yates, Andrew Feng, Erik Ordentlich, Lee Yang, Gavin Owens

    Abstract: Sponsored search represents a major source of revenue for web search engines. This popular advertising model brings a unique possibility for advertisers to target users' immediate intent communicated through a search query, usually by displaying their ads alongside organic search results for queries deemed relevant to their products or services. However, due to a large number of unique queries it… ▽ More

    Submitted 6 July, 2016; originally announced July 2016.

    Comments: 10 pages, 4 figures, 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Italy

    Journal ref: 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Italy

  17. arXiv:1604.06481  [pdf, other

    cs.CV cs.HC

    Visual Congruent Ads for Image Search

    Authors: Yannis Kalantidis, Ayman Farahat, Lyndon Kennedy, Ricardo Baeza-Yates, David A. Shamma

    Abstract: The quality of user experience online is affected by the relevance and placement of advertisements. We propose a new system for selecting and displaying visual advertisements in image search result sets. Our method compares the visual similarity of candidate ads to the image search results and selects the most visually similar ad to be displayed. The method further selects an appropriate location… ▽ More

    Submitted 21 April, 2016; originally announced April 2016.

  18. arXiv:1604.03044  [pdf, other

    cs.CY cs.SI physics.soc-ph

    Wisdom of the Crowd or Wisdom of a Few? An Analysis of Users' Content Generation

    Authors: Ricardo Baeza-Yates, Diego Saez-Trumper

    Abstract: In this paper we analyze how user generated content (UGC) is created, challenging the well known {\it wisdom of crowds} concept. Although it is known that user activity in most settings follow a power law, that is, few people do a lot, while most do nothing, there are few studies that characterize well this activity. In our analysis of datasets from two different social networks, Facebook and Twit… ▽ More

    Submitted 11 April, 2016; originally announced April 2016.

    ACM Class: H.2.8; J.4

    Journal ref: Proceedings of the 26th ACM Conference on Hypertext & Social Media, 2015

  19. arXiv:1601.02071  [pdf, other

    cs.HC

    Sentiment Visualisation Widgets for Exploratory Search

    Authors: Eduardo Graells-Garrido, Mounia Lalmas, Ricardo Baeza-Yates

    Abstract: This paper proposes the usage of \emph{visualisation widgets} for exploratory search with \emph{sentiment} as a facet. Starting from specific design goals for depiction of ambivalence in sentiment, two visualization widgets were implemented: \emph{scatter plot} and \emph{parallel coordinates}. Those widgets were evaluated against a text baseline in a small-scale usability study with exploratory ta… ▽ More

    Submitted 8 January, 2016; originally announced January 2016.

    Comments: Presented at the Social Personalization Workshop held jointly with ACM Hypertext 2014. 6 pages

    ACM Class: H.3.3; H.5.2

  20. Data Portraits and Intermediary Topics: Encouraging Exploration of Politically Diverse Profiles

    Authors: Eduardo Graells-Garrido, Mounia Lalmas, Ricardo Baeza-Yates

    Abstract: In micro-blogging platforms, people connect and interact with others. However, due to cognitive biases, they tend to interact with like-minded people and read agreeable information only. Many efforts to make people connect with those who think differently have not worked well. In this paper, we hypothesize, first, that previous approaches have not worked because they have been direct -- they have… ▽ More

    Submitted 4 January, 2016; originally announced January 2016.

    Comments: 12 pages, 7 figures. To be presented at ACM Intelligent User Interfaces 2016

    ACM Class: H.4.3; H.5.2

  21. arXiv:1510.01920  [pdf, other

    cs.SI cs.CY cs.HC

    Encouraging Diversity- and Representation-Awareness in Geographically Centralized Content

    Authors: Eduardo Graells-Garrido, Mounia Lalmas, Ricardo Baeza-Yates

    Abstract: In centralized countries, not only population, media and economic power are concentrated, but people give more attention to central locations. While this is not inherently bad, this behavior extends to micro-blogging platforms: central locations get more attention in terms of information flow. In this paper we study the effects of an information filtering algorithm that decentralizes content in su… ▽ More

    Submitted 7 October, 2015; originally announced October 2015.

    Comments: 12 pages. Under review. Please contact authors before citing / distributing

    ACM Class: H.3.3; H.5.2

  22. arXiv:1506.00963  [pdf, other

    cs.SI physics.soc-ph

    Finding Intermediary Topics Between People of Opposing Views: A Case Study

    Authors: Eduardo Graells-Garrido, Mounia Lalmas, Ricardo Baeza-Yates

    Abstract: In micro-blogging platforms, people can connect with others and have conversations on a wide variety of topics. However, because of homophily and selective exposure, users tend to connect with like-minded people and only read agreeable information. Motivated by this scenario, in this paper we study the diversity of intermediary topics, which are latent topics estimated from user generated content.… ▽ More

    Submitted 30 July, 2015; v1 submitted 2 June, 2015; originally announced June 2015.

    Comments: 6 pages. Presented at the International Workshop on Social Personalisation & Search, SPS2015 (co-located with SIGIR 2015)

    ACM Class: H.3.4

  23. arXiv:1309.2679  [pdf

    cs.SI

    Caracterizando la Web Chilena

    Authors: Eduardo Graells-Garrido, Ricardo Baeza-Yates

    Abstract: This article presents a characterization of the web space from Chile in 2007. The characterization shows distributions of sites and domains, analysis of document content and server configuration. In addition, the network structure of the chilean Web is analyzed, determining components based on hyperlink structure at the document and site levels. Original Abstract: En este artículo se muestra una… ▽ More

    Submitted 10 September, 2013; originally announced September 2013.

    Comments: In Spanish. Published in "Revista Bits de Ciencia" vol. 2, 2009. Department of Computer Science, University of Chile. Available in http://www.dcc.uchile.cl/revista

  24. arXiv:1309.1890  [pdf, other

    cs.SI physics.soc-ph

    Evolution of the Chilean Web: A Larger Study

    Authors: Eduardo Graells-Garrido, Ricardo Baeza-Yates

    Abstract: In this paper we extend our previous and only study on the dynamics of the Chilean Web. This new study doubles the time period and to the best of our knowledge is the only study of its type known about any country in the Web. The new results corroborate the trends found before, in particular the exponential growth of the Web, and reinforce the conclusion that the Web is more chaotic than we would… ▽ More

    Submitted 7 September, 2013; originally announced September 2013.

    Comments: Presented at the Sixth Latin American Web Congress, 2008, Vila Velha, Espírito Santo, Brazil

  25. arXiv:1204.2712  [pdf, ps, other

    cs.AI cs.HC cs.IR

    Learning to Rank Query Recommendations by Semantic Similarities

    Authors: Sumio Fujita, Georges Dupret, Ricardo Baeza-Yates

    Abstract: Logs of the interactions with a search engine show that users often reformulate their queries. Examining these reformulations shows that recommendations that precise the focus of a query are helpful, like those based on expansions of the original queries. But it also shows that queries that express some topical shift with respect to the original query can help user access more rapidly the informat… ▽ More

    Submitted 12 April, 2012; originally announced April 2012.

    Comments: 2nd International Workshop on Usage Analysis and the Web of Data (USEWOD2012) in the 21st International World Wide Web Conference (WWW2012), Lyon, France, April 17th, 2012

    Report number: WWW2012USEWOD/2012/fuduba ACM Class: H.3.3; H.3.5

  26. arXiv:1006.5059  [pdf, ps, other

    cs.IR

    Capacity Planning for Vertical Search Engines

    Authors: Claudine Badue, Jussara Almeida, Virgilio Almeida, Ricardo Baeza-Yates, Berthier Ribeiro-Neto, Artur Ziviani, Nivio Ziviani

    Abstract: Vertical search engines focus on specific slices of content, such as the Web of a single country or the document collection of a large corporation. Despite this, like general open web search engines, they are expensive to maintain, expensive to operate, and hard to design. Because of this, predicting the response time of a vertical search engine is usually done empirically through experimentation,… ▽ More

    Submitted 25 June, 2010; originally announced June 2010.