Skip to main content

Showing 1–37 of 37 results for author: Saleiro, P

.
  1. arXiv:2405.05809  [pdf

    cs.LG cs.AI cs.CY

    Aequitas Flow: Streamlining Fair ML Experimentation

    Authors: Sérgio Jesus, Pedro Saleiro, Inês Oliveira e Silva, Beatriz M. Jorge, Rita P. Ribeiro, João Gama, Pedro Bizarro, Rayid Ghani

    Abstract: Aequitas Flow is an open-source framework for end-to-end Fair Machine Learning (ML) experimentation in Python. This package fills the existing integration gaps in other Fair ML packages of complete and accessible experimentation. It provides a pipeline for fairness-aware model training, hyperparameter optimization, and evaluation, enabling rapid and simple experiments and result analysis. Aimed at… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  2. arXiv:2403.06906  [pdf, other

    cs.LG cs.AI

    Cost-Sensitive Learning to Defer to Multiple Experts with Workload Constraints

    Authors: Jean V. Alves, Diogo Leitão, Sérgio Jesus, Marco O. P. Sampaio, Javier Liébana, Pedro Saleiro, Mário A. T. Figueiredo, Pedro Bizarro

    Abstract: Learning to defer (L2D) aims to improve human-AI collaboration systems by learning how to defer decisions to humans when they are more likely to be correct than an ML classifier. Existing research in L2D overlooks key aspects of real-world systems that impede its practical adoption, namely: i) neglecting cost-sensitive scenarios, where type 1 and type 2 errors have different costs; ii) requiring c… ▽ More

    Submitted 21 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  3. arXiv:2401.08534  [pdf, other

    cs.LG cs.AI cs.HC

    DiConStruct: Causal Concept-based Explanations through Black-Box Distillation

    Authors: Ricardo Moreira, Jacopo Bono, Mário Cardoso, Pedro Saleiro, Mário A. T. Figueiredo, Pedro Bizarro

    Abstract: Model interpretability plays a central role in human-AI decision-making systems. Ideally, explanations should be expressed using human-interpretable semantic concepts. Moreover, the causal relations between these concepts should be captured by the explainer to allow for reasoning about the explanations. Lastly, explanation methods should be efficient and not compromise the performance of the predi… ▽ More

    Submitted 26 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted at Conference on Causal Learning and Reasoning (CLeaR 2024, https://www.cclear.cc/2024). To be published at Proceedings of Machine Learning Research (PMLR)

  4. arXiv:2312.13218  [pdf, other

    cs.LG cs.AI

    FiFAR: A Fraud Detection Dataset for Learning to Defer

    Authors: Jean V. Alves, Diogo Leitão, Sérgio Jesus, Marco O. P. Sampaio, Pedro Saleiro, Mário A. T. Figueiredo, Pedro Bizarro

    Abstract: Public dataset limitations have significantly hindered the development and benchmarking of learning to defer (L2D) algorithms, which aim to optimally combine human and AI capabilities in hybrid decision-making systems. In such systems, human availability and domain-specific concerns introduce difficulties, while obtaining human predictions for training and evaluation is costly. Financial fraud det… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: The public dataset and detailed synthetic expert information are available at: https://github.com/feedzai/fifar-dataset

  5. arXiv:2303.16963  [pdf, other

    cs.LG cs.CY

    Fairness-Aware Data Valuation for Supervised Learning

    Authors: José Pombal, Pedro Saleiro, Mário A. T. Figueiredo, Pedro Bizarro

    Abstract: Data valuation is a ML field that studies the value of training instances towards a given predictive task. Although data bias is one of the main sources of downstream model unfairness, previous work in data valuation does not consider how training instances may influence both performance and fairness of ML models. Thus, we propose Fairness-Aware Data vauatiOn (FADO), a data valuation framework tha… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

    Comments: ICLR 2023 Workshop Trustworthy ML

  6. arXiv:2302.07444  [pdf, other

    cs.LG cs.HC

    A Case Study on Designing Evaluations of ML Explanations with Simulated User Studies

    Authors: Ada Martin, Valerie Chen, Sérgio Jesus, Pedro Saleiro

    Abstract: When conducting user studies to ascertain the usefulness of model explanations in aiding human decision-making, it is important to use real-world use cases, data, and users. However, this process can be resource-intensive, allowing only a limited number of explanation methods to be evaluated. Simulated user evaluations (SimEvals), which use machine learning models as a proxy for human users, have… ▽ More

    Submitted 20 March, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

    Comments: 9 pages, 2 figures. Will appear in ICLR 2023's TrustML-(un)Limited workshop

  7. arXiv:2211.13358  [pdf, other

    cs.LG

    Turning the Tables: Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation

    Authors: Sérgio Jesus, José Pombal, Duarte Alves, André Cruz, Pedro Saleiro, Rita P. Ribeiro, João Gama, Pedro Bizarro

    Abstract: Evaluating new techniques on realistic datasets plays a crucial role in the development of ML research and its broader adoption by practitioners. In recent years, there has been a significant increase of publicly available unstructured data resources for computer vision and NLP tasks. However, tabular data -- which is prevalent in many high-stakes domains -- has been lagging behind. To bridge this… ▽ More

    Submitted 28 November, 2022; v1 submitted 23 November, 2022; originally announced November 2022.

    Comments: Accepted at NeurIPS 2022. https://openreview.net/forum?id=UrAYT2QwOX8

  8. arXiv:2210.14360  [pdf, other

    cs.LG cs.AI

    LaundroGraph: Self-Supervised Graph Representation Learning for Anti-Money Laundering

    Authors: Mário Cardoso, Pedro Saleiro, Pedro Bizarro

    Abstract: Anti-money laundering (AML) regulations mandate financial institutions to deploy AML systems based on a set of rules that, when triggered, form the basis of a suspicious alert to be assessed by human analysts. Reviewing these cases is a cumbersome and complex task that requires analysts to navigate a large network of financial interactions to validate suspicious movements. Furthermore, these syste… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

    Comments: Accepted at ACM International Conference on AI in Finance 2022 (ICAIF'22)

  9. arXiv:2209.07850  [pdf, other

    cs.LG cs.AI cs.CY

    FairGBM: Gradient Boosting with Fairness Constraints

    Authors: André F Cruz, Catarina Belém, Sérgio Jesus, João Bravo, Pedro Saleiro, Pedro Bizarro

    Abstract: Tabular data is prevalent in many high-stakes domains, such as financial services or public policy. Gradient Boosted Decision Trees (GBDT) are popular in these settings due to their scalability, performance, and low training cost. While fairness in these domains is a foremost concern, existing in-processing Fair ML methods are either incompatible with GBDT, or incur in significant performance loss… ▽ More

    Submitted 3 March, 2023; v1 submitted 16 September, 2022; originally announced September 2022.

    Comments: Published as a conference paper at ICLR 2023

  10. arXiv:2207.06273  [pdf, other

    cs.LG cs.CY q-fin.ST

    Understanding Unfairness in Fraud Detection through Model and Data Bias Interactions

    Authors: José Pombal, André F. Cruz, João Bravo, Pedro Saleiro, Mário A. T. Figueiredo, Pedro Bizarro

    Abstract: In recent years, machine learning algorithms have become ubiquitous in a multitude of high-stakes decision-making applications. The unparalleled ability of machine learning algorithms to learn patterns from data also enables them to incorporate biases embedded within. A biased model can then make decisions that disproportionately harm certain groups in society -- limiting their access to financial… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

    Comments: KDD'22 Workshop on Machine Learning in Finance

  11. arXiv:2206.13503  [pdf, other

    cs.LG cs.HC

    On the Importance of Application-Grounded Experimental Design for Evaluating Explainable ML Methods

    Authors: Kasun Amarasinghe, Kit T. Rodolfa, Sérgio Jesus, Valerie Chen, Vladimir Balayan, Pedro Saleiro, Pedro Bizarro, Ameet Talwalkar, Rayid Ghani

    Abstract: Most existing evaluations of explainable machine learning (ML) methods rely on simplifying assumptions or proxies that do not reflect real-world use cases; the handful of more robust evaluations on real-world settings have shortcomings in their design, resulting in limited conclusions of methods' real-world utility. In this work, we seek to bridge this gap by conducting a study that evaluates thre… ▽ More

    Submitted 21 February, 2023; v1 submitted 24 June, 2022; originally announced June 2022.

  12. arXiv:2206.13202  [pdf, other

    cs.LG cs.AI cs.HC

    Human-AI Collaboration in Decision-Making: Beyond Learning to Defer

    Authors: Diogo Leitão, Pedro Saleiro, Mário A. T. Figueiredo, Pedro Bizarro

    Abstract: Human-AI collaboration (HAIC) in decision-making aims to create synergistic teaming between human decision-makers and AI systems. Learning to defer (L2D) has been presented as a promising framework to determine who among humans and AI should make which decisions in order to optimize the performance and fairness of the combined system. Nevertheless, L2D entails several often unfeasible requirements… ▽ More

    Submitted 13 July, 2022; v1 submitted 27 June, 2022; originally announced June 2022.

    Comments: ICML 2022 Workshop on Human-Machine Collaboration and Teaming

  13. arXiv:2206.13183  [pdf, other

    cs.LG

    Prisoners of Their Own Devices: How Models Induce Data Bias in Performative Prediction

    Authors: José Pombal, Pedro Saleiro, Mário A. T. Figueiredo, Pedro Bizarro

    Abstract: The unparalleled ability of machine learning algorithms to learn patterns from data also enables them to incorporate biases embedded within. A biased model can then make decisions that disproportionately harm certain groups in society. Much work has been devoted to measuring unfairness in static ML environments, but not in dynamic, performative prediction ones, in which most real-world use cases o… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: ICML 2022 Workshop on Responsible Decision Making in Dynamic Environments

  14. arXiv:2205.03601  [pdf, other

    cs.LG cs.AI

    ConceptDistil: Model-Agnostic Distillation of Concept Explanations

    Authors: João Bento Sousa, Ricardo Moreira, Vladimir Balayan, Pedro Saleiro, Pedro Bizarro

    Abstract: Concept-based explanations aims to fill the model interpretability gap for non-technical humans-in-the-loop. Previous work has focused on providing concepts for specific models (eg, neural networks) or data types (eg, images), and by either trying to extract concepts from an already trained network or training self-explainable models through multi-task learning. In this work, we propose ConceptDis… ▽ More

    Submitted 7 May, 2022; originally announced May 2022.

    Comments: ICLR 2022 PAIR2Struct Workshop

  15. arXiv:2104.12459  [pdf, other

    cs.LG cs.AI

    Weakly Supervised Multi-task Learning for Concept-based Explainability

    Authors: Catarina Belém, Vladimir Balayan, Pedro Saleiro, Pedro Bizarro

    Abstract: In ML-aided decision-making tasks, such as fraud detection or medical diagnosis, the human-in-the-loop, usually a domain-expert without technical ML knowledge, prefers high-level concept-based explanations instead of low-level explanations based on model features. To obtain faithful concept-based explanations, we leverage multi-task learning to train a neural network that jointly learns to predict… ▽ More

    Submitted 26 April, 2021; originally announced April 2021.

    Comments: Accepted at ICLR 2021 Workshop on Weakly Supervised Learning (WeaSuL)

  16. Promoting Fairness through Hyperparameter Optimization

    Authors: André F. Cruz, Pedro Saleiro, Catarina Belém, Carlos Soares, Pedro Bizarro

    Abstract: Considerable research effort has been guided towards algorithmic fairness but real-world adoption of bias reduction techniques is still scarce. Existing methods are either metric- or model-specific, require access to sensitive attributes at inference time, or carry high development or deployment costs. This work explores the unfairness that emerges when optimizing ML models solely for predictive p… ▽ More

    Submitted 11 October, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2010.03665

    Journal ref: 2021 IEEE International Conference on Data Mining (ICDM)

  17. How can I choose an explainer? An Application-grounded Evaluation of Post-hoc Explanations

    Authors: Sérgio Jesus, Catarina Belém, Vladimir Balayan, João Bento, Pedro Saleiro, Pedro Bizarro, João Gama

    Abstract: There have been several research works proposing new Explainable AI (XAI) methods designed to generate model explanations having specific properties, or desiderata, such as fidelity, robustness, or human-interpretability. However, explanations are seldom evaluated based on their true practical impact on decision-making tasks. Without that assessment, explanations might be chosen that, in fact, hur… ▽ More

    Submitted 22 January, 2021; v1 submitted 21 January, 2021; originally announced January 2021.

    Comments: Accepted at FAccT'21, the ACM Conference on Fairness, Accountability, and Transparency

  18. arXiv:2012.01932  [pdf, other

    cs.LG cs.AI

    Teaching the Machine to Explain Itself using Domain Knowledge

    Authors: Vladimir Balayan, Pedro Saleiro, Catarina Belém, Ludwig Krippahl, Pedro Bizarro

    Abstract: Machine Learning (ML) has been increasingly used to aid humans to make better and faster decisions. However, non-technical humans-in-the-loop struggle to comprehend the rationale behind model predictions, hindering trust in algorithmic decision-making systems. Considerable research work on AI explainability attempts to win back trust in AI systems by develo** explanation methods but there is sti… ▽ More

    Submitted 27 November, 2020; originally announced December 2020.

    ACM Class: I.2

  19. TimeSHAP: Explaining Recurrent Models through Sequence Perturbations

    Authors: João Bento, Pedro Saleiro, André F. Cruz, Mário A. T. Figueiredo, Pedro Bizarro

    Abstract: Although recurrent neural networks (RNNs) are state-of-the-art in numerous sequential decision-making tasks, there has been little research on explaining their predictions. In this work, we present TimeSHAP, a model-agnostic recurrent explainer that builds upon KernelSHAP and extends it to the sequential domain. TimeSHAP computes feature-, timestep-, and cell-level attributions. As sequences may b… ▽ More

    Submitted 26 June, 2021; v1 submitted 30 November, 2020; originally announced December 2020.

    Comments: Accepted at KDD 2021

  20. arXiv:2010.03665  [pdf, other

    cs.LG cs.AI

    A Bandit-Based Algorithm for Fairness-Aware Hyperparameter Optimization

    Authors: André F. Cruz, Pedro Saleiro, Catarina Belém, Carlos Soares, Pedro Bizarro

    Abstract: Considerable research effort has been guided towards algorithmic fairness but there is still no major breakthrough. In practice, an exhaustive search over all possible techniques and hyperparameters is needed to find optimal fairness-accuracy trade-offs. Hence, coupled with the lack of tools for ML practitioners, real-world adoption of bias reduction methods is still scarce. To address this, we pr… ▽ More

    Submitted 22 October, 2020; v1 submitted 7 October, 2020; originally announced October 2020.

  21. arXiv:1811.05577  [pdf, other

    cs.LG cs.AI cs.CY

    Aequitas: A Bias and Fairness Audit Toolkit

    Authors: Pedro Saleiro, Benedict Kuester, Loren Hinkson, Jesse London, Abby Stevens, Ari Anisfeld, Kit T. Rodolfa, Rayid Ghani

    Abstract: Recent work has raised concerns on the risk of unintended bias in AI systems being used nowadays that can affect individuals unfairly based on race, gender or religion, among other possible characteristics. While a lot of bias metrics and fairness definitions have been proposed in recent years, there is no consensus on which metric/definition should be used and there are very few available resourc… ▽ More

    Submitted 29 April, 2019; v1 submitted 13 November, 2018; originally announced November 2018.

    Comments: Aequitas website: http://dsapp.uchicago.edu/aequitas

  22. arXiv:1810.05944  [pdf, other

    cs.SI cs.CY

    Social Media Brand Engagement as a Proxy for E-commerce Activities: A Case Study of Sina Weibo and JD

    Authors: Weiqiang Lin, Pedro Saleiro, Natasa Milic-Frayling, Eugene Ch'ng

    Abstract: E-commerce platforms facilitate sales of products while product vendors engage in Social Media Activities (SMA) to drive E-commerce Platform Activities (EPA) of consumers, enticing them to search, browse and buy products. The frequency and timing of SMA are expected to affect levels of EPA, increasing the number of brand related queries, clickthrough, and purchase orders. This paper applies cross-… ▽ More

    Submitted 13 October, 2018; originally announced October 2018.

    Comments: WI'18

  23. arXiv:1810.03235  [pdf, other

    cs.IR

    Entity-Relationship Search over the Web

    Authors: Pedro Saleiro, Natasa Milic-Frayling, Eduarda Mendes Rodrigues, Carlos Soares

    Abstract: Entity-Relationship (E-R) Search is a complex case of Entity Search where the goal is to search for multiple unknown entities and relationships connecting them. We assume that a E-R query can be decomposed as a sequence of sub-queries each containing keywords related to a specific entity or relationship. We adopt a probabilistic formulation of the E-R search problem. When creating specific represe… ▽ More

    Submitted 7 October, 2018; originally announced October 2018.

  24. arXiv:1801.07743  [pdf, other

    cs.IR cs.AI

    Entity Retrieval and Text Mining for Online Reputation Monitoring

    Authors: Pedro Saleiro

    Abstract: Online Reputation Monitoring (ORM) is concerned with the use of computational tools to measure the reputation of entities online, such as politicians or companies. In practice, current ORM methods are constrained to the generation of data analytics reports, which aggregate statistics of popularity and sentiment on social media. We argue that this format is too restrictive as end users often like t… ▽ More

    Submitted 23 January, 2018; originally announced January 2018.

    Comments: PhD Thesis

  25. arXiv:1709.01981  [pdf, other

    cs.CY

    Characterizing Geo-located Tweets in Brazilian Megacities

    Authors: João Pereira, Arian Pasquali, Pedro Saleiro, Rosaldo Rossetti, Nélio Cacho

    Abstract: This work presents a framework for collecting, processing and mining geo-located tweets in order to extract meaningful and actionable knowledge in the context of smart cities. We collected and characterized more than 9M tweets from the two biggest cities in Brazil, Rio de Janeiro and São Paulo. We performed topic modeling using the Latent Dirichlet Allocation model to produce an unsupervised distr… ▽ More

    Submitted 6 September, 2017; originally announced September 2017.

  26. arXiv:1709.00947  [pdf, other

    cs.CL cs.LG

    Learning Word Embeddings from the Portuguese Twitter Stream: A Study of some Practical Aspects

    Authors: Pedro Saleiro, Luís Sarmento, Eduarda Mendes Rodrigues, Carlos Soares, Eugénio Oliveira

    Abstract: This paper describes a preliminary study for producing and distributing a large-scale database of embeddings from the Portuguese Twitter stream. We start by experimenting with a relatively small sample and focusing on three challenges: volume of training data, vocabulary size and intrinsic evaluation metrics. Using a single GPU, we were able to scale up vocabulary size from 2048 words embedded and… ▽ More

    Submitted 4 September, 2017; originally announced September 2017.

  27. arXiv:1707.09075  [pdf, ps, other

    cs.IR

    Early Fusion Strategy for Entity-Relationship Retrieval

    Authors: Pedro Saleiro, Natasa Milic-Frayling, Eduarda Mendes Rodrigues, Carlos Soares

    Abstract: We address the task of entity-relationship (E-R) retrieval, i.e, given a query characterizing types of two or more entities and relationships between them, retrieve the relevant tuples of related entities. Answering E-R queries requires gathering and joining evidence from multiple unstructured documents. In this work, we consider entity and relationships of any type, i.e, characterized by context… ▽ More

    Submitted 19 October, 2017; v1 submitted 27 July, 2017; originally announced July 2017.

    Comments: KG4IR (SIGIR workshop)

  28. arXiv:1706.05090  [pdf, other

    cs.CY cs.SI

    Transportation in Social Media: an automatic classifier for travel-related tweets

    Authors: João Pereira, Arian Pasquali, Pedro Saleiro, Rosaldo Rossetti

    Abstract: In the last years researchers in the field of intelligent transportation systems have made several efforts to extract valuable information from social media streams. However, collecting domain-specific data from any social media is a challenging task demanding appropriate and robust classification methods. In this work we focus on exploring geo-located tweets in order to create a travel-related tw… ▽ More

    Submitted 15 June, 2017; originally announced June 2017.

  29. arXiv:1706.03960  [pdf, other

    cs.IR

    RELink: A Research Framework and Test Collection for Entity-Relationship Retrieval

    Authors: Pedro Saleiro, Natasa Milic-Frayling, Eduarda Mendes Rodrigues, Carlos Soares

    Abstract: Improvements of entity-relationship (E-R) search techniques have been hampered by a lack of test collections, particularly for complex queries involving multiple entities and relationships. In this paper we describe a method for generating E-R test queries to support comprehensive E-R search experiments. Queries and relevance judgments are created from content that exists in a tabular form where c… ▽ More

    Submitted 13 June, 2017; originally announced June 2017.

    Comments: SIGIR 17 (resource)

  30. arXiv:1704.05091  [pdf, ps, other

    cs.CL cs.IR

    FEUP at SemEval-2017 Task 5: Predicting Sentiment Polarity and Intensity with Financial Word Embeddings

    Authors: Pedro Saleiro, Eduarda Mendes Rodrigues, Carlos Soares, Eugénio Oliveira

    Abstract: This paper presents the approach developed at the Faculty of Engineering of University of Porto, to participate in SemEval 2017, Task 5: Fine-grained Sentiment Analysis on Financial Microblogs and News. The task consisted in predicting a real continuous variable from -1.0 to +1.0 representing the polarity and intensity of sentiment concerning companies/stocks mentioned in short texts. We modeled t… ▽ More

    Submitted 17 April, 2017; originally announced April 2017.

  31. arXiv:1610.09894  [pdf, other

    cs.CY cs.AI cs.SI

    Mining Social Media for Open Innovation in Transportation Systems

    Authors: Daniela Ulloa, Pedro Saleiro, Rosaldo J. F. Rossetti, Elis Regina Silva

    Abstract: This work proposes a novel framework for the development of new products and services in transportation through an open innovation approach based on automatic content analysis of social media data. The framework is able to extract users comments from Online Social Networks (OSN), to process and analyze text through information extraction and sentiment analysis techniques to obtain relevant informa… ▽ More

    Submitted 31 October, 2016; originally announced October 2016.

  32. arXiv:1607.03057  [pdf, other

    cs.SI

    Learning from the News: Predicting Entity Popularity on Twitter

    Authors: Pedro Saleiro, Carlos Soares

    Abstract: In this work, we tackle the problem of predicting entity popularity on Twitter based on the news cycle. We apply a supervised learn- ing approach and extract four types of features: (i) signal, (ii) textual, (iii) sentiment and (iv) semantic, which we use to predict whether the popularity of a given entity will be high or low in the following hours. We run several experiments on six different enti… ▽ More

    Submitted 11 July, 2016; originally announced July 2016.

  33. arXiv:1607.00167  [pdf, other

    cs.SI cs.CL cs.IR

    SentiBubbles: Topic Modeling and Sentiment Visualization of Entity-centric Tweets

    Authors: João Oliveira, Mike Pinto, Pedro Saleiro, Jorge Teixeira

    Abstract: Social Media users tend to mention entities when reacting to news events. The main purpose of this work is to create entity-centric aggregations of tweets on a daily basis. By applying topic modeling and sentiment analysis, we create data visualization insights about current events and people reactions to those events from an entity-centric perspective.

    Submitted 23 January, 2018; v1 submitted 1 July, 2016; originally announced July 2016.

  34. arXiv:1606.05242  [pdf, other

    cs.SI

    Sentiment Aggregate Functions for Political Opinion Polling using Microblog Streams

    Authors: Pedro Saleiro, Luís Gomes, Carlos Soares

    Abstract: The automatic content analysis of mass media in the social sciences has become necessary and possible with the raise of social media and computational power. One particularly promising avenue of research concerns the use of sentiment analysis in microblog streams. However, one of the main challenges consists in aggregating sentiment polarity in a timely fashion that can be fed to the prediction me… ▽ More

    Submitted 16 June, 2016; originally announced June 2016.

  35. arXiv:1601.00855  [pdf, other

    cs.IR

    TimeMachine: Entity-centric Search and Visualization of News Archives

    Authors: Pedro Saleiro, Jorge Teixeira, Carlos Soares, Eugénio Oliveira

    Abstract: We present a dynamic web tool that allows interactive search and visualization of large news archives using an entity-centric approach. Users are able to search entities using keyword phrases expressing news stories or events and the system retrieves the most relevant entities to the user query based on automatically extracted and indexed entity profiles. From the computational journalism perspect… ▽ More

    Submitted 5 January, 2016; originally announced January 2016.

    Comments: Advances in Information Retrieval: 38th European Conference on IR Research, ECIR 2016, Padua, Italy, March 20-23, 2016

  36. arXiv:1511.09290  [pdf, ps, other

    cs.IR

    "Piaf" vs "Adele": classifying encyclopedic queries using automatically labeled training data

    Authors: Pedro Saleiro, Luís Sarmento

    Abstract: Encyclopedic queries express the intent of obtaining information typically available in encyclopedias, such as biographical, geographical or historical facts. In this paper, we train a classifier for detecting the encyclopedic intent of web queries. For training such a classifier, we automatically label training data from raw query logs. We use click-through data to select positive examples of enc… ▽ More

    Submitted 30 November, 2015; originally announced November 2015.

    Comments: in Proceedings of the 10th Conference on Open Research Areas in Information Retrieval, 2013

  37. POPmine: Tracking Political Opinion on the Web

    Authors: Pedro Saleiro, Sílvio Amir, Mário J. Silva, Carlos Soares

    Abstract: The automatic content analysis of mass media in the social sciences has become necessary and possible with the raise of social media and computational power. One particularly promising avenue of research concerns the use of opinion mining. We design and implement the POPmine system which is able to collect texts from web-based conventional media (news items in mainstream media sites) and social me… ▽ More

    Submitted 29 November, 2015; originally announced November 2015.

    Comments: 2015 IEEE International Conference on Computer and Information Technology, Ubiquitous Computing and Communications