Skip to main content

Showing 1–18 of 18 results for author: Cerqueira, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17008  [pdf, other

    cs.LG stat.ML

    Meta-learning and Data Augmentation for Stress Testing Forecasting Models

    Authors: Ricardo Inácio, Vitor Cerqueira, Marília Barandas, Carlos Soares

    Abstract: The effectiveness of univariate forecasting models is often hampered by conditions that cause them stress. A model is considered to be under stress if it shows a negative behaviour, such as higher-than-usual errors or increased uncertainty. Understanding the factors that cause stress to forecasting models is important to improve their reliability, transparency, and utility. This paper addresses th… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 16 pages, 5 figures, 3 tables

  2. arXiv:2406.16590  [pdf, other

    stat.ML cs.LG

    Forecasting with Deep Learning: Beyond Average of Average of Average Performance

    Authors: Vitor Cerqueira, Luis Roque, Carlos Soares

    Abstract: Accurate evaluation of forecasting models is essential for ensuring reliable predictions. Current practices for evaluating and comparing forecasting models focus on summarising performance into a single score, using metrics such as SMAPE. We hypothesize that averaging performance over all samples dilutes relevant information about the relative performance of models. Particularly, conditions in whi… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  3. arXiv:2405.11237  [pdf, other

    stat.ML cs.LG

    Lag Selection for Univariate Time Series Forecasting using Deep Learning: An Empirical Study

    Authors: José Leites, Vitor Cerqueira, Carlos Soares

    Abstract: Most forecasting methods use recent past observations (lags) to model the future values of univariate time series. Selecting an adequate number of lags is important for training accurate forecasting models. Several approaches and heuristics have been devised to solve this task. However, there is no consensus about what the best approach is. Besides, lag selection procedures have been developed bas… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  4. arXiv:2404.18537  [pdf, other

    cs.LG stat.ML

    Time Series Data Augmentation as an Imbalanced Learning Problem

    Authors: Vitor Cerqueira, Nuno Moniz, Ricardo Inácio, Carlos Soares

    Abstract: Recent state-of-the-art forecasting methods are trained on collections of time series. These methods, often referred to as global models, can capture common patterns in different time series to improve their generalization performance. However, they require large amounts of data that might not be readily available. Besides this, global models sometimes fail to capture relevant patterns unique to a… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  5. arXiv:2404.16918  [pdf, other

    cs.LG stat.ML

    On-the-fly Data Augmentation for Forecasting with Deep Learning

    Authors: Vitor Cerqueira, Moisés Santos, Yassine Baghoussi, Carlos Soares

    Abstract: Deep learning approaches are increasingly used to tackle forecasting tasks. A key factor in the successful application of these methods is a large enough training sample size, which is not always available. In these scenarios, synthetic data generation techniques are usually applied to augment the dataset. Data augmentation is typically applied before fitting a model. However, these approaches cre… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  6. arXiv:2306.14563  [pdf, other

    stat.ML cs.LG

    Multi-output Ensembles for Multi-step Forecasting

    Authors: Vitor Cerqueira, Luis Torgo

    Abstract: This paper studies the application of ensembles composed of multi-output models for multi-step ahead forecasting problems. Dynamic ensembles have been commonly used for forecasting. However, these are typically designed for one-step-ahead tasks. On the other hand, the literature regarding the application of dynamic ensembles for multi-step ahead forecasting is scarce. Moreover, it is not clear how… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: 19 pages, github repository available

  7. arXiv:2206.09821  [pdf, other

    stat.ML cs.LG

    Exceedance Probability Forecasting via Regression for Significant Wave Height Prediction

    Authors: Vitor Cerqueira, Luis Torgo

    Abstract: Significant wave height forecasting is a key problem in ocean data analytics. This task affects several maritime operations, such as managing the passage of vessels or estimating the energy production from waves. In this work, we focus on the prediction of extreme values of significant wave height that can cause coastal disasters. This task is framed as an exceedance probability forecasting proble… ▽ More

    Submitted 6 May, 2024; v1 submitted 20 June, 2022; originally announced June 2022.

  8. arXiv:2205.02553  [pdf, other

    cs.LG stat.ML

    Automated Imbalanced Classification via Layered Learning

    Authors: Vitor Cerqueira, Luis Torgo, Paula Branco, Colin Bellinger

    Abstract: In this paper we address imbalanced binary classification (IBC) tasks. Applying resampling strategies to balance the class distribution of training instances is a common approach to tackle these problems. Many state-of-the-art methods find instances of interest close to the decision boundary to drive the resampling process. However, under-sampling the majority class may potentially lead to importa… ▽ More

    Submitted 30 May, 2022; v1 submitted 5 May, 2022; originally announced May 2022.

  9. arXiv:2112.14806  [pdf, other

    cs.LG

    AutoFITS: Automatic Feature Engineering for Irregular Time Series

    Authors: Pedro Costa, Vitor Cerqueira, João Vinagre

    Abstract: A time series represents a set of observations collected over time. Typically, these observations are captured with a uniform sampling frequency (e.g. daily). When data points are observed in uneven time intervals the time series is referred to as irregular or intermittent. In such scenarios, the most common solution is to reconstruct the time series to make it regular, thus removing its intermitt… ▽ More

    Submitted 29 December, 2021; originally announced December 2021.

  10. arXiv:2104.01830  [pdf, other

    stat.ML cs.LG

    Model Compression for Dynamic Forecast Combination

    Authors: Vitor Cerqueira, Luis Torgo, Carlos Soares, Albert Bifet

    Abstract: The predictive advantage of combining several different predictive models is widely accepted. Particularly in time series forecasting problems, this combination is often dynamic to cope with potential non-stationary sources of variation present in the data. Despite their superior predictive performance, ensemble methods entail two main limitations: high computational costs and lack of transparency… ▽ More

    Submitted 5 April, 2021; originally announced April 2021.

  11. arXiv:2104.00584  [pdf, other

    stat.ML cs.LG

    Model Selection for Time Series Forecasting: Empirical Analysis of Different Estimators

    Authors: Vitor Cerqueira, Luis Torgo, Carlos Soares

    Abstract: Evaluating predictive models is a crucial task in predictive analytics. This process is especially challenging with time series data where the observations show temporal dependencies. Several studies have analysed how different performance estimation methods compare with each other for approximating the true loss incurred by a given forecasting model. However, these studies do not address how the… ▽ More

    Submitted 11 February, 2022; v1 submitted 1 April, 2021; originally announced April 2021.

  12. arXiv:2103.00903  [pdf, other

    cs.LG stat.ML

    STUDD: A Student-Teacher Method for Unsupervised Concept Drift Detection

    Authors: Vitor Cerqueira, Heitor Murilo Gomes, Albert Bifet, Luis Torgo

    Abstract: Concept drift detection is a crucial task in data stream evolving environments. Most of state of the art approaches designed to tackle this problem monitor the loss of predictive models. However, this approach falls short in many real-world scenarios, where the true labels are not readily available to compute the loss. In this context, there is increasing attention to approaches that perform conce… ▽ More

    Submitted 1 March, 2021; originally announced March 2021.

    Comments: 23 pages, single column

  13. arXiv:2010.11595  [pdf, other

    stat.ML cs.LG

    Early Anomaly Detection in Time Series: A Hierarchical Approach for Predicting Critical Health Episodes

    Authors: Vitor Cerqueira, Luis Torgo, Carlos Soares

    Abstract: The early detection of anomalous events in time series data is essential in many domains of application. In this paper we deal with critical health events, which represent a significant cause of mortality in intensive care units of hospitals. The timely prediction of these events is crucial for mitigating their consequences and improving healthcare. One of the most common approaches to tackle earl… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

  14. arXiv:2010.07137  [pdf, other

    stat.ML cs.LG

    VEST: Automatic Feature Engineering for Forecasting

    Authors: Vitor Cerqueira, Nuno Moniz, Carlos Soares

    Abstract: Time series forecasting is a challenging task with applications in a wide range of domains. Auto-regression is one of the most common approaches to address these problems. Accordingly, observations are modelled by multiple regression using their past lags as predictor variables. We investigate the extension of auto-regressive processes using statistics which summarise the recent past dynamics of t… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

    Comments: 25 pages, R code available

  15. arXiv:1909.13316  [pdf, other

    stat.ML cs.LG

    Machine Learning vs Statistical Methods for Time Series Forecasting: Size Matters

    Authors: Vitor Cerqueira, Luis Torgo, Carlos Soares

    Abstract: Time series forecasting is one of the most active research topics. Machine learning methods have been increasingly adopted to solve these predictive tasks. However, in a recent work, these were shown to systematically present a lower predictive performance relative to simple statistical methods. In this work, we counter these results. We show that these are only valid under an extremely low sample… ▽ More

    Submitted 29 September, 2019; originally announced September 2019.

    Comments: 9 pages

  16. Evaluating time series forecasting models: An empirical study on performance estimation methods

    Authors: Vitor Cerqueira, Luis Torgo, Igor Mozetic

    Abstract: Performance estimation aims at estimating the loss that a predictive model will incur on unseen data. These procedures are part of the pipeline in every machine learning project and are used for assessing the overall generalisation ability of predictive models. In this paper we address the application of these methods to time series forecasting tasks. For independent and identically distributed da… ▽ More

    Submitted 28 May, 2019; originally announced May 2019.

    Journal ref: Machine Learning 109:1997-2028, 2020

  17. How to evaluate sentiment classifiers for Twitter time-ordered data?

    Authors: Igor Mozetič, Luis Torgo, Vitor Cerqueira, Jasmina Smailović

    Abstract: Social media are becoming an increasingly important source of information about the public mood regarding issues such as elections, Brexit, stock market, etc. In this paper we focus on sentiment classification of Twitter data. Construction of sentiment classifiers is a standard text mining task, but here we address the question of how to properly evaluate them as there is no settled way to do so.… ▽ More

    Submitted 14 March, 2018; originally announced March 2018.

    Journal ref: PLoS ONE 13(3): e0194317, 2018

  18. arXiv:1706.09367  [pdf, other

    stat.ML cs.LG

    autoBagging: Learning to Rank Bagging Workflows with Metalearning

    Authors: Fábio Pinto, Vítor Cerqueira, Carlos Soares, João Mendes-Moreira

    Abstract: Machine Learning (ML) has been successfully applied to a wide range of domains and applications. One of the techniques behind most of these successful applications is Ensemble Learning (EL), the field of ML that gave birth to methods such as Random Forests or Boosting. The complexity of applying these techniques together with the market scarcity on ML experts, has created the need for systems that… ▽ More

    Submitted 28 June, 2017; originally announced June 2017.