Search | arXiv e-print repository

Meta-learning and Data Augmentation for Stress Testing Forecasting Models

Authors: Ricardo Inácio, Vitor Cerqueira, Marília Barandas, Carlos Soares

Abstract: The effectiveness of univariate forecasting models is often hampered by conditions that cause them stress. A model is considered to be under stress if it shows a negative behaviour, such as higher-than-usual errors or increased uncertainty. Understanding the factors that cause stress to forecasting models is important to improve their reliability, transparency, and utility. This paper addresses th… ▽ More The effectiveness of univariate forecasting models is often hampered by conditions that cause them stress. A model is considered to be under stress if it shows a negative behaviour, such as higher-than-usual errors or increased uncertainty. Understanding the factors that cause stress to forecasting models is important to improve their reliability, transparency, and utility. This paper addresses this problem by contributing with a novel framework called MAST (Meta-learning and data Augmentation for Stress Testing). The proposed approach aims to model and characterize stress in univariate time series forecasting models, focusing on conditions where they exhibit large errors. In particular, MAST is a meta-learning approach that predicts the probability that a given model will perform poorly on a given time series based on a set of statistical time series features. MAST also encompasses a novel data augmentation technique based on oversampling to improve the metadata concerning stress. We conducted experiments using three benchmark datasets that contain a total of 49.794 time series to validate the performance of MAST. The results suggest that the proposed approach is able to identify conditions that lead to large errors. The method and experiments are publicly available in a repository. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 16 pages, 5 figures, 3 tables

arXiv:2406.16590 [pdf, other]

Forecasting with Deep Learning: Beyond Average of Average of Average Performance

Authors: Vitor Cerqueira, Luis Roque, Carlos Soares

Abstract: Accurate evaluation of forecasting models is essential for ensuring reliable predictions. Current practices for evaluating and comparing forecasting models focus on summarising performance into a single score, using metrics such as SMAPE. We hypothesize that averaging performance over all samples dilutes relevant information about the relative performance of models. Particularly, conditions in whi… ▽ More Accurate evaluation of forecasting models is essential for ensuring reliable predictions. Current practices for evaluating and comparing forecasting models focus on summarising performance into a single score, using metrics such as SMAPE. We hypothesize that averaging performance over all samples dilutes relevant information about the relative performance of models. Particularly, conditions in which this relative performance is different than the overall accuracy. We address this limitation by proposing a novel framework for evaluating univariate time series forecasting models from multiple perspectives, such as one-step ahead forecasting versus multi-step ahead forecasting. We show the advantages of this framework by comparing a state-of-the-art deep learning approach with classical forecasting techniques. While classical methods (e.g. ARIMA) are long-standing approaches to forecasting, deep neural networks (e.g. NHITS) have recently shown state-of-the-art forecasting performance in benchmark datasets. We conducted extensive experiments that show NHITS generally performs best, but its superiority varies with forecasting conditions. For instance, concerning the forecasting horizon, NHITS only outperforms classical approaches for multi-step ahead forecasting. Another relevant insight is that, when dealing with anomalies, NHITS is outperformed by methods such as Theta. These findings highlight the importance of aspect-based model evaluation. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2405.11237 [pdf, other]

Lag Selection for Univariate Time Series Forecasting using Deep Learning: An Empirical Study

Authors: José Leites, Vitor Cerqueira, Carlos Soares

Abstract: Most forecasting methods use recent past observations (lags) to model the future values of univariate time series. Selecting an adequate number of lags is important for training accurate forecasting models. Several approaches and heuristics have been devised to solve this task. However, there is no consensus about what the best approach is. Besides, lag selection procedures have been developed bas… ▽ More Most forecasting methods use recent past observations (lags) to model the future values of univariate time series. Selecting an adequate number of lags is important for training accurate forecasting models. Several approaches and heuristics have been devised to solve this task. However, there is no consensus about what the best approach is. Besides, lag selection procedures have been developed based on local models and classical forecasting techniques such as ARIMA. We bridge this gap in the literature by carrying out an extensive empirical analysis of different lag selection methods. We focus on deep learning methods trained in a global approach, i.e., on datasets comprising multiple univariate time series. The experiments were carried out using three benchmark databases that contain a total of 2411 univariate time series. The results indicate that the lag size is a relevant parameter for accurate forecasts. In particular, excessively small or excessively large lag sizes have a considerable negative impact on forecasting performance. Cross-validation approaches show the best performance for lag selection, but this performance is comparable with simple heuristics. △ Less

Submitted 18 May, 2024; originally announced May 2024.

arXiv:2404.18537 [pdf, other]

Time Series Data Augmentation as an Imbalanced Learning Problem

Authors: Vitor Cerqueira, Nuno Moniz, Ricardo Inácio, Carlos Soares

Abstract: Recent state-of-the-art forecasting methods are trained on collections of time series. These methods, often referred to as global models, can capture common patterns in different time series to improve their generalization performance. However, they require large amounts of data that might not be readily available. Besides this, global models sometimes fail to capture relevant patterns unique to a… ▽ More Recent state-of-the-art forecasting methods are trained on collections of time series. These methods, often referred to as global models, can capture common patterns in different time series to improve their generalization performance. However, they require large amounts of data that might not be readily available. Besides this, global models sometimes fail to capture relevant patterns unique to a particular time series. In these cases, data augmentation can be useful to increase the sample size of time series datasets. The main contribution of this work is a novel method for generating univariate time series synthetic samples. Our approach stems from the insight that the observations concerning a particular time series of interest represent only a small fraction of all observations. In this context, we frame the problem of training a forecasting model as an imbalanced learning task. Oversampling strategies are popular approaches used to deal with the imbalance problem in machine learning. We use these techniques to create synthetic time series observations and improve the accuracy of forecasting models. We carried out experiments using 7 different databases that contain a total of 5502 univariate time series. We found that the proposed solution outperforms both a global and a local model, thus providing a better trade-off between these two approaches. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.16918 [pdf, other]

On-the-fly Data Augmentation for Forecasting with Deep Learning

Authors: Vitor Cerqueira, Moisés Santos, Yassine Baghoussi, Carlos Soares

Abstract: Deep learning approaches are increasingly used to tackle forecasting tasks. A key factor in the successful application of these methods is a large enough training sample size, which is not always available. In these scenarios, synthetic data generation techniques are usually applied to augment the dataset. Data augmentation is typically applied before fitting a model. However, these approaches cre… ▽ More Deep learning approaches are increasingly used to tackle forecasting tasks. A key factor in the successful application of these methods is a large enough training sample size, which is not always available. In these scenarios, synthetic data generation techniques are usually applied to augment the dataset. Data augmentation is typically applied before fitting a model. However, these approaches create a single augmented dataset, potentially limiting their effectiveness. This work introduces OnDAT (On-the-fly Data Augmentation for Time series) to address this issue by applying data augmentation during training and validation. Contrary to traditional methods that create a single, static augmented dataset beforehand, OnDAT performs augmentation on-the-fly. By generating a new augmented dataset on each iteration, the model is exposed to a constantly changing augmented data variations. We hypothesize this process enables a better exploration of the data space, which reduces the potential for overfitting and improves forecasting performance. We validated the proposed approach using a state-of-the-art deep learning forecasting method and 8 benchmark datasets containing a total of 75797 time series. The experiments suggest that OnDAT leads to better forecasting performance than a strategy that applies data augmentation before training as well as a strategy that does not involve data augmentation. The method and experiments are publicly available. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2306.14563 [pdf, other]

Multi-output Ensembles for Multi-step Forecasting

Authors: Vitor Cerqueira, Luis Torgo

Abstract: This paper studies the application of ensembles composed of multi-output models for multi-step ahead forecasting problems. Dynamic ensembles have been commonly used for forecasting. However, these are typically designed for one-step-ahead tasks. On the other hand, the literature regarding the application of dynamic ensembles for multi-step ahead forecasting is scarce. Moreover, it is not clear how… ▽ More This paper studies the application of ensembles composed of multi-output models for multi-step ahead forecasting problems. Dynamic ensembles have been commonly used for forecasting. However, these are typically designed for one-step-ahead tasks. On the other hand, the literature regarding the application of dynamic ensembles for multi-step ahead forecasting is scarce. Moreover, it is not clear how the combination rule is applied across the forecasting horizon. We carried out extensive experiments to analyze the application of dynamic ensembles for multi-step forecasting. We resorted to a case study with 3568 time series and an ensemble of 30 multi-output models. We discovered that dynamic ensembles based on arbitrating and windowing present the best performance according to average rank. Moreover, as the horizon increases, most approaches struggle to outperform a static ensemble that assigns equal weights to all models. The experiments are publicly available in a repository. △ Less

Submitted 26 June, 2023; originally announced June 2023.

Comments: 19 pages, github repository available

arXiv:2306.12731 [pdf, ps, other]

Production of fully-heavy tetraquark states through the double parton scattering mechanism in $pp$ and $pA$ collisions

Authors: L. M. Abreu, F. Carvalho, J. V. C. Cerqueira, V. P. Goncalves

Abstract: The production of fully-heavy tetraquark states in proton-proton ($pp$) and proton-nucleus ($pA$) collisions at the center-of-mass energies of the Large Hadron Collider (LHC) and at the Future Circular Collider (FCC) is investigated considering that these states are produced through the double parton scattering mechanism. We estimate the cross sections for the $T_{4c}$, $T_{4b}$ and $T_{2b2c}$ sta… ▽ More The production of fully-heavy tetraquark states in proton-proton ($pp$) and proton-nucleus ($pA$) collisions at the center-of-mass energies of the Large Hadron Collider (LHC) and at the Future Circular Collider (FCC) is investigated considering that these states are produced through the double parton scattering mechanism. We estimate the cross sections for the $T_{4c}$, $T_{4b}$ and $T_{2b2c}$ states and present predictions for $pp$, $pCa$ and $pPb$ collisions considering the rapidity ranges covered by central and forward detectors. We demonstrate that the cross sections for $pA$ collisions are enhanced in comparison to the $pp$ predictions scaled by the atomic number. Moreover, our results indicate that a search of these exotic states is, in principle, feasible in the future runs of the LHC and FCC. △ Less

Submitted 22 June, 2023; originally announced June 2023.

Comments: 9 pages, 3 figures, 2 tables

arXiv:2206.09821 [pdf, other]

Exceedance Probability Forecasting via Regression for Significant Wave Height Prediction

Authors: Vitor Cerqueira, Luis Torgo

Abstract: Significant wave height forecasting is a key problem in ocean data analytics. This task affects several maritime operations, such as managing the passage of vessels or estimating the energy production from waves. In this work, we focus on the prediction of extreme values of significant wave height that can cause coastal disasters. This task is framed as an exceedance probability forecasting proble… ▽ More Significant wave height forecasting is a key problem in ocean data analytics. This task affects several maritime operations, such as managing the passage of vessels or estimating the energy production from waves. In this work, we focus on the prediction of extreme values of significant wave height that can cause coastal disasters. This task is framed as an exceedance probability forecasting problem. Accordingly, we aim to estimate the probability that the significant wave height will exceed a predefined critical threshold. This problem is usually solved using a probabilistic binary classification model or an ensemble of forecasts. Instead, we propose a novel approach based on point forecasting. Computing both type of forecasts (binary probabilities and point forecasts) can be useful for decision-makers. While a probabilistic binary forecast streamlines information for end-users concerning exceedance events, the point forecasts can provide additional insights into the upcoming future dynamics. The procedure of the proposed solution works by assuming that the point forecasts follow a distribution with the location parameter equal to that forecast. Then, we convert these point forecasts into exceedance probability estimates using the cumulative distribution function. We carried out experiments using data from a smart buoy placed on the coast of Halifax, Canada. The results suggest that the proposed methodology is better than state-of-the-art approaches for exceedance probability forecasting. △ Less

Submitted 6 May, 2024; v1 submitted 20 June, 2022; originally announced June 2022.

arXiv:2205.02553 [pdf, other]

Automated Imbalanced Classification via Layered Learning

Authors: Vitor Cerqueira, Luis Torgo, Paula Branco, Colin Bellinger

Abstract: In this paper we address imbalanced binary classification (IBC) tasks. Applying resampling strategies to balance the class distribution of training instances is a common approach to tackle these problems. Many state-of-the-art methods find instances of interest close to the decision boundary to drive the resampling process. However, under-sampling the majority class may potentially lead to importa… ▽ More In this paper we address imbalanced binary classification (IBC) tasks. Applying resampling strategies to balance the class distribution of training instances is a common approach to tackle these problems. Many state-of-the-art methods find instances of interest close to the decision boundary to drive the resampling process. However, under-sampling the majority class may potentially lead to important information loss. Over-sampling also may increase the chance of overfitting by propagating the information contained in instances from the minority class. The main contribution of our work is a new method called ICLL for tackling IBC tasks which is not based on resampling training observations. Instead, ICLL follows a layered learning paradigm to model the data in two stages. In the first layer, ICLL learns to distinguish cases close to the decision boundary from cases which are clearly from the majority class, where this dichotomy is defined using a hierarchical clustering analysis. In the subsequent layer, we use instances close to the decision boundary and instances from the minority class to solve the original predictive task. A second contribution of our work is the automatic definition of the layers which comprise the layered learning strategy using a hierarchical clustering model. This is a relevant discovery as this process is usually performed manually according to domain knowledge. We carried out extensive experiments using 100 benchmark data sets. The results show that the proposed method leads to a better performance relatively to several state-of-the-art methods for IBC. △ Less

Submitted 30 May, 2022; v1 submitted 5 May, 2022; originally announced May 2022.

arXiv:2112.14806 [pdf, other]

AutoFITS: Automatic Feature Engineering for Irregular Time Series

Authors: Pedro Costa, Vitor Cerqueira, João Vinagre

Abstract: A time series represents a set of observations collected over time. Typically, these observations are captured with a uniform sampling frequency (e.g. daily). When data points are observed in uneven time intervals the time series is referred to as irregular or intermittent. In such scenarios, the most common solution is to reconstruct the time series to make it regular, thus removing its intermitt… ▽ More A time series represents a set of observations collected over time. Typically, these observations are captured with a uniform sampling frequency (e.g. daily). When data points are observed in uneven time intervals the time series is referred to as irregular or intermittent. In such scenarios, the most common solution is to reconstruct the time series to make it regular, thus removing its intermittency. We hypothesise that, in irregular time series, the time at which each observation is collected may be helpful to summarise the dynamics of the data and improve forecasting performance. We study this idea by develo** a novel automatic feature engineering framework, which focuses on extracting information from this point of view, i.e., when each instance is collected. We study how valuable this information is by integrating it in a time series forecasting workflow and investigate how it compares to or complements state-of-the-art methods for regular time series forecasting. In the end, we contribute by providing a novel framework that tackles feature engineering for time series from an angle previously vastly ignored. We show that our approach has the potential to further extract more information about time series that significantly improves forecasting performance. △ Less

Submitted 29 December, 2021; originally announced December 2021.

arXiv:2104.01830 [pdf, other]

Model Compression for Dynamic Forecast Combination

Authors: Vitor Cerqueira, Luis Torgo, Carlos Soares, Albert Bifet

Abstract: The predictive advantage of combining several different predictive models is widely accepted. Particularly in time series forecasting problems, this combination is often dynamic to cope with potential non-stationary sources of variation present in the data. Despite their superior predictive performance, ensemble methods entail two main limitations: high computational costs and lack of transparency… ▽ More The predictive advantage of combining several different predictive models is widely accepted. Particularly in time series forecasting problems, this combination is often dynamic to cope with potential non-stationary sources of variation present in the data. Despite their superior predictive performance, ensemble methods entail two main limitations: high computational costs and lack of transparency. These issues often preclude the deployment of such approaches, in favour of simpler yet more efficient and reliable ones. In this paper, we leverage the idea of model compression to address this problem in time series forecasting tasks. Model compression approaches have been mostly unexplored for forecasting. Their application in time series is challenging due to the evolving nature of the data. Further, while the literature focuses on neural networks, we apply model compression to distinct types of methods. In an extensive set of experiments, we show that compressing dynamic forecasting ensembles into an individual model leads to a comparable predictive performance and a drastic reduction in computational costs. Further, the compressed individual model with best average rank is a rule-based regression model. Thus, model compression also leads to benefits in terms of model interpretability. The experiments carried in this paper are fully reproducible. △ Less

Submitted 5 April, 2021; originally announced April 2021.

arXiv:2104.00584 [pdf, other]

Model Selection for Time Series Forecasting: Empirical Analysis of Different Estimators

Authors: Vitor Cerqueira, Luis Torgo, Carlos Soares

Abstract: Evaluating predictive models is a crucial task in predictive analytics. This process is especially challenging with time series data where the observations show temporal dependencies. Several studies have analysed how different performance estimation methods compare with each other for approximating the true loss incurred by a given forecasting model. However, these studies do not address how the… ▽ More Evaluating predictive models is a crucial task in predictive analytics. This process is especially challenging with time series data where the observations show temporal dependencies. Several studies have analysed how different performance estimation methods compare with each other for approximating the true loss incurred by a given forecasting model. However, these studies do not address how the estimators behave for model selection: the ability to select the best solution among a set of alternatives. We address this issue and compare a set of estimation methods for model selection in time series forecasting tasks. We attempt to answer two main questions: (i) how often is the best possible model selected by the estimators; and (ii) what is the performance loss when it does not. We empirically found that the accuracy of the estimators for selecting the best solution is low, and the overall forecasting performance loss associated with the model selection process ranges from 1.2% to 2.3%. We also discovered that some factors, such as the sample size, are important in the relative performance of the estimators. △ Less

Submitted 11 February, 2022; v1 submitted 1 April, 2021; originally announced April 2021.

arXiv:2103.00903 [pdf, other]

STUDD: A Student-Teacher Method for Unsupervised Concept Drift Detection

Authors: Vitor Cerqueira, Heitor Murilo Gomes, Albert Bifet, Luis Torgo

Abstract: Concept drift detection is a crucial task in data stream evolving environments. Most of state of the art approaches designed to tackle this problem monitor the loss of predictive models. However, this approach falls short in many real-world scenarios, where the true labels are not readily available to compute the loss. In this context, there is increasing attention to approaches that perform conce… ▽ More Concept drift detection is a crucial task in data stream evolving environments. Most of state of the art approaches designed to tackle this problem monitor the loss of predictive models. However, this approach falls short in many real-world scenarios, where the true labels are not readily available to compute the loss. In this context, there is increasing attention to approaches that perform concept drift detection in an unsupervised manner, i.e., without access to the true labels. We propose a novel approach to unsupervised concept drift detection based on a student-teacher learning paradigm. Essentially, we create an auxiliary model (student) to mimic the behaviour of the primary model (teacher). At run-time, our approach is to use the teacher for predicting new instances and monitoring the mimicking loss of the student for concept drift detection. In a set of experiments using 19 data streams, we show that the proposed approach can detect concept drift and present a competitive behaviour relative to the state of the art approaches. △ Less

Submitted 1 March, 2021; originally announced March 2021.

Comments: 23 pages, single column

arXiv:2010.11595 [pdf, other]

Early Anomaly Detection in Time Series: A Hierarchical Approach for Predicting Critical Health Episodes

Authors: Vitor Cerqueira, Luis Torgo, Carlos Soares

Abstract: The early detection of anomalous events in time series data is essential in many domains of application. In this paper we deal with critical health events, which represent a significant cause of mortality in intensive care units of hospitals. The timely prediction of these events is crucial for mitigating their consequences and improving healthcare. One of the most common approaches to tackle earl… ▽ More The early detection of anomalous events in time series data is essential in many domains of application. In this paper we deal with critical health events, which represent a significant cause of mortality in intensive care units of hospitals. The timely prediction of these events is crucial for mitigating their consequences and improving healthcare. One of the most common approaches to tackle early anomaly detection problems is standard classification methods. In this paper we propose a novel method that uses a layered learning architecture to address these tasks. One key contribution of our work is the idea of pre-conditional events, which denote arbitrary but computable relaxed versions of the event of interest. We leverage this idea to break the original problem into two hierarchical layers, which we hypothesize are easier to solve. The results suggest that the proposed approach leads to a better performance relative to state of the art approaches for critical health episode prediction. △ Less

Submitted 22 October, 2020; originally announced October 2020.

arXiv:2010.07137 [pdf, other]

VEST: Automatic Feature Engineering for Forecasting

Authors: Vitor Cerqueira, Nuno Moniz, Carlos Soares

Abstract: Time series forecasting is a challenging task with applications in a wide range of domains. Auto-regression is one of the most common approaches to address these problems. Accordingly, observations are modelled by multiple regression using their past lags as predictor variables. We investigate the extension of auto-regressive processes using statistics which summarise the recent past dynamics of t… ▽ More Time series forecasting is a challenging task with applications in a wide range of domains. Auto-regression is one of the most common approaches to address these problems. Accordingly, observations are modelled by multiple regression using their past lags as predictor variables. We investigate the extension of auto-regressive processes using statistics which summarise the recent past dynamics of time series. The result of our research is a novel framework called VEST, designed to perform feature engineering using univariate and numeric time series automatically. The proposed approach works in three main steps. First, recent observations are mapped onto different representations. Second, each representation is summarised by statistical functions. Finally, a filter is applied for feature selection. We discovered that combining the features generated by VEST with auto-regression significantly improves forecasting performance. We provide evidence using 90 time series with high sampling frequency. VEST is publicly available online. △ Less

Submitted 14 October, 2020; originally announced October 2020.

Comments: 25 pages, R code available

arXiv:1910.01398 [pdf, other]

The effects of degrees of freedom estimation in the Asymmetric GARCH model with Student-t Innovations

Authors: T. C. O. Fonseca, V. S. Cerqueira, H. S. Migon, C. A. C. Torres

Abstract: This work investigates the effects of using the independent Jeffreys prior for the degrees of freedom parameter of a Student-t model in the asymmetric generalised autoregressive conditional heteroskedasticity (GARCH) model. To capture asymmetry in the reaction to past shocks, smooth transition models are assumed for the variance. We adopt the fully Bayesian approach for inference, prediction and m… ▽ More This work investigates the effects of using the independent Jeffreys prior for the degrees of freedom parameter of a Student-t model in the asymmetric generalised autoregressive conditional heteroskedasticity (GARCH) model. To capture asymmetry in the reaction to past shocks, smooth transition models are assumed for the variance. We adopt the fully Bayesian approach for inference, prediction and model selection We discuss problems related to the estimation of degrees of freedom in the Student-t model and propose a solution based on independent Jeffreys priors which correct problems in the likelihood function. A simulated study is presented to investigate how the estimation of model parameters in the Student-t GARCH model are affected by small sample sizes, prior distributions and misspecification regarding the sampling distribution. An application to the Dow Jones stock market data illustrates the usefulness of the asymmetric GARCH model with Student-t errors. △ Less

Submitted 3 October, 2019; originally announced October 2019.

arXiv:1909.13316 [pdf, other]

Machine Learning vs Statistical Methods for Time Series Forecasting: Size Matters

Authors: Vitor Cerqueira, Luis Torgo, Carlos Soares

Abstract: Time series forecasting is one of the most active research topics. Machine learning methods have been increasingly adopted to solve these predictive tasks. However, in a recent work, these were shown to systematically present a lower predictive performance relative to simple statistical methods. In this work, we counter these results. We show that these are only valid under an extremely low sample… ▽ More Time series forecasting is one of the most active research topics. Machine learning methods have been increasingly adopted to solve these predictive tasks. However, in a recent work, these were shown to systematically present a lower predictive performance relative to simple statistical methods. In this work, we counter these results. We show that these are only valid under an extremely low sample size. Using a learning curve method, our results suggest that machine learning methods improve their relative predictive performance as the sample size grows. The code to reproduce the experiments is available at https://github.com/vcerqueira/MLforForecasting. △ Less

Submitted 29 September, 2019; originally announced September 2019.

Comments: 9 pages

arXiv:1905.11744 [pdf, other]

doi 10.1007/s10994-020-05910-7

Evaluating time series forecasting models: An empirical study on performance estimation methods

Authors: Vitor Cerqueira, Luis Torgo, Igor Mozetic

Abstract: Performance estimation aims at estimating the loss that a predictive model will incur on unseen data. These procedures are part of the pipeline in every machine learning project and are used for assessing the overall generalisation ability of predictive models. In this paper we address the application of these methods to time series forecasting tasks. For independent and identically distributed da… ▽ More Performance estimation aims at estimating the loss that a predictive model will incur on unseen data. These procedures are part of the pipeline in every machine learning project and are used for assessing the overall generalisation ability of predictive models. In this paper we address the application of these methods to time series forecasting tasks. For independent and identically distributed data the most common approach is cross-validation. However, the dependency among observations in time series raises some caveats about the most appropriate way to estimate performance in this type of data and currently there is no settled way to do so. We compare different variants of cross-validation and of out-of-sample approaches using two case studies: One with 62 real-world time series and another with three synthetic time series. Results show noticeable differences in the performance estimation methods in the two scenarios. In particular, empirical experiments suggest that cross-validation approaches can be applied to stationary time series. However, in real-world scenarios, when different sources of non-stationary variation are at play, the most accurate estimates are produced by out-of-sample methods that preserve the temporal order of observations. △ Less

Submitted 28 May, 2019; originally announced May 2019.

Journal ref: Machine Learning 109:1997-2028, 2020

arXiv:1803.05160 [pdf, other]

doi 10.1371/journal.pone.0194317

How to evaluate sentiment classifiers for Twitter time-ordered data?

Authors: Igor Mozetič, Luis Torgo, Vitor Cerqueira, Jasmina Smailović

Abstract: Social media are becoming an increasingly important source of information about the public mood regarding issues such as elections, Brexit, stock market, etc. In this paper we focus on sentiment classification of Twitter data. Construction of sentiment classifiers is a standard text mining task, but here we address the question of how to properly evaluate them as there is no settled way to do so.… ▽ More Social media are becoming an increasingly important source of information about the public mood regarding issues such as elections, Brexit, stock market, etc. In this paper we focus on sentiment classification of Twitter data. Construction of sentiment classifiers is a standard text mining task, but here we address the question of how to properly evaluate them as there is no settled way to do so. Sentiment classes are ordered and unbalanced, and Twitter produces a stream of time-ordered data. The problem we address concerns the procedures used to obtain reliable estimates of performance measures, and whether the temporal ordering of the training and test data matters. We collected a large set of 1.5 million tweets in 13 European languages. We created 138 sentiment models and out-of-sample datasets, which are used as a gold standard for evaluations. The corresponding 138 in-sample datasets are used to empirically compare six different estimation procedures: three variants of cross-validation, and three variants of sequential validation (where test set always follows the training set). We find no significant difference between the best cross-validation and sequential validation. However, we observe that all cross-validation variants tend to overestimate the performance, while the sequential methods tend to underestimate it. Standard cross-validation with random selection of examples is significantly worse than the blocked cross-validation, and should not be used to evaluate classifiers in time-ordered data scenarios. △ Less

Submitted 14 March, 2018; originally announced March 2018.

Journal ref: PLoS ONE 13(3): e0194317, 2018

arXiv:1706.09367 [pdf, other]

autoBagging: Learning to Rank Bagging Workflows with Metalearning

Authors: Fábio Pinto, Vítor Cerqueira, Carlos Soares, João Mendes-Moreira

Abstract: Machine Learning (ML) has been successfully applied to a wide range of domains and applications. One of the techniques behind most of these successful applications is Ensemble Learning (EL), the field of ML that gave birth to methods such as Random Forests or Boosting. The complexity of applying these techniques together with the market scarcity on ML experts, has created the need for systems that… ▽ More Machine Learning (ML) has been successfully applied to a wide range of domains and applications. One of the techniques behind most of these successful applications is Ensemble Learning (EL), the field of ML that gave birth to methods such as Random Forests or Boosting. The complexity of applying these techniques together with the market scarcity on ML experts, has created the need for systems that enable a fast and easy drop-in replacement for ML libraries. Automated machine learning (autoML) is the field of ML that attempts to answers these needs. Typically, these systems rely on optimization techniques such as bayesian optimization to lead the search for the best model. Our approach differs from these systems by making use of the most recent advances on metalearning and a learning to rank approach to learn from metadata. We propose autoBagging, an autoML system that automatically ranks 63 bagging workflows by exploiting past performance and dataset characterization. Results on 140 classification datasets from the OpenML platform show that autoBagging can yield better performance than the Average Rank method and achieve results that are not statistically different from an ideal model that systematically selects the best workflow for each dataset. For the purpose of reproducibility and generalizability, autoBagging is publicly available as an R package on CRAN. △ Less

Submitted 28 June, 2017; originally announced June 2017.

Showing 1–20 of 20 results for author: Cerqueira, V