-
Back to Basics: A Sanity Check on Modern Time Series Classification Algorithms
Authors:
Bhaskar Dhariyal,
Thach Le Nguyen,
Georgiana Ifrim
Abstract:
The state-of-the-art in time series classification has come a long way, from the 1NN-DTW algorithm to the ROCKET family of classifiers. However, in the current fast-paced development of new classifiers, taking a step back and performing simple baseline checks is essential. These checks are often overlooked, as researchers are focused on establishing new state-of-the-art results, develo** scalabl…
▽ More
The state-of-the-art in time series classification has come a long way, from the 1NN-DTW algorithm to the ROCKET family of classifiers. However, in the current fast-paced development of new classifiers, taking a step back and performing simple baseline checks is essential. These checks are often overlooked, as researchers are focused on establishing new state-of-the-art results, develo** scalable algorithms, and making models explainable. Nevertheless, there are many datasets that look like time series at first glance, but classic algorithms such as tabular methods with no time ordering may perform better on such problems. For example, for spectroscopy datasets, tabular methods tend to significantly outperform recent time series methods. In this study, we compare the performance of tabular models using classic machine learning approaches (e.g., Ridge, LDA, RandomForest) with the ROCKET family of classifiers (e.g., Rocket, MiniRocket, MultiRocket). Tabular models are simple and very efficient, while the ROCKET family of classifiers are more complex and have state-of-the-art accuracy and efficiency among recent time series classifiers. We find that tabular models outperform the ROCKET family of classifiers on approximately 19% of univariate and 28% of multivariate datasets in the UCR/UEA benchmark and achieve accuracy within 10 percentage points on about 50% of datasets. Our results suggest that it is important to consider simple tabular models as baselines when develo** time series classifiers. These models are very fast, can be as effective as more complex methods and may be easier to understand and deploy.
△ Less
Submitted 15 August, 2023;
originally announced August 2023.
-
Classification of cow diet based on milk mid infrared spectra: a data analysis competition at the "International workshop of spectroscopy and chemometrics 2022"
Authors:
Maria Frizzarin,
Giulio Visentin,
Alessandro Ferragina,
Elena Hayes,
Antonio Bevilacqua,
Bhaskar Dhariyal,
Katarina Domijan,
Hussain Khan,
Georgiana Ifrim,
Thach Le Nguyen,
Joe Meagher,
Laura Menchetti,
Ashish Singh,
Suzy Whoriskey,
Robert Williamson,
Martina Zappaterra,
Alessandro Casa
Abstract:
In April 2022, the Vistamilk SFI Research Centre organized the second edition of the "International Workshop on Spectroscopy and Chemometrics - Applications in Food and Agriculture". Within this event, a data challenge was organized among participants of the workshop. Such data competition aimed at develo** a prediction model to discriminate dairy cows' diet based on milk spectral information co…
▽ More
In April 2022, the Vistamilk SFI Research Centre organized the second edition of the "International Workshop on Spectroscopy and Chemometrics - Applications in Food and Agriculture". Within this event, a data challenge was organized among participants of the workshop. Such data competition aimed at develo** a prediction model to discriminate dairy cows' diet based on milk spectral information collected in the mid-infrared region. In fact, the development of an accurate and reliable discriminant model for dairy cows' diet can provide important authentication tools for dairy processors to guarantee product origin for dairy food manufacturers from grass-fed animals. Different statistical and machine learning modelling approaches have been employed during the workshop, with different pre-processing steps involved and different degree of complexity. The present paper aims to describe the statistical methods adopted by participants to develop such classification model.
△ Less
Submitted 10 October, 2022;
originally announced October 2022.
-
Scalable Classifier-Agnostic Channel Selection for Multivariate Time Series Classification
Authors:
Bhaskar Dhariyal,
Thach Le Nguyen,
Georgiana Ifrim
Abstract:
Accuracy is a key focus of current work in time series classification. However, speed and data reduction in many applications is equally important, especially when the data scale and storage requirements increase rapidly. Current MTSC algorithms need hundreds of compute hours to complete training and prediction. This is due to the nature of multivariate time series data, which grows with the numbe…
▽ More
Accuracy is a key focus of current work in time series classification. However, speed and data reduction in many applications is equally important, especially when the data scale and storage requirements increase rapidly. Current MTSC algorithms need hundreds of compute hours to complete training and prediction. This is due to the nature of multivariate time series data, which grows with the number of time series, their length and the number of channels. In many applications, not all the channels are useful for the classification task; hence we require methods that can efficiently select useful channels and thus save computational resources. We propose and evaluate two methods for channel selection. Our techniques work by representing each class by a prototype time series and performing channel selection based on the prototype distance between classes. The main hypothesis is that useful channels enable better separation between classes; hence, channels with the higher distance between class prototypes are more useful. On the UEA Multivariate Time Series Classification (MTSC) benchmark, we show that these techniques achieve significant data reduction and classifier speedup for similar levels of classification accuracy. Channel selection is applied as a pre-processing step before training state-of-the-art MTSC algorithms and saves about 70\% of computation time and data storage, with preserved accuracy. Furthermore, our methods enable even efficient classifiers, such as ROCKET, to achieve better accuracy than using no channel selection or forward channel selection. To further study the impact of our techniques, we present experiments on classifying synthetic multivariate time series datasets with more than 100 channels, as well as a real-world case study on a dataset with 50 channels. Our channel selection methods lead to significant data reduction with preserved or improved accuracy.
△ Less
Submitted 5 December, 2022; v1 submitted 18 June, 2022;
originally announced June 2022.
-
Mid infrared spectroscopy and milk quality traits: a data analysis competition at the "International Workshop on Spectroscopy and Chemometrics 2021"
Authors:
Maria Frizzarin,
Antonio Bevilacqua,
Bhaskar Dhariyal,
Katarina Domijan,
Federico Ferraccioli,
Elena Hayes,
Georgiana Ifrim,
Agnieszka Konkolewska,
Thach Le Nguyen,
Uche Mbaka,
Giovanna Ranzato,
Ashish Singh,
Marco Stefanucci,
Alessandro Casa
Abstract:
A chemometric data analysis challenge has been arranged during the first edition of the "International Workshop on Spectroscopy and Chemometrics", organized by the Vistamilk SFI Research Centre and held online in April 2021. The aim of the competition was to build a calibration model in order to predict milk quality traits exploiting the information contained in mid-infrared spectra only. Three di…
▽ More
A chemometric data analysis challenge has been arranged during the first edition of the "International Workshop on Spectroscopy and Chemometrics", organized by the Vistamilk SFI Research Centre and held online in April 2021. The aim of the competition was to build a calibration model in order to predict milk quality traits exploiting the information contained in mid-infrared spectra only. Three different traits have been provided, presenting heterogeneous degrees of prediction complexity thus possibly requiring trait-specific modelling choices. In this paper the different approaches adopted by the participants are outlined and the insights obtained from the analyses are critically discussed.
△ Less
Submitted 19 September, 2022; v1 submitted 5 July, 2021;
originally announced July 2021.