Search | arXiv e-print repository

Detach-ROCKET: Sequential feature selection for time series classification with random convolutional kernels

Authors: Gonzalo Uribarri, Federico Barone, Alessio Ansuini, Erik Fransén

Abstract: Time Series Classification (TSC) is essential in fields like medicine, environmental science, and finance, enabling tasks such as disease diagnosis, anomaly detection, and stock price analysis. While machine learning models like Recurrent Neural Networks and InceptionTime are successful in numerous applications, they can face scalability issues due to computational requirements. Recently, ROCKET h… ▽ More Time Series Classification (TSC) is essential in fields like medicine, environmental science, and finance, enabling tasks such as disease diagnosis, anomaly detection, and stock price analysis. While machine learning models like Recurrent Neural Networks and InceptionTime are successful in numerous applications, they can face scalability issues due to computational requirements. Recently, ROCKET has emerged as an efficient alternative, achieving state-of-the-art performance and simplifying training by utilizing a large number of randomly generated features from the time series data. However, many of these features are redundant or non-informative, increasing computational load and compromising generalization. Here we introduce Sequential Feature Detachment (SFD) to identify and prune non-essential features in ROCKET-based models, such as ROCKET, MiniRocket, and MultiRocket. SFD estimates feature importance using model coefficients and can handle large feature sets without complex hyperparameter tuning. Testing on the UCR archive shows that SFD can produce models with better test accuracy using only 10\% of the original features. We named these pruned models Detach-ROCKET. We also present an end-to-end procedure for determining an optimal balance between the number of features and model accuracy. On the largest binary UCR dataset, Detach-ROCKET improves test accuracy by 0.6\% while reducing features by 98.9\%. By enabling a significant reduction in model size without sacrificing accuracy, our methodology improves computational efficiency and contributes to model interpretability. We believe that Detach-ROCKET will be a valuable tool for researchers and practitioners working with time series data, who can find a user-friendly implementation of the model at \url{https://github.com/gon-uri/detach_rocket}. △ Less

Submitted 24 June, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

Comments: 18 pages, 5 figures, 3 tables

arXiv:2306.06081 [pdf, other]

Carefully Blending Adversarial Training and Purification Improves Adversarial Robustness

Authors: Emanuele Ballarin, Alessio Ansuini, Luca Bortolussi

Abstract: In this work, we propose a novel adversarial defence mechanism for image classification - CARSO - blending the paradigms of adversarial training and adversarial purification in a synergistic robustness-enhancing way. The method builds upon an adversarially-trained classifier, and learns to map its internal representation associated with a potentially perturbed input onto a distribution of tentativ… ▽ More In this work, we propose a novel adversarial defence mechanism for image classification - CARSO - blending the paradigms of adversarial training and adversarial purification in a synergistic robustness-enhancing way. The method builds upon an adversarially-trained classifier, and learns to map its internal representation associated with a potentially perturbed input onto a distribution of tentative clean reconstructions. Multiple samples from such distribution are classified by the same adversarially-trained model, and an aggregation of its outputs finally constitutes the robust prediction of interest. Experimental evaluation by a well-established benchmark of strong adaptive attacks, across different image datasets, shows that CARSO is able to defend itself against adaptive end-to-end white-box attacks devised for stochastic defences. Paying a modest clean accuracy toll, our method improves by a significant margin the state-of-the-art for CIFAR-10, CIFAR-100, and TinyImageNet-200 $\ell_\infty$ robust classification accuracy against AutoAttack. Code, and instructions to obtain pre-trained models are available at https://github.com/emaballarin/CARSO . △ Less

Submitted 23 May, 2024; v1 submitted 25 May, 2023; originally announced June 2023.

Comments: 21 pages, 1 figure, 15 tables

arXiv:2305.18353 [pdf, other]

Emergent representations in networks trained with the Forward-Forward algorithm

Authors: Niccolò Tosato, Lorenzo Basile, Emanuele Ballarin, Giuseppe de Alteriis, Alberto Cazzaniga, Alessio Ansuini

Abstract: The Backpropagation algorithm has often been criticised for its lack of biological realism. In an attempt to find a more biologically plausible alternative, the recently introduced Forward-Forward algorithm replaces the forward and backward passes of Backpropagation with two forward passes. In this work, we show that the internal representations obtained by the Forward-Forward algorithm can organi… ▽ More The Backpropagation algorithm has often been criticised for its lack of biological realism. In an attempt to find a more biologically plausible alternative, the recently introduced Forward-Forward algorithm replaces the forward and backward passes of Backpropagation with two forward passes. In this work, we show that the internal representations obtained by the Forward-Forward algorithm can organise into category-specific ensembles exhibiting high sparsity - composed of a low number of active units. This situation is reminiscent of what has been observed in cortical sensory areas, where neuronal ensembles are suggested to serve as the functional building blocks for perception and action. Interestingly, while this sparse pattern does not typically arise in models trained with standard Backpropagation, it can emerge in networks trained with Backpropagation on the same objective proposed for the Forward-Forward algorithm. These results suggest that the learning procedure proposed by Forward-Forward may be superior to Backpropagation in modelling learning in the cortex, even when a backward pass is used. △ Less

Submitted 19 June, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

arXiv:2302.00294 [pdf, other]

The geometry of hidden representations of large transformer models

Authors: Lucrezia Valeriani, Diego Doimo, Francesca Cuturello, Alessandro Laio, Alessio Ansuini, Alberto Cazzaniga

Abstract: Large transformers are powerful architectures used for self-supervised data analysis across various data types, including protein sequences, images, and text. In these models, the semantic structure of the dataset emerges from a sequence of transformations between one representation and the next. We characterize the geometric and statistical properties of these representations and how they change… ▽ More Large transformers are powerful architectures used for self-supervised data analysis across various data types, including protein sequences, images, and text. In these models, the semantic structure of the dataset emerges from a sequence of transformations between one representation and the next. We characterize the geometric and statistical properties of these representations and how they change as we move through the layers. By analyzing the intrinsic dimension (ID) and neighbor composition, we find that the representations evolve similarly in transformers trained on protein language tasks and image reconstruction tasks. In the first layers, the data manifold expands, becoming high-dimensional, and then contracts significantly in the intermediate layers. In the last part of the model, the ID remains approximately constant or forms a second shallow peak. We show that the semantic information of the dataset is better expressed at the end of the first peak, and this phenomenon can be observed across many models trained on diverse datasets. Based on our findings, we point out an explicit strategy to identify, without supervision, the layers that maximize semantic content: representations at intermediate layers corresponding to a relative minimum of the ID profile are more suitable for downstream learning tasks. △ Less

Submitted 30 October, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

arXiv:2202.05676 [pdf, other]

Deep artificial neural network for prediction of atrial fibrillation through the analysis of 12-leads standard ECG

Authors: A. Scagnetto, G. Barbati, I. Gandin, C. Cappelletto, G. Baj, A. Cazzaniga, F. Cuturello, A. Ansuini, L. Bortolussi, A. Di Lenarda

Abstract: Atrial Fibrillation (AF) is a heart's arrhythmia which, despite being often asymptomatic, represents an important risk factor for stroke, therefore being able to predict AF at the electrocardiogram exam, would be of great impact on actively targeting patients at high risk. In the present work we use Convolution Neural Networks to analyze ECG and predict Atrial Fibrillation starting from realistic… ▽ More Atrial Fibrillation (AF) is a heart's arrhythmia which, despite being often asymptomatic, represents an important risk factor for stroke, therefore being able to predict AF at the electrocardiogram exam, would be of great impact on actively targeting patients at high risk. In the present work we use Convolution Neural Networks to analyze ECG and predict Atrial Fibrillation starting from realistic datasets, i.e. considering fewer ECG than other studies and extending the maximal distance between ECG and AF diagnosis. We achieved 75.5% (0.75) AUC firstly increasing our dataset size by a shifting technique and secondarily using the dilation parameter of the convolution neural network. In addition we find that, contrarily to what is commonly used by clinicians reporting AF at the exam, the most informative leads for the task of predicting AF are D1 and avR. Similarly, we find that the most important frequencies to check are in the range of 5-20 Hz. Finally, we develop a net able to manage at the same time the electrocardiographic signal together with the electronic health record, showing that integration between different sources of data is a profitable path. In fact, the 2.8% gain of such net brings us to a 78.6% (std 0.77) AUC. In future works we will deepen both the integration of sources and the reason why we claim avR is the most informative lead. △ Less

Submitted 14 January, 2022; originally announced February 2022.

Comments: 10 pages, 2 figures, 5 tables

arXiv:2007.03506 [pdf, other]

Hierarchical nucleation in deep neural networks

Authors: Diego Doimo, Aldo Glielmo, Alessio Ansuini, Alessandro Laio

Abstract: Deep convolutional networks (DCNs) learn meaningful representations where data that share the same abstract characteristics are positioned closer and closer. Understanding these representations and how they are generated is of unquestioned practical and theoretical interest. In this work we study the evolution of the probability density of the ImageNet dataset across the hidden layers in some stat… ▽ More Deep convolutional networks (DCNs) learn meaningful representations where data that share the same abstract characteristics are positioned closer and closer. Understanding these representations and how they are generated is of unquestioned practical and theoretical interest. In this work we study the evolution of the probability density of the ImageNet dataset across the hidden layers in some state-of-the-art DCNs. We find that the initial layers generate a unimodal probability density getting rid of any structure irrelevant for classification. In subsequent layers density peaks arise in a hierarchical fashion that mirrors the semantic hierarchy of the concepts. Density peaks corresponding to single categories appear only close to the output and via a very sharp transition which resembles the nucleation process of a heterogeneous liquid. This process leaves a footprint in the probability density of the output layer where the topography of the peaks allows reconstructing the semantic relationships of the categories. △ Less

Submitted 9 July, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

arXiv:1905.12784 [pdf, ps, other]

Intrinsic dimension of data representations in deep neural networks

Authors: Alessio Ansuini, Alessandro Laio, Jakob H. Macke, Davide Zoccolan

Abstract: Deep neural networks progressively transform their inputs across multiple processing layers. What are the geometrical properties of the representations learned by these networks? Here we study the intrinsic dimensionality (ID) of data-representations, i.e. the minimal number of parameters needed to describe a representation. We find that, in a trained network, the ID is orders of magnitude smaller… ▽ More Deep neural networks progressively transform their inputs across multiple processing layers. What are the geometrical properties of the representations learned by these networks? Here we study the intrinsic dimensionality (ID) of data-representations, i.e. the minimal number of parameters needed to describe a representation. We find that, in a trained network, the ID is orders of magnitude smaller than the number of units in each layer. Across layers, the ID first increases and then progressively decreases in the final layers. Remarkably, the ID of the last hidden layer predicts classification accuracy on the test set. These results can neither be found by linear dimensionality estimates (e.g., with principal component analysis), nor in representations that had been artificially linearized. They are neither found in untrained networks, nor in networks that are trained on randomized labels. This suggests that neural networks that can generalize are those that transform the data into low-dimensional, but not necessarily flat manifolds. △ Less

Submitted 28 October, 2019; v1 submitted 29 May, 2019; originally announced May 2019.

Comments: NeurIPS 2019

arXiv:1812.02504 [pdf, ps, other]

Observing the Population Dynamics in GE by means of the Intrinsic Dimension

Authors: Eric Medvet, Alberto Bartoli, Alessio Ansuini, Fabiano Tarlao

Abstract: We explore the use of Intrinsic Dimension (ID) for gaining insights in how populations evolve in Evolutionary Algorithms. ID measures the minimum number of dimensions needed to accurately describe a dataset and its estimators are being used more and more in Machine Learning to cope with large datasets. We postulate that ID can provide information about population which is complimentary w.r.t.\ wha… ▽ More We explore the use of Intrinsic Dimension (ID) for gaining insights in how populations evolve in Evolutionary Algorithms. ID measures the minimum number of dimensions needed to accurately describe a dataset and its estimators are being used more and more in Machine Learning to cope with large datasets. We postulate that ID can provide information about population which is complimentary w.r.t.\ what (a simple measure of) diversity tells. We experimented with the application of ID to populations evolved with a recent variant of Grammatical Evolution. The preliminary results suggest that diversity and ID constitute two different points of view on the population dynamics. △ Less

Submitted 6 December, 2018; originally announced December 2018.

Comments: Evolutionary Machine Learning workshop at International Conference on Parallel Problem Solving from Nature (EML@PPSN), 2018, Coimbra (Portugal)

Showing 1–8 of 8 results for author: Ansuini, A