-
Effective Layer Pruning Through Similarity Metric Perspective
Authors:
Ian Pons,
Bruno Yamamoto,
Anna H. Reali Costa,
Artur Jordao
Abstract:
Deep neural networks have been the predominant paradigm in machine learning for solving cognitive tasks. Such models, however, are restricted by a high computational overhead, limiting their applicability and hindering advancements in the field. Extensive research demonstrated that pruning structures from these models is a straightforward approach to reducing network complexity. In this direction,…
▽ More
Deep neural networks have been the predominant paradigm in machine learning for solving cognitive tasks. Such models, however, are restricted by a high computational overhead, limiting their applicability and hindering advancements in the field. Extensive research demonstrated that pruning structures from these models is a straightforward approach to reducing network complexity. In this direction, most efforts focus on removing weights or filters. Studies have also been devoted to layer pruning as it promotes superior computational gains. However, layer pruning often hurts the network predictive ability (i.e., accuracy) at high compression rates. This work introduces an effective layer-pruning strategy that meets all underlying properties pursued by pruning methods. Our method estimates the relative importance of a layer using the Centered Kernel Alignment (CKA) metric, employed to measure the similarity between the representations of the unpruned model and a candidate layer for pruning. We confirm the effectiveness of our method on standard architectures and benchmarks, in which it outperforms existing layer-pruning strategies and other state-of-the-art pruning techniques. Particularly, we remove more than 75% of computation while improving predictive ability. At higher compression regimes, our method exhibits negligible accuracy drop, while other methods notably deteriorate model accuracy. Apart from these benefits, our pruned models exhibit robustness to adversarial and out-of-distribution samples.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
When Layers Play the Lottery, all Tickets Win at Initialization
Authors:
Artur Jordao,
George Correa de Araujo,
Helena de Almeida Maia,
Helio Pedrini
Abstract:
Pruning is a standard technique for reducing the computational cost of deep networks. Many advances in pruning leverage concepts from the Lottery Ticket Hypothesis (LTH). LTH reveals that inside a trained dense network exists sparse subnetworks (tickets) able to achieve similar accuracy (i.e., win the lottery - winning tickets). Pruning at initialization focuses on finding winning tickets without…
▽ More
Pruning is a standard technique for reducing the computational cost of deep networks. Many advances in pruning leverage concepts from the Lottery Ticket Hypothesis (LTH). LTH reveals that inside a trained dense network exists sparse subnetworks (tickets) able to achieve similar accuracy (i.e., win the lottery - winning tickets). Pruning at initialization focuses on finding winning tickets without training a dense network. Studies on these concepts share the trend that subnetworks come from weight or filter pruning. In this work, we investigate LTH and pruning at initialization from the lens of layer pruning. First, we confirm the existence of winning tickets when the pruning process removes layers. Leveraged by this observation, we propose to discover these winning tickets at initialization, eliminating the requirement of heavy computational resources for training the initial (over-parameterized) dense network. Extensive experiments show that our winning tickets notably speed up the training phase and reduce up to 51% of carbon emission, an important step towards democratization and green Artificial Intelligence. Beyond computational benefits, our winning tickets exhibit robustness against adversarial and out-of-distribution examples. Finally, we show that our subnetworks easily win the lottery at initialization while tickets from filter removal (the standard structured LTH) hardly become winning tickets.
△ Less
Submitted 19 March, 2024; v1 submitted 25 January, 2023;
originally announced January 2023.
-
On the Effect of Pruning on Adversarial Robustness
Authors:
Artur Jordao,
Helio Pedrini
Abstract:
Pruning is a well-known mechanism for reducing the computational cost of deep convolutional networks. However, studies have shown the potential of pruning as a form of regularization, which reduces overfitting and improves generalization. We demonstrate that this family of strategies provides additional benefits beyond computational performance and generalization. Our analyses reveal that pruning…
▽ More
Pruning is a well-known mechanism for reducing the computational cost of deep convolutional networks. However, studies have shown the potential of pruning as a form of regularization, which reduces overfitting and improves generalization. We demonstrate that this family of strategies provides additional benefits beyond computational performance and generalization. Our analyses reveal that pruning structures (filters and/or layers) from convolutional networks increase not only generalization but also robustness to adversarial images (natural images with content modified). Such achievements are possible since pruning reduces network capacity and provides regularization, which have been proven effective tools against adversarial images. In contrast to promising defense mechanisms that require training with adversarial images and careful regularization, we show that pruning obtains competitive results considering only natural images (e.g., the standard and low-cost training). We confirm these findings on several adversarial attacks and architectures; thus suggesting the potential of pruning as a novel defense mechanism against adversarial images.
△ Less
Submitted 24 November, 2021; v1 submitted 10 August, 2021;
originally announced August 2021.
-
Stage-Wise Neural Architecture Search
Authors:
Artur Jordao,
Fernando Akio,
Maiko Lie,
William Robson Schwartz
Abstract:
Modern convolutional networks such as ResNet and NASNet have achieved state-of-the-art results in many computer vision applications. These architectures consist of stages, which are sets of layers that operate on representations in the same resolution. It has been demonstrated that increasing the number of layers in each stage improves the prediction ability of the network. However, the resulting…
▽ More
Modern convolutional networks such as ResNet and NASNet have achieved state-of-the-art results in many computer vision applications. These architectures consist of stages, which are sets of layers that operate on representations in the same resolution. It has been demonstrated that increasing the number of layers in each stage improves the prediction ability of the network. However, the resulting architecture becomes computationally expensive in terms of floating point operations, memory requirements and inference time. Thus, significant human effort is necessary to evaluate different trade-offs between depth and performance. To handle this problem, recent works have proposed to automatically design high-performance architectures, mainly by means of neural architecture search (NAS). Current NAS strategies analyze a large set of possible candidate architectures and, hence, require vast computational resources and take many GPUs days. Motivated by this, we propose a NAS approach to efficiently design accurate and low-cost convolutional architectures and demonstrate that an efficient strategy for designing these architectures is to learn the depth stage-by-stage. For this purpose, our approach increases depth incrementally in each stage taking into account its importance, such that stages with low importance are kept shallow while stages with high importance become deeper. We conduct experiments on the CIFAR and different versions of ImageNet datasets, where we show that architectures discovered by our approach achieve better accuracy and efficiency than human-designed architectures. Additionally, we show that architectures discovered on CIFAR-10 can be successfully transferred to large datasets. Compared to previous NAS approaches, our method is substantially more efficient, as it evaluates one order of magnitude fewer models and yields architectures on par with the state-of-the-art.
△ Less
Submitted 19 October, 2020; v1 submitted 23 April, 2020;
originally announced April 2020.
-
Covariance-free Partial Least Squares: An Incremental Dimensionality Reduction Method
Authors:
Artur Jordao,
Maiko Lie,
Victor Hugo Cunha de Melo,
William Robson Schwartz
Abstract:
Dimensionality reduction plays an important role in computer vision problems since it reduces computational cost and is often capable of yielding more discriminative data representation. In this context, Partial Least Squares (PLS) has presented notable results in tasks such as image classification and neural network optimization. However, PLS is infeasible on large datasets, such as ImageNet, bec…
▽ More
Dimensionality reduction plays an important role in computer vision problems since it reduces computational cost and is often capable of yielding more discriminative data representation. In this context, Partial Least Squares (PLS) has presented notable results in tasks such as image classification and neural network optimization. However, PLS is infeasible on large datasets, such as ImageNet, because it requires all the data to be in memory in advance, which is often impractical due to hardware limitations. Additionally, this requirement prevents us from employing PLS on streaming applications where the data are being continuously generated. Motivated by this, we propose a novel incremental PLS, named Covariance-free Incremental Partial Least Squares (CIPLS), which learns a low-dimensional representation of the data using a single sample at a time. In contrast to other state-of-the-art approaches, instead of adopting a partially-discriminative or SGD-based model, we extend Nonlinear Iterative Partial Least Squares (NIPALS) -- the standard algorithm used to compute PLS -- for incremental processing. Among the advantages of this approach are the preservation of discriminative information across all components, the possibility of employing its score matrices for feature selection, and its computational efficiency. We validate CIPLS on face verification and image classification tasks, where it outperforms several other incremental dimensionality reduction techniques. In the context of feature selection, CIPLS achieves comparable results when compared to state-of-the-art techniques.
△ Less
Submitted 10 November, 2020; v1 submitted 5 October, 2019;
originally announced October 2019.
-
Pruning Deep Neural Networks using Partial Least Squares
Authors:
Artur Jordao,
Ricardo Kloss,
Fernando Yamada,
William Robson Schwartz
Abstract:
Modern pattern recognition methods are based on convolutional networks since they are able to learn complex patterns that benefit the classification. However, convolutional networks are computationally expensive and require a considerable amount of memory, which limits their deployment on low-power and resource-constrained systems. To handle these problems, recent approaches have proposed pruning…
▽ More
Modern pattern recognition methods are based on convolutional networks since they are able to learn complex patterns that benefit the classification. However, convolutional networks are computationally expensive and require a considerable amount of memory, which limits their deployment on low-power and resource-constrained systems. To handle these problems, recent approaches have proposed pruning strategies that find and remove unimportant neurons (i.e., filters) in these networks. Despite achieving remarkable results, existing pruning approaches are ineffective since the accuracy of the original network is degraded. In this work, we propose a novel approach to efficiently remove filters from convolutional networks. Our approach estimates the filter importance based on its relationship with the class label on a low-dimensional space. This relationship is computed using Partial Least Squares (PLS) and Variable Importance in Projection (VIP). Our method is able to reduce up to 67% of the floating point operations (FLOPs) without penalizing the network accuracy. With a negligible drop in accuracy, we can reduce up to 90% of FLOPs. Additionally, sometimes the method is even able to improve the accuracy compared to original, unpruned, network. We show that employing PLS+VIP as the criterion for detecting the filters to be removed is better than recent feature selection techniques, which have been employed by state-of-the-art pruning methods. Finally, we show that the proposed method achieves the highest FLOPs reduction and the smallest drop in accuracy when compared to state-of-the-art pruning approaches. Codes are available at: https://github.com/arturjordao/PruningNeuralNetworks
△ Less
Submitted 19 September, 2019; v1 submitted 17 October, 2018;
originally announced October 2018.
-
Human Activity Recognition Based on Wearable Sensor Data: A Standardization of the State-of-the-Art
Authors:
Artur Jordao,
Antonio C. Nazare Jr.,
Jessica Sena,
William Robson Schwartz
Abstract:
Human activity recognition based on wearable sensor data has been an attractive research topic due to its application in areas such as healthcare and smart environments. In this context, many works have presented remarkable results using accelerometer, gyroscope and magnetometer data to represent the activities categories. However, current studies do not consider important issues that lead to skew…
▽ More
Human activity recognition based on wearable sensor data has been an attractive research topic due to its application in areas such as healthcare and smart environments. In this context, many works have presented remarkable results using accelerometer, gyroscope and magnetometer data to represent the activities categories. However, current studies do not consider important issues that lead to skewed results, making it hard to assess the quality of sensor-based human activity recognition and preventing a direct comparison of previous works. These issues include the samples generation processes and the validation protocols used. We emphasize that in other research areas, such as image classification and object detection, these issues are already well-defined, which brings more efforts towards the application. Inspired by this, we conduct an extensive set of experiments that analyze different sample generation processes and validation protocols to indicate the vulnerable points in human activity recognition based on wearable sensor data. For this purpose, we implement and evaluate several top-performance methods, ranging from handcrafted-based approaches to convolutional neural networks. According to our study, most of the experimental evaluations that are currently employed are not adequate to perform the activity recognition in the context of wearable sensor data, in which the recognition accuracy drops considerably when compared to an appropriate evaluation approach. To the best of our knowledge, this is the first study that tackles essential issues that compromise the understanding of the performance in human activity recognition based on wearable sensor data.
△ Less
Submitted 1 February, 2019; v1 submitted 13 June, 2018;
originally announced June 2018.
-
A Content-Based Late Fusion Approach Applied to Pedestrian Detection
Authors:
Jessica Sena,
Artur Jordao,
William Robson Schwartz
Abstract:
The variety of pedestrians detectors proposed in recent years has encouraged some works to fuse pedestrian detectors to achieve a more accurate detection. The intuition behind is to combine the detectors based on its spatial consensus. We propose a novel method called Content-Based Spatial Consensus (CSBC), which, in addition to relying on spatial consensus, considers the content of the detection…
▽ More
The variety of pedestrians detectors proposed in recent years has encouraged some works to fuse pedestrian detectors to achieve a more accurate detection. The intuition behind is to combine the detectors based on its spatial consensus. We propose a novel method called Content-Based Spatial Consensus (CSBC), which, in addition to relying on spatial consensus, considers the content of the detection windows to learn a weighted-fusion of pedestrian detectors. The result is a reduction in false alarms and an enhancement in the detection. In this work, we also demonstrate that there is small influence of the feature used to learn the contents of the windows of each detector, which enables our method to be efficient even employing simple features. The CSBC overcomes state-of-the-art fusion methods in the ETH dataset and in the Caltech dataset. Particularly, our method is more efficient since fewer detectors are necessary to achieve expressive results.
△ Less
Submitted 8 June, 2018;
originally announced June 2018.
-
Latent hypernet: Exploring all Layers from Convolutional Neural Networks
Authors:
Artur Jordao,
Ricardo Kloss,
William Robson Schwartz
Abstract:
Since Convolutional Neural Networks (ConvNets) are able to simultaneously learn features and classifiers to discriminate different categories of activities, recent works have employed ConvNets approaches to perform human activity recognition (HAR) based on wearable sensors, allowing the removal of expensive human work and expert knowledge. However, these approaches have their power of discriminati…
▽ More
Since Convolutional Neural Networks (ConvNets) are able to simultaneously learn features and classifiers to discriminate different categories of activities, recent works have employed ConvNets approaches to perform human activity recognition (HAR) based on wearable sensors, allowing the removal of expensive human work and expert knowledge. However, these approaches have their power of discrimination limited mainly by the large number of parameters that compose the network and the reduced number of samples available for training. Inspired by this, we propose an accurate and robust approach, referred to as Latent HyperNet (LHN). The LHN uses feature maps from early layers (hyper) and projects them, individually, onto a low dimensionality space (latent). Then, these latent features are concatenated and presented to a classifier. To demonstrate the robustness and accuracy of the LHN, we evaluate it using four different networks architectures in five publicly available HAR datasets based on wearable sensors, which vary in the sampling rate and number of activities. Our experiments demonstrate that the proposed LHN is able to produce rich information, improving the results regarding the original ConvNets. Furthermore, the method outperforms existing state-of-the-art methods.
△ Less
Submitted 16 November, 2018; v1 submitted 7 November, 2017;
originally announced November 2017.