-
Unveiling the Unseen: Identifiable Clusters in Trained Depthwise Convolutional Kernels
Authors:
Zahra Babaiee,
Peyman M. Kiasari,
Daniela Rus,
Radu Grosu
Abstract:
Recent advances in depthwise-separable convolutional neural networks (DS-CNNs) have led to novel architectures, that surpass the performance of classical CNNs, by a considerable scalability and accuracy margin. This paper reveals another striking property of DS-CNN architectures: discernible and explainable patterns emerge in their trained depthwise convolutional kernels in all layers. Through an…
▽ More
Recent advances in depthwise-separable convolutional neural networks (DS-CNNs) have led to novel architectures, that surpass the performance of classical CNNs, by a considerable scalability and accuracy margin. This paper reveals another striking property of DS-CNN architectures: discernible and explainable patterns emerge in their trained depthwise convolutional kernels in all layers. Through an extensive analysis of millions of trained filters, with different sizes and from various models, we employed unsupervised clustering with autoencoders, to categorize these filters. Astonishingly, the patterns converged into a few main clusters, each resembling the difference of Gaussian (DoG) functions, and their first and second-order derivatives. Notably, we were able to classify over 95\% and 90\% of the filters from state-of-the-art ConvNextV2 and ConvNeXt models, respectively. This finding is not merely a technological curiosity; it echoes the foundational models neuroscientists have long proposed for the vision systems of mammals. Our results thus deepen our understanding of the emergent properties of trained DS-CNNs and provide a bridge between artificial and biological visual processing systems. More broadly, they pave the way for more interpretable and biologically-inspired neural network designs in the future.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Neural Echos: Depthwise Convolutional Filters Replicate Biological Receptive Fields
Authors:
Zahra Babaiee,
Peyman M. Kiasari,
Daniela Rus,
Radu Grosu
Abstract:
In this study, we present evidence suggesting that depthwise convolutional kernels are effectively replicating the structural intricacies of the biological receptive fields observed in the mammalian retina. We provide analytics of trained kernels from various state-of-the-art models substantiating this evidence. Inspired by this intriguing discovery, we propose an initialization scheme that draws…
▽ More
In this study, we present evidence suggesting that depthwise convolutional kernels are effectively replicating the structural intricacies of the biological receptive fields observed in the mammalian retina. We provide analytics of trained kernels from various state-of-the-art models substantiating this evidence. Inspired by this intriguing discovery, we propose an initialization scheme that draws inspiration from the biological receptive fields. Experimental analysis of the ImageNet dataset with multiple CNN architectures featuring depthwise convolutions reveals a marked enhancement in the accuracy of the learned model when initialized with biologically derived weights. This underlies the potential for biologically inspired computational models to further our understanding of vision processing systems and to improve the efficacy of convolutional networks.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Prediction of Tourism Flow with Sparse Geolocation Data
Authors:
Julian Lemmel,
Zahra Babaiee,
Marvin Kleinlehner,
Ivan Majic,
Philipp Neubauer,
Johannes Scholz,
Radu Grosu,
Sophie A. Neubauer
Abstract:
Modern tourism in the 21st century is facing numerous challenges. Among these the rapidly growing number of tourists visiting space-limited regions like historical cities, museums and bottlenecks such as bridges is one of the biggest. In this context, a proper and accurate prediction of tourism volume and tourism flow within a certain area is important and critical for visitor management tasks suc…
▽ More
Modern tourism in the 21st century is facing numerous challenges. Among these the rapidly growing number of tourists visiting space-limited regions like historical cities, museums and bottlenecks such as bridges is one of the biggest. In this context, a proper and accurate prediction of tourism volume and tourism flow within a certain area is important and critical for visitor management tasks such as sustainable treatment of the environment and prevention of overcrowding. Static flow control methods like conventional low-level controllers or limiting access to overcrowded venues could not solve the problem yet. In this paper, we empirically evaluate the performance of state-of-the-art deep-learning methods such as RNNs, GNNs, and Transformers as well as the classic statistical ARIMA method. Granular limited data supplied by a tourism region is extended by exogenous data such as geolocation trajectories of individual tourists, weather and holidays. In the field of visitor flow prediction with sparse data, we are thereby capable of increasing the accuracy of our predictions, incorporating modern input feature handling as well as map** geolocation data on top of discrete POI data.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
IB-U-Nets: Improving medical image segmentation tasks with 3D Inductive Biased kernels
Authors:
Shrajan Bhandary,
Zahra Babaiee,
Dejan Kostyszyn,
Tobias Fechter,
Constantinos Zamboglou,
Anca-Ligia Grosu,
Radu Grosu
Abstract:
Despite the success of convolutional neural networks for 3D medical-image segmentation, the architectures currently used are still not robust enough to the protocols of different scanners, and the variety of image properties they produce. Moreover, access to large-scale datasets with annotated regions of interest is scarce, and obtaining good results is thus difficult. To overcome these challenges…
▽ More
Despite the success of convolutional neural networks for 3D medical-image segmentation, the architectures currently used are still not robust enough to the protocols of different scanners, and the variety of image properties they produce. Moreover, access to large-scale datasets with annotated regions of interest is scarce, and obtaining good results is thus difficult. To overcome these challenges, we introduce IB-U-Nets, a novel architecture with inductive bias, inspired by the visual processing in vertebrates. With the 3D U-Net as the base, we add two 3D residual components to the second encoder blocks. They provide an inductive bias, hel** U-Nets to segment anatomical structures from 3D images with increased robustness and accuracy. We compared IB-U-Nets with state-of-the-art 3D U-Nets on multiple modalities and organs, such as the prostate and spleen, using the same training and testing pipeline, including data processing, augmentation and cross-validation. Our results demonstrate the superior robustness and accuracy of IB-U-Nets, especially on small datasets, as is typically the case in medical-image analysis. IB-U-Nets source code and models are publicly available.
△ Less
Submitted 28 October, 2022;
originally announced October 2022.
-
Pruning by Active Attention Manipulation
Authors:
Zahra Babaiee,
Lucas Liebenwein,
Ramin Hasani,
Daniela Rus,
Radu Grosu
Abstract:
Filter pruning of a CNN is typically achieved by applying discrete masks on the CNN's filter weights or activation maps, post-training. Here, we present a new filter-importance-scoring concept named pruning by active attention manipulation (PAAM), that sparsifies the CNN's set of filters through a particular attention mechanism, during-training. PAAM learns analog filter scores from the filter wei…
▽ More
Filter pruning of a CNN is typically achieved by applying discrete masks on the CNN's filter weights or activation maps, post-training. Here, we present a new filter-importance-scoring concept named pruning by active attention manipulation (PAAM), that sparsifies the CNN's set of filters through a particular attention mechanism, during-training. PAAM learns analog filter scores from the filter weights by optimizing a cost function regularized by an additive term in the scores. As the filters are not independent, we use attention to dynamically learn their correlations. Moreover, by training the pruning scores of all layers simultaneously, PAAM can account for layer inter-dependencies, which is essential to finding a performant sparse sub-network. PAAM can also train and generate a pruned network from scratch in a straightforward, one-stage training process without requiring a pre-trained network. Finally, PAAM does not need layer-specific hyperparameters and pre-defined layer budgets, since it can implicitly determine the appropriate number of filters in each layer. Our experimental results on different network architectures suggest that PAAM outperforms state-of-the-art structured-pruning methods (SOTA). On CIFAR-10 dataset, without requiring a pre-trained baseline network, we obtain 1.02% and 1.19% accuracy gain and 52.3% and 54% parameters reduction, on ResNet56 and ResNet110, respectively. Similarly, on the ImageNet dataset, PAAM achieves 1.06% accuracy gain while pruning 51.1% of the parameters on ResNet50. For Cifar-10, this is better than the SOTA with a margin of 9.5% and 6.6%, respectively, and on ImageNet with a margin of 11%.
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
Deep-Learning vs Regression: Prediction of Tourism Flow with Limited Data
Authors:
Julian Lemmel,
Zahra Babaiee,
Marvin Kleinlehner,
Ivan Majic,
Philipp Neubauer,
Johannes Scholz,
Radu Grosu,
Sophie A. Neubauer
Abstract:
Modern tourism in the 21st century is facing numerous challenges. One of these challenges is the rapidly growing number of tourists in space limited regions such as historical city centers, museums or geographical bottlenecks like narrow valleys. In this context, a proper and accurate prediction of tourism volume and tourism flow within a certain area is important and critical for visitor manageme…
▽ More
Modern tourism in the 21st century is facing numerous challenges. One of these challenges is the rapidly growing number of tourists in space limited regions such as historical city centers, museums or geographical bottlenecks like narrow valleys. In this context, a proper and accurate prediction of tourism volume and tourism flow within a certain area is important and critical for visitor management tasks such as visitor flow control and prevention of overcrowding. Static flow control methods like limiting access to hotspots or using conventional low level controllers could not solve the problem yet. In this paper, we empirically evaluate the performance of several state-of-the-art deep-learning methods in the field of visitor flow prediction with limited data by using available granular data supplied by a tourism region and comparing the results to ARIMA, a classical statistical method. Our results show that deep-learning models yield better predictions compared to the ARIMA method, while both featuring faster inference times and being able to incorporate additional input features.
△ Less
Submitted 27 June, 2022;
originally announced June 2022.
-
Entangled Residual Map**s
Authors:
Mathias Lechner,
Ramin Hasani,
Zahra Babaiee,
Radu Grosu,
Daniela Rus,
Thomas A. Henzinger,
Sepp Hochreiter
Abstract:
Residual map**s have been shown to perform representation learning in the first layers and iterative feature refinement in higher layers. This interplay, combined with their stabilizing effect on the gradient norms, enables them to train very deep networks. In this paper, we take a step further and introduce entangled residual map**s to generalize the structure of the residual connections and…
▽ More
Residual map**s have been shown to perform representation learning in the first layers and iterative feature refinement in higher layers. This interplay, combined with their stabilizing effect on the gradient norms, enables them to train very deep networks. In this paper, we take a step further and introduce entangled residual map**s to generalize the structure of the residual connections and evaluate their role in iterative learning representations. An entangled residual map** replaces the identity skip connections with specialized entangled map**s such as orthogonal, sparse, and structural correlation matrices that share key attributes (eigenvalues, structure, and Jacobian norm) with identity map**s. We show that while entangled map**s can preserve the iterative refinement of features across various deep models, they influence the representation learning process in convolutional networks differently than attention-based models and recurrent neural networks. In general, we find that for CNNs and Vision Transformers entangled sparse map** can help generalization while orthogonal map**s hurt performance. For recurrent networks, orthogonal residual map**s form an inductive bias for time-variant sequences, which degrades accuracy on time-invariant tasks.
△ Less
Submitted 2 June, 2022;
originally announced June 2022.
-
End-to-End Sensitivity-Based Filter Pruning
Authors:
Zahra Babaiee,
Lucas Liebenwein,
Ramin Hasani,
Daniela Rus,
Radu Grosu
Abstract:
In this paper, we present a novel sensitivity-based filter pruning algorithm (SbF-Pruner) to learn the importance scores of filters of each layer end-to-end. Our method learns the scores from the filter weights, enabling it to account for the correlations between the filters of each layer. Moreover, by training the pruning scores of all layers simultaneously our method can account for layer interd…
▽ More
In this paper, we present a novel sensitivity-based filter pruning algorithm (SbF-Pruner) to learn the importance scores of filters of each layer end-to-end. Our method learns the scores from the filter weights, enabling it to account for the correlations between the filters of each layer. Moreover, by training the pruning scores of all layers simultaneously our method can account for layer interdependencies, which is essential to find a performant sparse sub-network. Our proposed method can train and generate a pruned network from scratch in a straightforward, one-stage training process without requiring a pretrained network. Ultimately, we do not need layer-specific hyperparameters and pre-defined layer budgets, since SbF-Pruner can implicitly determine the appropriate number of channels in each layer. Our experimental results on different network architectures suggest that SbF-Pruner outperforms advanced pruning methods. Notably, on CIFAR-10, without requiring a pretrained baseline network, we obtain 1.02% and 1.19% accuracy gain on ResNet56 and ResNet110, compared to the baseline reported for state-of-the-art pruning algorithms. This is while SbF-Pruner reduces parameter-count by 52.3% (for ResNet56) and 54% (for ResNet101), which is better than the state-of-the-art pruning algorithms with a high margin of 9.5% and 6.6%.
△ Less
Submitted 15 April, 2022;
originally announced April 2022.
-
3D-OOCS: Learning Prostate Segmentation with Inductive Bias
Authors:
Shrajan Bhandary,
Zahra Babaiee,
Dejan Kostyszyn,
Tobias Fechter,
Constantinos Zamboglou,
Anca-Ligia Grosu,
Radu Grosu
Abstract:
Despite the great success of convolutional neural networks (CNN) in 3D medical image segmentation tasks, the methods currently in use are still not robust enough to the different protocols utilized by different scanners, and to the variety of image properties or artefacts they produce. To this end, we introduce OOCS-enhanced networks, a novel architecture inspired by the innate nature of visual pr…
▽ More
Despite the great success of convolutional neural networks (CNN) in 3D medical image segmentation tasks, the methods currently in use are still not robust enough to the different protocols utilized by different scanners, and to the variety of image properties or artefacts they produce. To this end, we introduce OOCS-enhanced networks, a novel architecture inspired by the innate nature of visual processing in the vertebrates. With different 3D U-Net variants as the base, we add two 3D residual components to the second encoder blocks: on and off center-surround (OOCS). They generalise the ganglion pathways in the retina to a 3D setting. The use of 2D-OOCS in any standard CNN network complements the feedforward framework with sharp edge-detection inductive biases. The use of 3D-OOCS also helps 3D U-Nets to scrutinise and delineate anatomical structures present in 3D images with increased accuracy.We compared the state-of-the-art 3D U-Nets with their 3D-OOCS extensions and showed the superior accuracy and robustness of the latter in automatic prostate segmentation from 3D Magnetic Resonance Images (MRIs). For a fair comparison, we trained and tested all the investigated 3D U-Nets with the same pipeline, including automatic hyperparameter optimisation and data augmentation.
△ Less
Submitted 20 April, 2022; v1 submitted 29 October, 2021;
originally announced October 2021.
-
On-Off Center-Surround Receptive Fields for Accurate and Robust Image Classification
Authors:
Zahra Babaiee,
Ramin Hasani,
Mathias Lechner,
Daniela Rus,
Radu Grosu
Abstract:
Robustness to variations in lighting conditions is a key objective for any deep vision system. To this end, our paper extends the receptive field of convolutional neural networks with two residual components, ubiquitous in the visual processing system of vertebrates: On-center and off-center pathways, with excitatory center and inhibitory surround; OOCS for short. The on-center pathway is excited…
▽ More
Robustness to variations in lighting conditions is a key objective for any deep vision system. To this end, our paper extends the receptive field of convolutional neural networks with two residual components, ubiquitous in the visual processing system of vertebrates: On-center and off-center pathways, with excitatory center and inhibitory surround; OOCS for short. The on-center pathway is excited by the presence of a light stimulus in its center but not in its surround, whereas the off-center one is excited by the absence of a light stimulus in its center but not in its surround. We design OOCS pathways via a difference of Gaussians, with their variance computed analytically from the size of the receptive fields. OOCS pathways complement each other in their response to light stimuli, ensuring this way a strong edge-detection capability, and as a result, an accurate and robust inference under challenging lighting conditions. We provide extensive empirical evidence showing that networks supplied with the OOCS edge representation gain accuracy and illumination-robustness compared to standard deep models.
△ Less
Submitted 13 June, 2021;
originally announced June 2021.