Search | arXiv e-print repository

doi 10.1038/s41598-024-56609-x

Towards a universal mechanism for successful deep learning

Authors: Yuval Meir, Yarden Tzach, Shiri Hodassman, Ofek Tevet, Ido Kanter

Abstract: Recently, the underlying mechanism for successful deep learning (DL) was presented based on a quantitative method that measures the quality of a single filter in each layer of a DL model, particularly VGG-16 trained on CIFAR-10. This method exemplifies that each filter identifies small clusters of possible output labels, with additional noise selected as labels outside the clusters. This feature i… ▽ More Recently, the underlying mechanism for successful deep learning (DL) was presented based on a quantitative method that measures the quality of a single filter in each layer of a DL model, particularly VGG-16 trained on CIFAR-10. This method exemplifies that each filter identifies small clusters of possible output labels, with additional noise selected as labels outside the clusters. This feature is progressively sharpened with each layer, resulting in an enhanced signal-to-noise ratio (SNR), which leads to an increase in the accuracy of the DL network. In this study, this mechanism is verified for VGG-16 and EfficientNet-B0 trained on the CIFAR-100 and ImageNet datasets, and the main results are as follows. First, the accuracy and SNR progressively increase with the layers. Second, for a given deep architecture, the maximal error rate increases approximately linearly with the number of output labels. Third, similar trends were obtained for dataset labels in the range [3, 1,000], thus supporting the universality of this mechanism. Understanding the performance of a single filter and its dominating features paves the way to highly dilute the deep architecture without affecting its overall accuracy, and this can be achieved by applying the filter's cluster connections (AFCC). △ Less

Submitted 12 March, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

Comments: 31 pages,7 figures, 9 tables. arXiv admin note: text overlap with arXiv:2305.18078

Journal ref: Scientific Reports volume 14, Article number: 5881 (2024)

arXiv:2305.18078 [pdf]

The mechanism underlying successful deep learning

Authors: Yarden Tzach, Yuval Meir, Ofek Tevet, Ronit D. Gross, Shiri Hodassman, Roni Vardi, Ido Kanter

Abstract: Deep architectures consist of tens or hundreds of convolutional layers (CLs) that terminate with a few fully connected (FC) layers and an output layer representing the possible labels of a complex classification task. According to the existing deep learning (DL) rationale, the first CL reveals localized features from the raw data, whereas the subsequent layers progressively extract higher-level fe… ▽ More Deep architectures consist of tens or hundreds of convolutional layers (CLs) that terminate with a few fully connected (FC) layers and an output layer representing the possible labels of a complex classification task. According to the existing deep learning (DL) rationale, the first CL reveals localized features from the raw data, whereas the subsequent layers progressively extract higher-level features required for refined classification. This article presents an efficient three-phase procedure for quantifying the mechanism underlying successful DL. First, a deep architecture is trained to maximize the success rate (SR). Next, the weights of the first several CLs are fixed and only the concatenated new FC layer connected to the output is trained, resulting in SRs that progress with the layers. Finally, the trained FC weights are silenced, except for those emerging from a single filter, enabling the quantification of the functionality of this filter using a correlation matrix between input labels and averaged output fields, hence a well-defined set of quantifiable features is obtained. Each filter essentially selects a single output label independent of the input label, which seems to prevent high SRs; however, it counterintuitively identifies a small subset of possible output labels. This feature is an essential part of the underlying DL mechanism and is progressively sharpened with layers, resulting in enhanced signal-to-noise ratios and SRs. Quantitatively, this mechanism is exemplified by the VGG-16, VGG-6, and AVGG-16. The proposed mechanism underlying DL provides an accurate tool for identifying each filter's quality and is expected to direct additional procedures to improve the SR, computational complexity, and latency of DL. △ Less

Submitted 29 May, 2023; originally announced May 2023.

Comments: 33 pages, 8 figures

arXiv:2303.05800 [pdf]

doi 10.1038/S41598-023-40566-Y

Enhancing the accuracies by performing pooling decisions adjacent to the output layer

Authors: Yuval Meir, Yarden Tzach, Ronit D. Gross, Ofek Tevet, Roni Vardi, Ido Kanter

Abstract: Learning classification tasks of (2^nx2^n) inputs typically consist of \le n (2x2) max-pooling (MP) operators along the entire feedforward deep architecture. Here we show, using the CIFAR-10 database, that pooling decisions adjacent to the last convolutional layer significantly enhance accuracies. In particular, average accuracies of the advanced-VGG with m layers (A-VGGm) architectures are 0.936,… ▽ More Learning classification tasks of (2^nx2^n) inputs typically consist of \le n (2x2) max-pooling (MP) operators along the entire feedforward deep architecture. Here we show, using the CIFAR-10 database, that pooling decisions adjacent to the last convolutional layer significantly enhance accuracies. In particular, average accuracies of the advanced-VGG with m layers (A-VGGm) architectures are 0.936, 0.940, 0.954, 0.955, and 0.955 for m=6, 8, 14, 13, and 16, respectively. The results indicate A-VGG8s' accuracy is superior to VGG16s', and that the accuracies of A-VGG13 and A-VGG16 are equal, and comparable to that of Wide-ResNet16. In addition, replacing the three fully connected (FC) layers with one FC layer, A-VGG6 and A-VGG14, or with several linear activation FC layers, yielded similar accuracies. These significantly enhanced accuracies stem from training the most influential input-output routes, in comparison to the inferior routes selected following multiple MP decisions along the deep architecture. In addition, accuracies are sensitive to the order of the non-commutative MP and average pooling operators adjacent to the output layer, varying the number and location of training routes. The results call for the reexamination of previously proposed deep architectures and their accuracies by utilizing the proposed pooling strategy adjacent to the output layer. △ Less

Submitted 31 August, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

Comments: 29 pages, 3 figures, 1 table, and Supplementary Information

Journal ref: Sci Rep 13, 13385 (2023)

arXiv:2211.11378 [pdf]

doi 10.1038/s41598-023-27986-6

Learning on tree architectures outperforms a convolutional feedforward network

Authors: Yuval Meir, Itamar Ben-Noam, Yarden Tzach, Shiri Hodassman, Ido Kanter

Abstract: Advanced deep learning architectures consist of tens of fully connected and convolutional hidden layers, currently extended to hundreds, are far from their biological realization. Their implausible biological dynamics relies on changing a weight in a non-local manner, as the number of routes between an output unit and a weight is typically large, using the backpropagation technique. Here, a 3-laye… ▽ More Advanced deep learning architectures consist of tens of fully connected and convolutional hidden layers, currently extended to hundreds, are far from their biological realization. Their implausible biological dynamics relies on changing a weight in a non-local manner, as the number of routes between an output unit and a weight is typically large, using the backpropagation technique. Here, a 3-layer tree architecture inspired by experimental-based dendritic tree adaptations is developed and applied to the offline and online learning of the CIFAR-10 database. The proposed architecture outperforms the achievable success rates of the 5-layer convolutional LeNet. Moreover, the highly pruned tree backpropagation approach of the proposed architecture, where a single route connects an output unit and a weight, represents an efficient dendritic deep learning. △ Less

Submitted 30 January, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

Comments: 21 pages, 4 figures, 2 table

Journal ref: Sci Rep 13, 962 (2023)

arXiv:2211.11106 [pdf]

doi 10.1038/s41598-023-32559-8

Efficient shallow learning as an alternative to deep learning

Authors: Yuval Meir, Ofek Tevet, Yarden Tzach, Shiri Hodassman, Ronit D. Gross, Ido Kanter

Abstract: The realization of complex classification tasks requires training of deep learning (DL) architectures consisting of tens or even hundreds of convolutional and fully connected hidden layers, which is far from the reality of the human brain. According to the DL rationale, the first convolutional layer reveals localized patterns in the input and large-scale patterns in the following layers, until it… ▽ More The realization of complex classification tasks requires training of deep learning (DL) architectures consisting of tens or even hundreds of convolutional and fully connected hidden layers, which is far from the reality of the human brain. According to the DL rationale, the first convolutional layer reveals localized patterns in the input and large-scale patterns in the following layers, until it reliably characterizes a class of inputs. Here, we demonstrate that with a fixed ratio between the depths of the first and second convolutional layers, the error rates of the generalized shallow LeNet architecture, consisting of only five layers, decay as a power law with the number of filters in the first convolutional layer. The extrapolation of this power law indicates that the generalized LeNet can achieve small error rates that were previously obtained for the CIFAR-10 database using DL architectures. A power law with a similar exponent also characterizes the generalized VGG-16 architecture. However, this results in a significantly increased number of operations required to achieve a given error rate with respect to LeNet. This power law phenomenon governs various generalized LeNet and VGG-16 architectures, hinting at its universal behavior and suggesting a quantitative hierarchical time-space complexity among machine learning architectures. Additionally, the conservation law along the convolutional layers, which is the square-root of their size times their depth, is found to asymptotically minimize error rates. The efficient shallow learning that is demonstrated in this study calls for further quantitative examination using various databases and architectures and its accelerated implementation using future dedicated hardware developments. △ Less

Submitted 23 November, 2022; v1 submitted 15 November, 2022; originally announced November 2022.

Comments: 26 pages, 4 figures (improved figures resolution)

Report number: https://www.nature.com/articles/s41598-023-32559-8

Journal ref: Sci Rep 13, 5423 (2023)

arXiv:2211.08430 [pdf]

doi 10.1038/s41598-020-76764-1

Power-law Scaling to Assist with Key Challenges in Artificial Intelligence

Authors: Yuval Meir, Shira Sardi, Shiri Hodassman, Karin Kisos, Itamar Ben-Noam, Amir Goldental, Ido Kanter

Abstract: Power-law scaling, a central concept in critical phenomena, is found to be useful in deep learning, where optimized test errors on handwritten digit examples converge as a power-law to zero with database size. For rapid decision making with one training epoch, each example is presented only once to the trained network, the power-law exponent increased with the number of hidden layers. For the larg… ▽ More Power-law scaling, a central concept in critical phenomena, is found to be useful in deep learning, where optimized test errors on handwritten digit examples converge as a power-law to zero with database size. For rapid decision making with one training epoch, each example is presented only once to the trained network, the power-law exponent increased with the number of hidden layers. For the largest dataset, the obtained test error was estimated to be in the proximity of state-of-the-art algorithms for large epoch numbers. Power-law scaling assists with key challenges found in current artificial intelligence applications and facilitates an a priori dataset size estimation to achieve a desired test accuracy. It establishes a benchmark for measuring training complexity and a quantitative hierarchy of machine learning tasks and algorithms. △ Less

Submitted 15 November, 2022; originally announced November 2022.

Comments: 30 pages, 5 figures

Journal ref: Sci Rep 10, 19628 (2020)

arXiv:2203.13028 [pdf]

doi 10.1038/s41598-022-20337-x

Brain inspired neuronal silencing mechanism to enable reliable sequence identification

Authors: Shiri Hodassman, Yuval Meir, Karin Kisos, Itamar Ben-Noam, Yael Tugendhaft, Amir Goldental, Roni Vardi, Ido Kanter

Abstract: Real-time sequence identification is a core use-case of artificial neural networks (ANNs), ranging from recognizing temporal events to identifying verification codes. Existing methods apply recurrent neural networks, which suffer from training difficulties; however, performing this function without feedback loops remains a challenge. Here, we present an experimental neuronal long-term plasticity m… ▽ More Real-time sequence identification is a core use-case of artificial neural networks (ANNs), ranging from recognizing temporal events to identifying verification codes. Existing methods apply recurrent neural networks, which suffer from training difficulties; however, performing this function without feedback loops remains a challenge. Here, we present an experimental neuronal long-term plasticity mechanism for high-precision feedforward sequence identification networks (ID-nets) without feedback loops, wherein input objects have a given order and timing. This mechanism temporarily silences neurons following their recent spiking activity. Therefore, transitory objects act on different dynamically created feedforward sub-networks. ID-nets are demonstrated to reliably identify 10 handwritten digit sequences, and are generalized to deep convolutional ANNs with continuous activation nodes trained on image sequences. Counterintuitively, their classification performance, even with a limited number of training examples, is high for sequences but low for individual objects. ID-nets are also implemented for writer-dependent recognition, and suggested as a cryptographic tool for encrypted authentication. The presented mechanism opens new horizons for advanced ANN algorithms. △ Less

Submitted 2 October, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

Comments: 38 pages, 11 figures

Journal ref: Sci Rep 12, 16003 (2022)

Showing 1–7 of 7 results for author: Meir, Y