Search | arXiv e-print repository

arXiv:2011.08042 [pdf, other]

Mixing ADAM and SGD: a Combined Optimization Method

Authors: Nicola Landro, Ignazio Gallo, Riccardo La Grassa

Abstract: Optimization methods (optimizers) get special attention for the efficient training of neural networks in the field of deep learning. In literature there are many papers that compare neural models trained with the use of different optimizers. Each paper demonstrates that for a particular problem an optimizer is better than the others but as the problem changes this type of result is no longer valid… ▽ More Optimization methods (optimizers) get special attention for the efficient training of neural networks in the field of deep learning. In literature there are many papers that compare neural models trained with the use of different optimizers. Each paper demonstrates that for a particular problem an optimizer is better than the others but as the problem changes this type of result is no longer valid and we have to start from scratch. In our paper we propose to use the combination of two very different optimizers but when used simultaneously they can overcome the performances of the single optimizers in very different problems. We propose a new optimizer called MAS (Mixing ADAM and SGD) that integrates SGD and ADAM simultaneously by weighing the contributions of both through the assignment of constant weights. Rather than trying to improve SGD or ADAM we exploit both at the same time by taking the best of both. We have conducted several experiments on images and text document classification, using various CNNs, and we demonstrated by experiments that the proposed MAS optimizer produces better performance than the single SGD or ADAM optimizers. The source code and all the results of the experiments are available online at the following link https://gitlab.com/nicolalandro/multi\_optimizer △ Less

Submitted 16 November, 2020; originally announced November 2020.

arXiv:2009.08796 [pdf, other]

$σ^2$R Loss: a Weighted Loss by Multiplicative Factors using Sigmoidal Functions

Authors: Riccardo La Grassa, Ignazio Gallo, Nicola Landro

Abstract: In neural networks, the loss function represents the core of the learning process that leads the optimizer to an approximation of the optimal convergence error. Convolutional neural networks (CNN) use the loss function as a supervisory signal to train a deep model and contribute significantly to achieving the state of the art in some fields of artificial vision. Cross-entropy and Center loss funct… ▽ More In neural networks, the loss function represents the core of the learning process that leads the optimizer to an approximation of the optimal convergence error. Convolutional neural networks (CNN) use the loss function as a supervisory signal to train a deep model and contribute significantly to achieving the state of the art in some fields of artificial vision. Cross-entropy and Center loss functions are commonly used to increase the discriminating power of learned functions and increase the generalization performance of the model. Center loss minimizes the class intra-class variance and at the same time penalizes the long distance between the deep features inside each class. However, the total error of the center loss will be heavily influenced by the majority of the instances and can lead to a freezing state in terms of intra-class variance. To address this, we introduce a new loss function called sigma squared reduction loss ($σ^2$R loss), which is regulated by a sigmoid function to inflate/deflate the error per instance and then continue to reduce the intra-class variance. Our loss has clear intuition and geometric interpretation, furthermore, we demonstrate by experiments the effectiveness of our proposal on several benchmark datasets showing the intra-class variance reduction and overcoming the results obtained with center loss and soft nearest neighbour functions. △ Less

Submitted 18 September, 2020; originally announced September 2020.

Comments: 9 pages

arXiv:2005.08622 [pdf, other]

Learn Class Hierarchy using Convolutional Neural Networks

Authors: Riccardo La Grassa, Ignazio Gallo, Nicola Landro

Abstract: A large amount of research on Convolutional Neural Networks has focused on flat Classification in the multi-class domain. In the real world, many problems are naturally expressed as problems of hierarchical classification, in which the classes to be predicted are organized in a hierarchy of classes. In this paper, we propose a new architecture for hierarchical classification of images, introducing… ▽ More A large amount of research on Convolutional Neural Networks has focused on flat Classification in the multi-class domain. In the real world, many problems are naturally expressed as problems of hierarchical classification, in which the classes to be predicted are organized in a hierarchy of classes. In this paper, we propose a new architecture for hierarchical classification of images, introducing a stack of deep linear layers with cross-entropy loss functions and center loss combined. The proposed architecture can extend any neural network model and simultaneously optimizes loss functions to discover local hierarchical class relationships and a loss function to discover global information from the whole class hierarchy while penalizing class hierarchy violations. We experimentally show that our hierarchical classifier presents advantages to the traditional classification approaches finding application in computer vision tasks. △ Less

Submitted 18 May, 2020; originally announced May 2020.

Comments: 7 pages

arXiv:2005.00393 [pdf, other]

Can a powerful neural network be a teacher for a weaker neural network?

Authors: Nicola Landro, Ignazio Gallo, Riccardo La Grassa

Abstract: The transfer learning technique is widely used to learning in one context and applying it to another, i.e. the capacity to apply acquired knowledge and skills to new situations. But is it possible to transfer the learning from a deep neural network to a weaker neural network? Is it possible to improve the performance of a weak neural network using the knowledge acquired by a more powerful neural n… ▽ More The transfer learning technique is widely used to learning in one context and applying it to another, i.e. the capacity to apply acquired knowledge and skills to new situations. But is it possible to transfer the learning from a deep neural network to a weaker neural network? Is it possible to improve the performance of a weak neural network using the knowledge acquired by a more powerful neural network? In this work, during the training process of a weak network, we add a loss function that minimizes the distance between the features previously learned from a strong neural network with the features that the weak network must try to learn. To demonstrate the effectiveness and robustness of our approach, we conducted a large number of experiments using three known datasets and demonstrated that a weak neural network can increase its performance if its learning process is driven by a more powerful neural network. △ Less

Submitted 7 May, 2020; v1 submitted 1 May, 2020; originally announced May 2020.

arXiv:2004.02273 [pdf, other]

Dynamic Decision Boundary for One-class Classifiers applied to non-uniformly Sampled Data

Authors: Riccardo La Grassa, Ignazio Gallo, Nicola Landro

Abstract: A typical issue in Pattern Recognition is the non-uniformly sampled data, which modifies the general performance and capability of machine learning algorithms to make accurate predictions. Generally, the data is considered non-uniformly sampled when in a specific area of data space, they are not enough, leading us to misclassification problems. This issue cut down the goal of the one-class classif… ▽ More A typical issue in Pattern Recognition is the non-uniformly sampled data, which modifies the general performance and capability of machine learning algorithms to make accurate predictions. Generally, the data is considered non-uniformly sampled when in a specific area of data space, they are not enough, leading us to misclassification problems. This issue cut down the goal of the one-class classifiers decreasing their performance. In this paper, we propose a one-class classifier based on the minimum spanning tree with a dynamic decision boundary (OCdmst) to make good prediction also in the case we have non-uniformly sampled data. To prove the effectiveness and robustness of our approach we compare with the most recent one-class classifier reaching the state-of-the-art in most of them. △ Less

Submitted 5 April, 2020; originally announced April 2020.

Comments: 7 pages

arXiv:2003.13524 [pdf, other]

OCmst: One-class Novelty Detection using Convolutional Neural Network and Minimum Spanning Trees

Authors: Riccardo La Grassa, Ignazio Gallo, Nicola Landro

Abstract: We present a novel model called One Class Minimum Spanning Tree (OCmst) for novelty detection problem that uses a Convolutional Neural Network (CNN) as deep feature extractor and graph-based model based on Minimum Spanning Tree (MST). In a novelty detection scenario, the training data is no polluted by outliers (abnormal class) and the goal is to recognize if a test instance belongs to the normal… ▽ More We present a novel model called One Class Minimum Spanning Tree (OCmst) for novelty detection problem that uses a Convolutional Neural Network (CNN) as deep feature extractor and graph-based model based on Minimum Spanning Tree (MST). In a novelty detection scenario, the training data is no polluted by outliers (abnormal class) and the goal is to recognize if a test instance belongs to the normal class or to the abnormal class. Our approach uses the deep features from CNN to feed a pair of MSTs built starting from each test instance. To cut down the computational time we use a parameter $γ$ to specify the size of the MST's starting to the neighbours from the test instance. To prove the effectiveness of the proposed approach we conducted experiments on two publicly available datasets, well-known in literature and we achieved the state-of-the-art results on CIFAR10 dataset. △ Less

Submitted 30 March, 2020; originally announced March 2020.

Comments: 16 pages

arXiv:1909.05663 [pdf, other]

Picture What you Read

Authors: Ignazio Gallo, Shah Nawaz, Alessandro Calefati, Riccardo La Grassa, Nicola Landro

Abstract: Visualization refers to our ability to create an image in our head based on the text we read or the words we hear. It is one of the many skills that makes reading comprehension possible. Convolutional Neural Networks (CNN) are an excellent tool for recognizing and classifying text documents. In addition, it can generate images conditioned on natural language. In this work, we utilize CNNs capabili… ▽ More Visualization refers to our ability to create an image in our head based on the text we read or the words we hear. It is one of the many skills that makes reading comprehension possible. Convolutional Neural Networks (CNN) are an excellent tool for recognizing and classifying text documents. In addition, it can generate images conditioned on natural language. In this work, we utilize CNNs capabilities to generate realistic images representative of the text illustrating the semantic concept. We conducted various experiments to highlight the capacity of the proposed model to generate representative images of the text descriptions used as input to the proposed model. △ Less

Submitted 9 September, 2019; originally announced September 2019.

Comments: 7 pages, Dicta2019 conference

arXiv:1909.04078 [pdf, other]

A Classification Methodology based on Subspace Graphs Learning

Authors: Riccardo La Grassa, Ignazio Gallo, Alessandro Calefati, Dimitri Ognibene

Abstract: In this paper, we propose a design methodology for one-class classifiers using an ensemble-of-classifiers approach. The objective is to select the best structures created during the training phase using an ensemble of spanning trees. It takes the best classifier, partitioning the area near a pattern into $γ^{γ-2}$ sub-spaces and combining all possible spanning trees that can be created starting fr… ▽ More In this paper, we propose a design methodology for one-class classifiers using an ensemble-of-classifiers approach. The objective is to select the best structures created during the training phase using an ensemble of spanning trees. It takes the best classifier, partitioning the area near a pattern into $γ^{γ-2}$ sub-spaces and combining all possible spanning trees that can be created starting from $γ$ nodes. The proposed method leverages on a supervised classification methodology and the concept of minimum distance. We evaluate our approach on well-known benchmark datasets and results obtained demonstrate that it achieves comparable and, in many cases, state-of-the-art results. Moreover, it obtains good performance even with unbalanced datasets. △ Less

Submitted 9 September, 2019; originally announced September 2019.

Comments: 8 pages, Dicta Conference

arXiv:1906.06090 [pdf, other]

Binary Classification using Pairs of Minimum Spanning Trees or N-ary Trees

Authors: Riccardo La Grassa, Ignazio Gallo, Alessandro Calefati, Dimitri Ognibene

Abstract: One-class classifiers are trained with target class only samples. Intuitively, their conservative modelling of the class description may benefit classical classification tasks where classes are difficult to separate due to overlap** and data imbalance. In this work, three methods are proposed which leverage on the combination of one-class classifiers based on non-parametric models, N-ary Trees a… ▽ More One-class classifiers are trained with target class only samples. Intuitively, their conservative modelling of the class description may benefit classical classification tasks where classes are difficult to separate due to overlap** and data imbalance. In this work, three methods are proposed which leverage on the combination of one-class classifiers based on non-parametric models, N-ary Trees and Minimum Spanning Trees class descriptors (MST-CD), to tackle binary classification problems. The methods deal with the inconsistencies arising from combining multiple classifiers and with spurious connections that MST-CD creates in multi-modal class distributions. As shown by our tests on several datasets, the proposed approach is feasible and comparable with state-of-the-art algorithms. △ Less

Submitted 25 June, 2019; v1 submitted 14 June, 2019; originally announced June 2019.

Showing 1–9 of 9 results for author: La Grassa, R