Search | arXiv e-print repository

Assessing the Emergent Symbolic Reasoning Abilities of Llama Large Language Models

Authors: Flavio Petruzzellis, Alberto Testolin, Alessandro Sperduti

Abstract: Large Language Models (LLMs) achieve impressive performance in a wide range of tasks, even if they are often trained with the only objective of chatting fluently with users. Among other skills, LLMs show emergent abilities in mathematical reasoning benchmarks, which can be elicited with appropriate prompting methods. In this work, we systematically investigate the capabilities and limitations of p… ▽ More Large Language Models (LLMs) achieve impressive performance in a wide range of tasks, even if they are often trained with the only objective of chatting fluently with users. Among other skills, LLMs show emergent abilities in mathematical reasoning benchmarks, which can be elicited with appropriate prompting methods. In this work, we systematically investigate the capabilities and limitations of popular open-source LLMs on different symbolic reasoning tasks. We evaluate three models of the Llama 2 family on two datasets that require solving mathematical formulas of varying degrees of difficulty. We test a generalist LLM (Llama 2 Chat) as well as two fine-tuned versions of Llama 2 (MAmmoTH and MetaMath) specifically designed to tackle mathematical problems. We observe that both increasing the scale of the model and fine-tuning it on relevant tasks lead to significant performance gains. Furthermore, using fine-grained evaluation measures, we find that such performance gains are mostly observed with mathematical formulas of low complexity, which nevertheless often remain challenging even for the largest fine-tuned models. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Accepted at 33rd International Conference on Artificial Neural Networks (ICANN24)

arXiv:2402.17407 [pdf, other]

A Neural Rewriting System to Solve Algorithmic Problems

Authors: Flavio Petruzzellis, Alberto Testolin, Alessandro Sperduti

Abstract: Modern neural network architectures still struggle to learn algorithmic procedures that require to systematically apply compositional rules to solve out-of-distribution problem instances. In this work, we focus on formula simplification problems, a class of synthetic benchmarks used to study the systematic generalization capabilities of neural architectures. We propose a modular architecture desig… ▽ More Modern neural network architectures still struggle to learn algorithmic procedures that require to systematically apply compositional rules to solve out-of-distribution problem instances. In this work, we focus on formula simplification problems, a class of synthetic benchmarks used to study the systematic generalization capabilities of neural architectures. We propose a modular architecture designed to learn a general procedure for solving nested mathematical formulas by only relying on a minimal set of training examples. Inspired by rewriting systems, a classic framework in symbolic artificial intelligence, we include in the architecture three specialized and interacting modules: the Selector, trained to identify solvable sub-expressions; the Solver, map** sub-expressions to their values; and the Combiner, replacing sub-expressions in the original formula with the solution provided by the Solver. We benchmark our system against the Neural Data Router, a recent model specialized for systematic generalization, and a state-of-the-art large language model (GPT-4) probed with advanced prompting strategies. We demonstrate that our approach achieves a higher degree of out-of-distribution generalization compared to these alternative approaches on three different types of formula simplification problems, and we discuss its limitations by analyzing its failures. △ Less

Submitted 12 July, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: Updated version (v2) accepted at the 27th European Conference on Artificial Intelligence (ECAI 24)

arXiv:2402.17396 [pdf, other]

Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies

Authors: Flavio Petruzzellis, Alberto Testolin, Alessandro Sperduti

Abstract: Large Language Models (LLMs) have revolutionized the field of Natural Language Processing thanks to their ability to reuse knowledge acquired on massive text corpora on a wide variety of downstream tasks, with minimal (if any) tuning steps. At the same time, it has been repeatedly shown that LLMs lack systematic generalization, which allows to extrapolate the learned statistical regularities outsi… ▽ More Large Language Models (LLMs) have revolutionized the field of Natural Language Processing thanks to their ability to reuse knowledge acquired on massive text corpora on a wide variety of downstream tasks, with minimal (if any) tuning steps. At the same time, it has been repeatedly shown that LLMs lack systematic generalization, which allows to extrapolate the learned statistical regularities outside the training distribution. In this work, we offer a systematic benchmarking of GPT-4, one of the most advanced LLMs available, on three algorithmic tasks characterized by the possibility to control the problem difficulty with two parameters. We compare the performance of GPT-4 with that of its predecessor (GPT-3.5) and with a variant of the Transformer-Encoder architecture recently introduced to solve similar tasks, the Neural Data Router. We find that the deployment of advanced prompting techniques allows GPT-4 to reach superior accuracy on all tasks, demonstrating that state-of-the-art LLMs constitute a very strong baseline also in challenging tasks that require systematic generalization. △ Less

Submitted 11 July, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: Accepted at LREC-COLING 2024. Added acknowledgements

arXiv:2306.17249 [pdf, other]

A Hybrid System for Systematic Generalization in Simple Arithmetic Problems

Authors: Flavio Petruzzellis, Alberto Testolin, Alessandro Sperduti

Abstract: Solving symbolic reasoning problems that require compositionality and systematicity is considered one of the key ingredients of human intelligence. However, symbolic reasoning is still a great challenge for deep learning models, which often cannot generalize the reasoning pattern to out-of-distribution test cases. In this work, we propose a hybrid system capable of solving arithmetic problems that… ▽ More Solving symbolic reasoning problems that require compositionality and systematicity is considered one of the key ingredients of human intelligence. However, symbolic reasoning is still a great challenge for deep learning models, which often cannot generalize the reasoning pattern to out-of-distribution test cases. In this work, we propose a hybrid system capable of solving arithmetic problems that require compositional and systematic reasoning over sequences of symbols. The model acquires such a skill by learning appropriate substitution rules, which are applied iteratively to the input string until the expression is completely resolved. We show that the proposed system can accurately solve nested arithmetical expressions even when trained only on a subset including the simplest cases, significantly outperforming both a sequence-to-sequence model trained end-to-end and a state-of-the-art large language model. △ Less

Submitted 29 June, 2023; originally announced June 2023.

Comments: Accepted at NeSy 2023, 17th International Workshop on Neural-Symbolic Learning and Reasoning

ACM Class: I.1.1; I.5.1; I.2.6

arXiv:2305.11699 [pdf, ps, other]

RGCVAE: Relational Graph Conditioned Variational Autoencoder for Molecule Design

Authors: Davide Rigoni, Nicolò Navarin, Alessandro Sperduti

Abstract: Identifying molecules that exhibit some pre-specified properties is a difficult problem to solve. In the last few years, deep generative models have been used for molecule generation. Deep Graph Variational Autoencoders are among the most powerful machine learning tools with which it is possible to address this problem. However, existing methods struggle in capturing the true data distribution and… ▽ More Identifying molecules that exhibit some pre-specified properties is a difficult problem to solve. In the last few years, deep generative models have been used for molecule generation. Deep Graph Variational Autoencoders are among the most powerful machine learning tools with which it is possible to address this problem. However, existing methods struggle in capturing the true data distribution and tend to be computationally expensive. In this work, we propose RGCVAE, an efficient and effective Graph Variational Autoencoder based on: (i) an encoding network exploiting a new powerful Relational Graph Isomorphism Network; (ii) a novel probabilistic decoding component. Compared to several state-of-the-art VAE methods on two widely adopted datasets, RGCVAE shows state-of-the-art molecule generation performance while being significantly faster to train. △ Less

Submitted 8 June, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

arXiv:2305.10913 [pdf, other]

Weakly-Supervised Visual-Textual Grounding with Semantic Prior Refinement

Authors: Davide Rigoni, Luca Parolari, Luciano Serafini, Alessandro Sperduti, Lamberto Ballan

Abstract: Using only image-sentence pairs, weakly-supervised visual-textual grounding aims to learn region-phrase correspondences of the respective entity mentions. Compared to the supervised approach, learning is more difficult since bounding boxes and textual phrases correspondences are unavailable. In light of this, we propose the Semantic Prior Refinement Model (SPRM), whose predictions are obtained by… ▽ More Using only image-sentence pairs, weakly-supervised visual-textual grounding aims to learn region-phrase correspondences of the respective entity mentions. Compared to the supervised approach, learning is more difficult since bounding boxes and textual phrases correspondences are unavailable. In light of this, we propose the Semantic Prior Refinement Model (SPRM), whose predictions are obtained by combining the output of two main modules. The first untrained module aims to return a rough alignment between textual phrases and bounding boxes. The second trained module is composed of two sub-components that refine the rough alignment to improve the accuracy of the final phrase-bounding box alignments. The model is trained to maximize the multimodal similarity between an image and a sentence, while minimizing the multimodal similarity of the same sentence and a new unrelated image, carefully selected to help the most during training. Our approach shows state-of-the-art results on two popular datasets, Flickr30k Entities and ReferIt, shining especially on ReferIt with a 9.6% absolute improvement. Moreover, thanks to the untrained component, it reaches competitive performances just using a small fraction of training examples. △ Less

Submitted 26 September, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

arXiv:2108.05308 [pdf, other]

doi 10.1145/3477314.3507047

A Better Loss for Visual-Textual Grounding

Authors: Davide Rigoni, Luciano Serafini, Alessandro Sperduti

Abstract: Given a textual phrase and an image, the visual grounding problem is the task of locating the content of the image referenced by the sentence. It is a challenging task that has several real-world applications in human-computer interaction, image-text reference resolution, and video-text reference resolution. In the last years, several works have addressed this problem by proposing more and more la… ▽ More Given a textual phrase and an image, the visual grounding problem is the task of locating the content of the image referenced by the sentence. It is a challenging task that has several real-world applications in human-computer interaction, image-text reference resolution, and video-text reference resolution. In the last years, several works have addressed this problem by proposing more and more large and complex models that try to capture visual-textual dependencies better than before. These models are typically constituted by two main components that focus on how to learn useful multi-modal features for grounding and how to improve the predicted bounding box of the visual mention, respectively. Finding the right learning balance between these two sub-tasks is not easy, and the current models are not necessarily optimal with respect to this issue. In this work, we propose a loss function based on bounding boxes classes probabilities that: (i) improves the bounding boxes selection; (ii) improves the bounding boxes coordinates prediction. Our model, although using a simple multi-modal feature fusion component, is able to achieve a higher accuracy than state-of-the-art models on two widely adopted datasets, reaching a better learning balance between the two sub-tasks mentioned above. △ Less

Submitted 2 February, 2022; v1 submitted 11 August, 2021; originally announced August 2021.

arXiv:2106.05809 [pdf, other]

Simple Graph Convolutional Networks

Authors: Luca Pasa, Nicolò Navarin, Wolfgang Erb, Alessandro Sperduti

Abstract: Many neural networks for graphs are based on the graph convolution operator, proposed more than a decade ago. Since then, many alternative definitions have been proposed, that tend to add complexity (and non-linearity) to the model. In this paper, we follow the opposite direction by proposing simple graph convolution operators, that can be implemented in single-layer graph convolutional networks.… ▽ More Many neural networks for graphs are based on the graph convolution operator, proposed more than a decade ago. Since then, many alternative definitions have been proposed, that tend to add complexity (and non-linearity) to the model. In this paper, we follow the opposite direction by proposing simple graph convolution operators, that can be implemented in single-layer graph convolutional networks. We show that our convolution operators are more theoretically grounded than many proposals in literature, and exhibit state-of-the-art predictive performance on the considered benchmark datasets. △ Less

Submitted 10 June, 2021; originally announced June 2021.

arXiv:2104.09159 [pdf, other]

Conditional Variational Capsule Network for Open Set Recognition

Authors: Yunrui Guo, Guglielmo Camporese, Wen**g Yang, Alessandro Sperduti, Lamberto Ballan

Abstract: In open set recognition, a classifier has to detect unknown classes that are not known at training time. In order to recognize new categories, the classifier has to project the input samples of known classes in very compact and separated regions of the features space for discriminating samples of unknown classes. Recently proposed Capsule Networks have shown to outperform alternatives in many fiel… ▽ More In open set recognition, a classifier has to detect unknown classes that are not known at training time. In order to recognize new categories, the classifier has to project the input samples of known classes in very compact and separated regions of the features space for discriminating samples of unknown classes. Recently proposed Capsule Networks have shown to outperform alternatives in many fields, particularly in image recognition, however they have not been fully applied yet to open-set recognition. In capsule networks, scalar neurons are replaced by capsule vectors or matrices, whose entries represent different properties of objects. In our proposal, during training, capsules features of the same known class are encouraged to match a pre-defined gaussian, one for each class. To this end, we use the variational autoencoder framework, with a set of gaussian priors as the approximation for the posterior distribution. In this way, we are able to control the compactness of the features of the same class around the center of the gaussians, thus controlling the ability of the classifier in detecting samples from unknown classes. We conducted several experiments and ablation of our model, obtaining state of the art results on different datasets in the open set recognition and unknown detection tasks. △ Less

Submitted 17 August, 2021; v1 submitted 19 April, 2021; originally announced April 2021.

Comments: Accepted to ICCV 2021

arXiv:2011.02886 [pdf, other]

Short-Term Memory Optimization in Recurrent Neural Networks by Autoencoder-based Initialization

Authors: Antonio Carta, Alessandro Sperduti, Davide Bacciu

Abstract: Training RNNs to learn long-term dependencies is difficult due to vanishing gradients. We explore an alternative solution based on explicit memorization using linear autoencoders for sequences, which allows to maximize the short-term memory and that can be solved with a closed-form solution without backpropagation. We introduce an initialization schema that pretrains the weights of a recurrent neu… ▽ More Training RNNs to learn long-term dependencies is difficult due to vanishing gradients. We explore an alternative solution based on explicit memorization using linear autoencoders for sequences, which allows to maximize the short-term memory and that can be solved with a closed-form solution without backpropagation. We introduce an initialization schema that pretrains the weights of a recurrent neural network to approximate the linear autoencoder of the input sequences and we show how such pretraining can better support solving hard classification tasks with long sequences. We test our approach on sequential and permuted MNIST. We show that the proposed approach achieves a much lower reconstruction error for long sequences and a better gradient propagation during the finetuning phase. △ Less

Submitted 5 November, 2020; originally announced November 2020.

Comments: Accepted at NeurIPS 2020 workshop "Beyond Backpropagation: Novel Ideas for Training Neural Architectures"

arXiv:2009.00725 [pdf, other]

Conditional Constrained Graph Variational Autoencoders for Molecule Design

Authors: Davide Rigoni, Nicolò Navarin, Alessandro Sperduti

Abstract: In recent years, deep generative models for graphs have been used to generate new molecules. These models have produced good results, leading to several proposals in the literature. However, these models may have troubles learning some of the complex laws governing the chemical world. In this work, we explore the usage of the histogram of atom valences to drive the generation of molecules in such… ▽ More In recent years, deep generative models for graphs have been used to generate new molecules. These models have produced good results, leading to several proposals in the literature. However, these models may have troubles learning some of the complex laws governing the chemical world. In this work, we explore the usage of the histogram of atom valences to drive the generation of molecules in such models. We present Conditional Constrained Graph Variational Autoencoder (CCGVAE), a model that implements this key-idea in a state-of-the-art model, and shows improved results on several evaluation metrics on two commonly adopted datasets for molecule generation. △ Less

Submitted 1 September, 2020; originally announced September 2020.

arXiv:2008.12676 [pdf, ps, other]

doi 10.1088/1361-6587/abca7d

Validation of neutron emission and neutron energy spectrum calculations on MAST with DRESS

Authors: Andrea Sperduti, Iwona Klimek, Sean Conroy, Marco Cecconello, Marina Gorelenkova, Antti Snicker

Abstract: The recently developed Directional RElativistic Spectrum Simulator (DRESS) code has been validated for the first time against numerical calculations and experimental measurements performed on MAST. In this validation, the neutron emissivities and rates computed by DRESS are benchmarked against TRANSP/NUBEAM predictions while the neutron energy spectra provided by DRESS taking as input TRANSP/NUBEA… ▽ More The recently developed Directional RElativistic Spectrum Simulator (DRESS) code has been validated for the first time against numerical calculations and experimental measurements performed on MAST. In this validation, the neutron emissivities and rates computed by DRESS are benchmarked against TRANSP/NUBEAM predictions while the neutron energy spectra provided by DRESS taking as input TRANSP/NUBEAM and ASCOT/BBNBI in Gyro-Orbit (GO) mode fast ion distributions are validated against proton pulse height spectra (PHS) measured by the neutron flux monitor. Excellent agreement was found between DRESS and TRANSP/NUBEAM predictions of local and total neutron emission. △ Less

Submitted 28 August, 2020; originally announced August 2020.

arXiv:2008.09168 [pdf, ps, other]

A Systematic Assessment of Deep Learning Models for Molecule Generation

Authors: Davide Rigoni, Nicolò Navarin, Alessandro Sperduti

Abstract: In recent years the scientific community has devoted much effort in the development of deep learning models for the generation of new molecules with desirable properties (i.e. drugs). This has produced many proposals in literature. However, a systematic comparison among the different VAE methods is still missing. For this reason, we propose an extensive testbed for the evaluation of generative mod… ▽ More In recent years the scientific community has devoted much effort in the development of deep learning models for the generation of new molecules with desirable properties (i.e. drugs). This has produced many proposals in literature. However, a systematic comparison among the different VAE methods is still missing. For this reason, we propose an extensive testbed for the evaluation of generative models for drug discovery, and we present the results obtained by many of the models proposed in literature. △ Less

Submitted 20 August, 2020; originally announced August 2020.

arXiv:2006.16800 [pdf, other]

Incremental Training of a Recurrent Neural Network Exploiting a Multi-Scale Dynamic Memory

Authors: Antonio Carta, Alessandro Sperduti, Davide Bacciu

Abstract: The effectiveness of recurrent neural networks can be largely influenced by their ability to store into their dynamical memory information extracted from input sequences at different frequencies and timescales. Such a feature can be introduced into a neural architecture by an appropriate modularization of the dynamic memory. In this paper we propose a novel incrementally trained recurrent architec… ▽ More The effectiveness of recurrent neural networks can be largely influenced by their ability to store into their dynamical memory information extracted from input sequences at different frequencies and timescales. Such a feature can be introduced into a neural architecture by an appropriate modularization of the dynamic memory. In this paper we propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning. First, we show how to extend the architecture of a simple RNN by separating its hidden state into different modules, each subsampling the network hidden activations at different frequencies. Then, we discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies. Each new module works at a slower frequency than the previous ones and it is initialized to encode the subsampled sequence of hidden activations. Experimental results on synthetic and real-world datasets on speech recognition and handwritten characters show that the modular architecture and the incremental training algorithm improve the ability of recurrent neural networks to capture long-term dependencies. △ Less

Submitted 29 June, 2020; originally announced June 2020.

Comments: accepted @ ECML 2020. arXiv admin note: substantial text overlap with arXiv:2001.11771

arXiv:2001.11771 [pdf, other]

Encoding-based Memory Modules for Recurrent Neural Networks

Authors: Antonio Carta, Alessandro Sperduti, Davide Bacciu

Abstract: Learning to solve sequential tasks with recurrent models requires the ability to memorize long sequences and to extract task-relevant features from them. In this paper, we study the memorization subtask from the point of view of the design and training of recurrent neural networks. We propose a new model, the Linear Memory Network, which features an encoding-based memorization component built with… ▽ More Learning to solve sequential tasks with recurrent models requires the ability to memorize long sequences and to extract task-relevant features from them. In this paper, we study the memorization subtask from the point of view of the design and training of recurrent neural networks. We propose a new model, the Linear Memory Network, which features an encoding-based memorization component built with a linear autoencoder for sequences. We extend the memorization component with a modular memory that encodes the hidden state sequence at different sampling frequencies. Additionally, we provide a specialized training algorithm that initializes the memory to efficiently encode the hidden activations of the network. The experimental results on synthetic and real-world datasets show that specializing the training algorithm to train the memorization component always improves the final performance whenever the memorization of long sequences is necessary to solve the problem. △ Less

Submitted 31 January, 2020; originally announced January 2020.

Comments: preprint submitted at Elsevier Neural Networks

arXiv:1905.06147 [pdf, ps, other]

Embeddings and Representation Learning for Structured Data

Authors: Benjamin Paaßen, Claudio Gallicchio, Alessio Micheli, Alessandro Sperduti

Abstract: Performing machine learning on structured data is complicated by the fact that such data does not have vectorial form. Therefore, multiple approaches have emerged to construct vectorial representations of structured data, from kernel and distance approaches to recurrent, recursive, and convolutional neural networks. Recent years have seen heightened attention in this demanding field of research an… ▽ More Performing machine learning on structured data is complicated by the fact that such data does not have vectorial form. Therefore, multiple approaches have emerged to construct vectorial representations of structured data, from kernel and distance approaches to recurrent, recursive, and convolutional neural networks. Recent years have seen heightened attention in this demanding field of research and several new approaches have emerged, such as metric learning on structured data, graph convolutional neural networks, and recurrent decoder networks for structured data. In this contribution, we provide an high-level overview of the state-of-the-art in representation learning and embeddings for structured data across a wide range of machine learning fields. △ Less

Submitted 15 May, 2019; originally announced May 2019.

Comments: Oral presentation at the 27th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2019) in Bruges, Belgium, on April 24th, 2019

Journal ref: Proc. ESANN (2019), 85-94

arXiv:1811.10435 [pdf, other]

On Filter Size in Graph Convolutional Networks

Authors: Dinh Van Tran, Nicolò Navarin, Alessandro Sperduti

Abstract: Recently, many researchers have been focusing on the definition of neural networks for graphs. The basic component for many of these approaches remains the graph convolution idea proposed almost a decade ago. In this paper, we extend this basic component, following an intuition derived from the well-known convolutional filters over multi-dimensional tensors. In particular, we derive a simple, effi… ▽ More Recently, many researchers have been focusing on the definition of neural networks for graphs. The basic component for many of these approaches remains the graph convolution idea proposed almost a decade ago. In this paper, we extend this basic component, following an intuition derived from the well-known convolutional filters over multi-dimensional tensors. In particular, we derive a simple, efficient and effective way to introduce a hyper-parameter on graph convolutions that influences the filter size, i.e. its receptive field over the considered graph. We show with experimental results on real-world graph datasets that the proposed graph convolutional filter improves the predictive performance of Deep Graph Convolutional Networks. △ Less

Submitted 23 November, 2018; originally announced November 2018.

Comments: arXiv admin note: text overlap with arXiv:1811.06930

Journal ref: IEEE Symposium on Deep Learning, 2018 Symposium Series on Computational Intelligence, 18 - 21 November, 2018, Bengaluru, India

arXiv:1811.06930 [pdf, other]

Pre-training Graph Neural Networks with Kernels

Authors: Nicolò Navarin, Dinh V. Tran, Alessandro Sperduti

Abstract: Many machine learning techniques have been proposed in the last few years to process data represented in graph-structured form. Graphs can be used to model several scenarios, from molecules and materials to RNA secondary structures. Several kernel functions have been defined on graphs that coupled with kernelized learning algorithms, have shown state-of-the-art performances on many tasks. Recently… ▽ More Many machine learning techniques have been proposed in the last few years to process data represented in graph-structured form. Graphs can be used to model several scenarios, from molecules and materials to RNA secondary structures. Several kernel functions have been defined on graphs that coupled with kernelized learning algorithms, have shown state-of-the-art performances on many tasks. Recently, several definitions of Neural Networks for Graph (GNNs) have been proposed, but their accuracy is not yet satisfying. In this paper, we propose a task-independent pre-training methodology that allows a GNN to learn the representation induced by state-of-the-art graph kernels. Then, the supervised learning phase will fine-tune this representation for the task at hand. The proposed technique is agnostic on the adopted GNN architecture and kernel function, and shows consistent improvements in the predictive performance of GNNs in our preliminary experimental results. △ Less

Submitted 16 November, 2018; originally announced November 2018.

arXiv:1811.03356 [pdf, other]

Linear Memory Networks

Authors: Davide Bacciu, Antonio Carta, Alessandro Sperduti

Abstract: Recurrent neural networks can learn complex transduction problems that require maintaining and actively exploiting a memory of their inputs. Such models traditionally consider memory and input-output functionalities indissolubly entangled. We introduce a novel recurrent architecture based on the conceptual separation between the functional input-output transformation and the memory mechanism, show… ▽ More Recurrent neural networks can learn complex transduction problems that require maintaining and actively exploiting a memory of their inputs. Such models traditionally consider memory and input-output functionalities indissolubly entangled. We introduce a novel recurrent architecture based on the conceptual separation between the functional input-output transformation and the memory mechanism, showing how they can be implemented through different neural components. By building on such conceptualization, we introduce the Linear Memory Network, a recurrent model comprising a feedforward neural network, realizing the non-linear functional transformation, and a linear autoencoder for sequences, implementing the memory component. The resulting architecture can be efficiently trained by building on closed-form solutions to linear optimization problems. Further, by exploiting equivalence results between feedforward and recurrent neural networks we devise a pretraining schema for the proposed architecture. Experiments on polyphonic music datasets show competitive results against gated recurrent networks and other state of the art models. △ Less

Submitted 8 November, 2018; originally announced November 2018.

arXiv:1802.08132 [pdf]

doi 10.1088/1748-0221/12/12/P12029

Results of the first user program on the Homogenous Thermal Neutron Source HOTNES (ENEA / INFN)

Authors: A. Sperduti, M. Angelone, R. Bedogni, G. Claps, E. Diociaiuti, C. Domingo, R. Donghia, S. Giovannella, J. M. Gomez-Ros, L. Irazola-Rosales, S. Loreti, V. Monti, S. Miscetti, F. Murtas, G. Pagano, M. Pillon, R. Pilotti, A. Pola, M. Romero-Expósito, F. Sánchez-Doblado, O. Sans-Planell, A. Scherillo, E. Soldani, M. Treccani, A. Pietropaolo

Abstract: The HOmogeneous Thermal NEutron Source (HOTNES) is a new type of thermal neutron irradiation assembly developed by the ENEA-INFN collaboration. The facility is fully characterized in terms of neutron field and dosimetric quantities, by either computational and experimental methods. This paper reports the results of the first "HOTNES users program", carried out in 2016, and covering a variety of th… ▽ More The HOmogeneous Thermal NEutron Source (HOTNES) is a new type of thermal neutron irradiation assembly developed by the ENEA-INFN collaboration. The facility is fully characterized in terms of neutron field and dosimetric quantities, by either computational and experimental methods. This paper reports the results of the first "HOTNES users program", carried out in 2016, and covering a variety of thermal neutron active detectors such as scintillators, solid-state, single crystal diamond and gaseous detectors. △ Less

Submitted 22 February, 2018; originally announced February 2018.

Journal ref: Journal of Instrumentation, Volume 12, December 2017

arXiv:1711.03822 [pdf, other]

LSTM Networks for Data-Aware Remaining Time Prediction of Business Process Instances

Authors: Nicolò Navarin, Beatrice Vincenzi, Mirko Polato, Alessandro Sperduti

Abstract: Predicting the completion time of business process instances would be a very helpful aid when managing processes under service level agreement constraints. The ability to know in advance the trend of running process instances would allow business managers to react in time, in order to prevent delays or undesirable situations. However, making such accurate forecasts is not easy: many factors may in… ▽ More Predicting the completion time of business process instances would be a very helpful aid when managing processes under service level agreement constraints. The ability to know in advance the trend of running process instances would allow business managers to react in time, in order to prevent delays or undesirable situations. However, making such accurate forecasts is not easy: many factors may influence the required time to complete a process instance. In this paper, we propose an approach based on deep Recurrent Neural Networks (specifically LSTMs) that is able to exploit arbitrary information associated to single events, in order to produce an as-accurate-as-possible prediction of the completion time of running instances. Experiments on real-world datasets confirm the quality of our proposal. △ Less

Submitted 10 November, 2017; originally announced November 2017.

Comments: Article accepted for publication in 2017 IEEE Symposium on Deep Learning (IEEE DL'17) @ SSCI

arXiv:1602.07566 [pdf, other]

Time and Activity Sequence Prediction of Business Process Instances

Authors: Mirko Polato, Alessandro Sperduti, Andrea Burattin, Massimiliano de Leoni

Abstract: The ability to know in advance the trend of running process instances, with respect to different features, such as the expected completion time, would allow business managers to timely counteract to undesired situations, in order to prevent losses. Therefore, the ability to accurately predict future features of running business process instances would be a very helpful aid when managing processes,… ▽ More The ability to know in advance the trend of running process instances, with respect to different features, such as the expected completion time, would allow business managers to timely counteract to undesired situations, in order to prevent losses. Therefore, the ability to accurately predict future features of running business process instances would be a very helpful aid when managing processes, especially under service level agreement constraints. However, making such accurate forecasts is not easy: many factors may influence the predicted features. Many approaches have been proposed to cope with this problem but all of them assume that the underling process is stationary. However, in real cases this assumption is not always true. In this work we present new methods for predicting the remaining time of running cases. In particular we propose a method, assuming process stationarity, which outperforms the state-of-the-art and two other methods which are able to make predictions even with non-stationary processes. We also describe an approach able to predict the full sequence of activities that a running case is going to take. All these methods are extensively evaluated on two real case studies. △ Less

Submitted 24 February, 2016; originally announced February 2016.

arXiv:1509.06589 [pdf, other]

doi 10.1007/978-3-319-12640-1_12

Graph Kernels exploiting Weisfeiler-Lehman Graph Isomorphism Test Extensions

Authors: Giovanni Da San Martino, Nicolò Navarin, Alessandro Sperduti

Abstract: In this paper we present a novel graph kernel framework inspired the by the Weisfeiler-Lehman (WL) isomorphism tests. Any WL test comprises a relabelling phase of the nodes based on test-specific information extracted from the graph, for example the set of neighbours of a node. We defined a novel relabelling and derived two kernels of the framework from it. The novel kernels are very fast to compu… ▽ More In this paper we present a novel graph kernel framework inspired the by the Weisfeiler-Lehman (WL) isomorphism tests. Any WL test comprises a relabelling phase of the nodes based on test-specific information extracted from the graph, for example the set of neighbours of a node. We defined a novel relabelling and derived two kernels of the framework from it. The novel kernels are very fast to compute and achieve state-of-the-art results on five real-world datasets. △ Less

Submitted 22 September, 2015; originally announced September 2015.

Journal ref: Neural Information Processing, Volume 8835 of the series Lecture Notes in Computer Science pp 93-100, 2014 Springer International Publishing

arXiv:1509.01116 [pdf, ps, other]

doi 10.1109/TNNLS.2017.2705694

A tree-based kernel for graphs with continuous attributes

Authors: Giovanni Da San Martino, Nicolò Navarin, Alessandro Sperduti

Abstract: The availability of graph data with node attributes that can be either discrete or real-valued is constantly increasing. While existing kernel methods are effective techniques for dealing with graphs having discrete node labels, their adaptation to non-discrete or continuous node attributes has been limited, mainly for computational issues. Recently, a few kernels especially tailored for this doma… ▽ More The availability of graph data with node attributes that can be either discrete or real-valued is constantly increasing. While existing kernel methods are effective techniques for dealing with graphs having discrete node labels, their adaptation to non-discrete or continuous node attributes has been limited, mainly for computational issues. Recently, a few kernels especially tailored for this domain, and that trade predictive performance for computational efficiency, have been proposed. In this paper, we propose a graph kernel for complex and continuous nodes' attributes, whose features are tree structures extracted from specific graph visits. The kernel manages to keep the same complexity of state-of-the-art kernels while implicitly using a larger feature space. We further present an approximated variant of the kernel which reduces its complexity significantly. Experimental results obtained on six real-world datasets show that the kernel is the best performing one on most of them. Moreover, in most cases the approximated version reaches comparable performances to current state-of-the-art kernels in terms of classification accuracy while greatly shortening the running times. △ Less

Submitted 20 December, 2016; v1 submitted 3 September, 2015; originally announced September 2015.

Comments: This work has been submitted to the IEEE Transactions on Neural Networks and Learning Systems for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:1507.03372 [pdf, other]

doi 10.1016/j.neucom.2015.12.110

Ordered Decompositional DAG Kernels Enhancements

Authors: Giovanni Da San Martino, Nicolò Navarin, Alessandro Sperduti

Abstract: In this paper, we show how the Ordered Decomposition DAGs (ODD) kernel framework, a framework that allows the definition of graph kernels from tree kernels, allows to easily define new state-of-the-art graph kernels. Here we consider a fast graph kernel based on the Subtree kernel (ST), and we propose various enhancements to increase its expressiveness. The proposed DAG kernel has the same worst… ▽ More In this paper, we show how the Ordered Decomposition DAGs (ODD) kernel framework, a framework that allows the definition of graph kernels from tree kernels, allows to easily define new state-of-the-art graph kernels. Here we consider a fast graph kernel based on the Subtree kernel (ST), and we propose various enhancements to increase its expressiveness. The proposed DAG kernel has the same worst-case complexity as the one based on ST, but an improved expressivity due to an augmented set of features. Moreover, we propose a novel weighting scheme for the features, which can be applied to other kernels of the ODD framework. These improvements allow the proposed kernels to improve on the classification performances of the ST-based kernel for several real-world datasets, reaching state-of-the-art performances. △ Less

Submitted 28 December, 2015; v1 submitted 13 July, 2015; originally announced July 2015.

Comments: Paper accepted for publication in Neurocomputing

Journal ref: Neurocomputing, Volume 192, 5 June 2016, Pages 92--103

arXiv:1507.02186 [pdf, ps, other]

doi 10.1007/978-3-319-26561-2_33

Extending local features with contextual information in graph kernels

Authors: Nicolò Navarin, Alessandro Sperduti, Riccardo Tesselli

Abstract: Graph kernels are usually defined in terms of simpler kernels over local substructures of the original graphs. Different kernels consider different types of substructures. However, in some cases they have similar predictive performances, probably because the substructures can be interpreted as approximations of the subgraphs they induce. In this paper, we propose to associate to each feature a pie… ▽ More Graph kernels are usually defined in terms of simpler kernels over local substructures of the original graphs. Different kernels consider different types of substructures. However, in some cases they have similar predictive performances, probably because the substructures can be interpreted as approximations of the subgraphs they induce. In this paper, we propose to associate to each feature a piece of information about the context in which the feature appears in the graph. A substructure appearing in two different graphs will match only if it appears with the same context in both graphs. We propose a kernel based on this idea that considers trees as substructures, and where the contexts are features too. The kernel is inspired from the framework in [6], even if it is not part of it. We give an efficient algorithm for computing the kernel and show promising results on real-world graph classification datasets. △ Less

Submitted 3 September, 2015; v1 submitted 8 July, 2015; originally announced July 2015.

Comments: To appear in ICONIP 2015

Report number: 9492, pp 271-279

Journal ref: Lecture Notes in Computer Science, Neural Information Processing, 22nd International Conference, ICONIP 2015, November 9-12, 2015, Proceedings, Part IV

arXiv:1507.02158 [pdf, other]

An Empirical Study on Budget-Aware Online Kernel Algorithms for Streams of Graphs

Authors: Giovanni Da San Martino, Nicolò Navarin, Alessandro Sperduti

Abstract: Kernel methods are considered an effective technique for on-line learning. Many approaches have been developed for compactly representing the dual solution of a kernel method when the problem imposes memory constraints. However, in literature no work is specifically tailored to streams of graphs. Motivated by the fact that the size of the feature space representation of many state-of-the-art graph… ▽ More Kernel methods are considered an effective technique for on-line learning. Many approaches have been developed for compactly representing the dual solution of a kernel method when the problem imposes memory constraints. However, in literature no work is specifically tailored to streams of graphs. Motivated by the fact that the size of the feature space representation of many state-of-the-art graph kernels is relatively small and thus it is explicitly computable, we study whether executing kernel algorithms in the feature space can be more effective than the classical dual approach. We study three different algorithms and various strategies for managing the budget. Efficiency and efficacy of the proposed approaches are experimentally assessed on relatively large graph streams exhibiting concept drift. It turns out that, when strict memory budget constraints have to be enforced, working in feature space, given the current state of the art on graph kernels, is more than a viable alternative to dual approaches, both in terms of speed and classification performance. △ Less

Submitted 20 July, 2016; v1 submitted 8 July, 2015; originally announced July 2015.

Comments: Author's version of the manuscript, to appear in Neurocomputing (ELSEVIER)

arXiv:1503.04957 [pdf, other]

doi 10.1016/j.eswa.2016.08.040

Conformance Checking Based on Multi-Perspective Declarative Process Models

Authors: Andrea Burattin, Fabrizio Maria Maggi, Alessandro Sperduti

Abstract: Process mining is a family of techniques that aim at analyzing business process execution data recorded in event logs. Conformance checking is a branch of this discipline embracing approaches for verifying whether the behavior of a process, as recorded in a log, is in line with some expected behaviors provided in the form of a process model. The majority of these approaches require the input proce… ▽ More Process mining is a family of techniques that aim at analyzing business process execution data recorded in event logs. Conformance checking is a branch of this discipline embracing approaches for verifying whether the behavior of a process, as recorded in a log, is in line with some expected behaviors provided in the form of a process model. The majority of these approaches require the input process model to be procedural (e.g., a Petri net). However, in turbulent environments, characterized by high variability, the process behavior is less stable and predictable. In these environments, procedural process models are less suitable to describe a business process. Declarative specifications, working in an open world assumption, allow the modeler to express several possible execution paths as a compact set of constraints. Any process execution that does not contradict these constraints is allowed. One of the open challenges in the context of conformance checking with declarative models is the capability of supporting multi-perspective specifications. In this paper, we close this gap by providing a framework for conformance checking based on MP-Declare, a multi-perspective version of the declarative process modeling language Declare. The approach has been implemented in the process mining tool ProM and has been experimented in three real life case studies. △ Less

Submitted 17 March, 2015; originally announced March 2015.

arXiv:1212.6383 [pdf, other]

doi 10.1109/CEC.2014.6900341

Heuristics Miners for Streaming Event Data

Authors: Andrea Burattin, Alessandro Sperduti, Wil M. P. van der Aalst

Abstract: More and more business activities are performed using information systems. These systems produce such huge amounts of event data that existing systems are unable to store and process them. Moreover, few processes are in steady-state and due to changing circumstances processes evolve and systems need to adapt continuously. Since conventional process discovery algorithms have been defined for batch… ▽ More More and more business activities are performed using information systems. These systems produce such huge amounts of event data that existing systems are unable to store and process them. Moreover, few processes are in steady-state and due to changing circumstances processes evolve and systems need to adapt continuously. Since conventional process discovery algorithms have been defined for batch processing, it is difficult to apply them in such evolving environments. Existing algorithms cannot cope with streaming event data and tend to generate unreliable and obsolete results. In this paper, we discuss the peculiarities of dealing with streaming event data in the context of process mining. Subsequently, we present a general framework for defining process mining algorithms in settings where it is impossible to store all events over an extended period or where processes evolve while being analyzed. We show how the Heuristics Miner, one of the most effective process discovery algorithms for practical applications, can be modified using this framework. Different stream-aware versions of the Heuristics Miner are defined and implemented in ProM. Moreover, experimental results on artificial and real logs are reported. △ Less

Submitted 27 December, 2012; originally announced December 2012.

ACM Class: H.2.8; F.1.2

Showing 1–29 of 29 results for author: Sperduti, A