Search | arXiv e-print repository

Separation Power of Equivariant Neural Networks

Authors: Marco Pacini, Xiaowen Dong, Bruno Lepri, Gabriele Santin

Abstract: The separation power of a machine learning model refers to its capacity to distinguish distinct inputs, and it is often employed as a proxy for its expressivity. In this paper, we propose a theoretical framework to investigate the separation power of equivariant neural networks with point-wise activations. Using the proposed framework, we can derive an explicit description of inputs indistinguisha… ▽ More The separation power of a machine learning model refers to its capacity to distinguish distinct inputs, and it is often employed as a proxy for its expressivity. In this paper, we propose a theoretical framework to investigate the separation power of equivariant neural networks with point-wise activations. Using the proposed framework, we can derive an explicit description of inputs indistinguishable by a family of neural networks with given architecture, demonstrating that it remains unaffected by the choice of non-polynomial activation function employed. We are able to understand the role played by activation functions in separability. Indeed, we show that all non-polynomial activations, such as ReLU and sigmoid, are equivalent in terms of expressivity, and that they reach maximum discrimination capacity. We demonstrate how assessing the separation power of an equivariant neural network can be simplified to evaluating the separation power of minimal representations. We conclude by illustrating how these minimal components form a hierarchy in separation power. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 9 pages of main text, 2 figures

arXiv:2401.09235 [pdf, ps, other]

A Characterization Theorem for Equivariant Networks with Point-wise Activations

Authors: Marco Pacini, Xiaowen Dong, Bruno Lepri, Gabriele Santin

Abstract: Equivariant neural networks have shown improved performance, expressiveness and sample complexity on symmetrical domains. But for some specific symmetries, representations, and choice of coordinates, the most common point-wise activations, such as ReLU, are not equivariant, hence they cannot be employed in the design of equivariant neural networks. The theorem we present in this paper describes al… ▽ More Equivariant neural networks have shown improved performance, expressiveness and sample complexity on symmetrical domains. But for some specific symmetries, representations, and choice of coordinates, the most common point-wise activations, such as ReLU, are not equivariant, hence they cannot be employed in the design of equivariant neural networks. The theorem we present in this paper describes all possible combinations of finite-dimensional representations, choice of coordinates and point-wise activations to obtain an exactly equivariant layer, generalizing and strengthening existing characterizations. Notable cases of practical relevance are discussed as corollaries. Indeed, we prove that rotation-equivariant networks can only be invariant, as it happens for any network which is equivariant with respect to connected compact groups. Then, we discuss implications of our findings when applied to important instances of exactly equivariant networks. First, we completely characterize permutation equivariant networks such as Invariant Graph Networks with point-wise nonlinearities and their geometric counterparts, highlighting a plethora of models whose expressive power and performance are still unknown. Second, we show that feature spaces of disentangled steerable convolutional neural networks are trivial representations. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: Accepted at the 12th International Conference on Learning Representations (ICLR 2024)

arXiv:2309.08254 [pdf, other]

Autonomous and Human-Driven Vehicles Interacting in a Roundabout: A Quantitative and Qualitative Evaluation

Authors: Laura Ferrarotti, Massimiliano Luca, Gabriele Santin, Giorgio Previati, Gianpiero Mastinu, Massimiliano Gobbi, Elena Campi, Lorenzo Uccello, Antonino Albanese, Praveen Zalaya, Alessandro Roccasalva, Bruno Lepri

Abstract: Optimizing traffic dynamics in an evolving transportation landscape is crucial, particularly in scenarios where autonomous vehicles (AVs) with varying levels of autonomy coexist with human-driven cars. While optimizing Reinforcement Learning (RL) policies for such scenarios is becoming more and more common, little has been said about realistic evaluations of such trained policies. This paper prese… ▽ More Optimizing traffic dynamics in an evolving transportation landscape is crucial, particularly in scenarios where autonomous vehicles (AVs) with varying levels of autonomy coexist with human-driven cars. While optimizing Reinforcement Learning (RL) policies for such scenarios is becoming more and more common, little has been said about realistic evaluations of such trained policies. This paper presents an evaluation of the effects of AVs penetration among human drivers in a roundabout scenario, considering both quantitative and qualitative aspects. In particular, we learn a policy to minimize traffic jams (i.e., minimize the time to cross the scenario) and to minimize pollution in a roundabout in Milan, Italy. Through empirical analysis, we demonstrate that the presence of AVs} can reduce time and pollution levels. Furthermore, we qualitatively evaluate the learned policy using a cutting-edge cockpit to assess its performance in near-real-world conditions. To gauge the practicality and acceptability of the policy, we conduct evaluations with human participants using the simulator, focusing on a range of metrics like traffic smoothness and safety perception. In general, our findings show that human-driven vehicles benefit from optimizing AVs dynamics. Also, participants in the study highlight that the scenario with 80% AVs is perceived as safer than the scenario with 20%. The same result is obtained for traffic smoothness perception. △ Less

Submitted 23 February, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

arXiv:2302.01018 [pdf, other]

Graph Neural Networks for temporal graphs: State of the art, open challenges, and opportunities

Authors: Antonio Longa, Veronica Lachi, Gabriele Santin, Monica Bianchini, Bruno Lepri, Pietro Lio, Franco Scarselli, Andrea Passerini

Abstract: Graph Neural Networks (GNNs) have become the leading paradigm for learning on (static) graph-structured data. However, many real-world systems are dynamic in nature, since the graph and node/edge attributes change over time. In recent years, GNN-based models for temporal graphs have emerged as a promising area of research to extend the capabilities of GNNs. In this work, we provide the first compr… ▽ More Graph Neural Networks (GNNs) have become the leading paradigm for learning on (static) graph-structured data. However, many real-world systems are dynamic in nature, since the graph and node/edge attributes change over time. In recent years, GNN-based models for temporal graphs have emerged as a promising area of research to extend the capabilities of GNNs. In this work, we provide the first comprehensive overview of the current state-of-the-art of temporal GNN, introducing a rigorous formalization of learning settings and tasks and a novel taxonomy categorizing existing approaches in terms of how the temporal aspect is represented and processed. We conclude the survey with a discussion of the most relevant open challenges for the field, from both research and application perspectives. △ Less

Submitted 8 July, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

arXiv:2212.07658 [pdf, other]

doi 10.14658/pupj-drna-2022-4-5

Interpolation with the polynomial kernels

Authors: Giacomo Elefante, Wolfgang Erb, Francesco Marchetti, Emma Perracchione, Davide Poggiali, Gabriele Santin

Abstract: The polynomial kernels are widely used in machine learning and they are one of the default choices to develop kernel-based classification and regression models. However, they are rarely used and considered in numerical analysis due to their lack of strict positive definiteness. In particular they do not enjoy the usual property of unisolvency for arbitrary point sets, which is one of the key prope… ▽ More The polynomial kernels are widely used in machine learning and they are one of the default choices to develop kernel-based classification and regression models. However, they are rarely used and considered in numerical analysis due to their lack of strict positive definiteness. In particular they do not enjoy the usual property of unisolvency for arbitrary point sets, which is one of the key properties used to build kernel-based interpolation methods. This paper is devoted to establish some initial results for the study of these kernels, and their related interpolation algorithms, in the context of approximation theory. We will first prove necessary and sufficient conditions on point sets which guarantee the existence and uniqueness of an interpolant. We will then study the Reproducing Kernel Hilbert Spaces (or native spaces) of these kernels and their norms, and provide inclusion relations between spaces corresponding to different kernel parameters. With these spaces at hand, it will be further possible to derive generic error estimates which apply to sufficiently smooth functions, thus esca** the native space. Finally, we will show how to employ an efficient stable algorithm to these kernels to obtain accurate interpolants, and we will test them in some numerical experiment. After this analysis several computational and theoretical aspects remain open, and we will outline possible further research directions in a concluding section. This work builds some bridges between kernel and polynomial interpolation, two topics to which the authors, to different extents, have been introduced under the supervision or through the work of Stefano De Marchi. For this reason, they wish to dedicate this work to him in the occasion of his 60th birthday. △ Less

Submitted 15 December, 2022; originally announced December 2022.

arXiv:2210.15304 [pdf, other]

Explaining the Explainers in Graph Neural Networks: a Comparative Study

Authors: Antonio Longa, Steve Azzolin, Gabriele Santin, Giulia Cencetti, Pietro Liò, Bruno Lepri, Andrea Passerini

Abstract: Following a fast initial breakthrough in graph based learning, Graph Neural Networks (GNNs) have reached a widespread application in many science and engineering fields, prompting the need for methods to understand their decision process. GNN explainers have started to emerge in recent years, with a multitude of methods both novel or adapted from other domains. To sort out this plethora of alter… ▽ More Following a fast initial breakthrough in graph based learning, Graph Neural Networks (GNNs) have reached a widespread application in many science and engineering fields, prompting the need for methods to understand their decision process. GNN explainers have started to emerge in recent years, with a multitude of methods both novel or adapted from other domains. To sort out this plethora of alternative approaches, several studies have benchmarked the performance of different explainers in terms of various explainability metrics. However, these earlier works make no attempts at providing insights into why different GNN architectures are more or less explainable, or which explainer should be preferred in a given setting. In this survey, we fill these gaps by devising a systematic experimental study, which tests ten explainers on eight representative architectures trained on six carefully designed graph and node classification datasets. With our results we provide key insights on the choice and applicability of GNN explainers, we isolate key components that make them usable and successful and provide recommendations on how to avoid common interpretation pitfalls. We conclude by highlighting open questions and directions of possible future research. △ Less

Submitted 1 July, 2024; v1 submitted 27 October, 2022; originally announced October 2022.

arXiv:2203.07802 [pdf, other]

doi 10.1109/ACCESS.2022.3196391

A Framework for Verifiable and Auditable Federated Anomaly Detection

Authors: Gabriele Santin, Inna Skarbovsky, Fabiana Fournier, Bruno Lepri

Abstract: Federated Leaning is an emerging approach to manage cooperation between a group of agents for the solution of Machine Learning tasks, with the goal of improving each agent's performance without disclosing any data. In this paper we present a novel algorithmic architecture that tackle this problem in the particular case of Anomaly Detection (or classification or rare events), a setting where typica… ▽ More Federated Leaning is an emerging approach to manage cooperation between a group of agents for the solution of Machine Learning tasks, with the goal of improving each agent's performance without disclosing any data. In this paper we present a novel algorithmic architecture that tackle this problem in the particular case of Anomaly Detection (or classification or rare events), a setting where typical applications often comprise data with sensible information, but where the scarcity of anomalous examples encourages collaboration. We show how Random Forests can be used as a tool for the development of accurate classifiers with an effective insight-sharing mechanism that does not break the data integrity. Moreover, we explain how the new architecture can be readily integrated in a blockchain infrastructure to ensure the verifiable and auditable execution of the algorithm. Furthermore, we discuss how this work may set the basis for a more general approach for the design of federated ensemble-learning methods beyond the specific task and architecture discussed in this paper. △ Less

Submitted 15 March, 2022; originally announced March 2022.

arXiv:2203.05811 [pdf, ps, other]

Reprogramming FairGANs with Variational Auto-Encoders: A New Transfer Learning Model

Authors: Beatrice Nobile, Gabriele Santin, Bruno Lepri, Pierpaolo Brutti

Abstract: Fairness-aware GANs (FairGANs) exploit the mechanisms of Generative Adversarial Networks (GANs) to impose fairness on the generated data, freeing them from both disparate impact and disparate treatment. Given the model's advantages and performance, we introduce a novel learning framework to transfer a pre-trained FairGAN to other tasks. This reprogramming process has the goal of maintaining the Fa… ▽ More Fairness-aware GANs (FairGANs) exploit the mechanisms of Generative Adversarial Networks (GANs) to impose fairness on the generated data, freeing them from both disparate impact and disparate treatment. Given the model's advantages and performance, we introduce a novel learning framework to transfer a pre-trained FairGAN to other tasks. This reprogramming process has the goal of maintaining the FairGAN's main targets of data utility, classification utility, and data fairness, while widening its applicability and ease of use. In this paper we present the technical extensions required to adapt the original architecture to this new framework (and in particular the use of Variational Auto-Encoders), and discuss the benefits, trade-offs, and limitations of the new model. △ Less

Submitted 11 March, 2022; originally announced March 2022.

arXiv:2106.14750 [pdf, other]

doi 10.1140/epjds/s13688-022-00316-y

Measuring close proximity interactions in summer camps during the COVID-19 pandemic

Authors: E. Leoni, G. Cencetti, G. Santin, T. Istomin, D. Molteni, G. P. Picco, E. Farella, B. Lepri, A. M. Murphy

Abstract: Policy makers have implemented multiple non-pharmaceutical strategies to mitigate the COVID-19 worldwide crisis. Interventions had the aim of reducing close proximity interactions, which drive the spread of the disease. A deeper knowledge of human physical interactions has revealed necessary, especially in all settings involving children, whose education and gathering activities should be preserve… ▽ More Policy makers have implemented multiple non-pharmaceutical strategies to mitigate the COVID-19 worldwide crisis. Interventions had the aim of reducing close proximity interactions, which drive the spread of the disease. A deeper knowledge of human physical interactions has revealed necessary, especially in all settings involving children, whose education and gathering activities should be preserved. Despite their relevance, almost no data are available on close proximity contacts among children in schools or other educational settings during the pandemic. Contact data are usually gathered via Bluetooth, which nonetheless offers a low temporal and spatial resolution. Recently, ultra-wideband (UWB) radios emerged as a more accurate alternative that nonetheless exhibits a significantly higher energy consumption, limiting in-field studies. In this paper, we leverage a novel approach, embodied by the Janus system that combines these radios by exploiting their complementary benefits. The very accurate proximity data gathered in-field by Janus, once augmented with several metadata, unlocks unprecedented levels of information, enabling the development of novel multi-level risk analyses. By means of this technology, we have collected real contact data of children and educators in three summer camps during summer 2020 in the province of Trento, Italy. The wide variety of performed daily activities induced multiple individual behaviors, allowing a rich investigation of social environments from the contagion risk perspective. We consider risk based on duration and proximity of contacts and classify interactions according to different risk levels. We can then evaluate the summer camps' organization, observe the effect of partition in small groups, or social bubbles, and identify the organized activities that mitigate the riskier behaviors. [...] △ Less

Submitted 28 June, 2021; originally announced June 2021.

arXiv:2105.07228 [pdf, other]

Universality and Optimality of Structured Deep Kernel Networks

Authors: Tizian Wenzel, Gabriele Santin, Bernard Haasdonk

Abstract: Kernel based methods yield approximation models that are flexible, efficient and powerful. In particular, they utilize fixed feature maps of the data, being often associated to strong analytical results that prove their accuracy. On the other hand, the recent success of machine learning methods has been driven by deep neural networks (NNs). They achieve a significant accuracy on very high-dimensio… ▽ More Kernel based methods yield approximation models that are flexible, efficient and powerful. In particular, they utilize fixed feature maps of the data, being often associated to strong analytical results that prove their accuracy. On the other hand, the recent success of machine learning methods has been driven by deep neural networks (NNs). They achieve a significant accuracy on very high-dimensional data, in that they are able to learn also efficient data representations or data-based feature maps. In this paper, we leverage a recent deep kernel representer theorem to connect the two approaches and understand their interplay. In particular, we show that the use of special types of kernels yield models reminiscent of neural networks that are founded in the same theoretical framework of classical kernel methods, while enjoying many computational properties of deep neural networks. Especially the introduced Structured Deep Kernel Networks (SDKNs) can be viewed as neural networks with optimizable activation functions obeying a representer theorem. Analytic properties show their universal approximation properties in different asymptotic regimes of unbounded number of centers, width and depth. Especially in the case of unbounded depth, the constructions is asymptotically better than corresponding constructions for ReLU neural networks, which is made possible by the flexibility of kernel approximation △ Less

Submitted 15 May, 2021; originally announced May 2021.

arXiv:2103.13655 [pdf, other]

doi 10.1007/978-3-030-97549-4_47

Structured Deep Kernel Networks for Data-Driven Closure Terms of Turbulent Flows

Authors: Tizian Wenzel, Marius Kurz, Andrea Beck, Gabriele Santin, Bernard Haasdonk

Abstract: Standard kernel methods for machine learning usually struggle when dealing with large datasets. We review a recently introduced Structured Deep Kernel Network (SDKN) approach that is capable of dealing with high-dimensional and huge datasets - and enjoys typical standard machine learning approximation properties. We extend the SDKN to combine it with standard machine learning modules and compare i… ▽ More Standard kernel methods for machine learning usually struggle when dealing with large datasets. We review a recently introduced Structured Deep Kernel Network (SDKN) approach that is capable of dealing with high-dimensional and huge datasets - and enjoys typical standard machine learning approximation properties. We extend the SDKN to combine it with standard machine learning modules and compare it with Neural Networks on the scientific challenge of data-driven prediction of closure terms of turbulent flows. We show experimentally that the SDKNs are capable of dealing with large datasets and achieve near-perfect accuracy on the given application. △ Less

Submitted 25 March, 2021; originally announced March 2021.

arXiv:2103.01575 [pdf, other]

doi 10.1016/j.cam.2022.114951

Kernel-Based Models for Influence Maximization on Graphs based on Gaussian Process Variance Minimization

Authors: Salvatore Cuomo, Wolfgang Erb, Gabriele Santin

Abstract: The inference of novel knowledge, the discovery of hidden patterns, and the uncovering of insights from large amounts of data from a multitude of sources make Data Science (DS) to an art rather than just a mere scientific discipline. The study and design of mathematical models able to analyze information represents a central research topic in DS. In this work, we introduce and investigate a novel… ▽ More The inference of novel knowledge, the discovery of hidden patterns, and the uncovering of insights from large amounts of data from a multitude of sources make Data Science (DS) to an art rather than just a mere scientific discipline. The study and design of mathematical models able to analyze information represents a central research topic in DS. In this work, we introduce and investigate a novel model for influence maximization (IM) on graphs using ideas from kernel-based approximation, Gaussian process regression, and the minimization of a corresponding variance term. Data-driven approaches can be applied to determine proper kernels for this IM model and machine learning methodologies are adopted to tune the model parameters. Compared to stochastic models in this field that rely on costly Monte-Carlo simulations, our model allows for a simple and cost-efficient update strategy to compute optimal influencing nodes on a graph. In several numerical experiments, we show the properties and benefits of this new model. △ Less

Submitted 2 March, 2021; originally announced March 2021.

arXiv:2012.00338 [pdf, ps, other]

doi 10.1016/j.physd.2021.133007

Kernel methods for center manifold approximation and a data-based version of the Center Manifold Theorem

Authors: Bernard Haasdonk, Boumediene Hamzi, Gabriele Santin, Dominik Wittwar

Abstract: For dynamical systems with a non hyperbolic equilibrium, it is possible to significantly simplify the study of stability by means of the center manifold theory. This theory allows to isolate the complicated asymptotic behavior of the system close to the equilibrium point and to obtain meaningful predictions of its behavior by analyzing a reduced order system on the so-called center manifold. Sin… ▽ More For dynamical systems with a non hyperbolic equilibrium, it is possible to significantly simplify the study of stability by means of the center manifold theory. This theory allows to isolate the complicated asymptotic behavior of the system close to the equilibrium point and to obtain meaningful predictions of its behavior by analyzing a reduced order system on the so-called center manifold. Since the center manifold is usually not known, good approximation methods are important as the center manifold theorem states that the stability properties of the origin of the reduced order system are the same as those of the origin of the full order system. In this work, we establish a data-based version of the center manifold theorem that works by considering an approximation in place of an exact manifold. Also the error between the approximated and the original reduced dynamics are quantified. We then use an apposite data-based kernel method to construct a suitable approximation of the manifold close to the equilibrium, which is compatible with our general error theory. The data are collected by repeated numerical simulation of the full system by means of a high-accuracy solver, which generates sets of discrete trajectories that are then used as a training set. The method is tested on different examples which show promising performance and good accuracy. △ Less

Submitted 1 December, 2020; originally announced December 2020.

arXiv:2004.12670 [pdf, other]

doi 10.1007/978-3-030-55874-1_49

Biomechanical surrogate modelling using stabilized vectorial greedy kernel methods

Authors: Bernard Haasdonk, Tizian Wenzel, Gabriele Santin, Syn Schmitt

Abstract: Greedy kernel approximation algorithms are successful techniques for sparse and accurate data-based modelling and function approximation. Based on a recent idea of stabilization of such algorithms in the scalar output case, we here consider the vectorial extension built on VKOGA. We introduce the so called $γ$-restricted VKOGA, comment on analytical properties and present numerical evaluation on d… ▽ More Greedy kernel approximation algorithms are successful techniques for sparse and accurate data-based modelling and function approximation. Based on a recent idea of stabilization of such algorithms in the scalar output case, we here consider the vectorial extension built on VKOGA. We introduce the so called $γ$-restricted VKOGA, comment on analytical properties and present numerical evaluation on data from a clinically relevant application, the modelling of the human spine. The experiments show that the new stabilized algorithms result in improved accuracy and stability over the non-stabilized algorithms. △ Less

Submitted 28 April, 2020; v1 submitted 27 April, 2020; originally announced April 2020.

Journal ref: Numerical Mathematics and Advanced Applications ENUMATH 2019

arXiv:1802.03064 [pdf, other]

doi 10.1007/s10596-018-9785-x

Comparison of data-driven uncertainty quantification methods for a carbon dioxide storage benchmark scenario

Authors: Markus Köppel, Fabian Franzelin, Ilja Kröker, Sergey Oladyshkin, Gabriele Santin, Dominik Wittwar, Andrea Barth, Bernard Haasdonk, Wolfgang Nowak, Dirk Pflüger, Christian Rohde

Abstract: A variety of methods is available to quantify uncertainties arising with\-in the modeling of flow and transport in carbon dioxide storage, but there is a lack of thorough comparisons. Usually, raw data from such storage sites can hardly be described by theoretical statistical distributions since only very limited data is available. Hence, exact information on distribution shapes for all uncertain… ▽ More A variety of methods is available to quantify uncertainties arising with\-in the modeling of flow and transport in carbon dioxide storage, but there is a lack of thorough comparisons. Usually, raw data from such storage sites can hardly be described by theoretical statistical distributions since only very limited data is available. Hence, exact information on distribution shapes for all uncertain parameters is very rare in realistic applications. We discuss and compare four different methods tested for data-driven uncertainty quantification based on a benchmark scenario of carbon dioxide storage. In the benchmark, for which we provide data and code, carbon dioxide is injected into a saline aquifer modeled by the nonlinear capillarity-free fractional flow formulation for two incompressible fluid phases, namely carbon dioxide and brine. To cover different aspects of uncertainty quantification, we incorporate various sources of uncertainty such as uncertainty of boundary conditions, of conceptual model definitions and of material properties. We consider recent versions of the following non-intrusive and intrusive uncertainty quantification methods: arbitary polynomial chaos, spatially adaptive sparse grids, kernel-based greedy interpolation and hybrid stochastic Galerkin. The performance of each approach is demonstrated assessing expectation value and standard deviation of the carbon dioxide saturation against a reference statistic based on Monte Carlo sampling. We compare the convergence of all methods reporting on accuracy with respect to the number of model runs and resolution. Finally we offer suggestions about the methods' advantages and disadvantages that can guide the modeler for uncertainty quantification in carbon dioxide storage and beyond. △ Less

Submitted 8 February, 2018; originally announced February 2018.

MSC Class: 65D05; 65D15; 65C20

Showing 1–15 of 15 results for author: Santin, G