Search | arXiv e-print repository

Cross-Architecture Auxiliary Feature Space Translation for Efficient Few-Shot Personalized Object Detection

Authors: Francesco Barbato, Umberto Michieli, Jijoong Moon, Pietro Zanuttigh, Mete Ozay

Abstract: Recent years have seen object detection robotic systems deployed in several personal devices (e.g., home robots and appliances). This has highlighted a challenge in their design, i.e., they cannot efficiently update their knowledge to distinguish between general classes and user-specific instances (e.g., a dog vs. user's dog). We refer to this challenging task as Instance-level Personalized Object… ▽ More Recent years have seen object detection robotic systems deployed in several personal devices (e.g., home robots and appliances). This has highlighted a challenge in their design, i.e., they cannot efficiently update their knowledge to distinguish between general classes and user-specific instances (e.g., a dog vs. user's dog). We refer to this challenging task as Instance-level Personalized Object Detection (IPOD). The personalization task requires many samples for model tuning and optimization in a centralized server, raising privacy concerns. An alternative is provided by approaches based on recent large-scale Foundation Models, but their compute costs preclude on-device applications. In our work we tackle both problems at the same time, designing a Few-Shot IPOD strategy called AuXFT. We introduce a conditional coarse-to-fine few-shot learner to refine the coarse predictions made by an efficient object detector, showing that using an off-the-shelf model leads to poor personalization due to neural collapse. Therefore, we introduce a Translator block that generates an auxiliary feature space where features generated by a self-supervised model (e.g., DINOv2) are distilled without impacting the performance of the detector. We validate AuXFT on three publicly available datasets and one in-house benchmark designed for the IPOD task, achieving remarkable gains in all considered scenarios with excellent time-complexity trade-off: AuXFT reaches a performance of 80% its upper bound at just 32% of the inference time, 13% of VRAM and 19% of the model size. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Accepted at IROS 2024, 8 pages, 4 figures, 6 tables

arXiv:2406.14563 [pdf, other]

Model Merging and Safety Alignment: One Bad Model Spoils the Bunch

Authors: Hasan Abed Al Kader Hammoud, Umberto Michieli, Fabio Pizzati, Philip Torr, Adel Bibi, Bernard Ghanem, Mete Ozay

Abstract: Merging Large Language Models (LLMs) is a cost-effective technique for combining multiple expert LLMs into a single versatile model, retaining the expertise of the original ones. However, current approaches often overlook the importance of safety alignment during merging, leading to highly misaligned models. This work investigates the effects of model merging on alignment. We evaluate several popu… ▽ More Merging Large Language Models (LLMs) is a cost-effective technique for combining multiple expert LLMs into a single versatile model, retaining the expertise of the original ones. However, current approaches often overlook the importance of safety alignment during merging, leading to highly misaligned models. This work investigates the effects of model merging on alignment. We evaluate several popular model merging techniques, demonstrating that existing methods do not only transfer domain expertise but also propagate misalignment. We propose a simple two-step approach to address this problem: (i) generating synthetic safety and domain-specific data, and (ii) incorporating these generated data into the optimization process of existing data-aware model merging techniques. This allows us to treat alignment as a skill that can be maximized in the resulting merged LLM. Our experiments illustrate the effectiveness of integrating alignment-related data during merging, resulting in models that excel in both domain expertise and alignment. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: Under review

arXiv:2406.13355 [pdf, ps, other]

Linear codes in the folded Hamming distance and the quasi MDS property

Authors: Umberto Martínez-Peñas, Rubén Rodríguez-Ballesteros

Abstract: In this work, we study linear codes with the folded Hamming distance, or equivalently, codes with the classical Hamming distance that are linear over a subfield. This includes additive codes. We study MDS codes in this setting and define quasi MDS (QMDS) codes and dually QMDS codes, which attain a more relaxed variant of the classical Singleton bound. We provide several general results concerning… ▽ More In this work, we study linear codes with the folded Hamming distance, or equivalently, codes with the classical Hamming distance that are linear over a subfield. This includes additive codes. We study MDS codes in this setting and define quasi MDS (QMDS) codes and dually QMDS codes, which attain a more relaxed variant of the classical Singleton bound. We provide several general results concerning these codes, including restriction, shortening, weight distributions, existence, density, geometric description and bounds on their lengths relative to their field sizes. We provide explicit examples and a binary construction with optimal lengths relative to their field sizes, which beats any MDS code. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.12451 [pdf, ps, other]

Small maximal clusters are very unlikely in critical random graphs

Authors: Umberto De Ambroggio

Abstract: We describe a probabilistic methodology, based on random walk estimates, to obtain exponential upper bounds for the probability of observing unusually small maximal components in two classical (near-)critical random graph models. More specifically, we analyse the near-critical Erdős-Rényi model $\mathbb{G}(n,p)$ and the random graph $\mathbb{G}(n,d,p)$ obtained by performing near-critical $p$-bond… ▽ More We describe a probabilistic methodology, based on random walk estimates, to obtain exponential upper bounds for the probability of observing unusually small maximal components in two classical (near-)critical random graph models. More specifically, we analyse the near-critical Erdős-Rényi model $\mathbb{G}(n,p)$ and the random graph $\mathbb{G}(n,d,p)$ obtained by performing near-critical $p$-bond percolation on a simple random $d$-regular graph and show that, for each one of these models, the probability that the size of a largest component is smaller than $n^{2/3}/A$ is at most of order $\exp(-A^{3/2})$. The exponent $3/2$ is known to be optimal for the near-critical $\mathbb{G}(n,p)$ random graph, whereas for the near-critical $\mathbb{G}(n,d,p)$ model the best known upper bound for the above probability was of order $A^{-3/5}$. As a secondary result we show, by means of an optimized version of the martingale method of Nachmias and Peres, that the above probability of observing an unusually small maximal component is at most of order $\exp(-A^{3/5})$ in other two critical models, namely a random intersection graph and the quantum random graph; this stretched-exponential bounds also improve upon the known (polynomial) bounds available for these other two critical models. △ Less

Submitted 18 June, 2024; originally announced June 2024.

MSC Class: 05C80; 60G50

arXiv:2406.10031 [pdf, other]

Deep Learning Domain Adaptation to Understand Physico-Chemical Processes from Fluorescence Spectroscopy Small Datasets: Application to Ageing of Olive Oil

Authors: Umberto Michelucci, Francesca Venturini

Abstract: Fluorescence spectroscopy is a fundamental tool in life sciences and chemistry, widely used for applications such as environmental monitoring, food quality control, and biomedical diagnostics. However, analysis of spectroscopic data with deep learning, in particular of fluorescence excitation-emission matrices (EEMs), presents significant challenges due to the typically small and sparse datasets a… ▽ More Fluorescence spectroscopy is a fundamental tool in life sciences and chemistry, widely used for applications such as environmental monitoring, food quality control, and biomedical diagnostics. However, analysis of spectroscopic data with deep learning, in particular of fluorescence excitation-emission matrices (EEMs), presents significant challenges due to the typically small and sparse datasets available. Furthermore, the analysis of EEMs is difficult due to their high dimensionality and overlap** spectral features. This study proposes a new approach that exploits domain adaptation with pretrained vision models, alongside a novel interpretability algorithm to address these challenges. Thanks to specialised feature engineering of the neural networks described in this work, we are now able to provide deeper insights into the physico-chemical processes underlying the data. The proposed approach is demonstrated through the analysis of the oxidation process in extra virgin olive oil (EVOO) during ageing, showing its effectiveness in predicting quality indicators and identifying the spectral bands, and thus the molecules involved in the process. This work describes a significantly innovative approach in the use of deep learning for spectroscopy, transforming it from a black box into a tool for understanding complex biological and chemical processes. △ Less

Submitted 22 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.07352 [pdf, other]

Stochastic Analysis of Homogeneous Wireless Networks Assisted by Intelligent Reflecting Surfaces

Authors: Ali H. Abdollahi Bafghi, Mahtab Mirmohseni, Masoumeh Nasiri-Kenari, Behrouz Maham, Umberto Spagnolini

Abstract: In this paper, we study the impact of the existence of multiple IRSs in a homogeneous wireless network, in which all BSs, users (U), and IRSs are spatially distributed by an independent homogeneous PPP, with density $λ_{\rm BS}\rm{[BS/m^2]}$, $λ_{\rm U}\rm{[U/m^2]}$, and $λ_{\rm IRS}\rm{[IRS/m^2]}$, respectively. We utilize a uniformly random serving strategy for BS and IRS to create stochastic sy… ▽ More In this paper, we study the impact of the existence of multiple IRSs in a homogeneous wireless network, in which all BSs, users (U), and IRSs are spatially distributed by an independent homogeneous PPP, with density $λ_{\rm BS}\rm{[BS/m^2]}$, $λ_{\rm U}\rm{[U/m^2]}$, and $λ_{\rm IRS}\rm{[IRS/m^2]}$, respectively. We utilize a uniformly random serving strategy for BS and IRS to create stochastic symmetry in the network. We analyze the performance of the network and study the effect of the existence of the IRS on the network performance. To this end, for a typical user in the system, we derive analytical upper and lower bounds on the expectation of the power (second statistical moment) of the desired signal and the interference caused by BSs and other users. After that, we obtain analytical upper bounds on the decay of the probability of the power of the desired signal and the interference for the typical user (which results in a lower bound for the cumulative distribution function (CDF)). Moreover, we derive upper bounds on the decay of the probability of the capacity of one typical user, which results in a lower bound for the outage probability. In the numerical results, we observe that the numerical calculation of the power of the desired signal and the interference is near the derived lower bounds and we show that the increment of the parameter ${(λ_{\rm IRS})}$ causes increment in powers of both the desired and interference signals. We also observe that the increment of the parameter ${λ_{\rm IRS}}$ causes the decrement of outage probability. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2405.17339 [pdf, other]

Physics-Informed Real NVP for Satellite Power System Fault Detection

Authors: Carlo Cena, Umberto Albertin, Mauro Martini, Silvia Bucci, Marcello Chiaberge

Abstract: The unique challenges posed by the space environment, characterized by extreme conditions and limited accessibility, raise the need for robust and reliable techniques to identify and prevent satellite faults. Fault detection methods in the space sector are required to ensure mission success and to protect valuable assets. In this context, this paper proposes an Artificial Intelligence (AI) based f… ▽ More The unique challenges posed by the space environment, characterized by extreme conditions and limited accessibility, raise the need for robust and reliable techniques to identify and prevent satellite faults. Fault detection methods in the space sector are required to ensure mission success and to protect valuable assets. In this context, this paper proposes an Artificial Intelligence (AI) based fault detection methodology and evaluates its performance on ADAPT (Advanced Diagnostics and Prognostics Testbed), an Electrical Power System (EPS) dataset, crafted in laboratory by NASA. Our study focuses on the application of a physics-informed (PI) real-valued non-volume preserving (Real NVP) model for fault detection in space systems. The efficacy of this method is systematically compared against other AI approaches such as Gated Recurrent Unit (GRU) and Autoencoder-based techniques. Results show that our physics-informed approach outperforms existing methods of fault detection, demonstrating its suitability for addressing the unique challenges of satellite EPS sub-system faults. Furthermore, we unveil the competitive advantage of physics-informed loss in AI models to address specific space needs, namely robustness, reliability, and power constraints, crucial for space exploration and satellite missions. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: Accepted at International Conference on Advanced Intelligent Mechatronics (AIM) 2024

arXiv:2405.14058 [pdf, other]

Formally Verifying Deep Reinforcement Learning Controllers with Lyapunov Barrier Certificates

Authors: Udayan Mandal, Guy Amir, Haoze Wu, Ieva Daukantas, Fletcher Lee Newell, Umberto J. Ravaioli, Baoluo Meng, Michael Durling, Milan Ganai, Tobey Shim, Guy Katz, Clark Barrett

Abstract: Deep reinforcement learning (DRL) is a powerful machine learning paradigm for generating agents that control autonomous systems. However, the "black box" nature of DRL agents limits their deployment in real-world safety-critical applications. A promising approach for providing strong guarantees on an agent's behavior is to use Neural Lyapunov Barrier (NLB) certificates, which are learned functions… ▽ More Deep reinforcement learning (DRL) is a powerful machine learning paradigm for generating agents that control autonomous systems. However, the "black box" nature of DRL agents limits their deployment in real-world safety-critical applications. A promising approach for providing strong guarantees on an agent's behavior is to use Neural Lyapunov Barrier (NLB) certificates, which are learned functions over the system whose properties indirectly imply that an agent behaves as desired. However, NLB-based certificates are typically difficult to learn and even more difficult to verify, especially for complex systems. In this work, we present a novel method for training and verifying NLB-based certificates for discrete-time systems. Specifically, we introduce a technique for certificate composition, which simplifies the verification of highly-complex systems by strategically designing a sequence of certificates. When jointly verified with neural network verification engines, these certificates provide a formal guarantee that a DRL agent both achieves its goals and avoids unsafe behavior. Furthermore, we introduce a technique for certificate filtering, which significantly simplifies the process of producing formally verified certificates. We demonstrate the merits of our approach with a case study on providing safety and liveness guarantees for a DRL-controlled spacecraft. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13473 [pdf, other]

Class-Conditional self-reward mechanism for improved Text-to-Image models

Authors: Safouane El Ghazouali, Arnaud Gucciardi, Umberto Michelucci

Abstract: Self-rewarding have emerged recently as a powerful tool in the field of Natural Language Processing (NLP), allowing language models to generate high-quality relevant responses by providing their own rewards during training. This innovative technique addresses the limitations of other methods that rely on human preferences. In this paper, we build upon the concept of self-rewarding models and intro… ▽ More Self-rewarding have emerged recently as a powerful tool in the field of Natural Language Processing (NLP), allowing language models to generate high-quality relevant responses by providing their own rewards during training. This innovative technique addresses the limitations of other methods that rely on human preferences. In this paper, we build upon the concept of self-rewarding models and introduce its vision equivalent for Text-to-Image generative AI models. This approach works by fine-tuning diffusion model on a self-generated self-judged dataset, making the fine-tuning more automated and with better data quality. The proposed mechanism makes use of other pre-trained models such as vocabulary based-object detection, image captioning and is conditioned by the a set of object for which the user might need to improve generated data quality. The approach has been implemented, fine-tuned and evaluated on stable diffusion and has led to a performance that has been evaluated to be at least 60\% better than existing commercial and research Text-to-image models. Additionally, the built self-rewarding mechanism allowed a fully automated generation of images, while increasing the visual quality of the generated images and also more efficient following of prompt instructions. The code used in this work is freely available on https://github.com/safouaneelg/SRT2I. △ Less

Submitted 25 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.10847 [pdf, other]

Model Predictive Contouring Control for Vehicle Obstacle Avoidance at the Limit of Handling Using Torque Vectoring

Authors: Alberto Bertipaglia, Davide Tavernini, Umberto Montanaro, Mohsen Alirezaei, Riender Happee, Aldo Sorniotti, Barys Shyrokau

Abstract: This paper presents an original approach to vehicle obstacle avoidance. It involves the development of a nonlinear Model Predictive Contouring Control, which uses torque vectoring to stabilise and drive the vehicle in evasive manoeuvres at the limit of handling. The proposed algorithm combines motion planning, path tracking and vehicle stability objectives, prioritising collision avoidance in emer… ▽ More This paper presents an original approach to vehicle obstacle avoidance. It involves the development of a nonlinear Model Predictive Contouring Control, which uses torque vectoring to stabilise and drive the vehicle in evasive manoeuvres at the limit of handling. The proposed algorithm combines motion planning, path tracking and vehicle stability objectives, prioritising collision avoidance in emergencies. The controller's prediction model is a nonlinear double-track vehicle model based on an extended Fiala tyre to capture the nonlinear coupled longitudinal and lateral dynamics. The controller computes the optimal steering angle and the longitudinal forces per each of the four wheels to minimise tracking error in safe situations and maximise the vehicle-to-obstacle distance in emergencies. Thanks to the optimisation of the longitudinal tyre forces, the proposed controller can produce an extra yaw moment, increasing the vehicle's lateral agility to avoid obstacles while kee** the vehicle stable. The optimal forces are constrained in the tyre friction circle not to exceed the tyres and vehicle capabilities. In a high-fidelity simulation environment, we demonstrate the benefits of torque vectoring, showing that our proposed approach is capable of successfully avoiding obstacles and kee** the vehicle stable while driving a double-lane change manoeuvre, in comparison to baselines lacking torque vectoring or collision avoidance prioritisation. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: Accepted at IEEE/ASME International Conference on Advanced Intelligent Mechatronics, Boston, USA, 2024

arXiv:2405.03543 [pdf, other]

Axiomatizing the Logic of Ordinary Discourse

Authors: Vitor Greati, Sérgio Marcelino, Umberto Rivieccio

Abstract: Most non-classical logics are subclassical, that is, every inference/theorem they validate is also valid classically. A notable exception is the three-valued propositional Logic of Ordinary Discourse (OL) proposed and extensively motivated by W. S. Cooper as a more adequate candidate for formalizing everyday reasoning (in English). OL challenges classical logic not only by rejecting some theses, b… ▽ More Most non-classical logics are subclassical, that is, every inference/theorem they validate is also valid classically. A notable exception is the three-valued propositional Logic of Ordinary Discourse (OL) proposed and extensively motivated by W. S. Cooper as a more adequate candidate for formalizing everyday reasoning (in English). OL challenges classical logic not only by rejecting some theses, but also by accepting non-classically valid principles, such as so-called Aristotle's and Boethius' theses. Formally, OL shows a number of unusual features - it is non-structural, connexive, paraconsistent and contradictory - making it all the more interesting for the mathematical logician. We present our recent findings on OL and its structural companion (that we call sOL). We introduce Hilbert-style multiple-conclusion calculi for OL and sOL that are both modular and analytic, and easily allow us to obtain single-conclusion axiomatizations. We prove that sOL is algebraizable and single out its equivalent semantics, which turns out to be a discriminator variety generated by a three-element algebra. Having observed that sOL can express the connectives of other three-valued logics, we prove that it is definitionally equivalent to an expansion of the three-valued logic J3 of D'Ottaviano and da Costa, itself an axiomatic extension of paraconsistent Nelson logic. △ Less

Submitted 6 May, 2024; originally announced May 2024.

MSC Class: 03B22 ACM Class: F.4.1

arXiv:2405.03452 [pdf]

doi 10.1098/rsta

Large Language Models (LLMs) as Agents for Augmented Democracy

Authors: Jairo Gudiño-Rosero, Umberto Grandi, César A. Hidalgo

Abstract: We explore the capabilities of an augmented democracy system built on off-the-shelf LLMs fine-tuned on data summarizing individual preferences across 67 policy proposals collected during the 2022 Brazilian presidential elections. We use a train-test cross-validation setup to estimate the accuracy with which the LLMs predict both: a subject's individual political choices and the aggregate preferenc… ▽ More We explore the capabilities of an augmented democracy system built on off-the-shelf LLMs fine-tuned on data summarizing individual preferences across 67 policy proposals collected during the 2022 Brazilian presidential elections. We use a train-test cross-validation setup to estimate the accuracy with which the LLMs predict both: a subject's individual political choices and the aggregate preferences of the full sample of participants. At the individual level, the accuracy of the out of sample predictions lie in the range 69%-76% and are significantly better at predicting the preferences of liberal and college educated participants. At the population level, we aggregate preferences using an adaptation of the Borda score and compare the ranking of policy proposals obtained from a probabilistic sample of participants and from data augmented using LLMs. We find that the augmented data predicts the preferences of the full population of participants better than probabilistic samples alone when these represent less than 30% to 40% of the total population. These results indicate that LLMs are potentially useful for the construction of systems of augmented democracy. △ Less

Submitted 7 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

Comments: 15 pages main manuscript with 3 figures. 12 pages of supplementary material

arXiv:2404.10727 [pdf, other]

How Deep Networks Learn Sparse and Hierarchical Data: the Sparse Random Hierarchy Model

Authors: Umberto Tomasini, Matthieu Wyart

Abstract: Understanding what makes high-dimensional data learnable is a fundamental question in machine learning. On the one hand, it is believed that the success of deep learning lies in its ability to build a hierarchy of representations that become increasingly more abstract with depth, going from simple features like edges to more complex concepts. On the other hand, learning to be insensitive to invari… ▽ More Understanding what makes high-dimensional data learnable is a fundamental question in machine learning. On the one hand, it is believed that the success of deep learning lies in its ability to build a hierarchy of representations that become increasingly more abstract with depth, going from simple features like edges to more complex concepts. On the other hand, learning to be insensitive to invariances of the task, such as smooth transformations for image datasets, has been argued to be important for deep networks and it strongly correlates with their performance. In this work, we aim to explain this correlation and unify these two viewpoints. We show that by introducing sparsity to generative hierarchical models of data, the task acquires insensitivity to spatial transformations that are discrete versions of smooth transformations. In particular, we introduce the Sparse Random Hierarchy Model (SRHM), where we observe and rationalize that a hierarchical representation mirroring the hierarchical model is learnt precisely when such insensitivity is learnt, thereby explaining the strong correlation between the latter and performance. Moreover, we quantify how the sample complexity of CNNs learning the SRHM depends on both the sparsity and hierarchical structure of the task. △ Less

Submitted 2 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: 9 pages, 6 figures

arXiv:2404.08245 [pdf, other]

A Distributed Approach for Persistent Homology Computation on a Large Scale

Authors: Riccardo Ceccaroni, Lorenzo Di Rocco, Umberto Ferraro Petrillo, Pierpaolo Brutti

Abstract: Persistent homology (PH) is a powerful mathematical method to automatically extract relevant insights from images, such as those obtained by high-resolution imaging devices like electron microscopes or new-generation telescopes. However, the application of this method comes at a very high computational cost, that is bound to explode more because new imaging devices generate an ever-growing amount… ▽ More Persistent homology (PH) is a powerful mathematical method to automatically extract relevant insights from images, such as those obtained by high-resolution imaging devices like electron microscopes or new-generation telescopes. However, the application of this method comes at a very high computational cost, that is bound to explode more because new imaging devices generate an ever-growing amount of data. In this paper we present PixHomology, a novel algorithm for efficiently computing $0$-dimensional PH on 2D images, optimizing memory and processing time. By leveraging the Apache Spark framework, we also present a distributed version of our algorithm with several optimized variants, able to concurrently process large batches of astronomical images. Finally, we present the results of an experimental analysis showing that our algorithm and its distributed version are efficient in terms of required memory, execution time, and scalability, consistently outperforming existing state-of-the-art PH computation tools when used to process large datasets. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2404.05351 [pdf, other]

Semi-Supervised Novelty Detection for Precise Ultra-Wideband Error Signal Prediction

Authors: Umberto Albertin, Alessandro Navone, Mauro Martini, Marcello Chiaberge

Abstract: Ultra-Wideband (UWB) technology is an emerging low-cost solution for localization in a generic environment. However, UWB signal can be affected by signal reflections and non-line-of-sight (NLoS) conditions between anchors; hence, in a broader sense, the specific geometry of the environment and the disposition of obstructing elements in the map may drastically hinder the reliability of UWB for prec… ▽ More Ultra-Wideband (UWB) technology is an emerging low-cost solution for localization in a generic environment. However, UWB signal can be affected by signal reflections and non-line-of-sight (NLoS) conditions between anchors; hence, in a broader sense, the specific geometry of the environment and the disposition of obstructing elements in the map may drastically hinder the reliability of UWB for precise robot localization. This work aims to mitigate this problem by learning a map-specific characterization of the UWB quality signal with a fingerprint semi-supervised novelty detection methodology. An unsupervised autoencoder neural network is trained on nominal UWB map conditions, and then it is used to predict errors derived from the introduction of perturbing novelties in the environment. This work poses a step change in the understanding of UWB localization and its reliability in evolving environmental conditions. The resulting performance of the proposed method is proved by fine-grained experiments obtained with a visual tracking ground truth. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.05324 [pdf, other]

Back to the Future: GNN-based NO$_2$ Forecasting via Future Covariates

Authors: Antonio Giganti, Sara Mandelli, Paolo Bestagini, Umberto Giuriato, Alessandro D'Ausilio, Marco Marcon, Stefano Tubaro

Abstract: Due to the latest environmental concerns in kee** at bay contaminants emissions in urban areas, air pollution forecasting has been rising the forefront of all researchers around the world. When predicting pollutant concentrations, it is common to include the effects of environmental factors that influence these concentrations within an extended period, like traffic, meteorological conditions and… ▽ More Due to the latest environmental concerns in kee** at bay contaminants emissions in urban areas, air pollution forecasting has been rising the forefront of all researchers around the world. When predicting pollutant concentrations, it is common to include the effects of environmental factors that influence these concentrations within an extended period, like traffic, meteorological conditions and geographical information. Most of the existing approaches exploit this information as past covariates, i.e., past exogenous variables that affected the pollutant but were not affected by it. In this paper, we present a novel forecasting methodology to predict NO$_2$ concentration via both past and future covariates. Future covariates are represented by weather forecasts and future calendar events, which are already known at prediction time. In particular, we deal with air quality observations in a city-wide network of ground monitoring stations, modeling the data structure and estimating the predictions with a Spatiotemporal Graph Neural Network (STGNN). We propose a conditioning block that embeds past and future covariates into the current observations. After extracting meaningful spatiotemporal representations, these are fused together and projected into the forecasting horizon to generate the final prediction. To the best of our knowledge, it is the first time that future covariates are included in time series predictions in a structured way. Remarkably, we find that conditioning on future weather information has a greater impact than considering past traffic conditions. We release our code implementation at https://github.com/polimi-ispl/MAGCRN. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 5 pages, 4 figures, 1 table, accepted at IEEE-IGARSS 2024

arXiv:2404.02877 [pdf, other]

FlightScope: A Deep Comprehensive Assessment of Aircraft Detection Algorithms in Satellite Imagery

Authors: Safouane El Ghazouali, Arnaud Gucciardi, Nicola Venturi, Michael Rueegsegger, Umberto Michelucci

Abstract: Object detection in remotely sensed satellite pictures is fundamental in many fields such as biophysical, and environmental monitoring. While deep learning algorithms are constantly evolving, they have been mostly implemented and tested on popular ground-based taken photos. This paper critically evaluates and compares a suite of advanced object detection algorithms customized for the task of ident… ▽ More Object detection in remotely sensed satellite pictures is fundamental in many fields such as biophysical, and environmental monitoring. While deep learning algorithms are constantly evolving, they have been mostly implemented and tested on popular ground-based taken photos. This paper critically evaluates and compares a suite of advanced object detection algorithms customized for the task of identifying aircraft within satellite imagery. Using the large HRPlanesV2 dataset, together with a rigorous validation with the GDIT dataset, this research encompasses an array of methodologies including YOLO versions 5 and 8, Faster RCNN, CenterNet, RetinaNet, RTMDet, and DETR, all trained from scratch. This exhaustive training and validation study reveal YOLOv5 as the preeminent model for the specific case of identifying airplanes from remote sensing data, showcasing high precision and adaptability across diverse imaging conditions. This research highlight the nuanced performance landscapes of these algorithms, with YOLOv5 emerging as a robust solution for aerial object detection, underlining its importance through superior mean average precision, Recall, and Intersection over Union scores. The findings described here underscore the fundamental role of algorithm selection aligned with the specific demands of satellite imagery analysis and extend a comprehensive framework to evaluate model efficacy. The benchmark toolkit and codes, available via https://github.com/toelt-llc/FlightScope_Bench, aims to further exploration and innovation in the realm of remote sensing object detection, paving the way for improved analytical methodologies in satellite imagery applications. △ Less

Submitted 1 May, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

Comments: 15 figures, 4 tables, comprehensive survey, comparative study

arXiv:2404.01397 [pdf, other]

Object-conditioned Bag of Instances for Few-Shot Personalized Instance Recognition

Authors: Umberto Michieli, Jijoong Moon, Daehyun Kim, Mete Ozay

Abstract: Nowadays, users demand for increased personalization of vision systems to localize and identify personal instances of objects (e.g., my dog rather than dog) from a few-shot dataset only. Despite outstanding results of deep networks on classical label-abundant benchmarks (e.g., those of the latest YOLOv8 model for standard object detection), they struggle to maintain within-class variability to rep… ▽ More Nowadays, users demand for increased personalization of vision systems to localize and identify personal instances of objects (e.g., my dog rather than dog) from a few-shot dataset only. Despite outstanding results of deep networks on classical label-abundant benchmarks (e.g., those of the latest YOLOv8 model for standard object detection), they struggle to maintain within-class variability to represent different instances rather than object categories only. We construct an Object-conditioned Bag of Instances (OBoI) based on multi-order statistics of extracted features, where generic object detection models are extended to search and identify personal instances from the OBoI's metric space, without need for backpropagation. By relying on multi-order statistics, OBoI achieves consistent superior accuracy in distinguishing different instances. In the results, we achieve 77.1% personal object recognition accuracy in case of 18 personal instances, showing about 12% relative gain over the state of the art. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: ICASSP 2024. Copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other

arXiv:2403.20234 [pdf, other]

Artificial Neural Networks-based Real-time Classification of ENG Signals for Implanted Nerve Interfaces

Authors: Antonio Coviello, Francesco Linsalata, Umberto Spagnolini, Maurizio Magarini

Abstract: Neuropathies are gaining higher relevance in clinical settings, as they risk permanently jeopardizing a person's life. To support the recovery of patients, the use of fully implanted devices is emerging as one of the most promising solutions. However, these devices, even if becoming an integral part of a fully complex neural nanonetwork system, pose numerous challenges. In this article, we address… ▽ More Neuropathies are gaining higher relevance in clinical settings, as they risk permanently jeopardizing a person's life. To support the recovery of patients, the use of fully implanted devices is emerging as one of the most promising solutions. However, these devices, even if becoming an integral part of a fully complex neural nanonetwork system, pose numerous challenges. In this article, we address one of them, which consists of the classification of motor/sensory stimuli. The task is performed by exploring four different types of artificial neural networks (ANNs) to extract various sensory stimuli from the electroneurographic (ENG) signal measured in the sciatic nerve of rats. Different sizes of the data sets are considered to analyze the feasibility of the investigated ANNs for real-time classification through a comparison of their performance in terms of accuracy, F1-score, and prediction time. The design of the ANNs takes advantage of the modelling of the ENG signal as a multiple-input multiple-output (MIMO) system to describe the measures taken by state-of-the-art implanted nerve interfaces. These are based on the use of multi-contact cuff electrodes to achieve nanoscale spatial discrimination of the nerve activity. The MIMO ENG signal model is another contribution of this paper. Our results show that some ANNs are more suitable for real-time applications, being capable of achieving accuracies over $90\%$ for signal windows of $100$ and $200\,$ms with a low enough processing time to be effective for pathology recovery. △ Less

Submitted 2 April, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

arXiv:2403.14335 [pdf, other]

FFT-based Selection and Optimization of Statistics for Robust Recognition of Severely Corrupted Images

Authors: Elena Camuffo, Umberto Michieli, Jijoong Moon, Daehyun Kim, Mete Ozay

Abstract: Improving model robustness in case of corrupted images is among the key challenges to enable robust vision systems on smart devices, such as robotic agents. Particularly, robust test-time performance is imperative for most of the applications. This paper presents a novel approach to improve robustness of any classification model, especially on severely corrupted images. Our method (FROST) employs… ▽ More Improving model robustness in case of corrupted images is among the key challenges to enable robust vision systems on smart devices, such as robotic agents. Particularly, robust test-time performance is imperative for most of the applications. This paper presents a novel approach to improve robustness of any classification model, especially on severely corrupted images. Our method (FROST) employs high-frequency features to detect input image corruption type, and select layer-wise feature normalization statistics. FROST provides the state-of-the-art results for different models and datasets, outperforming competitors on ImageNet-C by up to 37.1% relative gain, improving baseline of 40.9% mCE on severe corruptions. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: ICASSP 2024. Copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other

arXiv:2403.12188 [pdf, other]

PETScML: Second-order solvers for training regression problems in Scientific Machine Learning

Authors: Stefano Zampini, Umberto Zerbinati, George Turkiyyah, David Keyes

Abstract: In recent years, we have witnessed the emergence of scientific machine learning as a data-driven tool for the analysis, by means of deep-learning techniques, of data produced by computational science and engineering applications. At the core of these methods is the supervised training algorithm to learn the neural network realization, a highly non-convex optimization problem that is usually solved… ▽ More In recent years, we have witnessed the emergence of scientific machine learning as a data-driven tool for the analysis, by means of deep-learning techniques, of data produced by computational science and engineering applications. At the core of these methods is the supervised training algorithm to learn the neural network realization, a highly non-convex optimization problem that is usually solved using stochastic gradient methods. However, distinct from deep-learning practice, scientific machine-learning training problems feature a much larger volume of smooth data and better characterizations of the empirical risk functions, which make them suited for conventional solvers for unconstrained optimization. We introduce a lightweight software framework built on top of the Portable and Extensible Toolkit for Scientific computation to bridge the gap between deep-learning software and conventional solvers for unconstrained minimization. We empirically demonstrate the superior efficacy of a trust region method based on the Gauss-Newton approximation of the Hessian in improving the generalization errors arising from regression tasks when learning surrogate models for a wide range of scientific machine-learning techniques and test cases. All the conventional second-order solvers tested, including L-BFGS and inexact Newton with line-search, compare favorably, either in terms of cost or accuracy, with the adaptive first-order methods used to validate the surrogate models. △ Less

Submitted 18 March, 2024; originally announced March 2024.

MSC Class: 65K10; 68T07; 65M70; 65Y05 ACM Class: I.2.5; D.2.m; G.4; G.1.6; J.2

arXiv:2403.12059 [pdf, other]

A Thorough Analysis of Radio Resource Assignment for UAV-Enhanced Vehicular Sidelink Communications

Authors: Francesca Conserva, Francesco Linsalata, Marouan Mizmizi, Maurizio Magarini, Umberto Spagnolini, Roberto Verdone, Chiara Buratti

Abstract: The rapid expansion of connected and autonomous vehicles (CAVs) and the shift towards millimiter-wave (mmWave) frequencies offer unprecedented opportunities to enhance road safety and traffic efficiency. Sidelink communication, enabling direct Vehicle-to-Vehicle (V2V) communications, play a pivotal role in this transformation. As communication technologies transit to higher frequencies, the associ… ▽ More The rapid expansion of connected and autonomous vehicles (CAVs) and the shift towards millimiter-wave (mmWave) frequencies offer unprecedented opportunities to enhance road safety and traffic efficiency. Sidelink communication, enabling direct Vehicle-to-Vehicle (V2V) communications, play a pivotal role in this transformation. As communication technologies transit to higher frequencies, the associated increase in bandwidth comes at the cost of a severe path and penetration loss. In response to these challenges, we investigate a network configuration that deploys beamforming-capable Unmanned Aerial Vehicles (UAVs) as relay nodes. In this work, we present a comprehensive analytical framework with a groundbreaking performance metric, i.e. average access probability, that quantifies user satisfaction, considering factors across different protocol stack layers. Additionally, we introduce two Radio Resources Assignment (RRA) methods tailored for UAVs. These methods consider parameters such as resource availability, vehicle distribution, and latency requirements. Through our analytical approach, we optimize the average access probability by controlling UAV altitude based on traffic density. Our numerical findings validate the proposed model and strategy, which ensures that Quality of Service (QoS) standards are met in the domain of Vehicle-to-Anything (V2X) sidelink communications. △ Less

Submitted 19 January, 2024; originally announced March 2024.

arXiv:2403.10502 [pdf, other]

Belief Change based on Knowledge Measures

Authors: Umberto Straccia, Giovanni Casini

Abstract: Knowledge Measures (KMs) aim at quantifying the amount of knowledge/information that a knowledge base carries. On the other hand, Belief Change (BC) is the process of changing beliefs (in our case, in terms of contraction, expansion and revision) taking into account a new piece of knowledge, which possibly may be in contradiction with the current belief. We propose a new quantitative BC framework… ▽ More Knowledge Measures (KMs) aim at quantifying the amount of knowledge/information that a knowledge base carries. On the other hand, Belief Change (BC) is the process of changing beliefs (in our case, in terms of contraction, expansion and revision) taking into account a new piece of knowledge, which possibly may be in contradiction with the current belief. We propose a new quantitative BC framework that is based on KMs by defining belief change operators that try to minimise, from an information-theoretic point of view, the surprise that the changed belief carries. To this end, we introduce the principle of minimal surprise. In particular, our contributions are (i) a general information-theoretic approach to KMs for which [1] is a special case; (ii) KM-based BC operators that satisfy the so-called AGM postulates; and (iii) a characterisation of any BC operator that satisfies the AGM postulates as a KM-based BC operator, i.e., any BC operator satisfying the AGM postulates can be encoded within our quantitative BC framework. We also introduce quantitative measures that account for the information loss of contraction, information gain of expansion and information change of revision. We also give a succinct look into the problem of iterated revision, which deals with the application of a sequence of revision operations in our framework, and also illustrate how one may build from our KM-based contraction operator also one not satisfying the (in)famous recovery postulate, by focusing on the so-called severe withdrawal model as an illustrative example. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: 48 pages, 3 figures, preprint

arXiv:2403.07454 [pdf, other]

Fast, accurate and lightweight sequential simulation-based inference using Gaussian locally linear map**s

Authors: Henrik Häggström, Pedro L. C. Rodrigues, Geoffroy Oudoumanessah, Florence Forbes, Umberto Picchini

Abstract: Bayesian inference for complex models with an intractable likelihood can be tackled using algorithms performing many calls to computer simulators. These approaches are collectively known as "simulation-based inference" (SBI). Recent SBI methods have made use of neural networks (NN) to provide approximate, yet expressive constructs for the unavailable likelihood function and the posterior distribut… ▽ More Bayesian inference for complex models with an intractable likelihood can be tackled using algorithms performing many calls to computer simulators. These approaches are collectively known as "simulation-based inference" (SBI). Recent SBI methods have made use of neural networks (NN) to provide approximate, yet expressive constructs for the unavailable likelihood function and the posterior distribution. However, the trade-off between accuracy and computational demand leaves much space for improvement. In this work, we propose an alternative that provides both approximations to the likelihood and the posterior distribution, using structured mixtures of probability distributions. Our approach produces accurate posterior inference when compared to state-of-the-art NN-based SBI methods, even for multimodal posteriors, while exhibiting a much smaller computational footprint. We illustrate our results on several benchmark models from the SBI literature and on a biological model of the translation kinetics after mRNA transfection. △ Less

Submitted 22 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: 69 pages, 66 figures: new case study added (Biological model of the translation kinetics after mRNA transfection)

arXiv:2403.05372 [pdf, ps, other]

Limit Laws for Critical Dispersion on Complete Graphs

Authors: Umberto De Ambroggio, Tamás Makai, Konstantinos Panagiotou, Annika Steibel

Abstract: We consider a synchronous process of particles moving on the vertices of a graph $G$, introduced by Cooper, McDowell, Radzik, Rivera and Shiraga (2018). Initially, $M$ particles are placed on a vertex of $G$. In subsequent time steps, all particles that are located on a vertex inhabited by at least two particles jump independently to a neighbour chosen uniformly at random. The process ends at the… ▽ More We consider a synchronous process of particles moving on the vertices of a graph $G$, introduced by Cooper, McDowell, Radzik, Rivera and Shiraga (2018). Initially, $M$ particles are placed on a vertex of $G$. In subsequent time steps, all particles that are located on a vertex inhabited by at least two particles jump independently to a neighbour chosen uniformly at random. The process ends at the first step when no vertex is inhabited by more than one particle; we call this (random) time step the dispersion time. In this work we study the case where $G$ is the complete graph on $n$ vertices and the number of particles is $M=n/2+αn^{1/2} + o(n^{1/2})$, $α\in \mathbb{R}$. This choice of $M$ corresponds to the critical window of the process, with respect to the dispersion time. We show that the dispersion time, if rescaled by $n^{-1/2}$, converges in $p$-th mean, as $n\rightarrow \infty$ and for any $p \in \mathbb{R}$, to a continuous and almost surely positive random variable $T_α$. We find that $T_α$ is the absorption time of a standard logistic branching process, thoroughly investigated by Lambert (2005), and we determine its expectation. In particular, in the middle of the critical window we show that $\mathbb{E}[T_0] = π^{3/2}/\sqrt{7}$, and furthermore we formulate explicit asymptotics when $|α|$ gets large that quantify the transition into and out of the critical window. We also study the (random) total number of jumps that are performed by the particles until the dispersion time is reached. In particular, we prove that it centers around $\frac{2}{7}n\ln n$ and that it has variations linear in $n$, whose distribution we can describe explicitly. △ Less

Submitted 24 April, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

Comments: 37 pages 2 figures

arXiv:2403.00175 [pdf, other]

doi 10.3390/s24092889

FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything

Authors: Safouane El Ghazouali, Youssef Mhirit, Ali Oukhrid, Umberto Michelucci, Hichem Nouira

Abstract: In the realm of computer vision, the integration of advanced techniques into the processing of RGB-D camera inputs poses a significant challenge, given the inherent complexities arising from diverse environmental conditions and varying object appearances. Therefore, this paper introduces FusionVision, an exhaustive pipeline adapted for the robust 3D segmentation of objects in RGB-D imagery. Tradit… ▽ More In the realm of computer vision, the integration of advanced techniques into the processing of RGB-D camera inputs poses a significant challenge, given the inherent complexities arising from diverse environmental conditions and varying object appearances. Therefore, this paper introduces FusionVision, an exhaustive pipeline adapted for the robust 3D segmentation of objects in RGB-D imagery. Traditional computer vision systems face limitations in simultaneously capturing precise object boundaries and achieving high-precision object detection on depth map as they are mainly proposed for RGB cameras. To address this challenge, FusionVision adopts an integrated approach by merging state-of-the-art object detection techniques, with advanced instance segmentation methods. The integration of these components enables a holistic (unified analysis of information obtained from both color \textit{RGB} and depth \textit{D} channels) interpretation of RGB-D data, facilitating the extraction of comprehensive and accurate object information. The proposed FusionVision pipeline employs YOLO for identifying objects within the RGB image domain. Subsequently, FastSAM, an innovative semantic segmentation model, is applied to delineate object boundaries, yielding refined segmentation masks. The synergy between these components and their integration into 3D scene understanding ensures a cohesive fusion of object detection and segmentation, enhancing overall precision in 3D object segmentation. The code and pre-trained models are publicly available at https://github.com/safouaneelg/FusionVision/. △ Less

Submitted 1 May, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

Comments: 14 pages, 9 figures, 1 table

Journal ref: Sensors 2024

arXiv:2402.18614 [pdf, other]

Deep Neural Network Models Trained With A Fixed Random Classifier Transfer Better Across Domains

Authors: Hafiz Tiomoko Ali, Umberto Michieli, Ji Joong Moon, Daehyun Kim, Mete Ozay

Abstract: The recently discovered Neural collapse (NC) phenomenon states that the last-layer weights of Deep Neural Networks (DNN), converge to the so-called Equiangular Tight Frame (ETF) simplex, at the terminal phase of their training. This ETF geometry is equivalent to vanishing within-class variability of the last layer activations. Inspired by NC properties, we explore in this paper the transferability… ▽ More The recently discovered Neural collapse (NC) phenomenon states that the last-layer weights of Deep Neural Networks (DNN), converge to the so-called Equiangular Tight Frame (ETF) simplex, at the terminal phase of their training. This ETF geometry is equivalent to vanishing within-class variability of the last layer activations. Inspired by NC properties, we explore in this paper the transferability of DNN models trained with their last layer weight fixed according to ETF. This enforces class separation by eliminating class covariance information, effectively providing implicit regularization. We show that DNN models trained with such a fixed classifier significantly improve transfer performance, particularly on out-of-domain datasets. On a broad range of fine-grained image classification datasets, our approach outperforms i) baseline methods that do not perform any covariance regularization (up to 22%), as well as ii) methods that explicitly whiten covariance of activations throughout training (up to 19%). Our findings suggest that DNNs trained with fixed ETF classifiers offer a powerful mechanism for improving transfer learning across domains. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: ICASSP 2024. Copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other

arXiv:2402.18449 [pdf, other]

HOP to the Next Tasks and Domains for Continual Learning in NLP

Authors: Umberto Michieli, Mete Ozay

Abstract: Continual Learning (CL) aims to learn a sequence of problems (i.e., tasks and domains) by transferring knowledge acquired on previous problems, whilst avoiding forgetting of past ones. Different from previous approaches which focused on CL for one NLP task or domain in a specific use-case, in this paper, we address a more general CL setting to learn from a sequence of problems in a unique framewor… ▽ More Continual Learning (CL) aims to learn a sequence of problems (i.e., tasks and domains) by transferring knowledge acquired on previous problems, whilst avoiding forgetting of past ones. Different from previous approaches which focused on CL for one NLP task or domain in a specific use-case, in this paper, we address a more general CL setting to learn from a sequence of problems in a unique framework. Our method, HOP, permits to hop across tasks and domains by addressing the CL problem along three directions: (i) we employ a set of adapters to generalize a large pre-trained model to unseen problems, (ii) we compute high-order moments over the distribution of embedded representations to distinguish independent and correlated statistics across different tasks and domains, (iii) we process this enriched information with auxiliary heads specialized for each end problem. Extensive experimental campaign on 4 NLP applications, 5 benchmarks and 2 CL setups demonstrates the effectiveness of our HOP. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: AAAI 2024. Main + supplmentary

arXiv:2402.18402 [pdf, other]

doi 10.1145/3625468.3647623

A Modular System for Enhanced Robustness of Multimedia Understanding Networks via Deep Parametric Estimation

Authors: Francesco Barbato, Umberto Michieli, Mehmet Kerim Yucel, Pietro Zanuttigh, Mete Ozay

Abstract: In multimedia understanding tasks, corrupted samples pose a critical challenge, because when fed to machine learning models they lead to performance degradation. In the past, three groups of approaches have been proposed to handle noisy data: i) enhancer and denoiser modules to improve the quality of the noisy data, ii) data augmentation approaches, and iii) domain adaptation strategies. All the a… ▽ More In multimedia understanding tasks, corrupted samples pose a critical challenge, because when fed to machine learning models they lead to performance degradation. In the past, three groups of approaches have been proposed to handle noisy data: i) enhancer and denoiser modules to improve the quality of the noisy data, ii) data augmentation approaches, and iii) domain adaptation strategies. All the aforementioned approaches come with drawbacks that limit their applicability; the first has high computational costs and requires pairs of clean-corrupted data for training, while the others only allow deployment of the same task/network they were trained on (\ie, when upstream and downstream task/network are the same). In this paper, we propose SyMPIE to solve these shortcomings. To this end, we design a small, modular, and efficient (just 2GFLOPs to process a Full HD image) system to enhance input data for robust downstream multimedia understanding with minimal computational cost. Our SyMPIE is pre-trained on an upstream task/network that should not match the downstream ones and does not need paired clean-corrupted samples. Our key insight is that most input corruptions found in real-world tasks can be modeled through global operations on color channels of images or spatial filters with small kernels. We validate our approach on multiple datasets and tasks, such as image classification (on ImageNetC, ImageNetC-Bar, VizWiz, and a newly proposed mixed corruption benchmark named ImageNetC-mixed) and semantic segmentation (on Cityscapes, ACDC, and DarkZurich) with consistent improvements of about 5\% relative accuracy gain across the board. The code of our approach and the new ImageNetC-mixed benchmark will be made available upon publication. △ Less

Submitted 29 February, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: Accepted at ACM MMSys'24. 10 pages, 7 figures, 8 tables

arXiv:2402.13918 [pdf, other]

BenchCloudVision: A Benchmark Analysis of Deep Learning Approaches for Cloud Detection and Segmentation in Remote Sensing Imagery

Authors: Loddo Fabio, Dario Piga, Michelucci Umberto, El Ghazouali Safouane

Abstract: Satellites equipped with optical sensors capture high-resolution imagery, providing valuable insights into various environmental phenomena. In recent years, there has been a surge of research focused on addressing some challenges in remote sensing, ranging from water detection in diverse landscapes to the segmentation of mountainous and terrains. Ongoing investigations goals to enhance the precisi… ▽ More Satellites equipped with optical sensors capture high-resolution imagery, providing valuable insights into various environmental phenomena. In recent years, there has been a surge of research focused on addressing some challenges in remote sensing, ranging from water detection in diverse landscapes to the segmentation of mountainous and terrains. Ongoing investigations goals to enhance the precision and efficiency of satellite imagery analysis. Especially, there is a growing emphasis on develo** methodologies for accurate water body detection, snow and clouds, important for environmental monitoring, resource management, and disaster response. Within this context, this paper focus on the cloud segmentation from remote sensing imagery. Accurate remote sensing data analysis can be challenging due to the presence of clouds in optical sensor-based applications. The quality of resulting products such as applications and research is directly impacted by cloud detection, which plays a key role in the remote sensing data processing pipeline. This paper examines seven cutting-edge semantic segmentation and detection algorithms applied to clouds identification, conducting a benchmark analysis to evaluate their architectural approaches and identify the most performing ones. To increase the model's adaptability, critical elements including the type of imagery and the amount of spectral bands used during training are analyzed. Additionally, this research tries to produce machine learning algorithms that can perform cloud segmentation using only a few spectral bands, including RGB and RGBN-IR combinations. The model's flexibility for a variety of applications and user scenarios is assessed by using imagery from Sentinel-2 and Landsat-8 as datasets. This benchmark can be reproduced using the material from this github link: https://github.com/toelt-llc/cloud_segmentation_comparative. △ Less

Submitted 1 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: Submitted to Expert Systems and Applications. Under license CC-BY-NC-ND

arXiv:2402.13768 [pdf, other]

Democratizing Uncertainty Quantification

Authors: Linus Seelinger, Anne Reinarz, Mikkel B. Lykkegaard, Robert Akers, Amal M. A. Alghamdi, David Aristoff, Wolfgang Bangerth, Jean Bénézech, Matteo Diez, Kurt Frey, John D. Jakeman, Jakob S. Jørgensen, Ki-Tae Kim, Massimiliano Martinelli, Matthew Parno, Riccardo Pellegrini, Noemi Petra, Nicolai A. B. Riis, Katherine Rosenfeld, Andrea Serani, Lorenzo Tamellini, Umberto Villa, Tim J. Dodwell, Robert Scheichl

Abstract: Uncertainty Quantification (UQ) is vital to safety-critical model-based analyses, but the widespread adoption of sophisticated UQ methods is limited by technical complexity. In this paper, we introduce UM-Bridge (the UQ and Modeling Bridge), a high-level abstraction and software protocol that facilitates universal interoperability of UQ software with simulation codes. It breaks down the technical… ▽ More Uncertainty Quantification (UQ) is vital to safety-critical model-based analyses, but the widespread adoption of sophisticated UQ methods is limited by technical complexity. In this paper, we introduce UM-Bridge (the UQ and Modeling Bridge), a high-level abstraction and software protocol that facilitates universal interoperability of UQ software with simulation codes. It breaks down the technical complexity of advanced UQ applications and enables separation of concerns between experts. UM-Bridge democratizes UQ by allowing effective interdisciplinary collaboration, accelerating the development of advanced UQ methods, and making it easy to perform UQ analyses from prototype to High Performance Computing (HPC) scale. In addition, we present a library of ready-to-run UQ benchmark problems, all easily accessible through UM-Bridge. These benchmarks support UQ methodology research, enabling reproducible performance comparisons. We demonstrate UM-Bridge with several scientific applications, harnessing HPC resources even using UQ codes not designed with HPC support. △ Less

Submitted 9 May, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: Fix broken reference, add R. Akers as author for contribution to tritium benchmark

arXiv:2402.10427 [pdf, other]

Evaluating and Improving Continual Learning in Spoken Language Understanding

Authors: Muqiao Yang, Xiang Li, Umberto Cappellazzo, Shinji Watanabe, Bhiksha Raj

Abstract: Continual learning has emerged as an increasingly important challenge across various tasks, including Spoken Language Understanding (SLU). In SLU, its objective is to effectively handle the emergence of new concepts and evolving environments. The evaluation of continual learning algorithms typically involves assessing the model's stability, plasticity, and generalizability as fundamental aspects o… ▽ More Continual learning has emerged as an increasingly important challenge across various tasks, including Spoken Language Understanding (SLU). In SLU, its objective is to effectively handle the emergence of new concepts and evolving environments. The evaluation of continual learning algorithms typically involves assessing the model's stability, plasticity, and generalizability as fundamental aspects of standards. However, existing continual learning metrics primarily focus on only one or two of the properties. They neglect the overall performance across all tasks, and do not adequately disentangle the plasticity versus stability/generalizability trade-offs within the model. In this work, we propose an evaluation methodology that provides a unified evaluation on stability, plasticity, and generalizability in continual learning. By employing the proposed metric, we demonstrate how introducing various knowledge distillations can improve different aspects of these three properties of the SLU model. We further show that our proposed metric is more sensitive in capturing the impact of task ordering in continual learning, making it better suited for practical use-case scenarios. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.03084 [pdf, ps, other]

New constructions of MSRD codes

Authors: Umberto Martínez-Peñas

Abstract: In this work, we provide four methods for constructing new maximum sum-rank distance (MSRD) codes. The first method, a variant of cartesian products, allows faster decoding than known MSRD codes of the same parameters. The other three methods allow us to extend or modify existing MSRD codes in order to obtain new explicit MSRD codes for sets of matrix sizes (numbers of rows and columns in differen… ▽ More In this work, we provide four methods for constructing new maximum sum-rank distance (MSRD) codes. The first method, a variant of cartesian products, allows faster decoding than known MSRD codes of the same parameters. The other three methods allow us to extend or modify existing MSRD codes in order to obtain new explicit MSRD codes for sets of matrix sizes (numbers of rows and columns in different blocks) that were not attainable by previous constructions. In this way, we show that MSRD codes exist (by giving explicit constructions) for new ranges of parameters, in particular with different numbers of rows and columns at different positions. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2402.02606 [pdf, other]

doi 10.1080/11663081.2024.2336386

Nelson algebras, residuated lattices and rough sets: A survey

Authors: Jouni Järvinen, Sándor Radeleczki, Umberto Rivieccio

Abstract: Over the past 50 years, Nelson algebras have been extensively studied by distinguished scholars as the algebraic counterpart of Nelson's constructive logic with strong negation. Despite these studies, a comprehensive survey of the topic is currently lacking, and the theory of Nelson algebras remains largely unknown to most logicians. This paper aims to fill this gap by focussing on the essential d… ▽ More Over the past 50 years, Nelson algebras have been extensively studied by distinguished scholars as the algebraic counterpart of Nelson's constructive logic with strong negation. Despite these studies, a comprehensive survey of the topic is currently lacking, and the theory of Nelson algebras remains largely unknown to most logicians. This paper aims to fill this gap by focussing on the essential developments in the field over the past two decades. Additionally, we explore generalisations of Nelson algebras, such as N4-lattices which correspond to the paraconsistent version of Nelson's logic, as well as their applications to other areas of interest to logicians, such as duality and rough set theory. A general representation theorem states that each Nelson algebra is isomorphic to a subalgebra of a rough set-based Nelson algebra induced by a quasiorder. Furthermore, a formula is a theorem of Nelson logic if and only if it is valid in every finite Nelson algebra induced by a quasiorder. △ Less

Submitted 2 March, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

Comments: Accepted for publication in Journal of Applied Non-Classical Logics. In this version of the manuscript, certain typographical errors have been rectified

Journal ref: Journal of Applied Non-Classical Logics (2024)

arXiv:2402.00828 [pdf, other]

Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters

Authors: Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti

Abstract: Mixture of Experts (MoE) architectures have recently started burgeoning due to their ability to scale model's capacity while maintaining the computational cost affordable. Furthermore, they can be applied to both Transformers and State Space Models, the current state-of-the-art models in numerous fields. While MoE has been mostly investigated for the pre-training stage, its use in parameter-effici… ▽ More Mixture of Experts (MoE) architectures have recently started burgeoning due to their ability to scale model's capacity while maintaining the computational cost affordable. Furthermore, they can be applied to both Transformers and State Space Models, the current state-of-the-art models in numerous fields. While MoE has been mostly investigated for the pre-training stage, its use in parameter-efficient transfer learning settings is under-explored. To narrow this gap, this paper attempts to demystify the use of MoE for parameter-efficient fine-tuning of Audio Spectrogram Transformers to audio and speech downstream tasks. Specifically, we propose Soft Mixture of Adapters (Soft-MoA). It exploits adapters as the experts and, leveraging the recent Soft MoE method, it relies on a soft assignment between the input tokens and experts to keep the computational time limited. Extensive experiments across 4 benchmarks demonstrate that Soft-MoA outperforms the single adapter method and performs on par with the dense MoA counterpart. We finally present ablation studies on key elements of Soft-MoA, showing for example that Soft-MoA achieves better scaling with more experts, as well as ensuring that all experts contribute to the computation of the output tokens, thus dispensing with the expert imbalance issue. △ Less

Submitted 4 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

Comments: Accepted at INTERSPEECH 2024. The code is publicly available at: https://github.com/umbertocappellazzo/PETL_AST

arXiv:2401.16863 [pdf, other]

Enabling the Digital Democratic Revival: A Research Program for Digital Democracy

Authors: Davide Grossi, Ulrike Hahn, Michael Mäs, Andreas Nitsche, Jan Behrens, Niclas Boehmer, Markus Brill, Ulle Endriss, Umberto Grandi, Adrian Haret, Jobst Heitzig, Nicolien Janssens, Catholijn M. Jonker, Marijn A. Keijzer, Axel Kistner, Martin Lackner, Alexandra Lieben, Anna Mikhaylovskaya, Pradeep K. Murukannaiah, Carlo Proietti, Manon Revel, Élise Rouméas, Ehud Shapiro, Gogulapati Sreedurga, Björn Swierczek , et al. (4 additional authors not shown)

Abstract: This white paper outlines a long-term scientific vision for the development of digital-democracy technology. We contend that if digital democracy is to meet the ambition of enabling a participatory renewal in our societies, then a comprehensive multi-methods research effort is required that could, over the years, support its development in a democratically principled, empirically and computational… ▽ More This white paper outlines a long-term scientific vision for the development of digital-democracy technology. We contend that if digital democracy is to meet the ambition of enabling a participatory renewal in our societies, then a comprehensive multi-methods research effort is required that could, over the years, support its development in a democratically principled, empirically and computationally informed way. The paper is co-authored by an international and interdisciplinary team of researchers and arose from the Lorentz Center Workshop on ``Algorithmic Technology for Democracy'' (Leiden, October 2022). △ Less

Submitted 30 January, 2024; originally announced January 2024.

arXiv:2401.13573 [pdf, other]

Distributed matrix multiplication with straggler tolerance using algebraic function fields

Authors: Adrián Fidalgo-Díaz, Umberto Martínez-Peñas

Abstract: The problem of straggler mitigation in distributed matrix multiplication (DMM) is considered for a large number of worker nodes and a fixed small finite field. Polynomial codes and matdot codes are generalized by making use of algebraic function fields (i.e., algebraic functions over an algebraic curve) over a finite field. The construction of optimal solutions is translated to a combinatorial pro… ▽ More The problem of straggler mitigation in distributed matrix multiplication (DMM) is considered for a large number of worker nodes and a fixed small finite field. Polynomial codes and matdot codes are generalized by making use of algebraic function fields (i.e., algebraic functions over an algebraic curve) over a finite field. The construction of optimal solutions is translated to a combinatorial problem on the Weierstrass semigroups of the corresponding algebraic curves. Optimal or almost optimal solutions are provided. These have the same computational complexity per worker as classical polynomial and matdot codes, and their recovery thresholds are almost optimal in the asymptotic regime (growing number of workers and a fixed finite field). △ Less

Submitted 24 January, 2024; originally announced January 2024.

arXiv:2401.12801 [pdf, other]

Deep Learning-based Target-To-User Association in Integrated Sensing and Communication Systems

Authors: Lorenzo Cazzella, Marouan Mizmizi, Dario Tagliaferri, Damiano Badini, Matteo Matteucci, Umberto Spagnolini

Abstract: In Integrated Sensing and Communication (ISAC) systems, matching the radar targets with communication user equipments (UEs) is functional to several communication tasks, such as proactive handover and beam prediction. In this paper, we consider a radar-assisted communication system where a base station (BS) is equipped with a multiple-input-multiple-output (MIMO) radar that has a double aim: (i) a… ▽ More In Integrated Sensing and Communication (ISAC) systems, matching the radar targets with communication user equipments (UEs) is functional to several communication tasks, such as proactive handover and beam prediction. In this paper, we consider a radar-assisted communication system where a base station (BS) is equipped with a multiple-input-multiple-output (MIMO) radar that has a double aim: (i) associate vehicular radar targets to vehicular equipments (VEs) in the communication beamspace and (ii) predict the beamforming vector for each VE from radar data. The proposed target-to-user (T2U) association consists of two stages. First, vehicular radar targets are detected from range-angle images, and, for each, a beamforming vector is estimated. Then, the inferred per-target beamforming vectors are matched with the ones utilized at the BS for communication to perform target-to-user (T2U) association. Joint multi-target detection and beam inference is obtained by modifying the you only look once (YOLO) model, which is trained over simulated range-angle radar images. Simulation results over different urban vehicular mobility scenarios show that the proposed T2U method provides a probability of correct association that increases with the size of the BS antenna array, highlighting the respective increase of the separability of the VEs in the beamspace. Moreover, we show that the modified YOLO architecture can effectively perform both beam prediction and radar target detection, with similar performance in mean average precision on the latter over different antenna array sizes. △ Less

Submitted 10 January, 2024; originally announced January 2024.

arXiv:2401.11814 [pdf, ps, other]

Symbrain: A large-scale dataset of MRI images for neonatal brain symmetry analysis

Authors: Arnaud Gucciardi, Safouane El Ghazouali, Francesca Venturini, Vida Groznik, Umberto Michelucci

Abstract: This paper presents an annotated dataset of brain MRI images designed to advance the field of brain symmetry study. Magnetic resonance imaging (MRI) has gained interest in analyzing brain symmetry in neonatal infants, and challenges remain due to the vast size differences between fetal and adult brains. Classification methods for brain structural MRI use scales and visual cues to assess hemisphere… ▽ More This paper presents an annotated dataset of brain MRI images designed to advance the field of brain symmetry study. Magnetic resonance imaging (MRI) has gained interest in analyzing brain symmetry in neonatal infants, and challenges remain due to the vast size differences between fetal and adult brains. Classification methods for brain structural MRI use scales and visual cues to assess hemisphere symmetry, which can help diagnose neonatal patients by comparing hemispheres and anatomical regions of interest in the brain. Using the Develo** Human Connectome Project dataset, this work presents a dataset comprising cerebral images extracted as slices across selected portions of interest for clinical evaluation . All the extracted images are annotated with the brain's midline. All the extracted images are annotated with the brain's midline. From the assumption that a decrease in symmetry is directly related to possible clinical pathologies, the dataset can contribute to a more precise diagnosis because it can be used to train deep learning model application in neonatal cerebral MRI anomaly detection from postnatal infant scans thanks to computer vision. Such models learn to identify and classify anomalies by identifying potential asymmetrical patterns in medical MRI images. Furthermore, this dataset can contribute to the research and development of methods using the relative symmetry of the two brain hemispheres for crucial diagnosis and treatment planning. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: 7 pages, 2 figures, Dataset Paper, Medical AI

arXiv:2401.05194 [pdf, other]

Modelling, Positioning, and Deep Reinforcement Learning Path Tracking Control of Scaled Robotic Vehicles: Design and Experimental Validation

Authors: Carmine Caponio, Pietro Stano, Raffaele Carli, Ignazio Olivieri, Daniele Ragone, Aldo Sorniotti, Umberto Montanaro

Abstract: Mobile robotic systems are becoming increasingly popular. These systems are used in various indoor applications, raging from warehousing and manufacturing to test benches for assessment of advanced control strategies, such as artificial intelligence (AI)-based control solutions, just to name a few. Scaled robotic cars are commonly equipped with a hierarchical control acthiecture that includes task… ▽ More Mobile robotic systems are becoming increasingly popular. These systems are used in various indoor applications, raging from warehousing and manufacturing to test benches for assessment of advanced control strategies, such as artificial intelligence (AI)-based control solutions, just to name a few. Scaled robotic cars are commonly equipped with a hierarchical control acthiecture that includes tasks dedicated to vehicle state estimation and control. This paper covers both aspects by proposing (i) a federeted extended Kalman filter (FEKF), and (ii) a novel deep reinforcement learning (DRL) path tracking controller trained via an expert demonstrator to expedite the learning phase and increase robustess to the simulation-to-reality gap. The paper also presents the formulation of a vehicle model along with an effective yet simple procedure for identifying tis paramters. The experimentally validated model is used for (i) supporting the design of the FEKF and (ii) serving as a digital twin for training the proposed DRL-based path tracking algorithm. Experimental results confirm the ability of the FEKF to improve the estimate of the mobile robot's position. Furthermore, the effectiveness of the DRL path tracking strateguy is experimentally tested along manoeuvres not considered during training, showing also the ability of the AI-based solution to outpeform model-based control strategies and the demonstrator. The comparison with benchmraking controllers is quantitavely evalueted through a set of key performance indicators. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: Under review on IEEE Transactions

arXiv:2401.03274 [pdf, ps, other]

Generating proof systems for three-valued propositional logics

Authors: Vitor Greati, Giuseppe Greco, Sérgio Marcelino, Alessandra Palmigiano, Umberto Rivieccio

Abstract: In general, providing an axiomatization for an arbitrary logic is a task that may require some ingenuity. In the case of logics defined by a finite logical matrix (three-valued logics being a particularly simple example), the generation of suitable finite axiomatizations can be completely automatized, essentially by expressing the matrix tables via inference rules. In this chapter we illustrate ho… ▽ More In general, providing an axiomatization for an arbitrary logic is a task that may require some ingenuity. In the case of logics defined by a finite logical matrix (three-valued logics being a particularly simple example), the generation of suitable finite axiomatizations can be completely automatized, essentially by expressing the matrix tables via inference rules. In this chapter we illustrate how two formalisms, the 3-labelled calculi of Baaz, Fermüller and Zach and the multiple-conclusion (or Set-Set) Hilbert-style calculi of Shoesmith and Smiley, may be uniformly employed to axiomatize logics defined by a three-valued logical matrix. The generating procedure common to both formalisms can be described as follows: first (i) convert the matrix semantics into rule form (we refer to this step as the generating subprocedure) and then (ii) simplify the set of rules thus obtained, essentially relying on the defining properties of any Tarskian consequence relation (we refer to this step as the streamlining subprocedure). We illustrate through some examples that, if a minimal expressiveness assumption is met (namely, if the matrix defining the logic is monadic), then it is straightforward to define effective translations guaranteeing the equivalence between the 3-labelled and the Set-Set approach. △ Less

Submitted 6 January, 2024; originally announced January 2024.

MSC Class: 03B50 ACM Class: F.4.1

arXiv:2401.03265 [pdf, other]

doi 10.1007/s11225-023-10079-w

Finite Hilbert systems for Weak Kleene logics

Authors: Vitor Greati, Sérgio Marcelino, Umberto Rivieccio

Abstract: Multiple-conclusion Hilbert-style systems allow us to finitely axiomatize every logic defined by a finite matrix. Having obtained such axiomatizations for Paraconsistent Weak Kleene and Bochvar-Kleene logics, we modify them by replacing the multiple-conclusion rules with carefully selected single-conclusion ones. In this way we manage to introduce the first finite Hilbert-style single-conclusion a… ▽ More Multiple-conclusion Hilbert-style systems allow us to finitely axiomatize every logic defined by a finite matrix. Having obtained such axiomatizations for Paraconsistent Weak Kleene and Bochvar-Kleene logics, we modify them by replacing the multiple-conclusion rules with carefully selected single-conclusion ones. In this way we manage to introduce the first finite Hilbert-style single-conclusion axiomatizations for these logics. △ Less

Submitted 20 March, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

Comments: Corrections on Def.2, Def.3 (PWK system) and Rem. 4

MSC Class: 03B22 ACM Class: F.4.1

arXiv:2312.09654 [pdf, other]

The cost of artificial latency in the PBS context

Authors: Umberto Natale, Michael Moser

Abstract: We present a comprehensive analysis of the implications of artificial latency in the Proposer-Builder Separation framework on the Ethereum network. Focusing on the MEV-Boost auction system, we analyze how strategic latency manipulation affects Maximum Extractable Value yields and network integrity. Our findings reveal both increased profitability for node operators and significant systemic challen… ▽ More We present a comprehensive analysis of the implications of artificial latency in the Proposer-Builder Separation framework on the Ethereum network. Focusing on the MEV-Boost auction system, we analyze how strategic latency manipulation affects Maximum Extractable Value yields and network integrity. Our findings reveal both increased profitability for node operators and significant systemic challenges, including heightened network inefficiencies and centralization risks. We empirically validates these insights with a pilot that Chorus One has been operating on Ethereum mainnet. We demonstrate the nuanced effects of latency on bid selection and validator dynamics. Ultimately, this research underscores the need for balanced strategies that optimize Maximum Extractable Value capture while preserving the Ethereum network's decentralization ethos. △ Less

Submitted 11 January, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

arXiv:2310.02952 [pdf, ps, other]

Some more theorems on structural entailment relations and non-deterministic semantics

Authors: Carlos Caleiro, Sérgio Marcelino, Umberto Rivieccio

Abstract: We extend classical work by Janusz Czelakowski on the closure properties of the class of matrix models of entailment relations - nowadays more commonly called multiple-conclusion logics - to the setting of non-deterministic matrices (Nmatrices), characterizing the Nmatrix models of an arbitrary logic through a generalization of the standard class operators to the non-deterministic setting. We high… ▽ More We extend classical work by Janusz Czelakowski on the closure properties of the class of matrix models of entailment relations - nowadays more commonly called multiple-conclusion logics - to the setting of non-deterministic matrices (Nmatrices), characterizing the Nmatrix models of an arbitrary logic through a generalization of the standard class operators to the non-deterministic setting. We highlight the main differences that appear in this more general setting, in particular: the possibility to obtain Nmatrix quotients using any compatible equivalence relation (not necessarily a congruence); the problem of determining when strict homomorphisms preserve the logic of a given Nmatrix; the fact that the operations of taking images and preimages cannot be swapped, which determines the exact sequence of operators that generates, from any complete semantics, the class of all Nmatrix models of a logic. Many results, on the other hand, generalize smoothly to the non-deterministic setting: we show for instance that a logic is finitely based if and only if both the class of its Nmatrix models and its complement are closed under ultraproducts. We conclude by mentioning possible developments in adapting the Abstract Algebraic Logic approach to logics induced by Nmatrices and the associated equational reasoning over non-deterministic algebras. △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2310.02699 [pdf, other]

Continual Contrastive Spoken Language Understanding

Authors: Umberto Cappellazzo, Enrico Fini, Muqiao Yang, Daniele Falavigna, Alessio Brutti, Bhiksha Raj

Abstract: Recently, neural networks have shown impressive progress across diverse fields, with speech processing being no exception. However, recent breakthroughs in this area require extensive offline training using large datasets and tremendous computing resources. Unfortunately, these models struggle to retain their previously acquired knowledge when learning new tasks continually, and retraining from sc… ▽ More Recently, neural networks have shown impressive progress across diverse fields, with speech processing being no exception. However, recent breakthroughs in this area require extensive offline training using large datasets and tremendous computing resources. Unfortunately, these models struggle to retain their previously acquired knowledge when learning new tasks continually, and retraining from scratch is almost always impractical. In this paper, we investigate the problem of learning sequence-to-sequence models for spoken language understanding in a class-incremental learning (CIL) setting and we propose COCONUT, a CIL method that relies on the combination of experience replay and contrastive learning. Through a modified version of the standard supervised contrastive loss applied only to the rehearsal samples, COCONUT preserves the learned representations by pulling closer samples from the same class and pushing away the others. Moreover, we leverage a multimodal contrastive loss that helps the model learn more discriminative representations of the new data by aligning audio and text features. We also investigate different contrastive designs to combine the strengths of the contrastive loss with teacher-student architectures used for distillation. Experiments on two established SLU datasets reveal the effectiveness of our proposed approach and significant improvements over the baselines. We also show that COCONUT can be combined with methods that operate on the decoder side of the model, resulting in further metrics improvements. △ Less

Submitted 4 June, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: Accepted to ACL Findings 2024

arXiv:2309.12377 [pdf, other]

Shedding Light on the Ageing of Extra Virgin Olive Oil: Probing the Impact of Temperature with Fluorescence Spectroscopy and Machine Learning Techniques

Authors: Francesca Venturini, Silvan Fluri, Manas Mejari, Michael Baumgartner, Dario Piga, Umberto Michelucci

Abstract: This work systematically investigates the oxidation of extra virgin olive oil (EVOO) under accelerated storage conditions with UV absorption and total fluorescence spectroscopy. With the large amount of data collected, it proposes a method to monitor the oil's quality based on machine learning applied to highly-aggregated data. EVOO is a high-quality vegetable oil that has earned worldwide reputat… ▽ More This work systematically investigates the oxidation of extra virgin olive oil (EVOO) under accelerated storage conditions with UV absorption and total fluorescence spectroscopy. With the large amount of data collected, it proposes a method to monitor the oil's quality based on machine learning applied to highly-aggregated data. EVOO is a high-quality vegetable oil that has earned worldwide reputation for its numerous health benefits and excellent taste. Despite its outstanding quality, EVOO degrades over time owing to oxidation, which can affect both its health qualities and flavour. Therefore, it is highly relevant to quantify the effects of oxidation on EVOO and develop methods to assess it that can be easily implemented under field conditions, rather than in specialized laboratories. The following study demonstrates that fluorescence spectroscopy has the capability to monitor the effect of oxidation and assess the quality of EVOO, even when the data are highly aggregated. It shows that complex laboratory equipment is not necessary to exploit fluorescence spectroscopy using the proposed method and that cost-effective solutions, which can be used in-field by non-scientists, could provide an easily-accessible assessment of the quality of EVOO. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.10479 [pdf, other]

RECALL+: Adversarial Web-based Replay for Continual Learning in Semantic Segmentation

Authors: Chang Liu, Giulia Rizzoli, Francesco Barbato, Andrea Maracani, Marco Toldo, Umberto Michieli, Yi Niu, Pietro Zanuttigh

Abstract: Catastrophic forgetting of previous knowledge is a critical issue in continual learning typically handled through various regularization strategies. However, existing methods struggle especially when several incremental steps are performed. In this paper, we extend our previous approach (RECALL) and tackle forgetting by exploiting unsupervised web-crawled data to retrieve examples of old classes f… ▽ More Catastrophic forgetting of previous knowledge is a critical issue in continual learning typically handled through various regularization strategies. However, existing methods struggle especially when several incremental steps are performed. In this paper, we extend our previous approach (RECALL) and tackle forgetting by exploiting unsupervised web-crawled data to retrieve examples of old classes from online databases. In contrast to the original methodology, which did not incorporate an assessment of web-based data, the present work proposes two advanced techniques: an adversarial approach and an adaptive threshold strategy. These methods are utilized to meticulously choose samples from web data that exhibit strong statistical congruence with the no longer available training data. Furthermore, we improved the pseudo-labeling scheme to achieve a more accurate labeling of web data that also considers classes being learned in the current step. Experimental results show that this enhanced approach achieves remarkable results, particularly when the incremental scenario spans multiple steps. △ Less

Submitted 16 February, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

arXiv:2309.09546 [pdf, other]

Training dynamic models using early exits for automatic speech recognition on resource-constrained devices

Authors: George August Wright, Umberto Cappellazzo, Salah Zaiem, Desh Raj, Lucas Ondel Yang, Daniele Falavigna, Mohamed Nabih Ali, Alessio Brutti

Abstract: The ability to dynamically adjust the computational load of neural models during inference is crucial for on-device processing scenarios characterised by limited and time-varying computational resources. A promising solution is presented by early-exit architectures, in which additional exit branches are appended to intermediate layers of the encoder. In self-attention models for automatic speech r… ▽ More The ability to dynamically adjust the computational load of neural models during inference is crucial for on-device processing scenarios characterised by limited and time-varying computational resources. A promising solution is presented by early-exit architectures, in which additional exit branches are appended to intermediate layers of the encoder. In self-attention models for automatic speech recognition (ASR), early-exit architectures enable the development of dynamic models capable of adapting their size and architecture to varying levels of computational resources and ASR performance demands. Previous research on early-exiting ASR models has relied on pre-trained self-supervised models, fine-tuned with an early-exit loss. In this paper, we undertake an experimental comparison between fine-tuning pre-trained backbones and training models from scratch with the early-exiting objective. Experiments conducted on public datasets reveal that early-exit models trained from scratch not only preserve performance when using fewer encoder layers but also exhibit enhanced task accuracy compared to single-exit or pre-trained models. Furthermore, we explore an exit selection strategy grounded in posterior probabilities as an alternative to the conventional frame-based entropy approach. Results provide insights into the training dynamics of early-exit architectures for ASR models, particularly the efficacy of training strategies and exit selection methods. △ Less

Submitted 22 February, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: Accepted at the ICASSP Workshop Self-supervision in Audio, Speech and Beyond 2024

arXiv:2309.06764 [pdf, other]

Adding an Implication to Logics of Perfect Paradefinite Algebras

Authors: Vitor Greati, Sérgio Marcelino, João Marcos, Umberto Rivieccio

Abstract: Perfect paradefinite algebras are De Morgan algebras expanded with an operation that allows for the full behavior of classical negation to be restored. They form a variety that is term-equivalent to the variety of involutive Stone algebras. Their associated multiple-conclusion (Set-Set) and single-conclusion (Set-Fmla) order-preserving logics are non-algebraizable self-extensional logics of formal… ▽ More Perfect paradefinite algebras are De Morgan algebras expanded with an operation that allows for the full behavior of classical negation to be restored. They form a variety that is term-equivalent to the variety of involutive Stone algebras. Their associated multiple-conclusion (Set-Set) and single-conclusion (Set-Fmla) order-preserving logics are non-algebraizable self-extensional logics of formal inconsistency and undeterminedness determined by a six-valued matrix, studied in depth by Gomes et al. (2022) from both the algebraic and the proof-theoretical perspectives. In the present paper, we continue that study by investigating directions for conservatively expanding these logics with an implication connective (essentially, one that admits the deduction-detachment theorem). We first consider logics given by very simple and manageable non-deterministic semantics whose implication (in isolation) is classical. These, nevertheless, fail to be self-extensional. We then consider the implication realized by the relative pseudo-complement over the six-valued perfect paradefinite algebra. Our strategy is to expand the language of the latter algebra with this connective and study the (self-extensional) Set-Set and Set-Fmla order-preserving and top-assertional logics of the variety induced by the resulting algebra. We provide axiomatizations for such new variety and for such logics, drawing parallels with the class of symmetric Heyting algebras and with Moisil's 'symmetric modal logic'. For the Set-Set logic, in particular, the axiomatization we obtain is analytic. We close by studying interpolation properties for these logics and concluding that the new variety has the Maehara amalgamation property. △ Less

Submitted 6 April, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

Comments: New version after a round of peer reviewing, no critical changes

MSC Class: 03G10 (Primary) 03C05; 03B50; 03B70; 03B53; 03B22; 03B35; 03C40 (Secondary) ACM Class: F.4.1

arXiv:2308.07136 [pdf, other]

Pairing interacting protein sequences using masked language modeling

Authors: Umberto Lupo, Damiano Sgarbossa, Anne-Florence Bitbol

Abstract: Predicting which proteins interact together from amino-acid sequences is an important task. We develop a method to pair interacting protein sequences which leverages the power of protein language models trained on multiple sequence alignments, such as MSA Transformer and the EvoFormer module of AlphaFold. We formulate the problem of pairing interacting partners among the paralogs of two protein fa… ▽ More Predicting which proteins interact together from amino-acid sequences is an important task. We develop a method to pair interacting protein sequences which leverages the power of protein language models trained on multiple sequence alignments, such as MSA Transformer and the EvoFormer module of AlphaFold. We formulate the problem of pairing interacting partners among the paralogs of two protein families in a differentiable way. We introduce a method called DiffPALM that solves it by exploiting the ability of MSA Transformer to fill in masked amino acids in multiple sequence alignments using the surrounding context. MSA Transformer encodes coevolution between functionally or structurally coupled amino acids. We show that it captures inter-chain coevolution, while it was trained on single-chain data, which means that it can be used out-of-distribution. Relying on MSA Transformer without fine-tuning, DiffPALM outperforms existing coevolution-based pairing methods on difficult benchmarks of shallow multiple sequence alignments extracted from ubiquitous prokaryotic protein datasets. It also outperforms an alternative method based on a state-of-the-art protein language model trained on single sequences. Paired alignments of interacting protein sequences are a crucial ingredient of supervised deep learning methods to predict the three-dimensional structure of protein complexes. DiffPALM substantially improves the structure prediction of some eukaryotic protein complexes by AlphaFold-Multimer, without significantly deteriorating any of those we tested. It also achieves competitive performance with using orthology-based pairing. △ Less

Submitted 14 August, 2023; originally announced August 2023.

Comments: 33 pages, 14 figures, 2 tables

MSC Class: 68T07; 68T50; 92-08; 92B20 ACM Class: J.3; I.2.7

Showing 1–50 of 250 results for author: Umberto