Search | arXiv e-print repository

PAGURI: a user experience study of creative interaction with text-to-music models

Authors: Francesca Ronchini, Luca Comanducci, Gabriele Perego, Fabio Antonacci

Abstract: In recent years, text-to-music models have been the biggest breakthrough in automatic music generation. While they are unquestionably a showcase of technological progress, it is not clear yet how they can be realistically integrated into the artistic practice of musicians and music practitioners. This paper aims to address this question via Prompt Audio Generation User Research Investigation (PAGU… ▽ More In recent years, text-to-music models have been the biggest breakthrough in automatic music generation. While they are unquestionably a showcase of technological progress, it is not clear yet how they can be realistically integrated into the artistic practice of musicians and music practitioners. This paper aims to address this question via Prompt Audio Generation User Research Investigation (PAGURI), a user experience study where we leverage recent text-to-music developments to study how musicians and practitioners interact with these systems, evaluating their satisfaction levels. We developed an online tool through which users can generate music samples and/or apply recently proposed personalization techniques, based on fine-tuning, to make the text-to-music model generate sounds closer to their needs and preferences. Using questionnaires, we analyzed how participants interacted with the proposed tool, to understand the effectiveness of text-to-music models in enhancing users' creativity. Results show that even if the audio samples generated and their quality may not always meet user expectations, the majority of the participants would incorporate the tool in their creative process. Furthermore, they provided insights into potential enhancements for the system and its integration into their music practice. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2406.17354 [pdf, other]

On the correlation between Architectural Smells and Static Analysis Warnings

Authors: Matteo Esposito, Mikel Robredo, Francesca Arcelli Fontana, Valentina Lenarduzzi

Abstract: Background. Software quality assurance is essential during software development and maintenance. Static Analysis Tools (SAT) are widely used for assessing code quality. Architectural smells are becoming more daunting to address and evaluate among quality issues. Objective. We aim to understand the relationships between static analysis warnings (SAW) and architectural smells (AS) to guide develop… ▽ More Background. Software quality assurance is essential during software development and maintenance. Static Analysis Tools (SAT) are widely used for assessing code quality. Architectural smells are becoming more daunting to address and evaluate among quality issues. Objective. We aim to understand the relationships between static analysis warnings (SAW) and architectural smells (AS) to guide developers/maintainers in focusing their efforts on SAWs more prone to co-occurring with AS. Method. We performed an empirical study on 103 Java projects totaling 72 million LOC belonging to projects from a vast set of domains, and 785 SAW detected by four SAT, Checkstyle, Findbugs, PMD, SonarQube, and 4 architectural smells detected by ARCAN tool. We analyzed how SAWs influence AS presence. Finally, we proposed an AS remediation effort prioritization based on SAW severity and SAW proneness to specific ASs. Results. Our study reveals a moderate correlation between SAWs and ASs. Different combinations of SATs and SAWs significantly affect AS occurrence, with certain SAWs more likely to co-occur with specific ASs. Conversely, 33.79% of SAWs act as "healthy carriers", not associated with any ASs. Conclusion. Practitioners can ignore about a third of SAWs and focus on those most likely to be associated with ASs. Prioritizing AS remediation based on SAW severity or SAW proneness to specific ASs results in effective rankings like those based on AS severity. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.16453 [pdf, other]

Learning in Wilson-Cowan model for metapopulation

Authors: Raffaele Marino, Lorenzo Buffoni, Lorenzo Chicchi, Francesca Di Patti, Diego Febbe, Lorenzo Giambagli, Duccio Fanelli

Abstract: The Wilson-Cowan model for metapopulation, a Neural Mass Network Model, treats different subcortical regions of the brain as connected nodes, with connections representing various types of structural, functional, or effective neuronal connectivity between these regions. Each region comprises interacting populations of excitatory and inhibitory cells, consistent with the standard Wilson-Cowan model… ▽ More The Wilson-Cowan model for metapopulation, a Neural Mass Network Model, treats different subcortical regions of the brain as connected nodes, with connections representing various types of structural, functional, or effective neuronal connectivity between these regions. Each region comprises interacting populations of excitatory and inhibitory cells, consistent with the standard Wilson-Cowan model. By incorporating stable attractors into such a metapopulation model's dynamics, we transform it into a learning algorithm capable of achieving high image and text classification accuracy. We test it on MNIST and Fashion MNIST, in combination with convolutional neural networks, on CIFAR-10 and TF-FLOWERS, and, in combination with a transformer architecture (BERT), on IMDB, always showing high classification accuracy. These numerical evaluations illustrate that minimal modifications to the Wilson-Cowan model for metapopulation can reveal unique and previously unobserved dynamics. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16063 [pdf, ps, other]

Optimal matching for sharing and linearity analysis

Authors: Gianluca Amato, Francesca Scozzari

Abstract: Static analysis of logic programs by abstract interpretation requires designing abstract operators which mimic the concrete ones, such as unification, renaming and projection. In the case of goal-driven analysis, where goal-dependent semantics are used, we also need a backward-unification operator, typically implemented through matching. In this paper we study the problem of deriving optimal abstr… ▽ More Static analysis of logic programs by abstract interpretation requires designing abstract operators which mimic the concrete ones, such as unification, renaming and projection. In the case of goal-driven analysis, where goal-dependent semantics are used, we also need a backward-unification operator, typically implemented through matching. In this paper we study the problem of deriving optimal abstract matching operators for sharing and linearity properties. We provide an optimal operator for matching in the domain ${\mathtt{ShLin}^ω}$, which can be easily instantiated to derive optimal operators for the domains ${\mathtt{ShLin}^{2}}$ by Andy King and the reduced product $\mathtt{Sharing} \times \mathtt{Lin}$. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: Under consideration in Theory and Practice of Logic Programming (TPLP). arXiv admin note: text overlap with arXiv:0710.0528

MSC Class: 68N17; 68N30 ACM Class: D.1.6; D.2.4

arXiv:2406.13724 [pdf, other]

Heterogeneous Graph Neural Networks with Post-hoc Explanations for Multi-modal and Explainable Land Use Inference

Authors: Xuehao Zhai, Junqi Jiang, Adam Dejl, Antonio Rago, Fangce Guo, Francesca Toni, Aruna Sivakumar

Abstract: Urban land use inference is a critically important task that aids in city planning and policy-making. Recently, the increased use of sensor and location technologies has facilitated the collection of multi-modal mobility data, offering valuable insights into daily activity patterns. Many studies have adopted advanced data-driven techniques to explore the potential of these multi-modal mobility dat… ▽ More Urban land use inference is a critically important task that aids in city planning and policy-making. Recently, the increased use of sensor and location technologies has facilitated the collection of multi-modal mobility data, offering valuable insights into daily activity patterns. Many studies have adopted advanced data-driven techniques to explore the potential of these multi-modal mobility data in land use inference. However, existing studies often process samples independently, ignoring the spatial correlations among neighbouring objects and heterogeneity among different services. Furthermore, the inherently low interpretability of complex deep learning methods poses a significant barrier in urban planning, where transparency and extrapolability are crucial for making long-term policy decisions. To overcome these challenges, we introduce an explainable framework for inferring land use that synergises heterogeneous graph neural networks (HGNs) with Explainable AI techniques, enhancing both accuracy and explainability. The empirical experiments demonstrate that the proposed HGNs significantly outperform baseline graph neural networks for all six land-use indicators, especially in terms of 'office' and 'sustenance'. As explanations, we consider feature attribution and counterfactual explanations. The analysis of feature attribution explanations shows that the symmetrical nature of the `residence' and 'work' categories predicted by the framework aligns well with the commuter's 'work' and 'recreation' activities in London. The analysis of the counterfactual explanations reveals that variations in node features and types are primarily responsible for the differences observed between the predicted land use distribution and the ideal mixed state. These analyses demonstrate that the proposed HGNs can suitably support urban stakeholders in their urban planning and policy-making. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.10868 [pdf, other]

Analyzing Key Neurons in Large Language Models

Authors: Lihu Chen, Adam Dejl, Francesca Toni

Abstract: Large Language Models (LLMs) possess vast amounts of knowledge within their parameters, prompting research into methods for locating and editing this knowledge. Previous investigations have primarily focused on fill-in-the-blank tasks and locating entity-related usually single-token facts) information in relatively small-scale language models. However, several key questions remain unanswered: (1)… ▽ More Large Language Models (LLMs) possess vast amounts of knowledge within their parameters, prompting research into methods for locating and editing this knowledge. Previous investigations have primarily focused on fill-in-the-blank tasks and locating entity-related usually single-token facts) information in relatively small-scale language models. However, several key questions remain unanswered: (1) How can we effectively locate query-relevant neurons in contemporary autoregressive LLMs, such as LLaMA and Mistral? (2) How can we address the challenge of long-form text generation? (3) Are there localized knowledge regions in LLMs? In this study, we introduce Neuron Attribution-Inverse Cluster Attribution (NA-ICA), a novel architecture-agnostic framework capable of identifying key neurons in LLMs. NA-ICA allows for the examination of long-form answers beyond single tokens by employing the proxy task of multi-choice question answering. To evaluate the effectiveness of our detected key neurons, we construct two multi-choice QA datasets spanning diverse domains and languages. Empirical evaluations demonstrate that NA-ICA outperforms baseline methods significantly. Moreover, analysis of neuron distributions reveals the presence of visible localized regions, particularly within different domains. Finally, we demonstrate the potential applications of our detected key neurons in knowledge editing and neuron-based prediction. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 13 pages

arXiv:2406.10214 [pdf, other]

Universal randomised signatures for generative time series modelling

Authors: Francesca Biagini, Lukas Gonon, Niklas Walter

Abstract: Randomised signature has been proposed as a flexible and easily implementable alternative to the well-established path signature. In this article, we employ randomised signature to introduce a generative model for financial time series data in the spirit of reservoir computing. Specifically, we propose a novel Wasserstein-type distance based on discrete-time randomised signatures. This metric on t… ▽ More Randomised signature has been proposed as a flexible and easily implementable alternative to the well-established path signature. In this article, we employ randomised signature to introduce a generative model for financial time series data in the spirit of reservoir computing. Specifically, we propose a novel Wasserstein-type distance based on discrete-time randomised signatures. This metric on the space of probability measures captures the distance between (conditional) distributions. Its use is justified by our novel universal approximation results for randomised signatures on the space of continuous functions taking the underlying path as an input. We then use our metric as the loss function in a non-adversarial generator model for synthetic time series data based on a reservoir neural stochastic differential equation. We compare the results of our model to benchmarks from the existing literature. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 33 pages

arXiv:2406.10031 [pdf, other]

Deep Learning Domain Adaptation to Understand Physico-Chemical Processes from Fluorescence Spectroscopy Small Datasets: Application to Ageing of Olive Oil

Authors: Umberto Michelucci, Francesca Venturini

Abstract: Fluorescence spectroscopy is a fundamental tool in life sciences and chemistry, widely used for applications such as environmental monitoring, food quality control, and biomedical diagnostics. However, analysis of spectroscopic data with deep learning, in particular of fluorescence excitation-emission matrices (EEMs), presents significant challenges due to the typically small and sparse datasets a… ▽ More Fluorescence spectroscopy is a fundamental tool in life sciences and chemistry, widely used for applications such as environmental monitoring, food quality control, and biomedical diagnostics. However, analysis of spectroscopic data with deep learning, in particular of fluorescence excitation-emission matrices (EEMs), presents significant challenges due to the typically small and sparse datasets available. Furthermore, the analysis of EEMs is difficult due to their high dimensionality and overlap** spectral features. This study proposes a new approach that exploits domain adaptation with pretrained vision models, alongside a novel interpretability algorithm to address these challenges. Thanks to specialised feature engineering of the neural networks described in this work, we are now able to provide deeper insights into the physico-chemical processes underlying the data. The proposed approach is demonstrated through the analysis of the oxidation process in extra virgin olive oil (EVOO) during ageing, showing its effectiveness in predicting quality indicators and identifying the spectral bands, and thus the molecules involved in the process. This work describes a significantly innovative approach in the use of deep learning for spectroscopy, transforming it from a black box into a tool for understanding complex biological and chemical processes. △ Less

Submitted 22 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09140 [pdf, other]

Investigating the translation capabilities of Large Language Models trained on parallel data only

Authors: Javier García Gilabert, Carlos Escolano, Aleix Sant Savall, Francesca De Luca Fornaciari, Audrey Mash, Xixian Liao, Maite Melero

Abstract: In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methods predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this… ▽ More In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methods predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce PLUME (Parallel Language Model), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on Catalan-centric parallel examples. These models perform comparably to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones. Utilizing this set of models, we conduct a thorough investigation into the translation capabilities of LLMs, probing their performance, the impact of the different elements of the prompt, and their cross-lingual representation space. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: We release our code at: https://github.com/projecte-aina/Plume

arXiv:2406.09078 [pdf, other]

ONNX-to-Hardware Design Flow for Adaptive Neural-Network Inference on FPGAs

Authors: Federico Manca, Francesco Ratto, Francesca Palumbo

Abstract: The challenges involved in executing neural networks (NNs) at the edge include providing diversity, flexibility, and sustainability. That implies, for instance, supporting evolving applications and algorithms energy-efficiently. Using hardware or software accelerators can deliver fast and efficient computation of the NNs, while flexibility can be exploited to support long-term adaptivity. Nonethel… ▽ More The challenges involved in executing neural networks (NNs) at the edge include providing diversity, flexibility, and sustainability. That implies, for instance, supporting evolving applications and algorithms energy-efficiently. Using hardware or software accelerators can deliver fast and efficient computation of the NNs, while flexibility can be exploited to support long-term adaptivity. Nonetheless, handcrafting an NN for a specific device, despite the possibility of leading to an optimal solution, takes time and experience, and that's why frameworks for hardware accelerators are being developed. This work, starting from a preliminary semi-integrated ONNX-to-hardware toolchain [21], focuses on enabling approximate computing leveraging the distinctive ability of the original toolchain to favor adaptivity. The goal is to allow lightweight adaptable NN inference on FPGAs at the edge. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Proceedings of the XXIV International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS), June 29 - July 4, 2024. arXiv admin note: text overlap with arXiv:2309.13321

arXiv:2406.07141 [pdf, other]

Identifiable Object-Centric Representation Learning via Probabilistic Slot Attention

Authors: Avinash Kori, Francesco Locatello, Ainkaran Santhirasekaram, Francesca Toni, Ben Glocker, Fabio De Sousa Ribeiro

Abstract: Learning modular object-centric representations is crucial for systematic generalization. Existing methods show promising object-binding capabilities empirically, but theoretical identifiability guarantees remain relatively underdeveloped. Understanding when object-centric representations can theoretically be identified is crucial for scaling slot-based methods to high-dimensional images with corr… ▽ More Learning modular object-centric representations is crucial for systematic generalization. Existing methods show promising object-binding capabilities empirically, but theoretical identifiability guarantees remain relatively underdeveloped. Understanding when object-centric representations can theoretically be identified is crucial for scaling slot-based methods to high-dimensional images with correctness guarantees. To that end, we propose a probabilistic slot-attention algorithm that imposes an aggregate mixture prior over object-centric slot representations, thereby providing slot identifiability guarantees without supervision, up to an equivalence relation. We provide empirical verification of our theoretical identifiability result using both simple 2-dimensional data and high-resolution imaging datasets. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2405.18586 [pdf, ps, other]

A Verifiable Computing Scheme for Encrypted Control Systems

Authors: Francesca Stabile, Walter Lucia, Amr Youssef, Giuseppe Franze

Abstract: The proliferation of cloud computing technologies has paved the way for deploying networked encrypted control systems, offering high performance, remote accessibility and privacy. However, in scenarios where the control algorithms run on third-party cloud service providers, the control logic might be changed by a malicious agent on the cloud. Consequently, it is imperative to verify the correctnes… ▽ More The proliferation of cloud computing technologies has paved the way for deploying networked encrypted control systems, offering high performance, remote accessibility and privacy. However, in scenarios where the control algorithms run on third-party cloud service providers, the control logic might be changed by a malicious agent on the cloud. Consequently, it is imperative to verify the correctness of the control signals received from the cloud. Traditional verification methods, like zero-knowledge proof techniques, are computationally demanding in both proof generation and verification, may require several rounds of interactions between the prover and verifier and, consequently, are inapplicable in realtime control system applications. In this paper, we present a novel computationally inexpensive verifiable computing solution inspired by the probabilistic cut-and-choose approach. The proposed scheme allows the plant's actuator to validate the computations accomplished by the encrypted cloud-based networked controller without compromising the control scheme's performance. We showcase the effectiveness and real-time applicability of the proposed verifiable computation scheme using a remotely controlled Khepera IV differential-drive robot. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: Preprint of the manuscript submitted to the IEEE Control Systems Letters (L-CSS)

arXiv:2405.16570 [pdf, other]

ID-to-3D: Expressive ID-guided 3D Heads via Score Distillation Sampling

Authors: Francesca Babiloni, Alexandros Lattas, Jiankang Deng, Stefanos Zafeiriou

Abstract: We propose ID-to-3D, a method to generate identity- and text-guided 3D human heads with disentangled expressions, starting from even a single casually captured in-the-wild image of a subject. The foundation of our approach is anchored in compositionality, alongside the use of task-specific 2D diffusion models as priors for optimization. First, we extend a foundational model with a lightweight expr… ▽ More We propose ID-to-3D, a method to generate identity- and text-guided 3D human heads with disentangled expressions, starting from even a single casually captured in-the-wild image of a subject. The foundation of our approach is anchored in compositionality, alongside the use of task-specific 2D diffusion models as priors for optimization. First, we extend a foundational model with a lightweight expression-aware and ID-aware architecture, and create 2D priors for geometry and texture generation, via fine-tuning only 0.2% of its available training parameters. Then, we jointly leverage a neural parametric representation for the expressions of each subject and a multi-stage generation of highly detailed geometry and albedo texture. This combination of strong face identity embeddings and our neural representation enables accurate reconstruction of not only facial features but also accessories and hair and can be meshed to provide render-ready assets for gaming and telepresence. Our results achieve an unprecedented level of identity-consistent and high-quality texture and geometry generation, generalizing to a ``world'' of unseen 3D identities, without relying on large 3D captured datasets of human assets. △ Less

Submitted 28 May, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

Comments: Explore our 3D results at: https://idto3d.github.io ; fixed broken url to project page

arXiv:2405.15926 [pdf, other]

Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers

Authors: Lorenzo Tiberi, Francesca Mignacco, Kazuki Irie, Haim Sompolinsky

Abstract: Despite the remarkable empirical performance of Transformers, their theoretical understanding remains elusive. Here, we consider a deep multi-head self-attention network, that is closely related to Transformers yet analytically tractable. We develop a statistical mechanics theory of Bayesian learning in this model, deriving exact equations for the network's predictor statistics under the finite-wi… ▽ More Despite the remarkable empirical performance of Transformers, their theoretical understanding remains elusive. Here, we consider a deep multi-head self-attention network, that is closely related to Transformers yet analytically tractable. We develop a statistical mechanics theory of Bayesian learning in this model, deriving exact equations for the network's predictor statistics under the finite-width thermodynamic limit, i.e., $N,P\rightarrow\infty$, $P/N=\mathcal{O}(1)$, where $N$ is the network width and $P$ is the number of training examples. Our theory shows that the predictor statistics are expressed as a sum of independent kernels, each one pairing different 'attention paths', defined as information pathways through different attention heads across layers. The kernels are weighted according to a 'task-relevant kernel combination' mechanism that aligns the total kernel with the task labels. As a consequence, this interplay between attention paths enhances generalization performance. Experiments confirm our findings on both synthetic and real-world sequence classification tasks. Finally, our theory explicitly relates the kernel combination mechanism to properties of the learned weights, allowing for a qualitative transfer of its insights to models trained via gradient descent. As an illustration, we demonstrate an efficient size reduction of the network, by pruning those attention heads that are deemed less relevant by our theory. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.15429 [pdf, other]

E(n) Equivariant Topological Neural Networks

Authors: Claudio Battiloro, Ege Karaismailoğlu, Mauricio Tec, George Dasoulas, Michelle Audirac, Francesca Dominici

Abstract: Graph neural networks excel at modeling pairwise interactions, but they cannot flexibly accommodate higher-order interactions and features. Topological deep learning (TDL) has emerged recently as a promising tool for addressing this issue. TDL enables the principled modeling of arbitrary multi-way, hierarchical higher-order interactions by operating on combinatorial topological spaces, such as sim… ▽ More Graph neural networks excel at modeling pairwise interactions, but they cannot flexibly accommodate higher-order interactions and features. Topological deep learning (TDL) has emerged recently as a promising tool for addressing this issue. TDL enables the principled modeling of arbitrary multi-way, hierarchical higher-order interactions by operating on combinatorial topological spaces, such as simplicial or cell complexes, instead of graphs. However, little is known about how to leverage geometric features such as positions and velocities for TDL. This paper introduces E(n)-Equivariant Topological Neural Networks (ETNNs), which are E(n)-equivariant message-passing networks operating on combinatorial complexes, formal objects unifying graphs, hypergraphs, simplicial, path, and cell complexes. ETNNs incorporate geometric node features while respecting rotation and translation equivariance. Moreover, ETNNs are natively ready for settings with heterogeneous interactions. We provide a theoretical analysis to show the improved expressiveness of ETNNs over architectures for geometric graphs. We also show how several E(n) equivariant variants of TDL models can be directly derived from our framework. The broad applicability of ETNNs is demonstrated through two tasks of vastly different nature: i) molecular property prediction on the QM9 benchmark and ii) land-use regression for hyper-local estimation of air pollution with multi-resolution irregular geospatial data. The experiment results indicate that ETNNs are an effective tool for learning from diverse types of richly structured data, highlighting the benefits of principled geometric inductive bias. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 32 pages, 11 figures, 8 tables

arXiv:2405.11250 [pdf, other]

Argumentative Causal Discovery

Authors: Fabrizio Russo, Anna Rapberger, Francesca Toni

Abstract: Causal discovery amounts to unearthing causal relationships amongst features in data. It is a crucial companion to causal inference, necessary to build scientific knowledge without resorting to expensive or impossible randomised control trials. In this paper, we explore how reasoning with symbolic representations can support causal discovery. Specifically, we deploy assumption-based argumentation… ▽ More Causal discovery amounts to unearthing causal relationships amongst features in data. It is a crucial companion to causal inference, necessary to build scientific knowledge without resorting to expensive or impossible randomised control trials. In this paper, we explore how reasoning with symbolic representations can support causal discovery. Specifically, we deploy assumption-based argumentation (ABA), a well-established and powerful knowledge representation formalism, in combination with causality theories, to learn graphs which reflect causal dependencies in the data. We prove that our method exhibits desirable properties, notably that, under natural conditions, it can retrieve ground-truth causal graphs. We also conduct experiments with an implementation of our method in answer set programming (ASP) on four datasets from standard benchmarks in causal discovery, showing that our method compares well against established baselines. △ Less

Submitted 25 May, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

arXiv:2405.10729 [pdf, other]

Contestable AI needs Computational Argumentation

Authors: Francesco Leofante, Hamed Ayoobi, Adam Dejl, Gabriel Freedman, Deniz Gorur, Junqi Jiang, Guilherme Paulino-Passos, Antonio Rago, Anna Rapberger, Fabrizio Russo, Xiang Yin, Dekai Zhang, Francesca Toni

Abstract: AI has become pervasive in recent years, but state-of-the-art approaches predominantly neglect the need for AI systems to be contestable. Instead, contestability is advocated by AI guidelines (e.g. by the OECD) and regulation of automated decision-making (e.g. GDPR). In this position paper we explore how contestability can be achieved computationally in and for AI. We argue that contestable AI req… ▽ More AI has become pervasive in recent years, but state-of-the-art approaches predominantly neglect the need for AI systems to be contestable. Instead, contestability is advocated by AI guidelines (e.g. by the OECD) and regulation of automated decision-making (e.g. GDPR). In this position paper we explore how contestability can be achieved computationally in and for AI. We argue that contestable AI requires dynamic (human-machine and/or machine-machine) explainability and decision-making processes, whereby machines can (i) interact with humans and/or other machines to progressively explain their outputs and/or their reasoning as well as assess grounds for contestation provided by these humans and/or other machines, and (ii) revise their decision-making processes to redress any issues successfully raised during contestation. Given that much of the current AI landscape is tailored to static AIs, the need to accommodate contestability will require a radical rethinking, that, we argue, computational argumentation is ideally suited to support. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.10579 [pdf, other]

A Hard Nut to Crack: Idiom Detection with Conversational Large Language Models

Authors: Francesca De Luca Fornaciari, Begoña Altuna, Itziar Gonzalez-Dios, Maite Melero

Abstract: In this work, we explore idiomatic language processing with Large Language Models (LLMs). We introduce the Idiomatic language Test Suite IdioTS, a new dataset of difficult examples specifically designed by language experts to assess the capabilities of LLMs to process figurative language at sentence level. We propose a comprehensive evaluation methodology based on an idiom detection task, where LL… ▽ More In this work, we explore idiomatic language processing with Large Language Models (LLMs). We introduce the Idiomatic language Test Suite IdioTS, a new dataset of difficult examples specifically designed by language experts to assess the capabilities of LLMs to process figurative language at sentence level. We propose a comprehensive evaluation methodology based on an idiom detection task, where LLMs are prompted with detecting an idiomatic expression in a given English sentence. We present a thorough automatic and manual evaluation of the results and an extensive error analysis. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.09415 [pdf, ps, other]

On the Correspondence of Non-flat Assumption-based Argumentation and Logic Programming with Negation as Failure in the Head

Authors: Anna Rapberger, Markus Ulbricht, Francesca Toni

Abstract: The relation between (a fragment of) assumption-based argumentation (ABA) and logic programs (LPs) under stable model semantics is well-studied. However, for obtaining this relation, the ABA framework needs to be restricted to being flat, i.e., a fragment where the (defeasible) assumptions can never be entailed, only assumed to be true or false. Here, we remove this restriction and show a correspo… ▽ More The relation between (a fragment of) assumption-based argumentation (ABA) and logic programs (LPs) under stable model semantics is well-studied. However, for obtaining this relation, the ABA framework needs to be restricted to being flat, i.e., a fragment where the (defeasible) assumptions can never be entailed, only assumed to be true or false. Here, we remove this restriction and show a correspondence between non-flat ABA and LPs with negation as failure in their head. We then extend this result to so-called set-stable ABA semantics, originally defined for the fragment of non-flat ABA called bipolar ABA. We showcase how to define set-stable semantics for LPs with negation as failure in their head and show the correspondence to set-stable ABA semantics. △ Less

Submitted 24 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.07888 [pdf, ps, other]

The fermionic massless modular Hamiltonian

Authors: Francesca La Piana, Gerardo Morsella

Abstract: We provide an explicit expression for the modular hamiltonian of the von Neumann algebras associated to the unit double cone for the (fermionic) quantum field theories of the 2-component Weyl (helicity 1/2) field, and of the 4-component massless Dirac and Majorana fields. To this end, we represent the one particle spaces of these theories in terms of solutions of the corresponding wave equations,… ▽ More We provide an explicit expression for the modular hamiltonian of the von Neumann algebras associated to the unit double cone for the (fermionic) quantum field theories of the 2-component Weyl (helicity 1/2) field, and of the 4-component massless Dirac and Majorana fields. To this end, we represent the one particle spaces of these theories in terms of solutions of the corresponding wave equations, and obtain the action of the modular group on them. As an application, we compute the relative entropy between the vacuum of the massless Majorana field and one particle states associated to waves with Cauchy data localized in the spatial unit ball. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 22 pages, no figures

MSC Class: 81P45; 94A17; 46L60; 81T05; 81V74

arXiv:2405.06851 [pdf, other]

Nonlinear classification of neural manifolds with contextual information

Authors: Francesca Mignacco, Chi-Ning Chou, SueYeon Chung

Abstract: Understanding how neural systems efficiently process information through distributed representations is a fundamental challenge at the interface of neuroscience and machine learning. Recent approaches analyze the statistical and geometrical attributes of neural representations as population-level mechanistic descriptors of task implementation. In particular, manifold capacity has emerged as a prom… ▽ More Understanding how neural systems efficiently process information through distributed representations is a fundamental challenge at the interface of neuroscience and machine learning. Recent approaches analyze the statistical and geometrical attributes of neural representations as population-level mechanistic descriptors of task implementation. In particular, manifold capacity has emerged as a promising framework linking population geometry to the separability of neural manifolds. However, this metric has been limited to linear readouts. Here, we propose a theoretical framework that overcomes this limitation by leveraging contextual input information. We derive an exact formula for the context-dependent capacity that depends on manifold geometry and context correlations, and validate it on synthetic and real data. Our framework's increased expressivity captures representation untanglement in deep networks at early stages of the layer hierarchy, previously inaccessible to analysis. As context-dependent nonlinearity is ubiquitous in neural systems, our data-driven and theoretically grounded approach promises to elucidate context-dependent computation across scales, datasets, and models. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: 5 pages, 5 figures

arXiv:2405.04079 [pdf, other]

Leveraging swarm capabilities to assist other systems

Authors: Miquel Kegeleirs, David Garzón Ramos, Guillermo Legarda Herranz, Ilyes Gharbi, Jeanne Szpirer, Ken Hasselmann, Lorenzo Garattoni, Gianpiero Francesca, Mauro Birattari

Abstract: Most studies in swarm robotics treat the swarm as an isolated system of interest. We argue that the prevailing view of swarms as self-sufficient, independent systems limits the scope of potential applications for swarm robotics. A robot swarm could act as a support in an heterogeneous system comprising other robots and/or human operators, in particular by quickly providing access to a large amount… ▽ More Most studies in swarm robotics treat the swarm as an isolated system of interest. We argue that the prevailing view of swarms as self-sufficient, independent systems limits the scope of potential applications for swarm robotics. A robot swarm could act as a support in an heterogeneous system comprising other robots and/or human operators, in particular by quickly providing access to a large amount of data acquired in large unknown environments. Tasks such as target identification & tracking, scouting, or monitoring/surveillance could benefit from this approach. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: Presented at the "Breaking swarm stereotypes" workshop at ICRA 2024

arXiv:2405.02079 [pdf, other]

Argumentative Large Language Models for Explainable and Contestable Decision-Making

Authors: Gabriel Freedman, Adam Dejl, Deniz Gorur, Xiang Yin, Antonio Rago, Francesca Toni

Abstract: The diversity of knowledge encoded in large language models (LLMs) and their ability to apply this knowledge zero-shot in a range of settings makes them a promising candidate for use in decision-making. However, they are currently limited by their inability to reliably provide outputs which are explainable and contestable. In this paper, we attempt to reconcile these strengths and weaknesses by in… ▽ More The diversity of knowledge encoded in large language models (LLMs) and their ability to apply this knowledge zero-shot in a range of settings makes them a promising candidate for use in decision-making. However, they are currently limited by their inability to reliably provide outputs which are explainable and contestable. In this paper, we attempt to reconcile these strengths and weaknesses by introducing a method for supplementing LLMs with argumentative reasoning. Concretely, we introduce argumentative LLMs, a method utilising LLMs to construct argumentation frameworks, which then serve as the basis for formal reasoning in decision-making. The interpretable nature of these argumentation frameworks and formal reasoning means that any decision made by the supplemented LLM may be naturally explained to, and contested by, humans. We demonstrate the effectiveness of argumentative LLMs experimentally in the decision-making task of claim verification. We obtain results that are competitive with, and in some cases surpass, comparable state-of-the-art techniques. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: 19 pages, 17 figures

ACM Class: I.2.7

arXiv:2405.00268 [pdf, other]

A biased random-key genetic algorithm with variable mutants to solve a vehicle routing problem

Authors: Paola Festa, Francesca Guerriero, Mauricio G. C. Resende, Edoardo Scalzo

Abstract: The paper explores the Biased Random-Key Genetic Algorithm (BRKGA) in the domain of logistics and vehicle routing. Specifically, the application of the algorithm is contextualized within the framework of the Vehicle Routing Problem with Occasional Drivers and Time Window (VRPODTW) that represents a critical challenge in contemporary delivery systems. Within this context, BRKGA emerges as an innova… ▽ More The paper explores the Biased Random-Key Genetic Algorithm (BRKGA) in the domain of logistics and vehicle routing. Specifically, the application of the algorithm is contextualized within the framework of the Vehicle Routing Problem with Occasional Drivers and Time Window (VRPODTW) that represents a critical challenge in contemporary delivery systems. Within this context, BRKGA emerges as an innovative solution approach to optimize routing plans, balancing cost-efficiency with operational constraints. This research introduces a new BRKGA, characterized by a variable mutant population which can vary from generation to generation, named BRKGA-VM. This novel variant was tested to solve a VRPODTW. For this purpose, an innovative specific decoder procedure was proposed and implemented. Furthermore, a hybridization of the algorithm with a Variable Neighborhood Descent (VND) algorithm has also been considered, showing an improvement of problem-solving capabilities. Computational results show a better performances in term of effectiveness over a previous version of BRKGA, denoted as MP. The improved performance of BRKGA-VM is evident from its ability to optimize solutions across a wide range of scenarios, with significant improvements observed for each type of instance considered. The analysis also reveals that VM achieves preset goals more quickly compared to MP, thanks to the increased variability induced in the mutant population which facilitates the exploration of new regions of the solution space. Furthermore, the integration of VND has shown an additional positive impact on the quality of the solutions found. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: 25 pages, 9 figures

MSC Class: 90; 68 ACM Class: F.2.2; G.2.3

arXiv:2404.19586 [pdf, other]

AI techniques for near real-time monitoring of contaminants in coastal waters on board future Phisat-2 mission

Authors: Francesca Razzano, Pietro Di Stasio, Francesco Mauro, Gabriele Meoni, Marco Esposito, Gilda Schirinzi, Silvia L. Ullo

Abstract: Differently from conventional procedures, the proposed solution advocates for a groundbreaking paradigm in water quality monitoring through the integration of satellite Remote Sensing (RS) data, Artificial Intelligence (AI) techniques, and onboard processing. The objective is to offer nearly real-time detection of contaminants in coastal waters addressing a significant gap in the existing literatu… ▽ More Differently from conventional procedures, the proposed solution advocates for a groundbreaking paradigm in water quality monitoring through the integration of satellite Remote Sensing (RS) data, Artificial Intelligence (AI) techniques, and onboard processing. The objective is to offer nearly real-time detection of contaminants in coastal waters addressing a significant gap in the existing literature. Moreover, the expected outcomes include substantial advancements in environmental monitoring, public health protection, and resource conservation. The specific focus of our study is on the estimation of Turbidity and pH parameters, for their implications on human and aquatic health. Nevertheless, the designed framework can be extended to include other parameters of interest in the water environment and beyond. Originating from our participation in the European Space Agency (ESA) OrbitalAI Challenge, this article describes the distinctive opportunities and issues for the contaminants monitoring on the Phisat-2 mission. The specific characteristics of this mission, with the tools made available, will be presented, with the methodology proposed by the authors for the onboard monitoring of water contaminants in near real-time. Preliminary promising results are discussed and in progress and future work introduced. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: 11 pages, 9 figures, submitted to IEEE JSTARS

arXiv:2404.18867 [pdf, other]

Feminist Interaction Techniques: Deterring Non-Consensual Screenshots with Interaction Techniques

Authors: Li Qiwei, Francesca Lameiro, Shefali Patel, Cristi-Isaula-Reyes, Eytan Adar, Eric Gilbert, Sarita Schoenebeck

Abstract: Non-consensual Intimate Media (NCIM) refers to the distribution of sexual or intimate content without consent. NCIM is common and causes significant emotional, financial, and reputational harm. We developed Hands-Off, an interaction technique for messaging applications that deters non-consensual screenshots. Hands-Off requires recipients to perform a hand gesture in the air, above the device, to u… ▽ More Non-consensual Intimate Media (NCIM) refers to the distribution of sexual or intimate content without consent. NCIM is common and causes significant emotional, financial, and reputational harm. We developed Hands-Off, an interaction technique for messaging applications that deters non-consensual screenshots. Hands-Off requires recipients to perform a hand gesture in the air, above the device, to unlock media -- which makes simultaneous screenshotting difficult. A lab study shows that Hands-Off gestures are easy to perform and reduce non-consensual screenshots by 67 percent. We conclude by generalizing this approach and introduce the idea of Feminist Interaction Techniques (FIT), interaction techniques that encode feminist values and speak to societal problems, and reflect on FIT's opportunities and limitations. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.17925 [pdf, other]

Accurate and fast anomaly detection in industrial processes and IoT environments

Authors: Simone Tonini, Andrea Vandin, Francesca Chiaromonte, Daniele Licari, Fernando Barsacchi

Abstract: We present a novel, simple and widely applicable semi-supervised procedure for anomaly detection in industrial and IoT environments, SAnD (Simple Anomaly Detection). SAnD comprises 5 steps, each leveraging well-known statistical tools, namely; smoothing filters, variance inflation factors, the Mahalanobis distance, threshold selection algorithms and feature importance techniques. To our knowledge,… ▽ More We present a novel, simple and widely applicable semi-supervised procedure for anomaly detection in industrial and IoT environments, SAnD (Simple Anomaly Detection). SAnD comprises 5 steps, each leveraging well-known statistical tools, namely; smoothing filters, variance inflation factors, the Mahalanobis distance, threshold selection algorithms and feature importance techniques. To our knowledge, SAnD is the first procedure that integrates these tools to identify anomalies and help decipher their putative causes. We show how each step contributes to tackling technical challenges that practitioners face when detecting anomalies in industrial contexts, where signals can be highly multicollinear, have unknown distributions, and intertwine short-lived noise with the long(er)-lived actual anomalies. The development of SAnD was motivated by a concrete case study from our industrial partner, which we use here to show its effectiveness. We also evaluate the performance of SAnD by comparing it with a selection of semi-supervised methods on public datasets from the literature on anomaly detection. We conclude that SAnD is effective, broadly applicable, and outperforms existing approaches in both anomaly detection and runtime. △ Less

Submitted 27 April, 2024; originally announced April 2024.

arXiv:2404.14304 [pdf, other]

Explaining Arguments' Strength: Unveiling the Role of Attacks and Supports (Technical Report)

Authors: Xiang Yin, Potyka Nico, Francesca Toni

Abstract: Quantitatively explaining the strength of arguments under gradual semantics has recently received increasing attention. Specifically, several works in the literature provide quantitative explanations by computing the attribution scores of arguments. These works disregard the importance of attacks and supports, even though they play an essential role when explaining arguments' strength. In this pap… ▽ More Quantitatively explaining the strength of arguments under gradual semantics has recently received increasing attention. Specifically, several works in the literature provide quantitative explanations by computing the attribution scores of arguments. These works disregard the importance of attacks and supports, even though they play an essential role when explaining arguments' strength. In this paper, we propose a novel theory of Relation Attribution Explanations (RAEs), adapting Shapley values from game theory to offer fine-grained insights into the role of attacks and supports in quantitative bipolar argumentation towards obtaining the arguments' strength. We show that RAEs satisfy several desirable properties. We also propose a probabilistic algorithm to approximate RAEs efficiently. Finally, we show the application value of RAEs in fraud detection and large language models case studies. △ Less

Submitted 10 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: This paper has been accepted at IJCAI 2024 (the 33rd International Joint Conference on Artificial Intelligence)

arXiv:2404.13736 [pdf, other]

Interval Abstractions for Robust Counterfactual Explanations

Authors: Junqi Jiang, Francesco Leofante, Antonio Rago, Francesca Toni

Abstract: Counterfactual Explanations (CEs) have emerged as a major paradigm in explainable AI research, providing recourse recommendations for users affected by the decisions of machine learning models. However, when slight changes occur in the parameters of the underlying model, CEs found by existing methods often become invalid for the updated models. The literature lacks a way to certify deterministic r… ▽ More Counterfactual Explanations (CEs) have emerged as a major paradigm in explainable AI research, providing recourse recommendations for users affected by the decisions of machine learning models. However, when slight changes occur in the parameters of the underlying model, CEs found by existing methods often become invalid for the updated models. The literature lacks a way to certify deterministic robustness guarantees for CEs under model changes, in that existing methods to improve CEs' robustness are heuristic, and the robustness performances are evaluated empirically using only a limited number of retrained models. To bridge this gap, we propose a novel interval abstraction technique for parametric machine learning models, which allows us to obtain provable robustness guarantees of CEs under the possibly infinite set of plausible model changes $Δ$. We formalise our robustness notion as the $Δ$-robustness for CEs, in both binary and multi-class classification settings. We formulate procedures to verify $Δ$-robustness based on Mixed Integer Linear Programming, using which we further propose two algorithms to generate CEs that are $Δ$-robust. In an extensive empirical study, we demonstrate how our approach can be used in practice by discussing two strategies for determining the appropriate hyperparameter in our method, and we quantitatively benchmark the CEs generated by eleven methods, highlighting the effectiveness of our algorithms in finding robust CEs. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.11431 [pdf, other]

Instantiations and Computational Aspects of Non-Flat Assumption-based Argumentation

Authors: Tuomo Lehtonen, Anna Rapberger, Francesca Toni, Markus Ulbricht, Johannes P. Wallner

Abstract: Most existing computational tools for assumption-based argumentation (ABA) focus on so-called flat frameworks, disregarding the more general case. In this paper, we study an instantiation-based approach for reasoning in possibly non-flat ABA. We make use of a semantics-preserving translation between ABA and bipolar argumentation frameworks (BAFs). By utilizing compilability theory, we establish th… ▽ More Most existing computational tools for assumption-based argumentation (ABA) focus on so-called flat frameworks, disregarding the more general case. In this paper, we study an instantiation-based approach for reasoning in possibly non-flat ABA. We make use of a semantics-preserving translation between ABA and bipolar argumentation frameworks (BAFs). By utilizing compilability theory, we establish that the constructed BAFs will in general be of exponential size. In order to keep the number of arguments and computational cost low, we present three ways of identifying redundant arguments. Moreover, we identify fragments of ABA which admit a poly-sized instantiation. We propose two algorithmic approaches for reasoning in possibly non-flat ABA. The first approach utilizes the BAF instantiation while the second works directly without constructing arguments. An empirical evaluation shows that the former outperforms the latter on many instances, reflecting the lower complexity of BAF reasoning. This result is in contrast to flat ABA, where direct approaches dominate instantiation-based approaches. △ Less

Submitted 24 May, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.10454 [pdf, other]

A Computer Vision-Based Quality Assessment Technique for the automatic control of consumables for analytical laboratories

Authors: Meriam Zribi, Paolo Pagliuca, Francesca Pitolli

Abstract: The rapid growth of the Industry 4.0 paradigm is increasing the pressure to develop effective automated monitoring systems. Artificial Intelligence (AI) is a convenient tool to improve the efficiency of industrial processes while reducing errors and waste. In fact, it allows the use of real-time data to increase the effectiveness of monitoring systems, minimize errors, make the production process… ▽ More The rapid growth of the Industry 4.0 paradigm is increasing the pressure to develop effective automated monitoring systems. Artificial Intelligence (AI) is a convenient tool to improve the efficiency of industrial processes while reducing errors and waste. In fact, it allows the use of real-time data to increase the effectiveness of monitoring systems, minimize errors, make the production process more sustainable, and save costs. In this paper, a novel automatic monitoring system is proposed in the context of production process of plastic consumables used in analysis laboratories, with the aim to increase the effectiveness of the control process currently performed by a human operator. In particular, we considered the problem of classifying the presence or absence of a transparent anticoagulant substance inside test tubes. Specifically, a hand-designed deep network model is used and compared with some state-of-the-art models for its ability to categorize different images of vials that can be either filled with the anticoagulant or empty. Collected results indicate that the proposed approach is competitive with state-of-the-art models in terms of accuracy. Furthermore, we increased the complexity of the task by training the models on the ability to discriminate not only the presence or absence of the anticoagulant inside the vial, but also the size of the test tube. The analysis performed in the latter scenario confirms the competitiveness of our approach. Moreover, our model is remarkably superior in terms of its generalization ability and requires significantly fewer resources. These results suggest the possibility of successfully implementing such a model in the production process of a plastic consumables company. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: 31 pages, 13 figures, 10 tables

arXiv:2404.07692 [pdf, other]

SWI-FEED: Smart Water IoT Framework for Evaluation of Energy and Data in Massive Scenarios

Authors: Antonino Pagano, Domenico Garlisi, Fabrizio Giuliano, Tiziana Cattai, Francesca Cuomo

Abstract: This paper presents a comprehensive framework designed to facilitate the widespread deployment of the Internet of Things (IoT) for enhanced monitoring and optimization of Water Distribution Systems (WDSs). The framework aims to investigate the utilization of massive IoT in monitoring and optimizing WDSs, with a particular focus on leakage detection, energy consumption and wireless network performa… ▽ More This paper presents a comprehensive framework designed to facilitate the widespread deployment of the Internet of Things (IoT) for enhanced monitoring and optimization of Water Distribution Systems (WDSs). The framework aims to investigate the utilization of massive IoT in monitoring and optimizing WDSs, with a particular focus on leakage detection, energy consumption and wireless network performance assessment in real-world water networks. The framework integrates simulation environments at both the application level (using EPANET) and the radio level (using NS-3) within the LoRaWAN network. The paper culminates with a practical use case, alongside evaluation results concerning power consumption in a large-scale LoRaWAN network and strategies for optimal gateway positioning. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: This is a pre-print paper submitted to IFIP/IEEE Networking 2024 - The International Federation for Information Processing (IFIP) Networking 2024 Conference (NETWORKING 2024) for consideration

arXiv:2404.07006 [pdf]

doi 10.2426/aibstudi-13301

Linked open data per la valorizzazione di collezioni culturali: il dataset mythLOD

Authors: Valentina Pasqual, Francesca Tomasi

Abstract: The formal representation of cultural metadata has always been a challenge, considering both the heterogeneity of cultural objects and the need to document the interpretive act exercised by experts. This article provides an overview of the revalorization of the digital collection Mythologiae in Linked Open Data format. The research aims to explore the data of a collection of artworks (Mythologiae)… ▽ More The formal representation of cultural metadata has always been a challenge, considering both the heterogeneity of cultural objects and the need to document the interpretive act exercised by experts. This article provides an overview of the revalorization of the digital collection Mythologiae in Linked Open Data format. The research aims to explore the data of a collection of artworks (Mythologiae) by promoting the potential of the Semantic Web, focusing particularly on the formal representation of the association of cultural objects with literary sources, as realized by experts, also documenting their interpretations. The workflow consisted of defining the data model, cleaning and disambiguating the data, converting it (from tabular structure to graph), and conducting testing activities (particularly expert domain review of the dataset through competency questions and data visualizations). The result is the mythLOD platform, which presents the dataset and detailed research documentation. Additionally, the platform hosts two data visualization spaces (the online catalogue and a data storytelling experiment on the case study of the Aeneid) that enrich the project documentation as user-friendly test units for the dataset and constitute an additional project documentation tool and exploration of the collection. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: in Italian language

Journal ref: AIB Studi 62-1 (2022) 149-168

arXiv:2404.05133 [pdf, other]

EcoVerse: An Annotated Twitter Dataset for Eco-Relevance Classification, Environmental Impact Analysis, and Stance Detection

Authors: Francesca Grasso, Stefano Locci, Giovanni Siragusa, Luigi Di Caro

Abstract: Anthropogenic ecological crisis constitutes a significant challenge that all within the academy must urgently face, including the Natural Language Processing (NLP) community. While recent years have seen increasing work revolving around climate-centric discourse, crucial environmental and ecological topics outside of climate change remain largely unaddressed, despite their prominent importance. Ma… ▽ More Anthropogenic ecological crisis constitutes a significant challenge that all within the academy must urgently face, including the Natural Language Processing (NLP) community. While recent years have seen increasing work revolving around climate-centric discourse, crucial environmental and ecological topics outside of climate change remain largely unaddressed, despite their prominent importance. Mainstream NLP tasks, such as sentiment analysis, dominate the scene, but there remains an untouched space in the literature involving the analysis of environmental impacts of certain events and practices. To address this gap, this paper presents EcoVerse, an annotated English Twitter dataset of 3,023 tweets spanning a wide spectrum of environmental topics. We propose a three-level annotation scheme designed for Eco-Relevance Classification, Stance Detection, and introducing an original approach for Environmental Impact Analysis. We detail the data collection, filtering, and labeling process that led to the creation of the dataset. Remarkable Inter-Annotator Agreement indicates that the annotation scheme produces consistent annotations of high quality. Subsequent classification experiments using BERT-based models, including ClimateBERT, are presented. These yield encouraging results, while also indicating room for a model specifically tailored for environmental texts. The dataset is made freely available to stimulate further research. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.04638 [pdf, other]

Designing for Complementarity: A Conceptual Framework to Go Beyond the Current Paradigm of Using XAI in Healthcare

Authors: Elisa Rubegni, Omran Ayoub, Stefania Maria Rita Rizzo, Marco Barbero, Guenda Bernegger, Francesca Faraci, Francesca Mangili, Emiliano Soldini, Pierpaolo Trimboli, Alessandro Facchini

Abstract: The widespread use of Artificial Intelligence-based tools in the healthcare sector raises many ethical and legal problems, one of the main reasons being their black-box nature and therefore the seemingly opacity and inscrutability of their characteristics and decision-making process. Literature extensively discusses how this can lead to phenomena of over-reliance and under-reliance, ultimately lim… ▽ More The widespread use of Artificial Intelligence-based tools in the healthcare sector raises many ethical and legal problems, one of the main reasons being their black-box nature and therefore the seemingly opacity and inscrutability of their characteristics and decision-making process. Literature extensively discusses how this can lead to phenomena of over-reliance and under-reliance, ultimately limiting the adoption of AI. We addressed these issues by building a theoretical framework based on three concepts: Feature Importance, Counterexample Explanations, and Similar-Case Explanations. Grounded in the literature, the model was deployed within a case study in which, using a participatory design approach, we designed and developed a high-fidelity prototype. Through the co-design and development of the prototype and the underlying model, we advanced the knowledge on how to design AI-based systems for enabling complementarity in the decision-making process in the healthcare domain. Our work aims at contributing to the current discourse on designing AI systems to support clinicians' decision-making processes. △ Less

Submitted 6 April, 2024; originally announced April 2024.

arXiv:2404.01903 [pdf, other]

Activation Steering for Robust Type Prediction in CodeLLMs

Authors: Francesca Lucchetti, Arjun Guha

Abstract: Contemporary LLMs pretrained on code are capable of succeeding at a wide variety of programming tasks. However, their performance is very sensitive to syntactic features, such as the names of variables and types, the structure of code, and presence of type hints. We contribute an inference-time technique to make CodeLLMs more robust to syntactic distractors that are semantically irrelevant. Our me… ▽ More Contemporary LLMs pretrained on code are capable of succeeding at a wide variety of programming tasks. However, their performance is very sensitive to syntactic features, such as the names of variables and types, the structure of code, and presence of type hints. We contribute an inference-time technique to make CodeLLMs more robust to syntactic distractors that are semantically irrelevant. Our methodology relies on activation steering, which involves editing internal model activations to steer the model towards the correct prediction. We contribute a novel way to construct steering vectors by taking inspiration from mutation testing, which constructs minimal semantics-breaking code edits. In contrast, we construct steering vectors from semantics-preserving code edits. We apply our approach to the task of type prediction for the gradually typed languages Python and TypeScript. This approach corrects up to 90% of type mispredictions. Finally, we show that steering vectors calculated from Python activations reliably correct type mispredictions in TypeScript, and vice versa. This result suggests that LLMs may be learning to transfer knowledge of types across programming languages. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 16 pages, 7 figures

arXiv:2404.00184 [pdf, other]

Word Ladders: A Mobile Application for Semantic Data Collection

Authors: Marianna Marcella Bolognesi, Claudia Collacciani, Andrea Ferrari, Francesca Genovese, Tommaso Lamarra, Adele Loia, Giulia Rambelli, Andrea Amelio Ravelli, Caterina Villani

Abstract: Word Ladders is a free mobile application for Android and iOS, developed for collecting linguistic data, specifically lists of words related to each other through semantic relations of categorical inclusion, within the Abstraction project (ERC-2021-STG-101039777). We hereby provide an overview of Word Ladders, explaining its game logic, motivation and expected results and applications to nlp tasks… ▽ More Word Ladders is a free mobile application for Android and iOS, developed for collecting linguistic data, specifically lists of words related to each other through semantic relations of categorical inclusion, within the Abstraction project (ERC-2021-STG-101039777). We hereby provide an overview of Word Ladders, explaining its game logic, motivation and expected results and applications to nlp tasks as well as to the investigation of cognitive scientific open questions △ Less

Submitted 29 March, 2024; originally announced April 2024.

arXiv:2403.20322 [pdf, other]

Towards a Framework for Evaluating Explanations in Automated Fact Verification

Authors: Neema Kotonya, Francesca Toni

Abstract: As deep neural models in NLP become more complex, and as a consequence opaque, the necessity to interpret them becomes greater. A burgeoning interest has emerged in rationalizing explanations to provide short and coherent justifications for predictions. In this position paper, we advocate for a formal framework for key concepts and properties about rationalizing explanations to support their evalu… ▽ More As deep neural models in NLP become more complex, and as a consequence opaque, the necessity to interpret them becomes greater. A burgeoning interest has emerged in rationalizing explanations to provide short and coherent justifications for predictions. In this position paper, we advocate for a formal framework for key concepts and properties about rationalizing explanations to support their evaluation systematically. We also outline one such formal framework, tailored to rationalizing explanations of increasingly complex structures, from free-form explanations to deductive explanations, to argumentative explanations (with the richest structure). Focusing on the automated fact verification task, we provide illustrations of the use and usefulness of our formalization for evaluating explanations, tailored to their varying structures. △ Less

Submitted 19 May, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

Comments: Accepted at LREC-COLING 2024; Updated Author Affiliation

arXiv:2403.19475 [pdf, other]

A theoretical framework for the design and analysis of computational thinking problems in education

Authors: Giorgia Adorni, Alberto Piatti, Engin Bumbacher, Lucio Negrini, Francesco Mondada, Dorit Assaf, Francesca Mangili, Luca Gambardella

Abstract: The field of computational thinking education has grown in recent years as researchers and educators have sought to develop and assess students' computational thinking abilities. While much of the research in this area has focused on defining computational thinking, the competencies it involves and how to assess them in teaching and learning contexts, this work takes a different approach. We provi… ▽ More The field of computational thinking education has grown in recent years as researchers and educators have sought to develop and assess students' computational thinking abilities. While much of the research in this area has focused on defining computational thinking, the competencies it involves and how to assess them in teaching and learning contexts, this work takes a different approach. We provide a more situated perspective on computational thinking, focusing on the types of problems that require computational thinking skills to be solved and the features that support these processes. We develop a framework for analysing existing computational thinking problems in an educational context. We conduct a comprehensive literature review to identify prototypical activities from areas where computational thinking is typically pursued in education. We identify the main components and characteristics of these activities, along with their influence on activating computational thinking competencies. The framework provides a catalogue of computational thinking skills that can be used to understand the relationship between problem features and competencies activated. This study contributes to the field of computational thinking education by offering a tool for evaluating and revising existing problems to activate specific skills and for assisting in designing new problems that target the development of particular competencies. The results of this study may be of interest to researchers and educators working in computational thinking education. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.19037 [pdf, other]

Women are less comfortable expressing opinions online than men and report heightened fears for safety: Surveying gender differences in experiences of online harms

Authors: Francesca Stevens, Florence E. Enock, Tvesha Sippy, Jonathan Bright, Miranda Cross, Pica Johansson, Judy Wajcman, Helen Z. Margetts

Abstract: Online harms, such as hate speech, trolling and self-harm promotion, continue to be widespread. While some work suggests women are disproportionately affected, other studies find mixed evidence for gender differences in experiences with content of this kind. Using a nationally representative survey of UK adults (N=1992), we examine exposure to a variety of harms, fears surrounding being targeted,… ▽ More Online harms, such as hate speech, trolling and self-harm promotion, continue to be widespread. While some work suggests women are disproportionately affected, other studies find mixed evidence for gender differences in experiences with content of this kind. Using a nationally representative survey of UK adults (N=1992), we examine exposure to a variety of harms, fears surrounding being targeted, the psychological impact of online experiences, the use of safety tools to protect against harm, and comfort with various forms of online participation across men and women. We find that while men and women see harmful content online to a roughly similar extent, women are more at risk than men of being targeted by harms including online misogyny, cyberstalking and cyberflashing. Women are significantly more fearful of being targeted by harms overall, and report greater negative psychological impact as a result of particular experiences. Perhaps in an attempt to mitigate risk, women report higher use of a range of safety tools and less comfort with several forms of online participation, with just 23% of women comfortable expressing political views online compared to 40% of men. We also find direct associations between fears surrounding harms and comfort with online behaviours. For example, fear of being trolled significantly decreases comfort expressing opinions, and fear of being targeted by misogyny significantly decreases comfort sharing photos. Our results are important because with much public discourse happening online, we must ensure all members of society feel safe and able to participate in online spaces. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.17864 [pdf, other]

Synthetic training set generation using text-to-audio models for environmental sound classification

Authors: Francesca Ronchini, Luca Comanducci, Fabio Antonacci

Abstract: In recent years, text-to-audio models have revolutionized the field of automatic audio generation. This paper investigates their application in generating synthetic datasets for training data-driven models. Specifically, this study analyzes the performance of two environmental sound classification systems trained with data generated from text-to-audio models. We considered three scenarios: a) augm… ▽ More In recent years, text-to-audio models have revolutionized the field of automatic audio generation. This paper investigates their application in generating synthetic datasets for training data-driven models. Specifically, this study analyzes the performance of two environmental sound classification systems trained with data generated from text-to-audio models. We considered three scenarios: a) augmenting the training dataset with data generated by text-to-audio models; b) using a mixed training dataset combining real and synthetic text-driven generated data; and c) using a training dataset composed entirely of synthetic audio. In all cases, the performance of the classification models was tested on real data. Results indicate that text-to-audio models are effective for dataset augmentation, with consistent performance when replacing a subset of the recorded dataset. However, the performance of the audio recognition models drops when relying entirely on generated audio. △ Less

Submitted 6 July, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.15871 [pdf, ps, other]

On the complexity and approximability of Bounded access Lempel Ziv coding

Authors: Ferdinando Cicalese, Francesca Ugazio

Abstract: We study the complexity of constructing an optimal parsing $\varphi$ of a string ${\bf s} = s_1 \dots s_n$ under the constraint that given a position $p$ in the original text, and the LZ76-like (Lempel Ziv 76) encoding of $T$ based on $\varphi$, it is possible to identify/decompress the character $s_p$ by performing at most $c$ accesses to the LZ encoding, for a given integer $c.$ We refer to such… ▽ More We study the complexity of constructing an optimal parsing $\varphi$ of a string ${\bf s} = s_1 \dots s_n$ under the constraint that given a position $p$ in the original text, and the LZ76-like (Lempel Ziv 76) encoding of $T$ based on $\varphi$, it is possible to identify/decompress the character $s_p$ by performing at most $c$ accesses to the LZ encoding, for a given integer $c.$ We refer to such a parsing $\varphi$ as a $c$-bounded access LZ parsing or $c$-BLZ parsing of ${\bf s}.$ We show that for any constant $c$ the problem of computing the optimal $c$-BLZ parsing of a string, i.e., the one with the minimum number of phrases, is NP-hard and also APX hard, i.e., no PTAS can exist under the standard complexity assumption $P \neq NP.$ We also study the ratio between the sizes of an optimal $c$-BLZ parsing of a string ${\bf s}$ and an optimal LZ76 parsing of ${\bf s}$ (which can be greedily computed in polynomial time). △ Less

Submitted 23 March, 2024; originally announced March 2024.

arXiv:2403.14023 [pdf]

A system capable of verifiably and privately screening global DNA synthesis

Authors: Carsten Baum, Jens Berlips, Walther Chen, Hongrui Cui, Ivan Damgard, Jiangbin Dong, Kevin M. Esvelt, Mingyu Gao, Dana Gretton, Leonard Foner, Martin Kysel, Kaiyi Zhang, Juanru Li, Xiang Li, Omer Paneth, Ronald L. Rivest, Francesca Sage-Ling, Adi Shamir, Yue Shen, Meicen Sun, Vinod Vaikuntanathan, Lynn Van Hauwe, Theia Vogel, Benjamin Weinstein-Raun, Yun Wang , et al. (5 additional authors not shown)

Abstract: Printing custom DNA sequences is essential to scientific and biomedical research, but the technology can be used to manufacture plagues as well as cures. Just as ink printers recognize and reject attempts to counterfeit money, DNA synthesizers and assemblers should deny unauthorized requests to make viral DNA that could be used to ignite a pandemic. There are three complications. First, we don't n… ▽ More Printing custom DNA sequences is essential to scientific and biomedical research, but the technology can be used to manufacture plagues as well as cures. Just as ink printers recognize and reject attempts to counterfeit money, DNA synthesizers and assemblers should deny unauthorized requests to make viral DNA that could be used to ignite a pandemic. There are three complications. First, we don't need to quickly update printers to deal with newly discovered currencies, whereas we regularly learn of new viruses and other biological threats. Second, anti-counterfeiting specifications on a local printer can't be extracted and misused by malicious actors, unlike information on biological threats. Finally, any screening must keep the inspected DNA sequences private, as they may constitute valuable trade secrets. Here we describe SecureDNA, a free, privacy-preserving, and fully automated system capable of verifiably screening all DNA synthesis orders of 30+ base pairs against an up-to-date database of hazards, and its operational performance and specificity when applied to 67 million base pairs of DNA synthesized by providers in the United States, Europe, and China. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: Main text 10 pages, 4 figures. 5 supplementary figures. Total 21 pages. Direct correspondence to: Ivan B. Damgard ([email protected]), Andrew C. Yao ([email protected]), Kevin M. Esvelt ([email protected])

arXiv:2403.12059 [pdf, other]

A Thorough Analysis of Radio Resource Assignment for UAV-Enhanced Vehicular Sidelink Communications

Authors: Francesca Conserva, Francesco Linsalata, Marouan Mizmizi, Maurizio Magarini, Umberto Spagnolini, Roberto Verdone, Chiara Buratti

Abstract: The rapid expansion of connected and autonomous vehicles (CAVs) and the shift towards millimiter-wave (mmWave) frequencies offer unprecedented opportunities to enhance road safety and traffic efficiency. Sidelink communication, enabling direct Vehicle-to-Vehicle (V2V) communications, play a pivotal role in this transformation. As communication technologies transit to higher frequencies, the associ… ▽ More The rapid expansion of connected and autonomous vehicles (CAVs) and the shift towards millimiter-wave (mmWave) frequencies offer unprecedented opportunities to enhance road safety and traffic efficiency. Sidelink communication, enabling direct Vehicle-to-Vehicle (V2V) communications, play a pivotal role in this transformation. As communication technologies transit to higher frequencies, the associated increase in bandwidth comes at the cost of a severe path and penetration loss. In response to these challenges, we investigate a network configuration that deploys beamforming-capable Unmanned Aerial Vehicles (UAVs) as relay nodes. In this work, we present a comprehensive analytical framework with a groundbreaking performance metric, i.e. average access probability, that quantifies user satisfaction, considering factors across different protocol stack layers. Additionally, we introduce two Radio Resources Assignment (RRA) methods tailored for UAVs. These methods consider parameters such as resource availability, vehicle distribution, and latency requirements. Through our analytical approach, we optimize the average access probability by controlling UAV altitude based on traffic density. Our numerical findings validate the proposed model and strategy, which ensures that Quality of Service (QoS) standards are met in the domain of Vehicle-to-Anything (V2X) sidelink communications. △ Less

Submitted 19 January, 2024; originally announced March 2024.

arXiv:2403.10158 [pdf, other]

Functional Graph Convolutional Networks: A unified multi-task and multi-modal learning framework to facilitate health and social-care insights

Authors: Tobia Boschi, Francesca Bonin, Rodrigo Ordonez-Hurtado, Cécile Rousseau, Alessandra Pascale, John Dinsmore

Abstract: This paper introduces a novel Functional Graph Convolutional Network (funGCN) framework that combines Functional Data Analysis and Graph Convolutional Networks to address the complexities of multi-task and multi-modal learning in digital health and longitudinal studies. With the growing importance of health solutions to improve health care and social support, ensure healthy lives, and promote well… ▽ More This paper introduces a novel Functional Graph Convolutional Network (funGCN) framework that combines Functional Data Analysis and Graph Convolutional Networks to address the complexities of multi-task and multi-modal learning in digital health and longitudinal studies. With the growing importance of health solutions to improve health care and social support, ensure healthy lives, and promote well-being at all ages, funGCN offers a unified approach to handle multivariate longitudinal data for multiple entities and ensures interpretability even with small sample sizes. Key innovations include task-specific embedding components that manage different data types, the ability to perform classification, regression, and forecasting, and the creation of a knowledge graph for insightful data interpretation. The efficacy of funGCN is validated through simulation experiments and a real-data application. △ Less

Submitted 27 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.08750 [pdf, ps, other]

Neural reproducing kernel Banach spaces and representer theorems for deep networks

Authors: Francesca Bartolucci, Ernesto De Vito, Lorenzo Rosasco, Stefano Vigogna

Abstract: Studying the function spaces defined by neural networks helps to understand the corresponding learning models and their inductive bias. While in some limits neural networks correspond to function spaces that are reproducing kernel Hilbert spaces, these regimes do not capture the properties of the networks used in practice. In contrast, in this paper we show that deep neural networks define suitabl… ▽ More Studying the function spaces defined by neural networks helps to understand the corresponding learning models and their inductive bias. While in some limits neural networks correspond to function spaces that are reproducing kernel Hilbert spaces, these regimes do not capture the properties of the networks used in practice. In contrast, in this paper we show that deep neural networks define suitable reproducing kernel Banach spaces. These spaces are equipped with norms that enforce a form of sparsity, enabling them to adapt to potential latent structures within the input data and their representations. In particular, leveraging the theory of reproducing kernel Banach spaces, combined with variational results, we derive representer theorems that justify the finite architectures commonly employed in applications. Our study extends analogous results for shallow networks and can be seen as a step towards considering more practically plausible neural architectures. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.03037 [pdf, other]

A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives

Authors: Simone Alberto Peirone, Francesca Pistilli, Antonio Alliegro, Giuseppe Averta

Abstract: Human comprehension of a video stream is naturally broad: in a few instants, we are able to understand what is happening, the relevance and relationship of objects, and forecast what will follow in the near future, everything all at once. We believe that - to effectively transfer such an holistic perception to intelligent machines - an important role is played by learning to correlate concepts and… ▽ More Human comprehension of a video stream is naturally broad: in a few instants, we are able to understand what is happening, the relevance and relationship of objects, and forecast what will follow in the near future, everything all at once. We believe that - to effectively transfer such an holistic perception to intelligent machines - an important role is played by learning to correlate concepts and to abstract knowledge coming from different tasks, to synergistically exploit them when learning novel skills. To accomplish this, we seek for a unified approach to video understanding which combines shared temporal modelling of human actions with minimal overhead, to support multiple downstream tasks and enable cooperation when learning novel skills. We then propose EgoPack, a solution that creates a collection of task perspectives that can be carried across downstream tasks and used as a potential source of additional insights, as a backpack of skills that a robot can carry around and use when needed. We demonstrate the effectiveness and efficiency of our approach on four Ego4D benchmarks, outperforming current state-of-the-art methods. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: Accepted at IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024. Project webpage at https://sapeirone.github.io/EgoPack

arXiv:2403.01537 [pdf, other]

Mixed Strategy Nash Equilibrium for Crowd Navigation

Authors: Muchen Sun, Francesca Baldini, Katie Hughes, Peter Trautman, Todd Murphey

Abstract: Robots navigating in crowded areas should negotiate free space with humans rather than fully controlling collision avoidance, as this can lead to freezing behavior. Game theory provides a framework for the robot to reason about potential cooperation from humans for collision avoidance during path planning. In particular, the mixed strategy Nash equilibrium captures the negotiation behavior under u… ▽ More Robots navigating in crowded areas should negotiate free space with humans rather than fully controlling collision avoidance, as this can lead to freezing behavior. Game theory provides a framework for the robot to reason about potential cooperation from humans for collision avoidance during path planning. In particular, the mixed strategy Nash equilibrium captures the negotiation behavior under uncertainty, making it well suited for crowd navigation. However, computing the mixed strategy Nash equilibrium is often prohibitively expensive for real-time decision-making. In this paper, we propose an iterative Bayesian update scheme over probability distributions of trajectories. The algorithm simultaneously generates a stochastic plan for the robot and probabilistic predictions of other pedestrians' paths. We prove that the proposed algorithm is equivalent to solving a mixed strategy game for crowd navigation, and the algorithm guarantees the recovery of the global Nash equilibrium of the game. We name our algorithm Bayes' Rule Nash Equilibrium (BRNE) and develop a real-time model prediction crowd navigation framework. Since BRNE is not solving a general-purpose mixed strategy Nash equilibrium but a tailored formula specifically for crowd navigation, it can compute the solution in real-time on a low-power embedded computer. We evaluate BRNE in both simulated environments and real-world pedestrian datasets. BRNE consistently outperforms non-learning and learning-based methods regarding safety and navigation efficiency. It also reaches human-level crowd navigation performance in the pedestrian dataset benchmark. Lastly, we demonstrate the practicality of our algorithm with real humans on an untethered quadruped robot with fully onboard perception and computation. △ Less

Submitted 17 June, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

arXiv:2403.01470 [pdf, other]

Is in-domain data beneficial in transfer learning for landmarks detection in x-ray images?

Authors: Roberto Di Via, Matteo Santacesaria, Francesca Odone, Vito Paolo Pastore

Abstract: In recent years, deep learning has emerged as a promising technique for medical image analysis. However, this application domain is likely to suffer from a limited availability of large public datasets and annotations. A common solution to these challenges in deep learning is the usage of a transfer learning framework, typically with a fine-tuning protocol, where a large-scale source dataset is us… ▽ More In recent years, deep learning has emerged as a promising technique for medical image analysis. However, this application domain is likely to suffer from a limited availability of large public datasets and annotations. A common solution to these challenges in deep learning is the usage of a transfer learning framework, typically with a fine-tuning protocol, where a large-scale source dataset is used to pre-train a model, further fine-tuned on the target dataset. In this paper, we present a systematic study analyzing whether the usage of small-scale in-domain x-ray image datasets may provide any improvement for landmark detection over models pre-trained on large natural image datasets only. We focus on the multi-landmark localization task for three datasets, including chest, head, and hand x-ray images. Our results show that using in-domain source datasets brings marginal or no benefit with respect to an ImageNet out-of-domain pre-training. Our findings can provide an indication for the development of robust landmark detection systems in medical images when no large annotated dataset is available. △ Less

Submitted 3 March, 2024; originally announced March 2024.

Comments: Accepted to ISBI 2024

arXiv:2402.19422 [pdf, other]

PEM: Prototype-based Efficient MaskFormer for Image Segmentation

Authors: Niccolò Cavagnero, Gabriele Rosi, Claudia Cuttano, Francesca Pistilli, Marco Ciccone, Giuseppe Averta, Fabio Cermelli

Abstract: Recent transformer-based architectures have shown impressive results in the field of image segmentation. Thanks to their flexibility, they obtain outstanding performance in multiple segmentation tasks, such as semantic and panoptic, under a single unified framework. To achieve such impressive performance, these architectures employ intensive operations and require substantial computational resourc… ▽ More Recent transformer-based architectures have shown impressive results in the field of image segmentation. Thanks to their flexibility, they obtain outstanding performance in multiple segmentation tasks, such as semantic and panoptic, under a single unified framework. To achieve such impressive performance, these architectures employ intensive operations and require substantial computational resources, which are often not available, especially on edge devices. To fill this gap, we propose Prototype-based Efficient MaskFormer (PEM), an efficient transformer-based architecture that can operate in multiple segmentation tasks. PEM proposes a novel prototype-based cross-attention which leverages the redundancy of visual features to restrict the computation and improve the efficiency without harming the performance. In addition, PEM introduces an efficient multi-scale feature pyramid network, capable of extracting features that have high semantic content in an efficient way, thanks to the combination of deformable convolutions and context-based self-modulation. We benchmark the proposed PEM architecture on two tasks, semantic and panoptic segmentation, evaluated on two different datasets, Cityscapes and ADE20K. PEM demonstrates outstanding performance on every task and dataset, outperforming task-specific architectures while being comparable and even better than computationally-expensive baselines. △ Less

Submitted 6 May, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: CVPR 2024. Project page: https://niccolocavagnero.github.io/PEM

Showing 1–50 of 612 results for author: Francesca