Search | arXiv e-print repository

Left-Linear Rewriting in Adhesive Categories

Authors: Paolo Baldan, Davide Castelnovo, Andrea Corradini, Fabio Gadducci

Abstract: When can two sequential steps performed by a computing device be considered (causally) independent? This is a relevant question for concurrent and distributed systems, since independence means that they could be executed in any order, and potentially in parallel. Equivalences identifying rewriting sequences which differ only for independent steps are at the core of the theory of concurrency of man… ▽ More When can two sequential steps performed by a computing device be considered (causally) independent? This is a relevant question for concurrent and distributed systems, since independence means that they could be executed in any order, and potentially in parallel. Equivalences identifying rewriting sequences which differ only for independent steps are at the core of the theory of concurrency of many formalisms. We investigate the issue in the context of the double pushout approach to rewriting in the general setting of adhesive categories. While a consolidated theory exists for linear rules,which can consume, preserve and generate entities, this paper focuses on left-linear rules which may also "merge" parts of the state. This is an apparently minimal, yet technically hard enhancement,since a standard characterisation of independence that - in the linear case - allows one to derive a number of properties, essential in the development of a theory of concurrency, no longer holds. The paper performs an in-depth study of the notion of independence for left-linear rules: it introduces a novel characterisation of independence, identifies well-behaved classes of left-linear rewriting systems,and provides some fundamental results including a Church-Rosser property and the existence of canonical equivalence proofs for concurrent computations. These results properly extends the class of formalisms that can be modelled in the adhesive framework △ Less

Submitted 8 July, 2024; originally announced July 2024.

ACM Class: F.4.1

arXiv:2407.06093 [pdf, other]

Artificial Intuition: Efficient Classification of Scientific Abstracts

Authors: Harsh Sakhrani, Naseela Pervez, Anirudh Ravi Kumar, Fred Morstatter, Alexandra Graddy Reed, Andrea Belz

Abstract: It is desirable to coarsely classify short scientific texts, such as grant or publication abstracts, for strategic insight or research portfolio management. These texts efficiently transmit dense information to experts possessing a rich body of knowledge to aid interpretation. Yet this task is remarkably difficult to automate because of brevity and the absence of context. To address this gap, we h… ▽ More It is desirable to coarsely classify short scientific texts, such as grant or publication abstracts, for strategic insight or research portfolio management. These texts efficiently transmit dense information to experts possessing a rich body of knowledge to aid interpretation. Yet this task is remarkably difficult to automate because of brevity and the absence of context. To address this gap, we have developed a novel approach to generate and appropriately assign coarse domain-specific labels. We show that a Large Language Model (LLM) can provide metadata essential to the task, in a process akin to the augmentation of supplemental knowledge representing human intuition, and propose a workflow. As a pilot study, we use a corpus of award abstracts from the National Aeronautics and Space Administration (NASA). We develop new assessment tools in concert with established performance metrics. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05684 [pdf, other]

Multi-Fidelity Bayesian Neural Network for Uncertainty Quantification in Transonic Aerodynamic Loads

Authors: Andrea Vaiuso, Gabriele Immordino, Marcello Righi, Andrea Da Ronch

Abstract: Multi-fidelity models are becoming more prevalent in engineering, particularly in aerospace, as they combine both the computational efficiency of low-fidelity models with the high accuracy of higher-fidelity simulations. Various state-of-the-art techniques exist for fusing data from different fidelity sources, including Co-Kriging and transfer learning in neural networks. This paper aims to implem… ▽ More Multi-fidelity models are becoming more prevalent in engineering, particularly in aerospace, as they combine both the computational efficiency of low-fidelity models with the high accuracy of higher-fidelity simulations. Various state-of-the-art techniques exist for fusing data from different fidelity sources, including Co-Kriging and transfer learning in neural networks. This paper aims to implement a multi-fidelity Bayesian neural network model that applies transfer learning to fuse data generated by models at different fidelities. Bayesian neural networks use probability distributions over network weights, enabling them to provide predictions along with estimates of their confidence. This approach harnesses the predictive and data fusion capabilities of neural networks while also quantifying uncertainty. The results demonstrate that the multi-fidelity Bayesian model outperforms the state-of-the-art Co-Kriging in terms of overall accuracy and robustness on unseen data. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05460 [pdf, other]

Basins of Attraction in Two-Player Random Ordinal Potential Games

Authors: Andrea Collevecchio, Hlafo Alfie Mimun, Matteo Quattropani, Marco Scarsini

Abstract: We consider the class of two-person ordinal potential games where each player has the same number of actions $K$. Each game in this class admits at least one pure Nash equilibrium and the best-response dynamics converges to one of these pure Nash equilibria; which one depends on the starting point. So, each pure Nash equilibrium has a basin of attraction. We pick uniformly at random one game fro… ▽ More We consider the class of two-person ordinal potential games where each player has the same number of actions $K$. Each game in this class admits at least one pure Nash equilibrium and the best-response dynamics converges to one of these pure Nash equilibria; which one depends on the starting point. So, each pure Nash equilibrium has a basin of attraction. We pick uniformly at random one game from this class and we study the joint distribution of the sizes of the basins of attraction. We provide an asymptotic exact value for the expected basin of attraction of each pure Nash equilibrium, when the number of actions $K$ goes to infinity. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: 24 pages, 2 figures

MSC Class: Primary 91A14; 91A05. Secondary: 91A26; 60C05

arXiv:2407.05335 [pdf, other]

Understanding and Addressing Gender Bias in Expert Finding Task

Authors: Maddalena Amendola, Carlos Castillo, Andrea Passarella, Raffaele Perego

Abstract: The Expert Finding (EF) task is critical in community Question&Answer (CQ&A) platforms, significantly enhancing user engagement by improving answer quality and reducing response times. However, biases, especially gender biases, have been identified in these platforms. This study investigates gender bias in state-of-the-art EF models and explores methods to mitigate it. Utilizing a comprehensive da… ▽ More The Expert Finding (EF) task is critical in community Question&Answer (CQ&A) platforms, significantly enhancing user engagement by improving answer quality and reducing response times. However, biases, especially gender biases, have been identified in these platforms. This study investigates gender bias in state-of-the-art EF models and explores methods to mitigate it. Utilizing a comprehensive dataset from StackOverflow, the largest community in the StackExchange network, we conduct extensive experiments to analyze how EF models' candidate identification processes influence gender representation. Our findings reveal that models relying on reputation metrics and activity levels disproportionately favor male users, who are more active on the platform. This bias results in the underrepresentation of female experts in the ranking process. We propose adjustments to EF models that incorporate a more balanced preprocessing strategy and leverage content-based and social network-based information, with the aim to provide a fairer representation of genders among identified experts. Our analysis shows that integrating these methods can significantly enhance gender balance without compromising model accuracy. To the best of our knowledge, this study is the first to focus on detecting and mitigating gender bias in EF methods. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.05120 [pdf, other]

doi 10.1109/OCEANSLimerick52467.2023.10244729

Wireless teleoperation of HSURF artificial fish in complex paths

Authors: Saverio Iacoponi, Nikita Mankovskii, Mohammed El Hanbaly, Andrea Infanti, Shamma Alhajeri, Federico Renda, Cesare Stefanini, Giulia De Masi

Abstract: In this paper we show the application of the new robotic multi-platform system HSURF to a specific use case of teleoperation, aimed at monitoring and inspection. The HSURF system, consists of 3 different kinds of platforms: floater, sinker and robotic fishes. The collaborative control of the 3 platforms allows a remotely based operator to control the fish in order to visit and inspect several targ… ▽ More In this paper we show the application of the new robotic multi-platform system HSURF to a specific use case of teleoperation, aimed at monitoring and inspection. The HSURF system, consists of 3 different kinds of platforms: floater, sinker and robotic fishes. The collaborative control of the 3 platforms allows a remotely based operator to control the fish in order to visit and inspect several targets underwater following a complex trajectory. A shared autonomy solution shows to be the most suitable, in order to minimize the effect of limited bandwidth and relevant delay intrinsic to acoustic communications. The control architecture is described and preliminary results of the acoustically teleoperated visits of multiple targets in a testing pool are provided. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Journal ref: OCEANS 2023 - Limerick, Ireland, 2023, pp. 1-5

arXiv:2407.05035 [pdf, other]

Equitable Congestion Pricing under the Markovian Traffic Model: An Application to Bogota

Authors: Alfredo Torrico, Natthawut Boonsiriphatthanajaroen, Nikhil Garg, Andrea Lodi, Hugo Mainguy

Abstract: Congestion pricing is used to raise revenues and reduce traffic and pollution. However, people have heterogeneous spatial demand patterns and willingness (or ability) to pay tolls, and so pricing may have substantial equity implications. We develop a data-driven approach to design congestion pricing given policymakers' equity and efficiency objectives. First, algorithmically, we extend the Markovi… ▽ More Congestion pricing is used to raise revenues and reduce traffic and pollution. However, people have heterogeneous spatial demand patterns and willingness (or ability) to pay tolls, and so pricing may have substantial equity implications. We develop a data-driven approach to design congestion pricing given policymakers' equity and efficiency objectives. First, algorithmically, we extend the Markovian traffic equilibrium setting introduced by Baillon & Cominetti (2008) to model heterogeneous populations and incorporate prices and outside options such as public transit. Second, we empirically evaluate various pricing schemes using data collected by an industry partner in the city of Bogota, one of the most congested cities in the world. We find that pricing personalized to each economic stratum can be substantially more efficient and equitable than uniform pricing; however, non-personalized but area-based pricing can recover much of the gap. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2407.05027 [pdf, other]

Listen-While-Talking: Toward dApp-based Real-Time Spectrum Sharing in O-RAN

Authors: Rajeev Gangula, Andrea Lacava, Michele Polese, Salvatore D'Oro, Leonardo Bonati, Florian Kaltenberger, Pedram Johari, Tommaso Melodia

Abstract: This demo paper presents a dApp-based real-time spectrum sharing scenario where a 5th generation (5G) base station implementing the NR stack adapts its transmission and reception strategies based on the incumbent priority users in the Citizen Broadband Radio Service (CBRS) band. The dApp is responsible for obtaining relevant measurements from the Next Generation Node Base (gNB), running the spectr… ▽ More This demo paper presents a dApp-based real-time spectrum sharing scenario where a 5th generation (5G) base station implementing the NR stack adapts its transmission and reception strategies based on the incumbent priority users in the Citizen Broadband Radio Service (CBRS) band. The dApp is responsible for obtaining relevant measurements from the Next Generation Node Base (gNB), running the spectrum sensing inference, and configuring the gNB with a control action upon detecting the primary incumbent user transmissions. This approach is built on dApps, which extend the O-RAN framework to the real-time and user plane domains. Thus, it avoids the need of dedicated Spectrum Access Systems (SASs) in the CBRS band. The demonstration setup is based on the open-source 5G OpenAirInterface (OAI) framework, where we have implemented a dApp interfaced with a gNB and communicating with a Commercial Off-the-Shelf (COTS) User Equipment (UE) in an over-the-air wireless environment. When an incumbent user has active transmission, the dApp will detect and inform the primary user presence to the gNB. The dApps will also enforce a control policy that adapts the scheduling and transmission policy of the Radio Access Network (RAN). This demo provides valuable insights into the potential of using dApp-based spectrum sensing with O-RAN architecture in next generation cellular networks. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2407.04400 [pdf, other]

Hard-Attention Gates with Gradient Routing for Endoscopic Image Computing

Authors: Giorgio Roffo, Carlo Biffi, Pietro Salvagnini, Andrea Cherubini

Abstract: To address overfitting and enhance model generalization in gastroenterological polyp size assessment, our study introduces Feature-Selection Gates (FSG) or Hard-Attention Gates (HAG) alongside Gradient Routing (GR) for dynamic feature selection. This technique aims to boost Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) by promoting sparse connectivity, thereby reducing overfi… ▽ More To address overfitting and enhance model generalization in gastroenterological polyp size assessment, our study introduces Feature-Selection Gates (FSG) or Hard-Attention Gates (HAG) alongside Gradient Routing (GR) for dynamic feature selection. This technique aims to boost Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) by promoting sparse connectivity, thereby reducing overfitting and enhancing generalization. HAG achieves this through sparsification with learnable weights, serving as a regularization strategy. GR further refines this process by optimizing HAG parameters via dual forward passes, independently from the main model, to improve feature re-weighting. Our evaluation spanned multiple datasets, including CIFAR-100 for a broad impact assessment and specialized endoscopic datasets (REAL-Colon, Misawa, and SUN) focusing on polyp size estimation, covering over 200 polyps in more than 370,000 frames. The findings indicate that our HAG-enhanced networks substantially enhance performance in both binary and triclass classification tasks related to polyp sizing. Specifically, CNNs experienced an F1 Score improvement to 87.8% in binary classification, while in triclass classification, the ViT-T model reached an F1 Score of 76.5%, outperforming traditional CNNs and ViT-T models. To facilitate further research, we are releasing our codebase, which includes implementations for CNNs, multistream CNNs, ViT, and HAG-augmented variants. This resource aims to standardize the use of endoscopic datasets, providing public training-validation-testing splits for reliable and comparable research in gastroenterological polyp size estimation. The codebase is available at github.com/cosmoimd/feature-selection-gates. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: Attention Gates, Hard-Attention Gates, Gradient Routing, Feature Selection Gates, Endoscopy, Medical Image Processing, Computer Vision

Journal ref: In Proceedings of the 27th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2024), 2024

arXiv:2407.04287 [pdf, other]

MARS: Paying more attention to visual attributes for text-based person search

Authors: Alex Ergasti, Tomaso Fontanini, Claudio Ferrari, Massimo Bertozzi, Andrea Prati

Abstract: Text-based person search (TBPS) is a problem that gained significant interest within the research community. The task is that of retrieving one or more images of a specific individual based on a textual description. The multi-modal nature of the task requires learning representations that bridge text and image data within a shared latent space. Existing TBPS systems face two major challenges. One… ▽ More Text-based person search (TBPS) is a problem that gained significant interest within the research community. The task is that of retrieving one or more images of a specific individual based on a textual description. The multi-modal nature of the task requires learning representations that bridge text and image data within a shared latent space. Existing TBPS systems face two major challenges. One is defined as inter-identity noise that is due to the inherent vagueness and imprecision of text descriptions and it indicates how descriptions of visual attributes can be generally associated to different people; the other is the intra-identity variations, which are all those nuisances e.g. pose, illumination, that can alter the visual appearance of the same textual attributes for a given subject. To address these issues, this paper presents a novel TBPS architecture named MARS (Mae-Attribute-Relation-Sensitive), which enhances current state-of-the-art models by introducing two key components: a Visual Reconstruction Loss and an Attribute Loss. The former employs a Masked AutoEncoder trained to reconstruct randomly masked image patches with the aid of the textual description. In doing so the model is encouraged to learn more expressive representations and textual-visual relations in the latent space. The Attribute Loss, instead, balances the contribution of different types of attributes, defined as adjective-noun chunks of text. This loss ensures that every attribute is taken into consideration in the person retrieval process. Extensive experiments on three commonly used datasets, namely CUHK-PEDES, ICFG-PEDES, and RSTPReid, report performance improvements, with significant gains in the mean Average Precision (mAP) metric w.r.t. the current state of the art. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.04076 [pdf, other]

Natively neuromorphic LMU architecture for encoding-free SNN-based HAR on commercial edge devices

Authors: Vittorio Fra, Benedetto Leto, Andrea Pignata, Enrico Macii, Gianvito Urgese

Abstract: Neuromorphic models take inspiration from the human brain by adopting bio-plausible neuron models to build alternatives to traditional Machine Learning (ML) and Deep Learning (DL) solutions. The scarce availability of dedicated hardware able to actualize the emulation of brain-inspired computation, which is otherwise only simulated, yet still hinders the wide adoption of neuromorphic computing for… ▽ More Neuromorphic models take inspiration from the human brain by adopting bio-plausible neuron models to build alternatives to traditional Machine Learning (ML) and Deep Learning (DL) solutions. The scarce availability of dedicated hardware able to actualize the emulation of brain-inspired computation, which is otherwise only simulated, yet still hinders the wide adoption of neuromorphic computing for edge devices and embedded systems. With this premise, we adopt the perspective of neuromorphic computing for conventional hardware and we present the L2MU, a natively neuromorphic Legendre Memory Unit (LMU) which entirely relies on Leaky Integrate-and-Fire (LIF) neurons. Specifically, the original recurrent architecture of LMU has been redesigned by modelling every constituent element with neural populations made of LIF or Current-Based (CuBa) LIF neurons. To couple neuromorphic computing and off-the-shelf edge devices, we equipped the L2MU with an input module for the conversion of real values into spikes, which makes it an encoding-free implementation of a Recurrent Spiking Neural Network (RSNN) able to directly work with raw sensor signals on non-dedicated hardware. As a use case to validate our network, we selected the task of Human Activity Recognition (HAR). We benchmarked our L2MU on smartwatch signals from hand-oriented activities, deploying it on three different commercial edge devices in compressed versions too. The reported results remark the possibility of considering neuromorphic models not only in an exclusive relationship with dedicated hardware but also as a suitable choice to work with common sensors and devices. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: Paper accepted for the 33rd International Conference on Artificial Neural Networks (ICANN 2024)

arXiv:2407.04018 [pdf, other]

Leveraging Topic Specificity and Social Relationships for Expert Finding in Community Question Answering Platforms

Authors: Maddalena Amendola, Andrea Passarella, Raffaele Perego

Abstract: Online Community Question Answering (CQA) platforms have become indispensable tools for users seeking expert solutions to their technical queries. The effectiveness of these platforms relies on their ability to identify and direct questions to the most knowledgeable users within the community, a process known as Expert Finding (EF). EF accuracy is crucial for increasing user engagement and the rel… ▽ More Online Community Question Answering (CQA) platforms have become indispensable tools for users seeking expert solutions to their technical queries. The effectiveness of these platforms relies on their ability to identify and direct questions to the most knowledgeable users within the community, a process known as Expert Finding (EF). EF accuracy is crucial for increasing user engagement and the reliability of provided answers. Despite recent advancements in EF methodologies, blending the diverse information sources available on CQA platforms for effective expert identification remains challenging. In this paper, we present TUEF, a Topic-oriented User-Interaction model for Expert Finding, which aims to fully and transparently leverage the heterogeneous information available within online question-answering communities. TUEF integrates content and social data by constructing a multi-layer graph that maps out user relationships based on their answering patterns on specific topics. By combining these sources of information, TUEF identifies the most relevant and knowledgeable users for any given question and ranks them using learning-to-rank techniques. Our findings indicate that TUEF's topic-oriented model significantly enhances performance, particularly in large communities discussing well-defined topics. Additionally, we show that the interpretable learning-to-rank algorithm integrated into TUEF offers transparency and explainability with minimal performance trade-offs. The exhaustive experiments conducted on six different CQA communities of Stack Exchange show that TUEF outperforms all competitors with a minimum performance boost of 42.42% in P@1, 32.73% in NDCG@3, 21.76% in R@5, and 29.81% in MRR, excelling in both the evaluation approaches present in the previous literature. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.03521 [pdf, other]

Algorithmic Collusion And The Minimum Price Markov Game

Authors: Igor Sadoune, Marcelin Joanis, Andrea Lodi

Abstract: This paper introduces the Minimum Price Markov Game (MPMG), a dynamic variant of the Prisoner's Dilemma. The MPMG serves as a theoretical model and reasonable approximation of real-world first-price sealed-bid public auctions that follow the minimum price rule. The goal is to provide researchers and practitioners with a framework to study market fairness and regulation in both digitized and non-di… ▽ More This paper introduces the Minimum Price Markov Game (MPMG), a dynamic variant of the Prisoner's Dilemma. The MPMG serves as a theoretical model and reasonable approximation of real-world first-price sealed-bid public auctions that follow the minimum price rule. The goal is to provide researchers and practitioners with a framework to study market fairness and regulation in both digitized and non-digitized public procurement processes, amidst growing concerns about algorithmic collusion in online markets. We demonstrate, using multi-agent reinforcement learning-driven artificial agents, that algorithmic tacit coordination is difficult to achieve in the MPMG when cooperation is not explicitly engineered. Paradoxically, our results highlight the robustness of the minimum price rule in an auction environment, but also show that it is not impervious to full-scale algorithmic collusion. These findings contribute to the ongoing debates about algorithmic pricing and its implications. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03132 [pdf, other]

Speaker- and Text-Independent Estimation of Articulatory Movements and Phoneme Alignments from Speech

Authors: Tobias Weise, Philipp Klumpp, Kubilay Can Demir, Paula Andrea Pérez-Toro, Maria Schuster, Elmar Noeth, Bjoern Heismann, Andreas Maier, Seung Hee Yang

Abstract: This paper introduces a novel combination of two tasks, previously treated separately: acoustic-to-articulatory speech inversion (AAI) and phoneme-to-articulatory (PTA) motion estimation. We refer to this joint task as acoustic phoneme-to-articulatory speech inversion (APTAI) and explore two different approaches, both working speaker- and text-independently during inference. We use a multi-task le… ▽ More This paper introduces a novel combination of two tasks, previously treated separately: acoustic-to-articulatory speech inversion (AAI) and phoneme-to-articulatory (PTA) motion estimation. We refer to this joint task as acoustic phoneme-to-articulatory speech inversion (APTAI) and explore two different approaches, both working speaker- and text-independently during inference. We use a multi-task learning setup, with the end-to-end goal of taking raw speech as input and estimating the corresponding articulatory movements, phoneme sequence, and phoneme alignment. While both proposed approaches share these same requirements, they differ in their way of achieving phoneme-related predictions: one is based on frame classification, the other on a two-staged training procedure and forced alignment. We reach competitive performance of 0.73 mean correlation for the AAI task and achieve up to approximately 87% frame overlap compared to a state-of-the-art text-dependent phoneme force aligner. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: to be published in Interspeech 2024 proceedings

arXiv:2407.02599 [pdf, other]

Meta 3D Gen

Authors: Raphael Bensadoun, Tom Monnier, Yanir Kleiman, Filippos Kokkinos, Yawar Siddiqui, Mahendra Kariya, Omri Harosh, Roman Shapovalov, Benjamin Graham, Emilien Garreau, Animesh Karnewar, Ang Cao, Idan Azuri, Iurii Makarov, Eric-Tuan Le, Antoine Toisoul, David Novotny, Oran Gafni, Natalia Neverova, Andrea Vedaldi

Abstract: We introduce Meta 3D Gen (3DGen), a new state-of-the-art, fast pipeline for text-to-3D asset generation. 3DGen offers 3D asset creation with high prompt fidelity and high-quality 3D shapes and textures in under a minute. It supports physically-based rendering (PBR), necessary for 3D asset relighting in real-world applications. Additionally, 3DGen supports generative retexturing of previously gener… ▽ More We introduce Meta 3D Gen (3DGen), a new state-of-the-art, fast pipeline for text-to-3D asset generation. 3DGen offers 3D asset creation with high prompt fidelity and high-quality 3D shapes and textures in under a minute. It supports physically-based rendering (PBR), necessary for 3D asset relighting in real-world applications. Additionally, 3DGen supports generative retexturing of previously generated (or artist-created) 3D shapes using additional textual inputs provided by the user. 3DGen integrates key technical components, Meta 3D AssetGen and Meta 3D TextureGen, that we developed for text-to-3D and text-to-texture generation, respectively. By combining their strengths, 3DGen represents 3D objects simultaneously in three ways: in view space, in volumetric space, and in UV (or texture) space. The integration of these two techniques achieves a win rate of 68% with respect to the single-stage model. We compare 3DGen to numerous industry baselines, and show that it outperforms them in terms of prompt fidelity and visual quality for complex textual prompts, while being significantly faster. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.02445 [pdf, other]

Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials

Authors: Yawar Siddiqui, Tom Monnier, Filippos Kokkinos, Mahendra Kariya, Yanir Kleiman, Emilien Garreau, Oran Gafni, Natalia Neverova, Andrea Vedaldi, Roman Shapovalov, David Novotny

Abstract: We present Meta 3D AssetGen (AssetGen), a significant advancement in text-to-3D generation which produces faithful, high-quality meshes with texture and material control. Compared to works that bake shading in the 3D object's appearance, AssetGen outputs physically-based rendering (PBR) materials, supporting realistic relighting. AssetGen generates first several views of the object with factored s… ▽ More We present Meta 3D AssetGen (AssetGen), a significant advancement in text-to-3D generation which produces faithful, high-quality meshes with texture and material control. Compared to works that bake shading in the 3D object's appearance, AssetGen outputs physically-based rendering (PBR) materials, supporting realistic relighting. AssetGen generates first several views of the object with factored shaded and albedo appearance channels, and then reconstructs colours, metalness and roughness in 3D, using a deferred shading loss for efficient supervision. It also uses a sign-distance function to represent 3D shape more reliably and introduces a corresponding loss for direct shape supervision. This is implemented using fused kernels for high memory efficiency. After mesh extraction, a texture refinement transformer operating in UV space significantly improves sharpness and details. AssetGen achieves 17% improvement in Chamfer Distance and 40% in LPIPS over the best concurrent work for few-view reconstruction, and a human preference of 72% over the best industry competitors of comparable speed, including those that support PBR. Project page with generated assets: https://assetgen.github.io △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Project Page: https://assetgen.github.io

arXiv:2407.02430 [pdf, other]

Meta 3D TextureGen: Fast and Consistent Texture Generation for 3D Objects

Authors: Raphael Bensadoun, Yanir Kleiman, Idan Azuri, Omri Harosh, Andrea Vedaldi, Natalia Neverova, Oran Gafni

Abstract: The recent availability and adaptability of text-to-image models has sparked a new era in many related domains that benefit from the learned text priors as well as high-quality and fast generation capabilities, one of which is texture generation for 3D objects. Although recent texture generation methods achieve impressive results by using text-to-image networks, the combination of global consisten… ▽ More The recent availability and adaptability of text-to-image models has sparked a new era in many related domains that benefit from the learned text priors as well as high-quality and fast generation capabilities, one of which is texture generation for 3D objects. Although recent texture generation methods achieve impressive results by using text-to-image networks, the combination of global consistency, quality, and speed, which is crucial for advancing texture generation to real-world applications, remains elusive. To that end, we introduce Meta 3D TextureGen: a new feedforward method comprised of two sequential networks aimed at generating high-quality and globally consistent textures for arbitrary geometries of any complexity degree in less than 20 seconds. Our method achieves state-of-the-art results in quality and speed by conditioning a text-to-image model on 3D semantics in 2D space and fusing them into a complete and high-resolution UV texture map, as demonstrated by extensive qualitative and quantitative evaluations. In addition, we introduce a texture enhancement network that is capable of up-scaling any texture by an arbitrary ratio, producing 4k pixel resolution textures. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.01409 [pdf, other]

Dynamic Few-Shot Learning for Knowledge Graph Question Answering

Authors: Jacopo D'Abramo, Andrea Zugarini, Paolo Torroni

Abstract: Large language models present opportunities for innovative Question Answering over Knowledge Graphs (KGQA). However, they are not inherently designed for query generation. To bridge this gap, solutions have been proposed that rely on fine-tuning or ad-hoc architectures, achieving good results but limited out-of-domain distribution generalization. In this study, we introduce a novel approach called… ▽ More Large language models present opportunities for innovative Question Answering over Knowledge Graphs (KGQA). However, they are not inherently designed for query generation. To bridge this gap, solutions have been proposed that rely on fine-tuning or ad-hoc architectures, achieving good results but limited out-of-domain distribution generalization. In this study, we introduce a novel approach called Dynamic Few-Shot Learning (DFSL). DFSL integrates the efficiency of in-context learning and semantic similarity and provides a generally applicable solution for KGQA with state-of-the-art performance. We run an extensive evaluation across multiple benchmark datasets and architecture configurations. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.01405 [pdf, other]

Social Isolation, Digital Connection: COVID-19's Impact on Twitter Ego Networks

Authors: Kamer Cekini, Elisabetta Biondi, Chiara Boldrini, Andrea Passarella, Marco Conti

Abstract: One of the most impactful measures to fight the COVID-19 pandemic in its early first years was the lockdown, implemented by governments to reduce physical contact among people and minimize opportunities for the virus to spread. As people were compelled to limit their physical interactions and stay at home, they turned to online social platforms to alleviate feelings of loneliness. Ego networks rep… ▽ More One of the most impactful measures to fight the COVID-19 pandemic in its early first years was the lockdown, implemented by governments to reduce physical contact among people and minimize opportunities for the virus to spread. As people were compelled to limit their physical interactions and stay at home, they turned to online social platforms to alleviate feelings of loneliness. Ego networks represent how people organize their relationships due to human cognitive constraints that impose limits on meaningful interactions among people. Physical contacts were disrupted during the lockdown, causing socialization to shift entirely online, leading to a shift in socialization into online platforms. Our research aimed to investigate the impact of lockdown measures on online ego network structures potentially caused by the increase of cognitive expenses in online social networks. In particular, we examined a large Twitter dataset of users, covering 7 years of their activities. We found that during the lockdown, there was an increase in network sizes and a richer structure in social circles, with relationships becoming more intimate. Moreover, we observe that, after the lockdown measures were relaxed, these features returned to their pre-lockdown values. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Work supported by SoBigData.it (N. IR0000013), ICSC (N. CN00000013), FAIR (N. PE00000013)

arXiv:2407.01293 [pdf, other]

Applying the Ego Network Model to Cross-Target Stance Detection

Authors: Jack Tacchi, Parisa Jamadi Khiabani, Arkaitz Zubiaga, Chiara Boldrini, Andrea Passarella

Abstract: Understanding human interactions and social structures is an incredibly important task, especially in such an interconnected world. One task that facilitates this is Stance Detection, which predicts the opinion or attitude of a text towards a target entity. Traditionally, this has often been done mainly via the use of text-based approaches, however, recent work has produced a model (CT-TN) that le… ▽ More Understanding human interactions and social structures is an incredibly important task, especially in such an interconnected world. One task that facilitates this is Stance Detection, which predicts the opinion or attitude of a text towards a target entity. Traditionally, this has often been done mainly via the use of text-based approaches, however, recent work has produced a model (CT-TN) that leverages information about a user's social network to help predict their stance, outperforming certain cross-target text-based approaches. Unfortunately, the data required for such graph-based approaches is not always available. This paper proposes two novel tools for Stance Detection: the Ego Network Model (ENM) and the Signed Ego Network Model (SENM). These models are founded in anthropological and psychological studies and have been used within the context of social network analysis and related tasks (e.g., link prediction). Stance Detection predictions obtained using these features achieve a level of accuracy similar to the graph-based features used by CT-TN while requiring less and more easily obtainable data. In addition to this, the performances of the inner and outer circles of the ENM, representing stronger and weaker social ties, respectively are compared. Surprisingly, the outer circles, which contain more numerous but less intimate connections, are more useful for predicting stance. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Accepted at ASONAM 2024

arXiv:2407.01272 [pdf, other]

Show Less, Instruct More: Enriching Prompts with Definitions and Guidelines for Zero-Shot NER

Authors: Andrew Zamai, Andrea Zugarini, Leonardo Rigutini, Marco Ernandes, Marco Maggini

Abstract: Recently, several specialized instruction-tuned Large Language Models (LLMs) for Named Entity Recognition (NER) have emerged. Compared to traditional NER approaches, these models have strong generalization capabilities. Existing LLMs mainly focus on zero-shot NER in out-of-domain distributions, being fine-tuned on an extensive number of entity classes that often highly or completely overlap with t… ▽ More Recently, several specialized instruction-tuned Large Language Models (LLMs) for Named Entity Recognition (NER) have emerged. Compared to traditional NER approaches, these models have strong generalization capabilities. Existing LLMs mainly focus on zero-shot NER in out-of-domain distributions, being fine-tuned on an extensive number of entity classes that often highly or completely overlap with test sets. In this work instead, we propose SLIMER, an approach designed to tackle never-seen-before named entity tags by instructing the model on fewer examples, and by leveraging a prompt enriched with definition and guidelines. Experiments demonstrate that definition and guidelines yield better performance, faster and more robust learning, particularly when labelling unseen Named Entities. Furthermore, SLIMER performs comparably to state-of-the-art approaches in out-of-domain zero-shot NER, while being trained on a reduced tag set. △ Less

Submitted 2 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.00890 [pdf, other]

Macroeconomic Forecasting with Large Language Models

Authors: Andrea Carriero, Davide Pettenuzzo, Shubhranshu Shekhar

Abstract: This paper presents a comparative analysis evaluating the accuracy of Large Language Models (LLMs) against traditional macro time series forecasting approaches. In recent times, LLMs have surged in popularity for forecasting due to their ability to capture intricate patterns in data and quickly adapt across very different domains. However, their effectiveness in forecasting macroeconomic time seri… ▽ More This paper presents a comparative analysis evaluating the accuracy of Large Language Models (LLMs) against traditional macro time series forecasting approaches. In recent times, LLMs have surged in popularity for forecasting due to their ability to capture intricate patterns in data and quickly adapt across very different domains. However, their effectiveness in forecasting macroeconomic time series data compared to conventional methods remains an area of interest. To address this, we conduct a rigorous evaluation of LLMs against traditional macro forecasting methods, using as common ground the FRED-MD database. Our findings provide valuable insights into the strengths and limitations of LLMs in forecasting macroeconomic time series, shedding light on their applicability in real-world scenarios △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2407.00585 [pdf, other]

Your Car Tells Me Where You Drove: A Novel Path Inference Attack via CAN Bus and OBD-II Data

Authors: Tommaso Bianchi, Alessandro Brighente, Mauro Conti, Andrea Valori

Abstract: Despite its well-known security issues, the Controller Area Network (CAN) is still the main technology for in-vehicle communications. Attackers posing as diagnostic services or accessing the CAN bus can threaten the drivers' location privacy to know the exact location at a certain point in time or to infer the visited areas. This represents a serious threat to users' privacy, but also an advantage… ▽ More Despite its well-known security issues, the Controller Area Network (CAN) is still the main technology for in-vehicle communications. Attackers posing as diagnostic services or accessing the CAN bus can threaten the drivers' location privacy to know the exact location at a certain point in time or to infer the visited areas. This represents a serious threat to users' privacy, but also an advantage for police investigations to gather location-based evidence. In this paper, we present On Path Diagnostic - Intrusion \& Inference (OPD-II), a novel path inference attack leveraging a physical car model and a map matching algorithm to infer the path driven by a car based on CAN bus data. Differently from available attacks, our approach only requires the attacker to know the initial location and heading of the victim's car and is not limited by the availability of training data, road configurations, or the need to access other victim's devices (e.g., smartphones). We implement our attack on a set of four different cars and a total number of 41 tracks in different road and traffic scenarios. We achieve an average of 95% accuracy on reconstructing the coordinates of the recorded path by leveraging a dynamic map-matching algorithm that outperforms the 75% and 89% accuracy values of other proposals while removing their set of assumptions. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2406.20080 [pdf, other]

AI for Extreme Event Modeling and Understanding: Methodologies and Challenges

Authors: Gustau Camps-Valls, Miguel-Ángel Fernández-Torres, Kai-Hendrik Cohrs, Adrian Höhl, Andrea Castelletti, Aytac Pacal, Claire Robin, Francesco Martinuzzi, Ioannis Papoutsis, Ioannis Prapas, Jorge Pérez-Aracil, Katja Weigel, Maria Gonzalez-Calabuig, Markus Reichstein, Martin Rabel, Matteo Giuliani, Miguel Mahecha, Oana-Iuliana Popescu, Oscar J. Pellicer-Valero, Said Ouala, Sancho Salcedo-Sanz, Sebastian Sippel, Spyros Kondylatos, Tamara Happé, Tristan Williams

Abstract: In recent years, artificial intelligence (AI) has deeply impacted various fields, including Earth system sciences. Here, AI improved weather forecasting, model emulation, parameter estimation, and the prediction of extreme events. However, the latter comes with specific challenges, such as develo** accurate predictors from noisy, heterogeneous and limited annotated data. This paper reviews how A… ▽ More In recent years, artificial intelligence (AI) has deeply impacted various fields, including Earth system sciences. Here, AI improved weather forecasting, model emulation, parameter estimation, and the prediction of extreme events. However, the latter comes with specific challenges, such as develo** accurate predictors from noisy, heterogeneous and limited annotated data. This paper reviews how AI is being used to analyze extreme events (like floods, droughts, wildfires and heatwaves), highlighting the importance of creating accurate, transparent, and reliable AI models. We discuss the hurdles of dealing with limited data, integrating information in real-time, deploying models, and making them understandable, all crucial for gaining the trust of stakeholders and meeting regulatory needs. We provide an overview of how AI can help identify and explain extreme events more effectively, improving disaster response and communication. We emphasize the need for collaboration across different fields to create AI solutions that are practical, understandable, and trustworthy for analyzing and predicting extreme events. Such collaborative efforts aim to enhance disaster readiness and disaster risk reduction. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.20055 [pdf, other]

SpotlessSplats: Ignoring Distractors in 3D Gaussian Splatting

Authors: Sara Sabour, Lily Goli, George Kopanas, Mark Matthews, Dmitry Lagun, Leonidas Guibas, Alec Jacobson, David J. Fleet, Andrea Tagliasacchi

Abstract: 3D Gaussian Splatting (3DGS) is a promising technique for 3D reconstruction, offering efficient training and rendering speeds, making it suitable for real-time applications.However, current methods require highly controlled environments (no moving people or wind-blown elements, and consistent lighting) to meet the inter-view consistency assumption of 3DGS. This makes reconstruction of real-world c… ▽ More 3D Gaussian Splatting (3DGS) is a promising technique for 3D reconstruction, offering efficient training and rendering speeds, making it suitable for real-time applications.However, current methods require highly controlled environments (no moving people or wind-blown elements, and consistent lighting) to meet the inter-view consistency assumption of 3DGS. This makes reconstruction of real-world captures problematic. We present SpotlessSplats, an approach that leverages pre-trained and general-purpose features coupled with robust optimization to effectively ignore transient distractors. Our method achieves state-of-the-art reconstruction quality both visually and quantitatively, on casual captures. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.19867 [pdf, other]

Sampled Datasets Risk Substantial Bias in the Identification of Political Polarization on Social Media

Authors: Gabriele Di Bona, Emma Fraxanet, Björn Komander, Andrea Lo Sasso, Virginia Morini, Antoine Vendeville, Max Falkenberg, Alessandro Galeazzi

Abstract: Following recent policy changes by X (Twitter) and other social media platforms, user interaction data has become increasingly difficult to access. These restrictions are impeding robust research pertaining to social and political phenomena online, which is critical due to the profound impact social media platforms may have on our societies. Here, we investigate the reliability of polarization mea… ▽ More Following recent policy changes by X (Twitter) and other social media platforms, user interaction data has become increasingly difficult to access. These restrictions are impeding robust research pertaining to social and political phenomena online, which is critical due to the profound impact social media platforms may have on our societies. Here, we investigate the reliability of polarization measures obtained from different samples of social media data by studying the structural polarization of the Polish political debate on Twitter over a 24-hour period. First, we show that the political discussion on Twitter is only a small subset of the wider Twitter discussion. Second, we find that large samples can be representative of the whole political discussion on a platform, but small samples consistently fail to accurately reflect the true structure of polarization online. Finally, we demonstrate that keyword-based samples can be representative if keywords are selected with great care, but that poorly selected keywords can result in substantial political bias in the sampled data. Our findings demonstrate that it is not possible to measure polarization in a reliable way with small, sampled datasets, highlighting why the current lack of research data is so problematic, and providing insight into the practical implementation of the European Union's Digital Service Act which aims to improve researchers' access to social media data. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.19537 [pdf, other]

Handling Ontology Gaps in Semantic Parsing

Authors: Andrea Bacciu, Marco Damonte, Marco Basaldella, Emilio Monti

Abstract: The majority of Neural Semantic Parsing (NSP) models are developed with the assumption that there are no concepts outside the ones such models can represent with their target symbols (closed-world assumption). This assumption leads to generate hallucinated outputs rather than admitting their lack of knowledge. Hallucinations can lead to wrong or potentially offensive responses to users. Hence, a m… ▽ More The majority of Neural Semantic Parsing (NSP) models are developed with the assumption that there are no concepts outside the ones such models can represent with their target symbols (closed-world assumption). This assumption leads to generate hallucinated outputs rather than admitting their lack of knowledge. Hallucinations can lead to wrong or potentially offensive responses to users. Hence, a mechanism to prevent this behavior is crucial to build trusted NSP-based Question Answering agents. To that end, we propose the Hallucination Simulation Framework (HSF), a general setting for stimulating and analyzing NSP model hallucinations. The framework can be applied to any NSP task with a closed-ontology. Using the proposed framework and KQA Pro as the benchmark dataset, we assess state-of-the-art techniques for hallucination detection. We then present a novel hallucination detection strategy that exploits the computational graph of the NSP model to detect the NSP hallucinations in the presence of ontology gaps, out-of-domain utterances, and to recognize NSP errors, improving the F1-Score respectively by ~21, ~24% and ~1%. This is the first work in closed-ontology NSP that addresses the problem of recognizing ontology gaps. We release our code and checkpoints at https://github.com/amazon-science/handling-ontology-gaps-in-semantic-parsing. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.19189 [pdf, other]

BISeizuRe: BERT-Inspired Seizure Data Representation to Improve Epilepsy Monitoring

Authors: Luca Benfenati, Thorir Mar Ingolfsson, Andrea Cossettini, Daniele Jahier Pagliari, Alessio Burrello, Luca Benini

Abstract: This study presents a novel approach for EEG-based seizure detection leveraging a BERT-based model. The model, BENDR, undergoes a two-phase training process. Initially, it is pre-trained on the extensive Temple University Hospital EEG Corpus (TUEG), a 1.5 TB dataset comprising over 10,000 subjects, to extract common EEG data patterns. Subsequently, the model is fine-tuned on the CHB-MIT Scalp EEG… ▽ More This study presents a novel approach for EEG-based seizure detection leveraging a BERT-based model. The model, BENDR, undergoes a two-phase training process. Initially, it is pre-trained on the extensive Temple University Hospital EEG Corpus (TUEG), a 1.5 TB dataset comprising over 10,000 subjects, to extract common EEG data patterns. Subsequently, the model is fine-tuned on the CHB-MIT Scalp EEG Database, consisting of 664 EEG recordings from 24 pediatric patients, of which 198 contain seizure events. Key contributions include optimizing fine-tuning on the CHB-MIT dataset, where the impact of model architecture, pre-processing, and post-processing techniques are thoroughly examined to enhance sensitivity and reduce false positives per hour (FP/h). We also explored custom training strategies to ascertain the most effective setup. The model undergoes a novel second pre-training phase before subject-specific fine-tuning, enhancing its generalization capabilities. The optimized model demonstrates substantial performance enhancements, achieving as low as 0.23 FP/h, 2.5$\times$ lower than the baseline model, with a lower but still acceptable sensitivity rate, showcasing the effectiveness of applying a BERT-based approach on EEG-based seizure detection. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 4 pages, 2 tables, 2 figures

arXiv:2406.19035 [pdf, other]

SD-BLS: Privacy Preserving Selective Disclosure of Verifiable Credentials with Unlinkable Threshold Revocation

Authors: Denis Roio, Rebecca Selvaggini, Andrea D'Intino

Abstract: It is of critical importance to design digital identity systems that ensure the privacy of citizens as well as protecting them from issuer corruption. We aim to solve this issue and propose a method for selective disclosure and privacy preserving revocation of digital credentials, using the unique homomorphic characteristics of second order Elliptic Curves and Boneh-Lynn-Shacham (BLS) signatures.… ▽ More It is of critical importance to design digital identity systems that ensure the privacy of citizens as well as protecting them from issuer corruption. We aim to solve this issue and propose a method for selective disclosure and privacy preserving revocation of digital credentials, using the unique homomorphic characteristics of second order Elliptic Curves and Boneh-Lynn-Shacham (BLS) signatures. Our approach ensures that users can selectively reveal credentials signed by a certain issuer, which can be interactively revoked by a quorum of other agreeing issuers without revealing the identity of users. Our goal is to protect users from issuer corruption by requiring collective agreement among multiple revocation issuers. △ Less

Submitted 3 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

Comments: 7 pages, 3 figures, accepted for publication by the IEEE International Conference on Blockchain (2024) to be held in Copenhagen

arXiv:2406.18371 [pdf, other]

Building multiscale models with PhysiBoSS, an agent-based modeling tool

Authors: Marco Ruscone, Andrea Checcoli, Randy Heiland, Emmanuel Barillot, Paul Macklin, Laurence Calzone, Vincent Noël

Abstract: Multiscale models provide a unique tool for studying complex processes that study events occurring at different scales across space and time. In the context of biological systems, such models can simulate mechanisms happening at the intracellular level such as signaling, and at the extracellular level where cells communicate and coordinate with other cells. They aim to understand the impact of gen… ▽ More Multiscale models provide a unique tool for studying complex processes that study events occurring at different scales across space and time. In the context of biological systems, such models can simulate mechanisms happening at the intracellular level such as signaling, and at the extracellular level where cells communicate and coordinate with other cells. They aim to understand the impact of genetic or environmental deregulation observed in complex diseases, describe the interplay between a pathological tissue and the immune system, and suggest strategies to revert the diseased phenotypes. The construction of these multiscale models remains a very complex task, including the choice of the components to consider, the level of details of the processes to simulate, or the fitting of the parameters to the data. One additional difficulty is the expert knowledge needed to program these models in languages such as C++ or Python, which may discourage the participation of non-experts. Simplifying this process through structured description formalisms -- coupled with a graphical interface -- is crucial in making modeling more accessible to the broader scientific community, as well as streamlining the process for advanced users. This article introduces three examples of multiscale models which rely on the framework PhysiBoSS, an add-on of PhysiCell that includes intracellular descriptions as continuous time Boolean models to the agent-based approach. The article demonstrates how to easily construct such models, relying on PhysiCell Studio, the PhysiCell Graphical User Interface. A step-by-step tutorial is provided as a Supplementary Material and all models are provided at: https://physiboss.github.io/tutorial/. △ Less