-
Left-Linear Rewriting in Adhesive Categories
Authors:
Paolo Baldan,
Davide Castelnovo,
Andrea Corradini,
Fabio Gadducci
Abstract:
When can two sequential steps performed by a computing device be considered (causally) independent? This is a relevant question for concurrent and distributed systems, since independence means that they could be executed in any order, and potentially in parallel. Equivalences identifying rewriting sequences which differ only for independent steps are at the core of the theory of concurrency of man…
▽ More
When can two sequential steps performed by a computing device be considered (causally) independent? This is a relevant question for concurrent and distributed systems, since independence means that they could be executed in any order, and potentially in parallel. Equivalences identifying rewriting sequences which differ only for independent steps are at the core of the theory of concurrency of many formalisms. We investigate the issue in the context of the double pushout approach to rewriting in the general setting of adhesive categories. While a consolidated theory exists for linear rules,which can consume, preserve and generate entities, this paper focuses on left-linear rules which may also "merge" parts of the state. This is an apparently minimal, yet technically hard enhancement,since a standard characterisation of independence that - in the linear case - allows one to derive a number of properties, essential in the development of a theory of concurrency, no longer holds. The paper performs an in-depth study of the notion of independence for left-linear rules: it introduces a novel characterisation of independence, identifies well-behaved classes of left-linear rewriting systems,and provides some fundamental results including a Church-Rosser property and the existence of canonical equivalence proofs for concurrent computations. These results properly extends the class of formalisms that can be modelled in the adhesive framework
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Artificial Intuition: Efficient Classification of Scientific Abstracts
Authors:
Harsh Sakhrani,
Naseela Pervez,
Anirudh Ravi Kumar,
Fred Morstatter,
Alexandra Graddy Reed,
Andrea Belz
Abstract:
It is desirable to coarsely classify short scientific texts, such as grant or publication abstracts, for strategic insight or research portfolio management. These texts efficiently transmit dense information to experts possessing a rich body of knowledge to aid interpretation. Yet this task is remarkably difficult to automate because of brevity and the absence of context. To address this gap, we h…
▽ More
It is desirable to coarsely classify short scientific texts, such as grant or publication abstracts, for strategic insight or research portfolio management. These texts efficiently transmit dense information to experts possessing a rich body of knowledge to aid interpretation. Yet this task is remarkably difficult to automate because of brevity and the absence of context. To address this gap, we have developed a novel approach to generate and appropriately assign coarse domain-specific labels. We show that a Large Language Model (LLM) can provide metadata essential to the task, in a process akin to the augmentation of supplemental knowledge representing human intuition, and propose a workflow. As a pilot study, we use a corpus of award abstracts from the National Aeronautics and Space Administration (NASA). We develop new assessment tools in concert with established performance metrics.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Multi-Fidelity Bayesian Neural Network for Uncertainty Quantification in Transonic Aerodynamic Loads
Authors:
Andrea Vaiuso,
Gabriele Immordino,
Marcello Righi,
Andrea Da Ronch
Abstract:
Multi-fidelity models are becoming more prevalent in engineering, particularly in aerospace, as they combine both the computational efficiency of low-fidelity models with the high accuracy of higher-fidelity simulations. Various state-of-the-art techniques exist for fusing data from different fidelity sources, including Co-Kriging and transfer learning in neural networks. This paper aims to implem…
▽ More
Multi-fidelity models are becoming more prevalent in engineering, particularly in aerospace, as they combine both the computational efficiency of low-fidelity models with the high accuracy of higher-fidelity simulations. Various state-of-the-art techniques exist for fusing data from different fidelity sources, including Co-Kriging and transfer learning in neural networks. This paper aims to implement a multi-fidelity Bayesian neural network model that applies transfer learning to fuse data generated by models at different fidelities. Bayesian neural networks use probability distributions over network weights, enabling them to provide predictions along with estimates of their confidence. This approach harnesses the predictive and data fusion capabilities of neural networks while also quantifying uncertainty. The results demonstrate that the multi-fidelity Bayesian model outperforms the state-of-the-art Co-Kriging in terms of overall accuracy and robustness on unseen data.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Basins of Attraction in Two-Player Random Ordinal Potential Games
Authors:
Andrea Collevecchio,
Hlafo Alfie Mimun,
Matteo Quattropani,
Marco Scarsini
Abstract:
We consider the class of two-person ordinal potential games where each player has the same number of actions $K$. Each game in this class admits at least one pure Nash equilibrium and the best-response dynamics converges to one of these pure Nash equilibria; which one depends on the starting point. So, each pure Nash equilibrium has a basin of attraction.
We pick uniformly at random one game fro…
▽ More
We consider the class of two-person ordinal potential games where each player has the same number of actions $K$. Each game in this class admits at least one pure Nash equilibrium and the best-response dynamics converges to one of these pure Nash equilibria; which one depends on the starting point. So, each pure Nash equilibrium has a basin of attraction.
We pick uniformly at random one game from this class and we study the joint distribution of the sizes of the basins of attraction. We provide an asymptotic exact value for the expected basin of attraction of each pure Nash equilibrium, when the number of actions $K$ goes to infinity.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Understanding and Addressing Gender Bias in Expert Finding Task
Authors:
Maddalena Amendola,
Carlos Castillo,
Andrea Passarella,
Raffaele Perego
Abstract:
The Expert Finding (EF) task is critical in community Question&Answer (CQ&A) platforms, significantly enhancing user engagement by improving answer quality and reducing response times. However, biases, especially gender biases, have been identified in these platforms. This study investigates gender bias in state-of-the-art EF models and explores methods to mitigate it. Utilizing a comprehensive da…
▽ More
The Expert Finding (EF) task is critical in community Question&Answer (CQ&A) platforms, significantly enhancing user engagement by improving answer quality and reducing response times. However, biases, especially gender biases, have been identified in these platforms. This study investigates gender bias in state-of-the-art EF models and explores methods to mitigate it. Utilizing a comprehensive dataset from StackOverflow, the largest community in the StackExchange network, we conduct extensive experiments to analyze how EF models' candidate identification processes influence gender representation. Our findings reveal that models relying on reputation metrics and activity levels disproportionately favor male users, who are more active on the platform. This bias results in the underrepresentation of female experts in the ranking process. We propose adjustments to EF models that incorporate a more balanced preprocessing strategy and leverage content-based and social network-based information, with the aim to provide a fairer representation of genders among identified experts. Our analysis shows that integrating these methods can significantly enhance gender balance without compromising model accuracy. To the best of our knowledge, this study is the first to focus on detecting and mitigating gender bias in EF methods.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Wireless teleoperation of HSURF artificial fish in complex paths
Authors:
Saverio Iacoponi,
Nikita Mankovskii,
Mohammed El Hanbaly,
Andrea Infanti,
Shamma Alhajeri,
Federico Renda,
Cesare Stefanini,
Giulia De Masi
Abstract:
In this paper we show the application of the new robotic multi-platform system HSURF to a specific use case of teleoperation, aimed at monitoring and inspection. The HSURF system, consists of 3 different kinds of platforms: floater, sinker and robotic fishes. The collaborative control of the 3 platforms allows a remotely based operator to control the fish in order to visit and inspect several targ…
▽ More
In this paper we show the application of the new robotic multi-platform system HSURF to a specific use case of teleoperation, aimed at monitoring and inspection. The HSURF system, consists of 3 different kinds of platforms: floater, sinker and robotic fishes. The collaborative control of the 3 platforms allows a remotely based operator to control the fish in order to visit and inspect several targets underwater following a complex trajectory. A shared autonomy solution shows to be the most suitable, in order to minimize the effect of limited bandwidth and relevant delay intrinsic to acoustic communications. The control architecture is described and preliminary results of the acoustically teleoperated visits of multiple targets in a testing pool are provided.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Equitable Congestion Pricing under the Markovian Traffic Model: An Application to Bogota
Authors:
Alfredo Torrico,
Natthawut Boonsiriphatthanajaroen,
Nikhil Garg,
Andrea Lodi,
Hugo Mainguy
Abstract:
Congestion pricing is used to raise revenues and reduce traffic and pollution. However, people have heterogeneous spatial demand patterns and willingness (or ability) to pay tolls, and so pricing may have substantial equity implications. We develop a data-driven approach to design congestion pricing given policymakers' equity and efficiency objectives. First, algorithmically, we extend the Markovi…
▽ More
Congestion pricing is used to raise revenues and reduce traffic and pollution. However, people have heterogeneous spatial demand patterns and willingness (or ability) to pay tolls, and so pricing may have substantial equity implications. We develop a data-driven approach to design congestion pricing given policymakers' equity and efficiency objectives. First, algorithmically, we extend the Markovian traffic equilibrium setting introduced by Baillon & Cominetti (2008) to model heterogeneous populations and incorporate prices and outside options such as public transit. Second, we empirically evaluate various pricing schemes using data collected by an industry partner in the city of Bogota, one of the most congested cities in the world. We find that pricing personalized to each economic stratum can be substantially more efficient and equitable than uniform pricing; however, non-personalized but area-based pricing can recover much of the gap.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Listen-While-Talking: Toward dApp-based Real-Time Spectrum Sharing in O-RAN
Authors:
Rajeev Gangula,
Andrea Lacava,
Michele Polese,
Salvatore D'Oro,
Leonardo Bonati,
Florian Kaltenberger,
Pedram Johari,
Tommaso Melodia
Abstract:
This demo paper presents a dApp-based real-time spectrum sharing scenario where a 5th generation (5G) base station implementing the NR stack adapts its transmission and reception strategies based on the incumbent priority users in the Citizen Broadband Radio Service (CBRS) band. The dApp is responsible for obtaining relevant measurements from the Next Generation Node Base (gNB), running the spectr…
▽ More
This demo paper presents a dApp-based real-time spectrum sharing scenario where a 5th generation (5G) base station implementing the NR stack adapts its transmission and reception strategies based on the incumbent priority users in the Citizen Broadband Radio Service (CBRS) band. The dApp is responsible for obtaining relevant measurements from the Next Generation Node Base (gNB), running the spectrum sensing inference, and configuring the gNB with a control action upon detecting the primary incumbent user transmissions. This approach is built on dApps, which extend the O-RAN framework to the real-time and user plane domains. Thus, it avoids the need of dedicated Spectrum Access Systems (SASs) in the CBRS band. The demonstration setup is based on the open-source 5G OpenAirInterface (OAI) framework, where we have implemented a dApp interfaced with a gNB and communicating with a Commercial Off-the-Shelf (COTS) User Equipment (UE) in an over-the-air wireless environment. When an incumbent user has active transmission, the dApp will detect and inform the primary user presence to the gNB. The dApps will also enforce a control policy that adapts the scheduling and transmission policy of the Radio Access Network (RAN). This demo provides valuable insights into the potential of using dApp-based spectrum sensing with O-RAN architecture in next generation cellular networks.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Hard-Attention Gates with Gradient Routing for Endoscopic Image Computing
Authors:
Giorgio Roffo,
Carlo Biffi,
Pietro Salvagnini,
Andrea Cherubini
Abstract:
To address overfitting and enhance model generalization in gastroenterological polyp size assessment, our study introduces Feature-Selection Gates (FSG) or Hard-Attention Gates (HAG) alongside Gradient Routing (GR) for dynamic feature selection. This technique aims to boost Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) by promoting sparse connectivity, thereby reducing overfi…
▽ More
To address overfitting and enhance model generalization in gastroenterological polyp size assessment, our study introduces Feature-Selection Gates (FSG) or Hard-Attention Gates (HAG) alongside Gradient Routing (GR) for dynamic feature selection. This technique aims to boost Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) by promoting sparse connectivity, thereby reducing overfitting and enhancing generalization. HAG achieves this through sparsification with learnable weights, serving as a regularization strategy. GR further refines this process by optimizing HAG parameters via dual forward passes, independently from the main model, to improve feature re-weighting. Our evaluation spanned multiple datasets, including CIFAR-100 for a broad impact assessment and specialized endoscopic datasets (REAL-Colon, Misawa, and SUN) focusing on polyp size estimation, covering over 200 polyps in more than 370,000 frames. The findings indicate that our HAG-enhanced networks substantially enhance performance in both binary and triclass classification tasks related to polyp sizing. Specifically, CNNs experienced an F1 Score improvement to 87.8% in binary classification, while in triclass classification, the ViT-T model reached an F1 Score of 76.5%, outperforming traditional CNNs and ViT-T models. To facilitate further research, we are releasing our codebase, which includes implementations for CNNs, multistream CNNs, ViT, and HAG-augmented variants. This resource aims to standardize the use of endoscopic datasets, providing public training-validation-testing splits for reliable and comparable research in gastroenterological polyp size estimation. The codebase is available at github.com/cosmoimd/feature-selection-gates.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
MARS: Paying more attention to visual attributes for text-based person search
Authors:
Alex Ergasti,
Tomaso Fontanini,
Claudio Ferrari,
Massimo Bertozzi,
Andrea Prati
Abstract:
Text-based person search (TBPS) is a problem that gained significant interest within the research community. The task is that of retrieving one or more images of a specific individual based on a textual description. The multi-modal nature of the task requires learning representations that bridge text and image data within a shared latent space. Existing TBPS systems face two major challenges. One…
▽ More
Text-based person search (TBPS) is a problem that gained significant interest within the research community. The task is that of retrieving one or more images of a specific individual based on a textual description. The multi-modal nature of the task requires learning representations that bridge text and image data within a shared latent space. Existing TBPS systems face two major challenges. One is defined as inter-identity noise that is due to the inherent vagueness and imprecision of text descriptions and it indicates how descriptions of visual attributes can be generally associated to different people; the other is the intra-identity variations, which are all those nuisances e.g. pose, illumination, that can alter the visual appearance of the same textual attributes for a given subject. To address these issues, this paper presents a novel TBPS architecture named MARS (Mae-Attribute-Relation-Sensitive), which enhances current state-of-the-art models by introducing two key components: a Visual Reconstruction Loss and an Attribute Loss. The former employs a Masked AutoEncoder trained to reconstruct randomly masked image patches with the aid of the textual description. In doing so the model is encouraged to learn more expressive representations and textual-visual relations in the latent space. The Attribute Loss, instead, balances the contribution of different types of attributes, defined as adjective-noun chunks of text. This loss ensures that every attribute is taken into consideration in the person retrieval process. Extensive experiments on three commonly used datasets, namely CUHK-PEDES, ICFG-PEDES, and RSTPReid, report performance improvements, with significant gains in the mean Average Precision (mAP) metric w.r.t. the current state of the art.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Natively neuromorphic LMU architecture for encoding-free SNN-based HAR on commercial edge devices
Authors:
Vittorio Fra,
Benedetto Leto,
Andrea Pignata,
Enrico Macii,
Gianvito Urgese
Abstract:
Neuromorphic models take inspiration from the human brain by adopting bio-plausible neuron models to build alternatives to traditional Machine Learning (ML) and Deep Learning (DL) solutions. The scarce availability of dedicated hardware able to actualize the emulation of brain-inspired computation, which is otherwise only simulated, yet still hinders the wide adoption of neuromorphic computing for…
▽ More
Neuromorphic models take inspiration from the human brain by adopting bio-plausible neuron models to build alternatives to traditional Machine Learning (ML) and Deep Learning (DL) solutions. The scarce availability of dedicated hardware able to actualize the emulation of brain-inspired computation, which is otherwise only simulated, yet still hinders the wide adoption of neuromorphic computing for edge devices and embedded systems. With this premise, we adopt the perspective of neuromorphic computing for conventional hardware and we present the L2MU, a natively neuromorphic Legendre Memory Unit (LMU) which entirely relies on Leaky Integrate-and-Fire (LIF) neurons. Specifically, the original recurrent architecture of LMU has been redesigned by modelling every constituent element with neural populations made of LIF or Current-Based (CuBa) LIF neurons. To couple neuromorphic computing and off-the-shelf edge devices, we equipped the L2MU with an input module for the conversion of real values into spikes, which makes it an encoding-free implementation of a Recurrent Spiking Neural Network (RSNN) able to directly work with raw sensor signals on non-dedicated hardware. As a use case to validate our network, we selected the task of Human Activity Recognition (HAR). We benchmarked our L2MU on smartwatch signals from hand-oriented activities, deploying it on three different commercial edge devices in compressed versions too. The reported results remark the possibility of considering neuromorphic models not only in an exclusive relationship with dedicated hardware but also as a suitable choice to work with common sensors and devices.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Leveraging Topic Specificity and Social Relationships for Expert Finding in Community Question Answering Platforms
Authors:
Maddalena Amendola,
Andrea Passarella,
Raffaele Perego
Abstract:
Online Community Question Answering (CQA) platforms have become indispensable tools for users seeking expert solutions to their technical queries. The effectiveness of these platforms relies on their ability to identify and direct questions to the most knowledgeable users within the community, a process known as Expert Finding (EF). EF accuracy is crucial for increasing user engagement and the rel…
▽ More
Online Community Question Answering (CQA) platforms have become indispensable tools for users seeking expert solutions to their technical queries. The effectiveness of these platforms relies on their ability to identify and direct questions to the most knowledgeable users within the community, a process known as Expert Finding (EF). EF accuracy is crucial for increasing user engagement and the reliability of provided answers. Despite recent advancements in EF methodologies, blending the diverse information sources available on CQA platforms for effective expert identification remains challenging. In this paper, we present TUEF, a Topic-oriented User-Interaction model for Expert Finding, which aims to fully and transparently leverage the heterogeneous information available within online question-answering communities. TUEF integrates content and social data by constructing a multi-layer graph that maps out user relationships based on their answering patterns on specific topics. By combining these sources of information, TUEF identifies the most relevant and knowledgeable users for any given question and ranks them using learning-to-rank techniques. Our findings indicate that TUEF's topic-oriented model significantly enhances performance, particularly in large communities discussing well-defined topics. Additionally, we show that the interpretable learning-to-rank algorithm integrated into TUEF offers transparency and explainability with minimal performance trade-offs. The exhaustive experiments conducted on six different CQA communities of Stack Exchange show that TUEF outperforms all competitors with a minimum performance boost of 42.42% in P@1, 32.73% in NDCG@3, 21.76% in R@5, and 29.81% in MRR, excelling in both the evaluation approaches present in the previous literature.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Algorithmic Collusion And The Minimum Price Markov Game
Authors:
Igor Sadoune,
Marcelin Joanis,
Andrea Lodi
Abstract:
This paper introduces the Minimum Price Markov Game (MPMG), a dynamic variant of the Prisoner's Dilemma. The MPMG serves as a theoretical model and reasonable approximation of real-world first-price sealed-bid public auctions that follow the minimum price rule. The goal is to provide researchers and practitioners with a framework to study market fairness and regulation in both digitized and non-di…
▽ More
This paper introduces the Minimum Price Markov Game (MPMG), a dynamic variant of the Prisoner's Dilemma. The MPMG serves as a theoretical model and reasonable approximation of real-world first-price sealed-bid public auctions that follow the minimum price rule. The goal is to provide researchers and practitioners with a framework to study market fairness and regulation in both digitized and non-digitized public procurement processes, amidst growing concerns about algorithmic collusion in online markets. We demonstrate, using multi-agent reinforcement learning-driven artificial agents, that algorithmic tacit coordination is difficult to achieve in the MPMG when cooperation is not explicitly engineered. Paradoxically, our results highlight the robustness of the minimum price rule in an auction environment, but also show that it is not impervious to full-scale algorithmic collusion. These findings contribute to the ongoing debates about algorithmic pricing and its implications.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Speaker- and Text-Independent Estimation of Articulatory Movements and Phoneme Alignments from Speech
Authors:
Tobias Weise,
Philipp Klumpp,
Kubilay Can Demir,
Paula Andrea Pérez-Toro,
Maria Schuster,
Elmar Noeth,
Bjoern Heismann,
Andreas Maier,
Seung Hee Yang
Abstract:
This paper introduces a novel combination of two tasks, previously treated separately: acoustic-to-articulatory speech inversion (AAI) and phoneme-to-articulatory (PTA) motion estimation. We refer to this joint task as acoustic phoneme-to-articulatory speech inversion (APTAI) and explore two different approaches, both working speaker- and text-independently during inference. We use a multi-task le…
▽ More
This paper introduces a novel combination of two tasks, previously treated separately: acoustic-to-articulatory speech inversion (AAI) and phoneme-to-articulatory (PTA) motion estimation. We refer to this joint task as acoustic phoneme-to-articulatory speech inversion (APTAI) and explore two different approaches, both working speaker- and text-independently during inference. We use a multi-task learning setup, with the end-to-end goal of taking raw speech as input and estimating the corresponding articulatory movements, phoneme sequence, and phoneme alignment. While both proposed approaches share these same requirements, they differ in their way of achieving phoneme-related predictions: one is based on frame classification, the other on a two-staged training procedure and forced alignment. We reach competitive performance of 0.73 mean correlation for the AAI task and achieve up to approximately 87% frame overlap compared to a state-of-the-art text-dependent phoneme force aligner.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Meta 3D Gen
Authors:
Raphael Bensadoun,
Tom Monnier,
Yanir Kleiman,
Filippos Kokkinos,
Yawar Siddiqui,
Mahendra Kariya,
Omri Harosh,
Roman Shapovalov,
Benjamin Graham,
Emilien Garreau,
Animesh Karnewar,
Ang Cao,
Idan Azuri,
Iurii Makarov,
Eric-Tuan Le,
Antoine Toisoul,
David Novotny,
Oran Gafni,
Natalia Neverova,
Andrea Vedaldi
Abstract:
We introduce Meta 3D Gen (3DGen), a new state-of-the-art, fast pipeline for text-to-3D asset generation. 3DGen offers 3D asset creation with high prompt fidelity and high-quality 3D shapes and textures in under a minute. It supports physically-based rendering (PBR), necessary for 3D asset relighting in real-world applications. Additionally, 3DGen supports generative retexturing of previously gener…
▽ More
We introduce Meta 3D Gen (3DGen), a new state-of-the-art, fast pipeline for text-to-3D asset generation. 3DGen offers 3D asset creation with high prompt fidelity and high-quality 3D shapes and textures in under a minute. It supports physically-based rendering (PBR), necessary for 3D asset relighting in real-world applications. Additionally, 3DGen supports generative retexturing of previously generated (or artist-created) 3D shapes using additional textual inputs provided by the user. 3DGen integrates key technical components, Meta 3D AssetGen and Meta 3D TextureGen, that we developed for text-to-3D and text-to-texture generation, respectively. By combining their strengths, 3DGen represents 3D objects simultaneously in three ways: in view space, in volumetric space, and in UV (or texture) space. The integration of these two techniques achieves a win rate of 68% with respect to the single-stage model. We compare 3DGen to numerous industry baselines, and show that it outperforms them in terms of prompt fidelity and visual quality for complex textual prompts, while being significantly faster.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials
Authors:
Yawar Siddiqui,
Tom Monnier,
Filippos Kokkinos,
Mahendra Kariya,
Yanir Kleiman,
Emilien Garreau,
Oran Gafni,
Natalia Neverova,
Andrea Vedaldi,
Roman Shapovalov,
David Novotny
Abstract:
We present Meta 3D AssetGen (AssetGen), a significant advancement in text-to-3D generation which produces faithful, high-quality meshes with texture and material control. Compared to works that bake shading in the 3D object's appearance, AssetGen outputs physically-based rendering (PBR) materials, supporting realistic relighting. AssetGen generates first several views of the object with factored s…
▽ More
We present Meta 3D AssetGen (AssetGen), a significant advancement in text-to-3D generation which produces faithful, high-quality meshes with texture and material control. Compared to works that bake shading in the 3D object's appearance, AssetGen outputs physically-based rendering (PBR) materials, supporting realistic relighting. AssetGen generates first several views of the object with factored shaded and albedo appearance channels, and then reconstructs colours, metalness and roughness in 3D, using a deferred shading loss for efficient supervision. It also uses a sign-distance function to represent 3D shape more reliably and introduces a corresponding loss for direct shape supervision. This is implemented using fused kernels for high memory efficiency. After mesh extraction, a texture refinement transformer operating in UV space significantly improves sharpness and details. AssetGen achieves 17% improvement in Chamfer Distance and 40% in LPIPS over the best concurrent work for few-view reconstruction, and a human preference of 72% over the best industry competitors of comparable speed, including those that support PBR. Project page with generated assets: https://assetgen.github.io
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Meta 3D TextureGen: Fast and Consistent Texture Generation for 3D Objects
Authors:
Raphael Bensadoun,
Yanir Kleiman,
Idan Azuri,
Omri Harosh,
Andrea Vedaldi,
Natalia Neverova,
Oran Gafni
Abstract:
The recent availability and adaptability of text-to-image models has sparked a new era in many related domains that benefit from the learned text priors as well as high-quality and fast generation capabilities, one of which is texture generation for 3D objects. Although recent texture generation methods achieve impressive results by using text-to-image networks, the combination of global consisten…
▽ More
The recent availability and adaptability of text-to-image models has sparked a new era in many related domains that benefit from the learned text priors as well as high-quality and fast generation capabilities, one of which is texture generation for 3D objects. Although recent texture generation methods achieve impressive results by using text-to-image networks, the combination of global consistency, quality, and speed, which is crucial for advancing texture generation to real-world applications, remains elusive. To that end, we introduce Meta 3D TextureGen: a new feedforward method comprised of two sequential networks aimed at generating high-quality and globally consistent textures for arbitrary geometries of any complexity degree in less than 20 seconds. Our method achieves state-of-the-art results in quality and speed by conditioning a text-to-image model on 3D semantics in 2D space and fusing them into a complete and high-resolution UV texture map, as demonstrated by extensive qualitative and quantitative evaluations. In addition, we introduce a texture enhancement network that is capable of up-scaling any texture by an arbitrary ratio, producing 4k pixel resolution textures.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Dynamic Few-Shot Learning for Knowledge Graph Question Answering
Authors:
Jacopo D'Abramo,
Andrea Zugarini,
Paolo Torroni
Abstract:
Large language models present opportunities for innovative Question Answering over Knowledge Graphs (KGQA). However, they are not inherently designed for query generation. To bridge this gap, solutions have been proposed that rely on fine-tuning or ad-hoc architectures, achieving good results but limited out-of-domain distribution generalization. In this study, we introduce a novel approach called…
▽ More
Large language models present opportunities for innovative Question Answering over Knowledge Graphs (KGQA). However, they are not inherently designed for query generation. To bridge this gap, solutions have been proposed that rely on fine-tuning or ad-hoc architectures, achieving good results but limited out-of-domain distribution generalization. In this study, we introduce a novel approach called Dynamic Few-Shot Learning (DFSL). DFSL integrates the efficiency of in-context learning and semantic similarity and provides a generally applicable solution for KGQA with state-of-the-art performance. We run an extensive evaluation across multiple benchmark datasets and architecture configurations.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Social Isolation, Digital Connection: COVID-19's Impact on Twitter Ego Networks
Authors:
Kamer Cekini,
Elisabetta Biondi,
Chiara Boldrini,
Andrea Passarella,
Marco Conti
Abstract:
One of the most impactful measures to fight the COVID-19 pandemic in its early first years was the lockdown, implemented by governments to reduce physical contact among people and minimize opportunities for the virus to spread. As people were compelled to limit their physical interactions and stay at home, they turned to online social platforms to alleviate feelings of loneliness. Ego networks rep…
▽ More
One of the most impactful measures to fight the COVID-19 pandemic in its early first years was the lockdown, implemented by governments to reduce physical contact among people and minimize opportunities for the virus to spread. As people were compelled to limit their physical interactions and stay at home, they turned to online social platforms to alleviate feelings of loneliness. Ego networks represent how people organize their relationships due to human cognitive constraints that impose limits on meaningful interactions among people. Physical contacts were disrupted during the lockdown, causing socialization to shift entirely online, leading to a shift in socialization into online platforms. Our research aimed to investigate the impact of lockdown measures on online ego network structures potentially caused by the increase of cognitive expenses in online social networks. In particular, we examined a large Twitter dataset of users, covering 7 years of their activities. We found that during the lockdown, there was an increase in network sizes and a richer structure in social circles, with relationships becoming more intimate. Moreover, we observe that, after the lockdown measures were relaxed, these features returned to their pre-lockdown values.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Applying the Ego Network Model to Cross-Target Stance Detection
Authors:
Jack Tacchi,
Parisa Jamadi Khiabani,
Arkaitz Zubiaga,
Chiara Boldrini,
Andrea Passarella
Abstract:
Understanding human interactions and social structures is an incredibly important task, especially in such an interconnected world. One task that facilitates this is Stance Detection, which predicts the opinion or attitude of a text towards a target entity. Traditionally, this has often been done mainly via the use of text-based approaches, however, recent work has produced a model (CT-TN) that le…
▽ More
Understanding human interactions and social structures is an incredibly important task, especially in such an interconnected world. One task that facilitates this is Stance Detection, which predicts the opinion or attitude of a text towards a target entity. Traditionally, this has often been done mainly via the use of text-based approaches, however, recent work has produced a model (CT-TN) that leverages information about a user's social network to help predict their stance, outperforming certain cross-target text-based approaches. Unfortunately, the data required for such graph-based approaches is not always available. This paper proposes two novel tools for Stance Detection: the Ego Network Model (ENM) and the Signed Ego Network Model (SENM). These models are founded in anthropological and psychological studies and have been used within the context of social network analysis and related tasks (e.g., link prediction). Stance Detection predictions obtained using these features achieve a level of accuracy similar to the graph-based features used by CT-TN while requiring less and more easily obtainable data. In addition to this, the performances of the inner and outer circles of the ENM, representing stronger and weaker social ties, respectively are compared. Surprisingly, the outer circles, which contain more numerous but less intimate connections, are more useful for predicting stance.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Show Less, Instruct More: Enriching Prompts with Definitions and Guidelines for Zero-Shot NER
Authors:
Andrew Zamai,
Andrea Zugarini,
Leonardo Rigutini,
Marco Ernandes,
Marco Maggini
Abstract:
Recently, several specialized instruction-tuned Large Language Models (LLMs) for Named Entity Recognition (NER) have emerged. Compared to traditional NER approaches, these models have strong generalization capabilities. Existing LLMs mainly focus on zero-shot NER in out-of-domain distributions, being fine-tuned on an extensive number of entity classes that often highly or completely overlap with t…
▽ More
Recently, several specialized instruction-tuned Large Language Models (LLMs) for Named Entity Recognition (NER) have emerged. Compared to traditional NER approaches, these models have strong generalization capabilities. Existing LLMs mainly focus on zero-shot NER in out-of-domain distributions, being fine-tuned on an extensive number of entity classes that often highly or completely overlap with test sets. In this work instead, we propose SLIMER, an approach designed to tackle never-seen-before named entity tags by instructing the model on fewer examples, and by leveraging a prompt enriched with definition and guidelines. Experiments demonstrate that definition and guidelines yield better performance, faster and more robust learning, particularly when labelling unseen Named Entities. Furthermore, SLIMER performs comparably to state-of-the-art approaches in out-of-domain zero-shot NER, while being trained on a reduced tag set.
△ Less
Submitted 2 July, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
Macroeconomic Forecasting with Large Language Models
Authors:
Andrea Carriero,
Davide Pettenuzzo,
Shubhranshu Shekhar
Abstract:
This paper presents a comparative analysis evaluating the accuracy of Large Language Models (LLMs) against traditional macro time series forecasting approaches. In recent times, LLMs have surged in popularity for forecasting due to their ability to capture intricate patterns in data and quickly adapt across very different domains. However, their effectiveness in forecasting macroeconomic time seri…
▽ More
This paper presents a comparative analysis evaluating the accuracy of Large Language Models (LLMs) against traditional macro time series forecasting approaches. In recent times, LLMs have surged in popularity for forecasting due to their ability to capture intricate patterns in data and quickly adapt across very different domains. However, their effectiveness in forecasting macroeconomic time series data compared to conventional methods remains an area of interest. To address this, we conduct a rigorous evaluation of LLMs against traditional macro forecasting methods, using as common ground the FRED-MD database. Our findings provide valuable insights into the strengths and limitations of LLMs in forecasting macroeconomic time series, shedding light on their applicability in real-world scenarios
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Your Car Tells Me Where You Drove: A Novel Path Inference Attack via CAN Bus and OBD-II Data
Authors:
Tommaso Bianchi,
Alessandro Brighente,
Mauro Conti,
Andrea Valori
Abstract:
Despite its well-known security issues, the Controller Area Network (CAN) is still the main technology for in-vehicle communications. Attackers posing as diagnostic services or accessing the CAN bus can threaten the drivers' location privacy to know the exact location at a certain point in time or to infer the visited areas. This represents a serious threat to users' privacy, but also an advantage…
▽ More
Despite its well-known security issues, the Controller Area Network (CAN) is still the main technology for in-vehicle communications. Attackers posing as diagnostic services or accessing the CAN bus can threaten the drivers' location privacy to know the exact location at a certain point in time or to infer the visited areas. This represents a serious threat to users' privacy, but also an advantage for police investigations to gather location-based evidence. In this paper, we present On Path Diagnostic - Intrusion \& Inference (OPD-II), a novel path inference attack leveraging a physical car model and a map matching algorithm to infer the path driven by a car based on CAN bus data. Differently from available attacks, our approach only requires the attacker to know the initial location and heading of the victim's car and is not limited by the availability of training data, road configurations, or the need to access other victim's devices (e.g., smartphones). We implement our attack on a set of four different cars and a total number of 41 tracks in different road and traffic scenarios. We achieve an average of 95% accuracy on reconstructing the coordinates of the recorded path by leveraging a dynamic map-matching algorithm that outperforms the 75% and 89% accuracy values of other proposals while removing their set of assumptions.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
AI for Extreme Event Modeling and Understanding: Methodologies and Challenges
Authors:
Gustau Camps-Valls,
Miguel-Ángel Fernández-Torres,
Kai-Hendrik Cohrs,
Adrian Höhl,
Andrea Castelletti,
Aytac Pacal,
Claire Robin,
Francesco Martinuzzi,
Ioannis Papoutsis,
Ioannis Prapas,
Jorge Pérez-Aracil,
Katja Weigel,
Maria Gonzalez-Calabuig,
Markus Reichstein,
Martin Rabel,
Matteo Giuliani,
Miguel Mahecha,
Oana-Iuliana Popescu,
Oscar J. Pellicer-Valero,
Said Ouala,
Sancho Salcedo-Sanz,
Sebastian Sippel,
Spyros Kondylatos,
Tamara Happé,
Tristan Williams
Abstract:
In recent years, artificial intelligence (AI) has deeply impacted various fields, including Earth system sciences. Here, AI improved weather forecasting, model emulation, parameter estimation, and the prediction of extreme events. However, the latter comes with specific challenges, such as develo** accurate predictors from noisy, heterogeneous and limited annotated data. This paper reviews how A…
▽ More
In recent years, artificial intelligence (AI) has deeply impacted various fields, including Earth system sciences. Here, AI improved weather forecasting, model emulation, parameter estimation, and the prediction of extreme events. However, the latter comes with specific challenges, such as develo** accurate predictors from noisy, heterogeneous and limited annotated data. This paper reviews how AI is being used to analyze extreme events (like floods, droughts, wildfires and heatwaves), highlighting the importance of creating accurate, transparent, and reliable AI models. We discuss the hurdles of dealing with limited data, integrating information in real-time, deploying models, and making them understandable, all crucial for gaining the trust of stakeholders and meeting regulatory needs. We provide an overview of how AI can help identify and explain extreme events more effectively, improving disaster response and communication. We emphasize the need for collaboration across different fields to create AI solutions that are practical, understandable, and trustworthy for analyzing and predicting extreme events. Such collaborative efforts aim to enhance disaster readiness and disaster risk reduction.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
SpotlessSplats: Ignoring Distractors in 3D Gaussian Splatting
Authors:
Sara Sabour,
Lily Goli,
George Kopanas,
Mark Matthews,
Dmitry Lagun,
Leonidas Guibas,
Alec Jacobson,
David J. Fleet,
Andrea Tagliasacchi
Abstract:
3D Gaussian Splatting (3DGS) is a promising technique for 3D reconstruction, offering efficient training and rendering speeds, making it suitable for real-time applications.However, current methods require highly controlled environments (no moving people or wind-blown elements, and consistent lighting) to meet the inter-view consistency assumption of 3DGS. This makes reconstruction of real-world c…
▽ More
3D Gaussian Splatting (3DGS) is a promising technique for 3D reconstruction, offering efficient training and rendering speeds, making it suitable for real-time applications.However, current methods require highly controlled environments (no moving people or wind-blown elements, and consistent lighting) to meet the inter-view consistency assumption of 3DGS. This makes reconstruction of real-world captures problematic. We present SpotlessSplats, an approach that leverages pre-trained and general-purpose features coupled with robust optimization to effectively ignore transient distractors. Our method achieves state-of-the-art reconstruction quality both visually and quantitatively, on casual captures.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Sampled Datasets Risk Substantial Bias in the Identification of Political Polarization on Social Media
Authors:
Gabriele Di Bona,
Emma Fraxanet,
Björn Komander,
Andrea Lo Sasso,
Virginia Morini,
Antoine Vendeville,
Max Falkenberg,
Alessandro Galeazzi
Abstract:
Following recent policy changes by X (Twitter) and other social media platforms, user interaction data has become increasingly difficult to access. These restrictions are impeding robust research pertaining to social and political phenomena online, which is critical due to the profound impact social media platforms may have on our societies. Here, we investigate the reliability of polarization mea…
▽ More
Following recent policy changes by X (Twitter) and other social media platforms, user interaction data has become increasingly difficult to access. These restrictions are impeding robust research pertaining to social and political phenomena online, which is critical due to the profound impact social media platforms may have on our societies. Here, we investigate the reliability of polarization measures obtained from different samples of social media data by studying the structural polarization of the Polish political debate on Twitter over a 24-hour period. First, we show that the political discussion on Twitter is only a small subset of the wider Twitter discussion. Second, we find that large samples can be representative of the whole political discussion on a platform, but small samples consistently fail to accurately reflect the true structure of polarization online. Finally, we demonstrate that keyword-based samples can be representative if keywords are selected with great care, but that poorly selected keywords can result in substantial political bias in the sampled data. Our findings demonstrate that it is not possible to measure polarization in a reliable way with small, sampled datasets, highlighting why the current lack of research data is so problematic, and providing insight into the practical implementation of the European Union's Digital Service Act which aims to improve researchers' access to social media data.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Handling Ontology Gaps in Semantic Parsing
Authors:
Andrea Bacciu,
Marco Damonte,
Marco Basaldella,
Emilio Monti
Abstract:
The majority of Neural Semantic Parsing (NSP) models are developed with the assumption that there are no concepts outside the ones such models can represent with their target symbols (closed-world assumption). This assumption leads to generate hallucinated outputs rather than admitting their lack of knowledge. Hallucinations can lead to wrong or potentially offensive responses to users. Hence, a m…
▽ More
The majority of Neural Semantic Parsing (NSP) models are developed with the assumption that there are no concepts outside the ones such models can represent with their target symbols (closed-world assumption). This assumption leads to generate hallucinated outputs rather than admitting their lack of knowledge. Hallucinations can lead to wrong or potentially offensive responses to users. Hence, a mechanism to prevent this behavior is crucial to build trusted NSP-based Question Answering agents. To that end, we propose the Hallucination Simulation Framework (HSF), a general setting for stimulating and analyzing NSP model hallucinations. The framework can be applied to any NSP task with a closed-ontology. Using the proposed framework and KQA Pro as the benchmark dataset, we assess state-of-the-art techniques for hallucination detection. We then present a novel hallucination detection strategy that exploits the computational graph of the NSP model to detect the NSP hallucinations in the presence of ontology gaps, out-of-domain utterances, and to recognize NSP errors, improving the F1-Score respectively by ~21, ~24% and ~1%. This is the first work in closed-ontology NSP that addresses the problem of recognizing ontology gaps. We release our code and checkpoints at https://github.com/amazon-science/handling-ontology-gaps-in-semantic-parsing.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
BISeizuRe: BERT-Inspired Seizure Data Representation to Improve Epilepsy Monitoring
Authors:
Luca Benfenati,
Thorir Mar Ingolfsson,
Andrea Cossettini,
Daniele Jahier Pagliari,
Alessio Burrello,
Luca Benini
Abstract:
This study presents a novel approach for EEG-based seizure detection leveraging a BERT-based model. The model, BENDR, undergoes a two-phase training process. Initially, it is pre-trained on the extensive Temple University Hospital EEG Corpus (TUEG), a 1.5 TB dataset comprising over 10,000 subjects, to extract common EEG data patterns. Subsequently, the model is fine-tuned on the CHB-MIT Scalp EEG…
▽ More
This study presents a novel approach for EEG-based seizure detection leveraging a BERT-based model. The model, BENDR, undergoes a two-phase training process. Initially, it is pre-trained on the extensive Temple University Hospital EEG Corpus (TUEG), a 1.5 TB dataset comprising over 10,000 subjects, to extract common EEG data patterns. Subsequently, the model is fine-tuned on the CHB-MIT Scalp EEG Database, consisting of 664 EEG recordings from 24 pediatric patients, of which 198 contain seizure events. Key contributions include optimizing fine-tuning on the CHB-MIT dataset, where the impact of model architecture, pre-processing, and post-processing techniques are thoroughly examined to enhance sensitivity and reduce false positives per hour (FP/h). We also explored custom training strategies to ascertain the most effective setup. The model undergoes a novel second pre-training phase before subject-specific fine-tuning, enhancing its generalization capabilities. The optimized model demonstrates substantial performance enhancements, achieving as low as 0.23 FP/h, 2.5$\times$ lower than the baseline model, with a lower but still acceptable sensitivity rate, showcasing the effectiveness of applying a BERT-based approach on EEG-based seizure detection.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
SD-BLS: Privacy Preserving Selective Disclosure of Verifiable Credentials with Unlinkable Threshold Revocation
Authors:
Denis Roio,
Rebecca Selvaggini,
Andrea D'Intino
Abstract:
It is of critical importance to design digital identity systems that ensure the privacy of citizens as well as protecting them from issuer corruption. We aim to solve this issue and propose a method for selective disclosure and privacy preserving revocation of digital credentials, using the unique homomorphic characteristics of second order Elliptic Curves and Boneh-Lynn-Shacham (BLS) signatures.…
▽ More
It is of critical importance to design digital identity systems that ensure the privacy of citizens as well as protecting them from issuer corruption. We aim to solve this issue and propose a method for selective disclosure and privacy preserving revocation of digital credentials, using the unique homomorphic characteristics of second order Elliptic Curves and Boneh-Lynn-Shacham (BLS) signatures. Our approach ensures that users can selectively reveal credentials signed by a certain issuer, which can be interactively revoked by a quorum of other agreeing issuers without revealing the identity of users. Our goal is to protect users from issuer corruption by requiring collective agreement among multiple revocation issuers.
△ Less
Submitted 3 July, 2024; v1 submitted 27 June, 2024;
originally announced June 2024.
-
Building multiscale models with PhysiBoSS, an agent-based modeling tool
Authors:
Marco Ruscone,
Andrea Checcoli,
Randy Heiland,
Emmanuel Barillot,
Paul Macklin,
Laurence Calzone,
Vincent Noël
Abstract:
Multiscale models provide a unique tool for studying complex processes that study events occurring at different scales across space and time. In the context of biological systems, such models can simulate mechanisms happening at the intracellular level such as signaling, and at the extracellular level where cells communicate and coordinate with other cells. They aim to understand the impact of gen…
▽ More
Multiscale models provide a unique tool for studying complex processes that study events occurring at different scales across space and time. In the context of biological systems, such models can simulate mechanisms happening at the intracellular level such as signaling, and at the extracellular level where cells communicate and coordinate with other cells. They aim to understand the impact of genetic or environmental deregulation observed in complex diseases, describe the interplay between a pathological tissue and the immune system, and suggest strategies to revert the diseased phenotypes. The construction of these multiscale models remains a very complex task, including the choice of the components to consider, the level of details of the processes to simulate, or the fitting of the parameters to the data. One additional difficulty is the expert knowledge needed to program these models in languages such as C++ or Python, which may discourage the participation of non-experts. Simplifying this process through structured description formalisms -- coupled with a graphical interface -- is crucial in making modeling more accessible to the broader scientific community, as well as streamlining the process for advanced users. This article introduces three examples of multiscale models which rely on the framework PhysiBoSS, an add-on of PhysiCell that includes intracellular descriptions as continuous time Boolean models to the agent-based approach. The article demonstrates how to easily construct such models, relying on PhysiCell Studio, the PhysiCell Graphical User Interface. A step-by-step tutorial is provided as a Supplementary Material and all models are provided at: https://physiboss.github.io/tutorial/.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Kolmogorov-Arnold Graph Neural Networks
Authors:
Gianluca De Carlo,
Andrea Mastropietro,
Aris Anagnostopoulos
Abstract:
Graph neural networks (GNNs) excel in learning from network-like data but often lack interpretability, making their application challenging in domains requiring transparent decision-making. We propose the Graph Kolmogorov-Arnold Network (GKAN), a novel GNN model leveraging spline-based activation functions on edges to enhance both accuracy and interpretability. Our experiments on five benchmark da…
▽ More
Graph neural networks (GNNs) excel in learning from network-like data but often lack interpretability, making their application challenging in domains requiring transparent decision-making. We propose the Graph Kolmogorov-Arnold Network (GKAN), a novel GNN model leveraging spline-based activation functions on edges to enhance both accuracy and interpretability. Our experiments on five benchmark datasets demonstrate that GKAN outperforms state-of-the-art GNN models in node classification, link prediction, and graph classification tasks. In addition to the improved accuracy, GKAN's design inherently provides clear insights into the model's decision-making process, eliminating the need for post-hoc explainability techniques. This paper discusses the methodology, performance, and interpretability of GKAN, highlighting its potential for applications in domains where interpretability is crucial.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
LLCoach: Generating Robot Soccer Plans using Multi-Role Large Language Models
Authors:
Michele Brienza,
Emanuele Musumeci,
Vincenzo Suriani,
Daniele Affinita,
Andrea Pennisi,
Daniele Nardi,
Domenico Daniele Bloisi
Abstract:
The deployment of robots into human scenarios necessitates advanced planning strategies, particularly when we ask robots to operate in dynamic, unstructured environments. RoboCup offers the chance to deploy robots in one of those scenarios, a human-shaped game represented by a soccer match. In such scenarios, robots must operate using predefined behaviors that can fail in unpredictable conditions.…
▽ More
The deployment of robots into human scenarios necessitates advanced planning strategies, particularly when we ask robots to operate in dynamic, unstructured environments. RoboCup offers the chance to deploy robots in one of those scenarios, a human-shaped game represented by a soccer match. In such scenarios, robots must operate using predefined behaviors that can fail in unpredictable conditions. This paper introduces a novel application of Large Language Models (LLMs) to address the challenge of generating actionable plans in such settings, specifically within the context of the RoboCup Standard Platform League (SPL) competitions where robots are required to autonomously execute soccer strategies that emerge from the interactions of individual agents. In particular, we propose a multi-role approach leveraging the capabilities of LLMs to generate and refine plans for a robotic soccer team. The potential of the proposed method is demonstrated through an experimental evaluation,carried out simulating multiple matches where robots with AI-generated plans play against robots running human-built code.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Foundational Models for Pathology and Endoscopy Images: Application for Gastric Inflammation
Authors:
Hamideh Kerdegari,
Kyle Higgins,
Dennis Veselkov,
Ivan Laponogov,
Inese Polaka,
Miguel Coimbra,
Junior Andrea Pescino,
Marcis Leja,
Mario Dinis-Ribeiro,
Tania Fleitas Kanonnikoff,
Kirill Veselkov
Abstract:
The integration of artificial intelligence (AI) in medical diagnostics represents a significant advancement in managing upper gastrointestinal (GI) cancer, a major cause of global cancer mortality. Specifically for gastric cancer (GC), chronic inflammation causes changes in the mucosa such as atrophy, intestinal metaplasia (IM), dysplasia and ultimately cancer. Early detection through endoscopic r…
▽ More
The integration of artificial intelligence (AI) in medical diagnostics represents a significant advancement in managing upper gastrointestinal (GI) cancer, a major cause of global cancer mortality. Specifically for gastric cancer (GC), chronic inflammation causes changes in the mucosa such as atrophy, intestinal metaplasia (IM), dysplasia and ultimately cancer. Early detection through endoscopic regular surveillance is essential for better outcomes. Foundation models (FM), which are machine or deep learning models trained on diverse data and applicable to broad use cases, offer a promising solution to enhance the accuracy of endoscopy and its subsequent pathology image analysis. This review explores the recent advancements, applications, and challenges associated with FM in endoscopy and pathology imaging. We started by elucidating the core principles and architectures underlying these models, including their training methodologies and the pivotal role of large-scale data in develo** their predictive capabilities. Moreover, this work discusses emerging trends and future research directions, emphasizing the integration of multimodal data, the development of more robust and equitable models, and the potential for real-time diagnostic support. This review aims to provide a roadmap for researchers and practitioners in navigating the complexities of incorporating FM into clinical practice for prevention/management of GC cases, thereby improving patient outcomes.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Enhancing Data Privacy in Large Language Models through Private Association Editing
Authors:
Davide Venditti,
Elena Sofia Ruzzetti,
Giancarlo A. Xompero,
Cristina Giannone,
Andrea Favalli,
Raniero Romagnoli,
Fabio Massimo Zanzotto
Abstract:
Large Language Models (LLMs) are powerful tools with extensive applications, but their tendency to memorize private information raises significant concerns as private data leakage can easily happen. In this paper, we introduce Private Association Editing (PAE), a novel defense approach for private data leakage. PAE is designed to effectively remove Personally Identifiable Information (PII) without…
▽ More
Large Language Models (LLMs) are powerful tools with extensive applications, but their tendency to memorize private information raises significant concerns as private data leakage can easily happen. In this paper, we introduce Private Association Editing (PAE), a novel defense approach for private data leakage. PAE is designed to effectively remove Personally Identifiable Information (PII) without retraining the model. Our approach consists of a four-step procedure: detecting memorized PII, applying PAE cards to mitigate memorization of private data, verifying resilience to targeted data extraction (TDE) attacks, and ensuring consistency in the post-edit LLMs. The versatility and efficiency of PAE, which allows for batch modifications, significantly enhance data privacy in LLMs. Experimental results demonstrate the effectiveness of PAE in mitigating private data leakage. We believe PAE will serve as a critical tool in the ongoing effort to protect data privacy in LLMs, encouraging the development of safer models for real-world applications.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
SincVAE: a New Approach to Improve Anomaly Detection on EEG Data Using SincNet and Variational Autoencoder
Authors:
Andrea Pollastro,
Francesco Isgrò,
Roberto Prevete
Abstract:
Over the past few decades, electroencephalography (EEG) monitoring has become a pivotal tool for diagnosing neurological disorders, particularly for detecting seizures. Epilepsy, one of the most prevalent neurological diseases worldwide, affects approximately the 1 \% of the population. These patients face significant risks, underscoring the need for reliable, continuous seizure monitoring in dail…
▽ More
Over the past few decades, electroencephalography (EEG) monitoring has become a pivotal tool for diagnosing neurological disorders, particularly for detecting seizures. Epilepsy, one of the most prevalent neurological diseases worldwide, affects approximately the 1 \% of the population. These patients face significant risks, underscoring the need for reliable, continuous seizure monitoring in daily life. Most of the techniques discussed in the literature rely on supervised Machine Learning (ML) methods. However, the challenge of accurately labeling variations in epileptic EEG waveforms complicates the use of these approaches. Additionally, the rarity of ictal events introduces an high imbalancing within the data, which could lead to poor prediction performance in supervised learning approaches. Instead, a semi-supervised approach allows to train the model only on data not containing seizures, thus avoiding the issues related to the data imbalancing. This work proposes a semi-supervised approach for detecting epileptic seizures from EEG data, utilizing a novel Deep Learning-based method called SincVAE. This proposal incorporates the learning of an ad-hoc array of bandpass filter as a first layer of a Variational Autoencoder (VAE), potentially eliminating the preprocessing stage where informative band frequencies are identified and isolated. Results indicate that SincVAE improves seizure detection in EEG data and is capable of identifying early seizures during the preictal stage as well as monitoring patients throughout the postictal stage.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Disce aut Deficere: Evaluating LLMs Proficiency on the INVALSI Italian Benchmark
Authors:
Fabio Mercorio,
Mario Mezzanzanica,
Daniele Potertì,
Antonio Serino,
Andrea Seveso
Abstract:
Recent advancements in Large Language Models (LLMs) have significantly enhanced their ability to generate and manipulate human language, highlighting their potential across various applications. Evaluating LLMs in languages other than English is crucial for ensuring their linguistic versatility, cultural relevance, and applicability in diverse global contexts, thus broadening their usability and e…
▽ More
Recent advancements in Large Language Models (LLMs) have significantly enhanced their ability to generate and manipulate human language, highlighting their potential across various applications. Evaluating LLMs in languages other than English is crucial for ensuring their linguistic versatility, cultural relevance, and applicability in diverse global contexts, thus broadening their usability and effectiveness. We tackle this challenge by introducing a structured benchmark using the INVALSI tests, a set of well-established assessments designed to measure educational competencies across Italy. Our study makes three primary contributions: Firstly, we adapt the INVALSI benchmark for automated LLM evaluation, which involves rigorous adaptation of the test format to suit automated processing while retaining the essence of the original tests. Secondly, we provide a detailed assessment of current LLMs, offering a crucial reference point for the academic community. Finally, we visually compare the performance of these models against human results. Additionally, researchers are invited to submit their models for ongoing evaluation, ensuring the benchmark remains a current and valuable resource.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Unveiling Cognitive Constraints in Language Production: Extracting and Validating the Active Ego Network of Words
Authors:
Kilian Ollivier,
Chiara Boldrini,
Andrea Passarella,
Marco Conti
Abstract:
The "ego network of words" model captures structural properties in language production associated with cognitive constraints. While previous research focused on the layer-based structure and its semantic properties, this paper argues that an essential element, the concept of an active network, is missing. The active part of the ego network of words only includes words that are regularly used by in…
▽ More
The "ego network of words" model captures structural properties in language production associated with cognitive constraints. While previous research focused on the layer-based structure and its semantic properties, this paper argues that an essential element, the concept of an active network, is missing. The active part of the ego network of words only includes words that are regularly used by individuals, akin to the ego networks in the social domain, where the active part includes relationships regularly nurtured by individuals and hence demanding cognitive effort. In this work, we define a methodology for extracting the active part of the ego network of words and validate it using interview transcripts and tweets. The robustness of our method to varying input data sizes and temporal stability is demonstrated. We also demonstrate that without the active network concept (and a tool for properly extracting the active network from data), the "ego network of words" model is not able to properly estimate the cognitive effort involved and it becomes vulnerable to the amount of data considered (leading to the disappearance of the layered structure in large datasets). Our results are well-aligned with prior analyses of the ego network of words, where the limitation of the data collected led automatically (and implicitly) to approximately consider the active part of the network only. Moreover, the validation on the transcripts dataset (MediaSum) highlights the generalizability of the model across diverse domains and the ingrained cognitive constraints in language usage.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Evaluation of Language Models in the Medical Context Under Resource-Constrained Settings
Authors:
Andrea Posada,
Daniel Rueckert,
Felix Meissen,
Philip Müller
Abstract:
Since the emergence of the Transformer architecture, language model development has increased, driven by their promising potential. However, releasing these models into production requires properly understanding their behavior, particularly in sensitive domains such as medicine. Despite this need, the medical literature still lacks technical assessments of pre-trained language models, which are es…
▽ More
Since the emergence of the Transformer architecture, language model development has increased, driven by their promising potential. However, releasing these models into production requires properly understanding their behavior, particularly in sensitive domains such as medicine. Despite this need, the medical literature still lacks technical assessments of pre-trained language models, which are especially valuable in resource-constrained settings in terms of computational power or limited budget. To address this gap, we provide a comprehensive survey of language models in the medical domain. In addition, we selected a subset of these models for thorough evaluation, focusing on classification and text generation tasks. Our subset encompasses 53 models, ranging from 110 million to 13 billion parameters, spanning the three families of Transformer-based models and from diverse knowledge domains. This study employs a series of approaches for text classification together with zero-shot prompting instead of model training or fine-tuning, which closely resembles the limited resource setting in which many users of language models find themselves. Encouragingly, our findings reveal remarkable performance across various tasks and datasets, underscoring the latent potential of certain models to contain medical knowledge, even without domain specialization. Consequently, our study advocates for further exploration of model applications in medical contexts, particularly in resource-constrained settings. The code is available on https://github.com/anpoc/Language-models-in-medicine.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Natural Measures on Polyominoes Induced by the Abelian Sandpile Model
Authors:
Andrea Sportiello
Abstract:
We introduce a natural Boltzmann measure over polyominoes induced by boundary avalanches in the Abelian Sandpile Model. Through the study of a suitable associated process, we give an argument suggesting that the probability distribution of the avalnche sizes has a power-law decay with exponent 3/2, in contrast with the present understanding of bulk avalanches in the model (which has some exponent…
▽ More
We introduce a natural Boltzmann measure over polyominoes induced by boundary avalanches in the Abelian Sandpile Model. Through the study of a suitable associated process, we give an argument suggesting that the probability distribution of the avalnche sizes has a power-law decay with exponent 3/2, in contrast with the present understanding of bulk avalanches in the model (which has some exponent between 1 and 5/4), and to the ordinary generating function of polyominoes (which is conjectured to have a logarithmic singularity, i.e. exponent 1). We provide some numerical evidence for our claims, and evaluate some other statistical observables on our process, most notably the density of triple points.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
High-resolution open-vocabulary object 6D pose estimation
Authors:
Jaime Corsetti,
Davide Boscaini,
Francesco Giuliari,
Changjae Oh,
Andrea Cavallaro,
Fabio Poiesi
Abstract:
The generalisation to unseen objects in the 6D pose estimation task is very challenging. While Vision-Language Models (VLMs) enable using natural language descriptions to support 6D pose estimation of unseen objects, these solutions underperform compared to model-based methods. In this work we present Horyon, an open-vocabulary VLM-based architecture that addresses relative pose estimation between…
▽ More
The generalisation to unseen objects in the 6D pose estimation task is very challenging. While Vision-Language Models (VLMs) enable using natural language descriptions to support 6D pose estimation of unseen objects, these solutions underperform compared to model-based methods. In this work we present Horyon, an open-vocabulary VLM-based architecture that addresses relative pose estimation between two scenes of an unseen object, described by a textual prompt only. We use the textual prompt to identify the unseen object in the scenes and then obtain high-resolution multi-scale features. These features are used to extract cross-scene matches for registration. We evaluate our model on a benchmark with a large variety of unseen objects across four datasets, namely REAL275, Toyota-Light, Linemod, and YCB-Video. Our method achieves state-of-the-art performance on all datasets, outperforming by 12.6 in Average Recall the previous best-performing approach.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Deep-MPC: A DAGGER-Driven Imitation Learning Strategy for Optimal Constrained Battery Charging
Authors:
Jorge Espin,
Dong Zhang,
Daniele Toti,
Andrea Pozzi
Abstract:
In the realm of battery charging, several complex aspects demand meticulous attention, including thermal management, capacity degradation, and the need for rapid charging while maintaining safety and battery lifespan. By employing the imitation learning paradigm, this manuscript introduces an innovative solution to confront the inherent challenges often associated with conventional predictive cont…
▽ More
In the realm of battery charging, several complex aspects demand meticulous attention, including thermal management, capacity degradation, and the need for rapid charging while maintaining safety and battery lifespan. By employing the imitation learning paradigm, this manuscript introduces an innovative solution to confront the inherent challenges often associated with conventional predictive control strategies for constrained battery charging. A significant contribution of this study lies in the adaptation of the Dataset Aggregation (DAGGER) algorithm to address scenarios where battery parameters are uncertain, and internal states are unobservable. Results drawn from a practical battery simulator that incorporates an electrochemical model highlight substantial improvements in battery charging performance, particularly in meeting all safety constraints and outperforming traditional strategies in computational processing.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Perks and Pitfalls of Faithfulness in Regular, Self-Explainable and Domain Invariant GNNs
Authors:
Steve Azzolin,
Antonio Longa,
Stefano Teso,
Andrea Passerini
Abstract:
As Graph Neural Networks (GNNs) become more pervasive, it becomes paramount to build robust tools for computing explanations of their predictions. A key desideratum is that these explanations are faithful, i.e., that they portray an accurate picture of the GNN's reasoning process. A number of different faithfulness metrics exist, begging the question of what faithfulness is exactly, and what its p…
▽ More
As Graph Neural Networks (GNNs) become more pervasive, it becomes paramount to build robust tools for computing explanations of their predictions. A key desideratum is that these explanations are faithful, i.e., that they portray an accurate picture of the GNN's reasoning process. A number of different faithfulness metrics exist, begging the question of what faithfulness is exactly, and what its properties are. We begin by showing that existing metrics are not interchangeable -- i.e., explanations attaining high faithfulness according to one metric may be unfaithful according to others -- and can be systematically insensitive to important properties of the explanation, and suggest how to address these issues. We proceed to show that, surprisingly, optimizing for faithfulness is not always a sensible design goal. Specifically, we show that for injective regular GNN architectures, perfectly faithful explanations are completely uninformative. The situation is different for modular GNNs, such as self-explainable and domain-invariant architectures, where optimizing faithfulness does not compromise informativeness, and is also unexpectedly tied to out-of-distribution generalization.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
GraLMatch: Matching Groups of Entities with Graphs and Language Models
Authors:
Fernando De Meer Pardo,
Claude Lehmann,
Dennis Gehrig,
Andrea Nagy,
Stefano Nicoli,
Branka Hadji Misheva,
Martin Braschler,
Kurt Stockinger
Abstract:
In this paper, we present an end-to-end multi-source Entity Matching problem, which we call entity group matching, where the goal is to assign to the same group, records originating from multiple data sources but representing the same real-world entity. We focus on the effects of transitively matched records, i.e. the records connected by paths in the graph G = (V,E) whose nodes and edges represen…
▽ More
In this paper, we present an end-to-end multi-source Entity Matching problem, which we call entity group matching, where the goal is to assign to the same group, records originating from multiple data sources but representing the same real-world entity. We focus on the effects of transitively matched records, i.e. the records connected by paths in the graph G = (V,E) whose nodes and edges represent the records and whether they are a match or not. We present a real-world instance of this problem, where the challenge is to match records of companies and financial securities originating from different data providers. We also introduce two new multi-source benchmark datasets that present similar matching challenges as real-world records. A distinctive characteristic of these records is that they are regularly updated following real-world events, but updates are not applied uniformly across data sources. This phenomenon makes the matching of certain groups of records only possible through the use of transitive information.
In our experiments, we illustrate how considering transitively matched records is challenging since a limited amount of false positive pairwise match predictions can throw off the group assignment of large quantities of records. Thus, we propose GraLMatch, a method that can partially detect and remove false positive pairwise predictions through graph-based properties. Finally, we showcase how fine-tuning a Transformer-based model (DistilBERT) on a reduced number of labeled samples yields a better final entity group matching than training on more samples and/or incorporating fine-tuning optimizations, illustrating how precision becomes the deciding factor in the entity group matching of large volumes of records.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Modelling Legislative Systems into Property Graphs to Enable Advanced Pattern Detection
Authors:
Andrea Colombo,
Anna Bernasconi,
Stefano Ceri
Abstract:
Legislative systems face growing complexity due to the ever-increasing number of laws and intricate interdependencies between them. Traditional methods of storing and analyzing legal systems, mainly based on RDF, struggle with this complexity, hindering efficient knowledge discovery, as required by domain experts. In this paper, we propose to model legislation into a property graph, where edges re…
▽ More
Legislative systems face growing complexity due to the ever-increasing number of laws and intricate interdependencies between them. Traditional methods of storing and analyzing legal systems, mainly based on RDF, struggle with this complexity, hindering efficient knowledge discovery, as required by domain experts. In this paper, we propose to model legislation into a property graph, where edges represent citations, modifications, and abrogations between laws and their articles or attachments, both represented as nodes and edges with properties. As a practical use case, we implement the model in the Italian legislative system. First, we describe our approach to extracting knowledge from legal texts. To this aim, we leverage the recently internationally adopted XML law standard, Akoma Ntoso, to parse and identify entities, relationships and properties. Next, we describe the model and the schema implemented using Neo4j, the market-leading graph database management system. The schema is designed to capture the structure and hierarchy of laws, together with their interdependencies. We show how such a property graph enables an efficient answer to complex and relevant queries previously impractical on raw text. By leveraging other implementations of the Akoma Ntoso standard and the proposed property graph approach, we are confident that this work will facilitate a comprehensive comparison of legislative systems and their complexities.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Talking the Talk Does Not Entail Walking the Walk: On the Limits of Large Language Models in Lexical Entailment Recognition
Authors:
Candida M. Greco,
Lucio La Cava,
Andrea Tagarelli
Abstract:
Verbs form the backbone of language, providing the structure and meaning to sentences. Yet, their intricate semantic nuances pose a longstanding challenge. Understanding verb relations through the concept of lexical entailment is crucial for comprehending sentence meanings and gras** verb dynamics. This work investigates the capabilities of eight Large Language Models in recognizing lexical enta…
▽ More
Verbs form the backbone of language, providing the structure and meaning to sentences. Yet, their intricate semantic nuances pose a longstanding challenge. Understanding verb relations through the concept of lexical entailment is crucial for comprehending sentence meanings and gras** verb dynamics. This work investigates the capabilities of eight Large Language Models in recognizing lexical entailment relations among verbs through differently devised prompting strategies and zero-/few-shot settings over verb pairs from two lexical databases, namely WordNet and HyperLex. Our findings unveil that the models can tackle the lexical entailment recognition task with moderately good performance, although at varying degree of effectiveness and under different conditions. Also, utilizing few-shot prompting can enhance the models' performance. However, perfectly solving the task arises as an unmet challenge for all examined LLMs, which raises an emergence for further research developments on this topic.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Optimizing Quantile-based Trading Strategies in Electricity Arbitrage
Authors:
Ciaran O'Connor,
Joseph Collins,
Steven Prestwich,
Andrea Visentin
Abstract:
Efficiently integrating renewable resources into electricity markets is vital for addressing the challenges of matching real-time supply and demand while reducing the significant energy wastage resulting from curtailments. To address this challenge effectively, the incorporation of storage devices can enhance the reliability and efficiency of the grid, improving market liquidity and reducing price…
▽ More
Efficiently integrating renewable resources into electricity markets is vital for addressing the challenges of matching real-time supply and demand while reducing the significant energy wastage resulting from curtailments. To address this challenge effectively, the incorporation of storage devices can enhance the reliability and efficiency of the grid, improving market liquidity and reducing price volatility. In short-term electricity markets, participants navigate numerous options, each presenting unique challenges and opportunities, underscoring the critical role of the trading strategy in maximizing profits. This study delves into the optimization of day-ahead and balancing market trading, leveraging quantile-based forecasts. Employing three trading approaches with practical constraints, our research enhances forecast assessment, increases trading frequency, and employs flexible timestamp orders. Our findings underscore the profit potential of simultaneous participation in both day-ahead and balancing markets, especially with larger battery storage systems; despite increased costs and narrower profit margins associated with higher-volume trading, the implementation of high-frequency strategies plays a significant role in maximizing profits and addressing market challenges. Finally, we modelled four commercial battery storage systems and evaluated their economic viability through a scenario analysis, with larger batteries showing a shorter return on investment.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Locating and measuring marine aquaculture production from space: a computer vision approach in the French Mediterranean
Authors:
Sebastian Quaade,
Andrea Vallebueno,
Olivia D. N. Alcabes,
Kit T. Rodolfa,
Daniel E. Ho
Abstract:
Aquaculture production -- the cultivation of aquatic plants and animals -- has grown rapidly since the 1990s, but sparse, self-reported and aggregate production data limits the effective understanding and monitoring of the industry's trends and potential risks. Building on a manual survey of aquaculture production from remote sensing imagery, we train a computer vision model to identify marine aqu…
▽ More
Aquaculture production -- the cultivation of aquatic plants and animals -- has grown rapidly since the 1990s, but sparse, self-reported and aggregate production data limits the effective understanding and monitoring of the industry's trends and potential risks. Building on a manual survey of aquaculture production from remote sensing imagery, we train a computer vision model to identify marine aquaculture cages from aerial and satellite imagery, and generate a spatially explicit dataset of finfish production locations in the French Mediterranean from 2000-2021 that includes 4,010 cages (69m2 average cage area). We demonstrate the value of our method as an easily adaptable, cost-effective approach that can improve the speed and reliability of aquaculture surveys, and enables downstream analyses relevant to researchers and regulators. We illustrate its use to compute independent estimates of production, and develop a flexible framework to quantify uncertainty in these estimates. Overall, our study presents an efficient, scalable and highly adaptable method for monitoring aquaculture production from remote sensing imagery.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Fine-Tuning BERTs for Definition Extraction from Mathematical Text
Authors:
Lucy Horowitz,
Ryan Hathaway
Abstract:
In this paper, we fine-tuned three pre-trained BERT models on the task of "definition extraction" from mathematical English written in LaTeX. This is presented as a binary classification problem, where either a sentence contains a definition of a mathematical term or it does not. We used two original data sets, "Chicago" and "TAC," to fine-tune and test these models. We also tested on WFMALL, a da…
▽ More
In this paper, we fine-tuned three pre-trained BERT models on the task of "definition extraction" from mathematical English written in LaTeX. This is presented as a binary classification problem, where either a sentence contains a definition of a mathematical term or it does not. We used two original data sets, "Chicago" and "TAC," to fine-tune and test these models. We also tested on WFMALL, a dataset presented by Vanetik and Litvak in 2021 and compared the performance of our models to theirs. We found that a high-performance Sentence-BERT transformer model performed best based on overall accuracy, recall, and precision metrics, achieving comparable results to the earlier models with less computational effort.
△ Less
Submitted 27 June, 2024; v1 submitted 19 June, 2024;
originally announced June 2024.
-
Quantum automata and languages of finite index
Authors:
Andrea Benso,
Flavio D'Alessandro,
Paolo Papi
Abstract:
This paper is a continuation of a previous study on the so-called measure once finite quantum automata model introduced by Moore and Crutchfield in 2000. We investigate conditions assuring that, given a language recognized by such a device and a language generated by a context-free grammar of finite index or by a matrix context-free grammar, it is recursively decidable whether or not they have a n…
▽ More
This paper is a continuation of a previous study on the so-called measure once finite quantum automata model introduced by Moore and Crutchfield in 2000. We investigate conditions assuring that, given a language recognized by such a device and a language generated by a context-free grammar of finite index or by a matrix context-free grammar, it is recursively decidable whether or not they have a nonempty intersection.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
A Primal-Dual Framework for Transformers and Neural Networks
Authors:
Tan M. Nguyen,
Tam Nguyen,
Nhat Ho,
Andrea L. Bertozzi,
Richard G. Baraniuk,
Stanley J. Osher
Abstract:
Self-attention is key to the remarkable success of transformers in sequence modeling tasks including many applications in natural language processing and computer vision. Like neural network layers, these attention mechanisms are often developed by heuristics and experience. To provide a principled framework for constructing attention layers in transformers, we show that the self-attention corresp…
▽ More
Self-attention is key to the remarkable success of transformers in sequence modeling tasks including many applications in natural language processing and computer vision. Like neural network layers, these attention mechanisms are often developed by heuristics and experience. To provide a principled framework for constructing attention layers in transformers, we show that the self-attention corresponds to the support vector expansion derived from a support vector regression problem, whose primal formulation has the form of a neural network layer. Using our framework, we derive popular attention layers used in practice and propose two new attentions: 1) the Batch Normalized Attention (Attention-BN) derived from the batch normalization layer and 2) the Attention with Scaled Head (Attention-SH) derived from using less training data to fit the SVR model. We empirically demonstrate the advantages of the Attention-BN and Attention-SH in reducing head redundancy, increasing the model's accuracy, and improving the model's efficiency in a variety of practical applications including image and time-series classification.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.