Search | arXiv e-print repository

Towards a multimodal framework for remote sensing image change retrieval and captioning

Authors: Roger Ferrod, Luigi Di Caro, Dino Ienco

Abstract: Recently, there has been increasing interest in multimodal applications that integrate text with other modalities, such as images, audio and video, to facilitate natural language interactions with multimodal AI systems. While applications involving standard modalities have been extensively explored, there is still a lack of investigation into specific data modalities such as remote sensing (RS) da… ▽ More Recently, there has been increasing interest in multimodal applications that integrate text with other modalities, such as images, audio and video, to facilitate natural language interactions with multimodal AI systems. While applications involving standard modalities have been extensively explored, there is still a lack of investigation into specific data modalities such as remote sensing (RS) data. Despite the numerous potential applications of RS data, including environmental protection, disaster monitoring and land planning, available solutions are predominantly focused on specific tasks like classification, captioning and retrieval. These solutions often overlook the unique characteristics of RS data, such as its capability to systematically provide information on the same geographical areas over time. This ability enables continuous monitoring of changes in the underlying landscape. To address this gap, we propose a novel foundation model for bi-temporal RS image pairs, in the context of change detection analysis, leveraging Contrastive Learning and the LEVIR-CC dataset for both captioning and text-image retrieval. By jointly training a contrastive encoder and captioning decoder, our model add text-image retrieval capabilities, in the context of bi-temporal change detection, while maintaining captioning performances that are comparable to the state of the art. We release the source code and pretrained weights at: https://github.com/rogerferrod/RSICRC. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.11840 [pdf, other]

LLaNA: Large Language and NeRF Assistant

Authors: Andrea Amaduzzi, Pierluigi Zama Ramirez, Giuseppe Lisanti, Samuele Salti, Luigi Di Stefano

Abstract: Multimodal Large Language Models (MLLMs) have demonstrated an excellent understanding of images and 3D data. However, both modalities have shortcomings in holistically capturing the appearance and geometry of objects. Meanwhile, Neural Radiance Fields (NeRFs), which encode information within the weights of a simple Multi-Layer Perceptron (MLP), have emerged as an increasingly widespread modality t… ▽ More Multimodal Large Language Models (MLLMs) have demonstrated an excellent understanding of images and 3D data. However, both modalities have shortcomings in holistically capturing the appearance and geometry of objects. Meanwhile, Neural Radiance Fields (NeRFs), which encode information within the weights of a simple Multi-Layer Perceptron (MLP), have emerged as an increasingly widespread modality that simultaneously encodes the geometry and photorealistic appearance of objects. This paper investigates the feasibility and effectiveness of ingesting NeRF into MLLM. We create LLaNA, the first general-purpose NeRF-language assistant capable of performing new tasks such as NeRF captioning and Q\&A. Notably, our method directly processes the weights of the NeRF's MLP to extract information about the represented objects without the need to render images or materialize 3D data structures. Moreover, we build a dataset of NeRFs with text annotations for various NeRF-language tasks with no human intervention. Based on this dataset, we develop a benchmark to evaluate the NeRF understanding capability of our method. Results show that processing NeRF weights performs favourably against extracting 2D or 3D representations from NeRFs. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Under review. Project page: https://andreamaduzzi.github.io/llana/

arXiv:2405.12731 [pdf, other]

From Today's Code to Tomorrow's Symphony: The AI Transformation of Developer's Routine by 2030

Authors: Matteo Ciniselli, Niccolò Puccinelli, Ketai Qiu, Luca Di Grazia

Abstract: In the rapidly evolving landscape of software engineering, the integration of Artificial Intelligence (AI) into the Software Development Life-Cycle (SDLC) heralds a transformative era for developers. Recently, we have assisted to a pivotal shift towards AI-assisted programming, exemplified by tools like GitHub Copilot and OpenAI's ChatGPT, which have become a crucial element for coding, debugging,… ▽ More In the rapidly evolving landscape of software engineering, the integration of Artificial Intelligence (AI) into the Software Development Life-Cycle (SDLC) heralds a transformative era for developers. Recently, we have assisted to a pivotal shift towards AI-assisted programming, exemplified by tools like GitHub Copilot and OpenAI's ChatGPT, which have become a crucial element for coding, debugging, and software design. In this paper we provide a comparative analysis between the current state of AI-assisted programming in 2024 and our projections for 2030, by exploring how AI advancements are set to enhance the implementation phase, fundamentally altering developers' roles from manual coders to orchestrators of AI-driven development ecosystems. We envision HyperAssistant, an augmented AI tool that offers comprehensive support to 2030 developers, addressing current limitations in mental health support, fault detection, code optimization, team interaction, and skill development. We emphasize AI as a complementary force, augmenting developers' capabilities rather than replacing them, leading to the creation of sophisticated, reliable, and secure software solutions. Our vision seeks to anticipate the evolution of programming practices, challenges, and future directions, sha** a new paradigm where developers and AI collaborate more closely, promising a significant leap in SE efficiency, security and creativity. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.07387 [pdf, other]

Semantic Loss Functions for Neuro-Symbolic Structured Prediction

Authors: Kareem Ahmed, Stefano Teso, Paolo Morettin, Luca Di Liello, Pierfrancesco Ardino, Jacopo Gobbi, Yitao Liang, Eric Wang, Kai-Wei Chang, Andrea Passerini, Guy Van den Broeck

Abstract: Structured output prediction problems are ubiquitous in machine learning. The prominent approach leverages neural networks as powerful feature extractors, otherwise assuming the independence of the outputs. These outputs, however, jointly encode an object, e.g. a path in a graph, and are therefore related through the structure underlying the output space. We discuss the semantic loss, which inject… ▽ More Structured output prediction problems are ubiquitous in machine learning. The prominent approach leverages neural networks as powerful feature extractors, otherwise assuming the independence of the outputs. These outputs, however, jointly encode an object, e.g. a path in a graph, and are therefore related through the structure underlying the output space. We discuss the semantic loss, which injects knowledge about such structure, defined symbolically, into training by minimizing the network's violation of such dependencies, steering the network towards predicting distributions satisfying the underlying structure. At the same time, it is agnostic to the arrangement of the symbols, and depends only on the semantics expressed thereby, while also enabling efficient end-to-end training and inference. We also discuss key improvements and applications of the semantic loss. One limitations of the semantic loss is that it does not exploit the association of every data point with certain features certifying its membership in a target class. We should therefore prefer minimum-entropy distributions over valid structures, which we obtain by additionally minimizing the neuro-symbolic entropy. We empirically demonstrate the benefits of this more refined formulation. Moreover, the semantic loss is designed to be modular and can be combined with both discriminative and generative neural models. This is illustrated by integrating it into generative adversarial networks, yielding constrained adversarial networks, a novel class of deep generative models able to efficiently synthesize complex objects obeying the structure of the underlying domain. △ Less

Submitted 12 May, 2024; originally announced May 2024.

Comments: Preprint of Ch. 22 "Semantic Loss Functions for Neuro-Symbolic Structured Prediction" in "Compendium of Neurosymbolic Artificial Intelligence", https://ebooks.iospress.nl/ISBN/978-1-64368-406-2. arXiv admin note: substantial text overlap with arXiv:2201.11250, arXiv:2007.13197

arXiv:2405.05828 [pdf, other]

MAD-ICP: It Is All About Matching Data -- Robust and Informed LiDAR Odometry

Authors: Simone Ferrari, Luca Di Giammarino, Leonardo Brizi, Giorgio Grisetti

Abstract: LiDAR odometry is the task of estimating the ego-motion of the sensor from sequential laser scans. This problem has been addressed by the community for more than two decades, and many effective solutions are available nowadays. Most of these systems implicitly rely on assumptions about the operating environment, the sensor used, and motion pattern. When these assumptions are violated, several well… ▽ More LiDAR odometry is the task of estimating the ego-motion of the sensor from sequential laser scans. This problem has been addressed by the community for more than two decades, and many effective solutions are available nowadays. Most of these systems implicitly rely on assumptions about the operating environment, the sensor used, and motion pattern. When these assumptions are violated, several well-known systems tend to perform poorly. This paper presents a LiDAR odometry system that can overcome these limitations and operate well under different operating conditions while achieving performance comparable with domain-specific methods. Our algorithm follows the well-known ICP paradigm that leverages a PCA-based kd-tree implementation that is used to extract structural information about the clouds being registered and to compute the minimization metric for the alignment. The drift is bound by managing the local map based on the estimated uncertainty of the tracked pose. To benefit the community, we release an open-source C++ anytime real-time implementation. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: https://github.com/rvp-group/mad-icp

arXiv:2405.01085 [pdf, other]

Single Image Super-Resolution Based on Global-Local Information Synergy

Authors: Nianzu Qiao, Lamei Di, Changyin Sun

Abstract: Although several image super-resolution solutions exist, they still face many challenges. CNN-based algorithms, despite the reduction in computational complexity, still need to improve their accuracy. While Transformer-based algorithms have higher accuracy, their ultra-high computational complexity makes them difficult to be accepted in practical applications. To overcome the existing challenges,… ▽ More Although several image super-resolution solutions exist, they still face many challenges. CNN-based algorithms, despite the reduction in computational complexity, still need to improve their accuracy. While Transformer-based algorithms have higher accuracy, their ultra-high computational complexity makes them difficult to be accepted in practical applications. To overcome the existing challenges, a novel super-resolution reconstruction algorithm is proposed in this paper. The algorithm achieves a significant increase in accuracy through a unique design while maintaining a low complexity. The core of the algorithm lies in its cleverly designed Global-Local Information Extraction Module and Basic Block Module. By combining global and local information, the Global-Local Information Extraction Module aims to understand the image content more comprehensively so as to recover the global structure and local details in the image more accurately, which provides rich information support for the subsequent reconstruction process. Experimental results show that the comprehensive performance of the algorithm proposed in this paper is optimal, providing an efficient and practical new solution in the field of super-resolution reconstruction. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2405.01083 [pdf, other]

MCMS: Multi-Category Information and Multi-Scale Stripe Attention for Blind Motion Deblurring

Authors: Nianzu Qiao, Lamei Di, Changyin Sun

Abstract: Deep learning-based motion deblurring techniques have advanced significantly in recent years. This class of techniques, however, does not carefully examine the inherent flaws in blurry images. For instance, low edge and structural information are traits of blurry images. The high-frequency component of blurry images is edge information, and the low-frequency component is structure information. A b… ▽ More Deep learning-based motion deblurring techniques have advanced significantly in recent years. This class of techniques, however, does not carefully examine the inherent flaws in blurry images. For instance, low edge and structural information are traits of blurry images. The high-frequency component of blurry images is edge information, and the low-frequency component is structure information. A blind motion deblurring network (MCMS) based on multi-category information and multi-scale stripe attention mechanism is proposed. Given the respective characteristics of the high-frequency and low-frequency components, a three-stage encoder-decoder model is designed. Specifically, the first stage focuses on extracting the features of the high-frequency component, the second stage concentrates on extracting the features of the low-frequency component, and the third stage integrates the extracted low-frequency component features, the extracted high-frequency component features, and the original blurred image in order to recover the final clear image. As a result, the model effectively improves motion deblurring by fusing the edge information of the high-frequency component and the structural information of the low-frequency component. In addition, a grouped feature fusion technique is developed so as to achieve richer, more three-dimensional and comprehensive utilization of various types of features at a deep level. Next, a multi-scale stripe attention mechanism (MSSA) is designed, which effectively combines the anisotropy and multi-scale information of the image, a move that significantly enhances the capability of the deep model in feature representation. Large-scale comparative studies on various datasets show that the strategy in this paper works better than the recently published measures. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.16558 [pdf, other]

doi 10.1049/ell2.13191

DeepKalPose: An Enhanced Deep-Learning Kalman Filter for Temporally Consistent Monocular Vehicle Pose Estimation

Authors: Leandro Di Bella, Yangxintong Lyu, Adrian Munteanu

Abstract: This paper presents DeepKalPose, a novel approach for enhancing temporal consistency in monocular vehicle pose estimation applied on video through a deep-learning-based Kalman Filter. By integrating a Bi-directional Kalman filter strategy utilizing forward and backward time-series processing, combined with a learnable motion model to represent complex motion patterns, our method significantly impr… ▽ More This paper presents DeepKalPose, a novel approach for enhancing temporal consistency in monocular vehicle pose estimation applied on video through a deep-learning-based Kalman Filter. By integrating a Bi-directional Kalman filter strategy utilizing forward and backward time-series processing, combined with a learnable motion model to represent complex motion patterns, our method significantly improves pose accuracy and robustness across various conditions, particularly for occluded or distant vehicles. Experimental validation on the KITTI dataset confirms that DeepKalPose outperforms existing methods in both pose accuracy and temporal consistency. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 4 pages, 3 Figures, published to IET Electronic Letters

Journal ref: Electronics Letters (ISSN: 00135194), jaar: 2024, volume: 60, nummer: 8, startpagina: ?

arXiv:2404.11322 [pdf, other]

VBR: A Vision Benchmark in Rome

Authors: Leonardo Brizi, Emanuele Giacomini, Luca Di Giammarino, Simone Ferrari, Omar Salem, Lorenzo De Rebotti, Giorgio Grisetti

Abstract: This paper presents a vision and perception research dataset collected in Rome, featuring RGB data, 3D point clouds, IMU, and GPS data. We introduce a new benchmark targeting visual odometry and SLAM, to advance the research in autonomous robotics and computer vision. This work complements existing datasets by simultaneously addressing several issues, such as environment diversity, motion patterns… ▽ More This paper presents a vision and perception research dataset collected in Rome, featuring RGB data, 3D point clouds, IMU, and GPS data. We introduce a new benchmark targeting visual odometry and SLAM, to advance the research in autonomous robotics and computer vision. This work complements existing datasets by simultaneously addressing several issues, such as environment diversity, motion patterns, and sensor frequency. It uses up-to-date devices and presents effective procedures to accurately calibrate the intrinsic and extrinsic of the sensors while addressing temporal synchronization. During recording, we cover multi-floor buildings, gardens, urban and highway scenarios. Combining handheld and car-based data collections, our setup can simulate any robot (quadrupeds, quadrotors, autonomous vehicles). The dataset includes an accurate 6-dof ground truth based on a novel methodology that refines the RTK-GPS estimate with LiDAR point clouds through Bundle Adjustment. All sequences divided in training and testing are accessible through our website. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: Accepted at IEEE ICRA 2024 Website: https://rvp-group.net/datasets/slam.html

arXiv:2404.08245 [pdf, other]

A Distributed Approach for Persistent Homology Computation on a Large Scale

Authors: Riccardo Ceccaroni, Lorenzo Di Rocco, Umberto Ferraro Petrillo, Pierpaolo Brutti

Abstract: Persistent homology (PH) is a powerful mathematical method to automatically extract relevant insights from images, such as those obtained by high-resolution imaging devices like electron microscopes or new-generation telescopes. However, the application of this method comes at a very high computational cost, that is bound to explode more because new imaging devices generate an ever-growing amount… ▽ More Persistent homology (PH) is a powerful mathematical method to automatically extract relevant insights from images, such as those obtained by high-resolution imaging devices like electron microscopes or new-generation telescopes. However, the application of this method comes at a very high computational cost, that is bound to explode more because new imaging devices generate an ever-growing amount of data. In this paper we present PixHomology, a novel algorithm for efficiently computing $0$-dimensional PH on 2D images, optimizing memory and processing time. By leveraging the Apache Spark framework, we also present a distributed version of our algorithm with several optimized variants, able to concurrently process large batches of astronomical images. Finally, we present the results of an experimental analysis showing that our algorithm and its distributed version are efficient in terms of required memory, execution time, and scalability, consistently outperforming existing state-of-the-art PH computation tools when used to process large datasets. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2404.07993 [pdf, other]

Connecting NeRFs, Images, and Text

Authors: Francesco Ballerini, Pierluigi Zama Ramirez, Roberto Mirabella, Samuele Salti, Luigi Di Stefano

Abstract: Neural Radiance Fields (NeRFs) have emerged as a standard framework for representing 3D scenes and objects, introducing a novel data type for information exchange and storage. Concurrently, significant progress has been made in multimodal representation learning for text and image data. This paper explores a novel research direction that aims to connect the NeRF modality with other modalities, sim… ▽ More Neural Radiance Fields (NeRFs) have emerged as a standard framework for representing 3D scenes and objects, introducing a novel data type for information exchange and storage. Concurrently, significant progress has been made in multimodal representation learning for text and image data. This paper explores a novel research direction that aims to connect the NeRF modality with other modalities, similar to established methodologies for images and text. To this end, we propose a simple framework that exploits pre-trained models for NeRF representations alongside multimodal models for text and image processing. Our framework learns a bidirectional map** between NeRF embeddings and those obtained from corresponding images and text. This map** unlocks several novel and useful applications, including NeRF zero-shot classification and NeRF retrieval from images or text. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: Accepted at CVPRW-INRV 2024

arXiv:2404.05133 [pdf, other]

EcoVerse: An Annotated Twitter Dataset for Eco-Relevance Classification, Environmental Impact Analysis, and Stance Detection

Authors: Francesca Grasso, Stefano Locci, Giovanni Siragusa, Luigi Di Caro

Abstract: Anthropogenic ecological crisis constitutes a significant challenge that all within the academy must urgently face, including the Natural Language Processing (NLP) community. While recent years have seen increasing work revolving around climate-centric discourse, crucial environmental and ecological topics outside of climate change remain largely unaddressed, despite their prominent importance. Ma… ▽ More Anthropogenic ecological crisis constitutes a significant challenge that all within the academy must urgently face, including the Natural Language Processing (NLP) community. While recent years have seen increasing work revolving around climate-centric discourse, crucial environmental and ecological topics outside of climate change remain largely unaddressed, despite their prominent importance. Mainstream NLP tasks, such as sentiment analysis, dominate the scene, but there remains an untouched space in the literature involving the analysis of environmental impacts of certain events and practices. To address this gap, this paper presents EcoVerse, an annotated English Twitter dataset of 3,023 tweets spanning a wide spectrum of environmental topics. We propose a three-level annotation scheme designed for Eco-Relevance Classification, Stance Detection, and introducing an original approach for Environmental Impact Analysis. We detail the data collection, filtering, and labeling process that led to the creation of the dataset. Remarkable Inter-Annotator Agreement indicates that the annotation scheme produces consistent annotations of high quality. Subsequent classification experiments using BERT-based models, including ClimateBERT, are presented. These yield encouraging results, while also indicating room for a model specifically tailored for environmental texts. The dataset is made freely available to stimulate further research. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.03743 [pdf, other]

Test Time Training for Industrial Anomaly Segmentation

Authors: Alex Costanzino, Pierluigi Zama Ramirez, Mirko Del Moro, Agostino Aiezzo, Giuseppe Lisanti, Samuele Salti, Luigi Di Stefano

Abstract: Anomaly Detection and Segmentation (AD&S) is crucial for industrial quality control. While existing methods excel in generating anomaly scores for each pixel, practical applications require producing a binary segmentation to identify anomalies. Due to the absence of labeled anomalies in many real scenarios, standard practices binarize these maps based on some statistics derived from a validation s… ▽ More Anomaly Detection and Segmentation (AD&S) is crucial for industrial quality control. While existing methods excel in generating anomaly scores for each pixel, practical applications require producing a binary segmentation to identify anomalies. Due to the absence of labeled anomalies in many real scenarios, standard practices binarize these maps based on some statistics derived from a validation set containing only nominal samples, resulting in poor segmentation performance. This paper addresses this problem by proposing a test time training strategy to improve the segmentation performance. Indeed, at test time, we can extract rich features directly from anomalous samples to train a classifier that can discriminate defects effectively. Our general approach can work downstream to any AD&S method that provides an anomaly score map as output, even in multimodal settings. We demonstrate the effectiveness of our approach over baselines through extensive experimentation and evaluation on MVTec AD and MVTec 3D-AD. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: Accepted at VAND 2.0, CVPRW 2024

arXiv:2401.06619 [pdf, other]

doi 10.1145/3597503.3639184

PyTy: Repairing Static Type Errors in Python

Authors: Yiu Wai Chow, Luca Di Grazia, Michael Pradel

Abstract: Gradual ty** enables developers to annotate types of their own choosing, offering a flexible middle ground between no type annotations and a fully statically typed language. As more and more code bases get type-annotated, static type checkers detect an increasingly large number of type errors. Unfortunately, fixing these errors requires manual effort, hampering the adoption of gradual ty** in… ▽ More Gradual ty** enables developers to annotate types of their own choosing, offering a flexible middle ground between no type annotations and a fully statically typed language. As more and more code bases get type-annotated, static type checkers detect an increasingly large number of type errors. Unfortunately, fixing these errors requires manual effort, hampering the adoption of gradual ty** in practice. This paper presents PyTy, an automated program repair approach targeted at statically detectable type errors in Python. The problem of repairing type errors deserves specific attention because it exposes particular repair patterns, offers a warning message with hints about where and how to apply a fix, and because gradual type checking serves as an automatic way to validate fixes. We addresses this problem through three contributions: (i) an empirical study that investigates how developers fix Python type errors, showing a diverse set of fixing strategies with some recurring patterns; (ii) an approach to automatically extract type error fixes, which enables us to create a dataset of 2,766 error-fix pairs from 176 GitHub repositories, named PyTyDefects; (iii) the first learning-based repair technique for fixing type errors in Python. Motivated by the relative data scarcity of the problem, the neural model at the core of PyTy is trained via cross-lingual transfer learning. Our evaluation shows that PyTy offers fixes for ten frequent categories of type errors, successfully addressing 85.4% of 281 real-world errors. This effectiveness outperforms state-of-the-art large language models asked to repair type errors (by 2.1x) and complements a previous technique aimed at type errors that manifest at runtime. Finally, 20 out of 30 pull requests with PyTy-suggested fixes have been merged by developers, showing the usefulness of PyTy in practice. △ Less

Submitted 12 January, 2024; originally announced January 2024.

Journal ref: ICSE 2024

arXiv:2312.13277 [pdf, other]

Deep Learning on 3D Neural Fields

Authors: Pierluigi Zama Ramirez, Luca De Luigi, Daniele Sirocchi, Adriano Cardace, Riccardo Spezialetti, Francesco Ballerini, Samuele Salti, Luigi Di Stefano

Abstract: In recent years, Neural Fields (NFs) have emerged as an effective tool for encoding diverse continuous signals such as images, videos, audio, and 3D shapes. When applied to 3D data, NFs offer a solution to the fragmentation and limitations associated with prevalent discrete representations. However, given that NFs are essentially neural networks, it remains unclear whether and how they can be seam… ▽ More In recent years, Neural Fields (NFs) have emerged as an effective tool for encoding diverse continuous signals such as images, videos, audio, and 3D shapes. When applied to 3D data, NFs offer a solution to the fragmentation and limitations associated with prevalent discrete representations. However, given that NFs are essentially neural networks, it remains unclear whether and how they can be seamlessly integrated into deep learning pipelines for solving downstream tasks. This paper addresses this research problem and introduces nf2vec, a framework capable of generating a compact latent representation for an input NF in a single inference pass. We demonstrate that nf2vec effectively embeds 3D objects represented by the input NFs and showcase how the resulting embeddings can be employed in deep learning pipelines to successfully address various tasks, all while processing exclusively NFs. We test this framework on several NFs used to represent 3D surfaces, such as unsigned/signed distance and occupancy fields. Moreover, we demonstrate the effectiveness of our approach with more complex NFs that encompass both geometry and appearance of 3D objects such as neural radiance fields. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: Extended version of the paper "Deep Learning on Implicit Neural Representations of Shapes" that was presented at ICLR 2023. arXiv admin note: text overlap with arXiv:2302.05438

arXiv:2312.04521 [pdf, other]

Multimodal Industrial Anomaly Detection by Crossmodal Feature Map**

Authors: Alex Costanzino, Pierluigi Zama Ramirez, Giuseppe Lisanti, Luigi Di Stefano

Abstract: The paper explores the industrial multimodal Anomaly Detection (AD) task, which exploits point clouds and RGB images to localize anomalies. We introduce a novel light and fast framework that learns to map features from one modality to the other on nominal samples. At test time, anomalies are detected by pinpointing inconsistencies between observed and mapped features. Extensive experiments show th… ▽ More The paper explores the industrial multimodal Anomaly Detection (AD) task, which exploits point clouds and RGB images to localize anomalies. We introduce a novel light and fast framework that learns to map features from one modality to the other on nominal samples. At test time, anomalies are detected by pinpointing inconsistencies between observed and mapped features. Extensive experiments show that our approach achieves state-of-the-art detection and segmentation performance in both the standard and few-shot settings on the MVTec 3D-AD dataset while achieving faster inference and occupying less memory than previous multimodal AD methods. Moreover, we propose a layer-pruning technique to improve memory and time efficiency with a marginal sacrifice in performance. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2311.03197 [pdf, other]

Stable Linear Subspace Identification: A Machine Learning Approach

Authors: Loris Di Natale, Muhammad Zakwan, Bratislav Svetozarevic, Philipp Heer, Giancarlo Ferrari-Trecate, Colin N. Jones

Abstract: Machine Learning (ML) and linear System Identification (SI) have been historically developed independently. In this paper, we leverage well-established ML tools - especially the automatic differentiation framework - to introduce SIMBa, a family of discrete linear multi-step-ahead state-space SI methods using backpropagation. SIMBa relies on a novel Linear-Matrix-Inequality-based free parametrizati… ▽ More Machine Learning (ML) and linear System Identification (SI) have been historically developed independently. In this paper, we leverage well-established ML tools - especially the automatic differentiation framework - to introduce SIMBa, a family of discrete linear multi-step-ahead state-space SI methods using backpropagation. SIMBa relies on a novel Linear-Matrix-Inequality-based free parametrization of Schur matrices to ensure the stability of the identified model. We show how SIMBa generally outperforms traditional linear state-space SI methods, and sometimes significantly, although at the price of a higher computational burden. This performance gap is particularly remarkable compared to other SI methods with stability guarantees, where the gain is frequently above 25% in our investigations, hinting at SIMBa's ability to simultaneously achieve state-of-the-art fitting performance and enforce stability. Interestingly, these observations hold for a wide variety of input-output systems and on both simulated and real-world data, showcasing the flexibility of the proposed approach. We postulate that this new SI paradigm presents a great extension potential to identify structured nonlinear models from data, and we hence open-source SIMBa on https://github.com/Cemempamoi/simba. △ Less

Submitted 26 March, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

Comments: Accepted at ECC 2024

arXiv:2310.01140 [pdf, other]

Neural Processing of Tri-Plane Hybrid Neural Fields

Authors: Adriano Cardace, Pierluigi Zama Ramirez, Francesco Ballerini, Allan Zhou, Samuele Salti, Luigi Di Stefano

Abstract: Driven by the appealing properties of neural fields for storing and communicating 3D data, the problem of directly processing them to address tasks such as classification and part segmentation has emerged and has been investigated in recent works. Early approaches employ neural fields parameterized by shared networks trained on the whole dataset, achieving good task performance but sacrificing rec… ▽ More Driven by the appealing properties of neural fields for storing and communicating 3D data, the problem of directly processing them to address tasks such as classification and part segmentation has emerged and has been investigated in recent works. Early approaches employ neural fields parameterized by shared networks trained on the whole dataset, achieving good task performance but sacrificing reconstruction quality. To improve the latter, later methods focus on individual neural fields parameterized as large Multi-Layer Perceptrons (MLPs), which are, however, challenging to process due to the high dimensionality of the weight space, intrinsic weight space symmetries, and sensitivity to random initialization. Hence, results turn out significantly inferior to those achieved by processing explicit representations, e.g., point clouds or meshes. In the meantime, hybrid representations, in particular based on tri-planes, have emerged as a more effective and efficient alternative to realize neural fields, but their direct processing has not been investigated yet. In this paper, we show that the tri-plane discrete data structure encodes rich information, which can be effectively processed by standard deep-learning machinery. We define an extensive benchmark covering a diverse set of fields such as occupancy, signed/unsigned distance, and, for the first time, radiance fields. While processing a field with the same reconstruction quality, we achieve task performance far superior to frameworks that process large MLPs and, for the first time, almost on par with architectures handling explicit representations. △ Less

Submitted 30 January, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: Accepted at ICLR 2024

arXiv:2310.00758 [pdf, other]

Data-driven adaptive building thermal controller tuning with constraints: A primal-dual contextual Bayesian optimization approach

Authors: Wenjie Xu, Bratislav Svetozarevic, Loris Di Natale, Philipp Heer, Colin N Jones

Abstract: We study the problem of tuning the parameters of a room temperature controller to minimize its energy consumption, subject to the constraint that the daily cumulative thermal discomfort of the occupants is below a given threshold. We formulate it as an online constrained black-box optimization problem where, on each day, we observe some relevant environmental context and adaptively select the cont… ▽ More We study the problem of tuning the parameters of a room temperature controller to minimize its energy consumption, subject to the constraint that the daily cumulative thermal discomfort of the occupants is below a given threshold. We formulate it as an online constrained black-box optimization problem where, on each day, we observe some relevant environmental context and adaptively select the controller parameters. In this paper, we propose to use a data-driven Primal-Dual Contextual Bayesian Optimization (PDCBO) approach to solve this problem. In a simulation case study on a single room, we apply our algorithm to tune the parameters of a Proportional Integral (PI) heating controller and the pre-heating time. Our results show that PDCBO can save up to 4.7% energy consumption compared to other state-of-the-art Bayesian optimization-based methods while kee** the daily thermal discomfort below the given tolerable threshold on average. Additionally, PDCBO can automatically track time-varying tolerable thresholds while existing methods fail to do so. We then study an alternative constrained tuning problem where we aim to minimize the thermal discomfort with a given energy budget. With this formulation, PDCBO reduces the average discomfort by up to 63% compared to state-of-the-art safe optimization methods while kee** the average daily energy consumption below the required threshold. △ Less

Submitted 1 October, 2023; originally announced October 2023.

arXiv:2309.08272 [pdf, other]

Structural Self-Supervised Objectives for Transformers

Authors: Luca Di Liello

Abstract: This thesis focuses on improving the pre-training of natural language models using unsupervised raw data to make them more efficient and aligned with downstream applications. In the first part, we introduce three alternative pre-training objectives to BERT's Masked Language Modeling (MLM), namely Random Token Substitution (RTS), Cluster-based Random Token Substitution (C-RTS), and Swapped Langua… ▽ More This thesis focuses on improving the pre-training of natural language models using unsupervised raw data to make them more efficient and aligned with downstream applications. In the first part, we introduce three alternative pre-training objectives to BERT's Masked Language Modeling (MLM), namely Random Token Substitution (RTS), Cluster-based Random Token Substitution (C-RTS), and Swapped Language Modeling (SLM). These objectives involve token swap** instead of masking, with RTS and C-RTS aiming to predict token originality and SLM predicting the original token values. Results show that RTS and C-RTS require less pre-training time while maintaining performance comparable to MLM. Surprisingly, SLM outperforms MLM on certain tasks despite using the same computational budget. In the second part, we proposes self-supervised pre-training tasks that align structurally with downstream applications, reducing the need for labeled data. We use large corpora like Wikipedia and CC-News to train models to recognize if text spans originate from the same paragraph or document in several ways. By doing continuous pre-training, starting from existing models like RoBERTa, ELECTRA, DeBERTa, BART, and T5, we demonstrate significant performance improvements in tasks like Fact Verification, Answer Sentence Selection, and Summarization. These improvements are especially pronounced when limited annotation data is available. The proposed objectives also achieve state-of-the-art results on various benchmark datasets, including FEVER (dev set), ASNQ, WikiQA, and TREC-QA, as well as enhancing the quality of summaries. Importantly, these techniques can be easily integrated with other methods without altering the internal structure of Transformer models, making them versatile for various NLP applications. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: Ph.D. Thesis

arXiv:2309.07917 [pdf, other]

Looking at words and points with attention: a benchmark for text-to-shape coherence

Authors: Andrea Amaduzzi, Giuseppe Lisanti, Samuele Salti, Luigi Di Stefano

Abstract: While text-conditional 3D object generation and manipulation have seen rapid progress, the evaluation of coherence between generated 3D shapes and input textual descriptions lacks a clear benchmark. The reason is twofold: a) the low quality of the textual descriptions in the only publicly available dataset of text-shape pairs; b) the limited effectiveness of the metrics used to quantitatively asse… ▽ More While text-conditional 3D object generation and manipulation have seen rapid progress, the evaluation of coherence between generated 3D shapes and input textual descriptions lacks a clear benchmark. The reason is twofold: a) the low quality of the textual descriptions in the only publicly available dataset of text-shape pairs; b) the limited effectiveness of the metrics used to quantitatively assess such coherence. In this paper, we propose a comprehensive solution that addresses both weaknesses. Firstly, we employ large language models to automatically refine textual descriptions associated with shapes. Secondly, we propose a quantitative metric to assess text-to-shape coherence, through cross-attention mechanisms. To validate our approach, we conduct a user study and compare quantitatively our metric with existing ones. The refined dataset, the new metric and a set of text-shape pairs validated by the user study comprise a novel, fine-grained benchmark that we publicly release to foster research on text-to-shape coherence of text-conditioned 3D generative models. Benchmark available at https://cvlab-unibo.github.io/CrossCoherence-Web/. △ Less

Submitted 14 September, 2023; originally announced September 2023.

Comments: ICCV 2023 Workshop "AI for 3D Content Creation", Project page: https://cvlab-unibo.github.io/CrossCoherence-Web/, 26 pages

arXiv:2309.07874 [pdf, other]

Ca$^2$Lib: Simple and Accurate LiDAR-RGB Calibration using Small Common Markers

Authors: Emanuele Giacomini, Leonardo Brizi, Luca Di Giammarino, Omar Salem, Patrizio Perugini, Giorgio Grisetti

Abstract: In many fields of robotics, knowing the relative position and orientation between two sensors is a mandatory precondition to operate with multiple sensing modalities. In this context, the pair LiDAR-RGB cameras offer complementary features: LiDARs yield sparse high quality range measurements, while RGB cameras provide a dense color measurement of the environment. Existing techniques often rely eit… ▽ More In many fields of robotics, knowing the relative position and orientation between two sensors is a mandatory precondition to operate with multiple sensing modalities. In this context, the pair LiDAR-RGB cameras offer complementary features: LiDARs yield sparse high quality range measurements, while RGB cameras provide a dense color measurement of the environment. Existing techniques often rely either on complex calibration targets that are expensive to obtain, or extracted virtual correspondences that can hinder the estimate's accuracy. In this paper we address the problem of LiDAR-RGB calibration using typical calibration patterns (i.e. A3 chessboard) with minimal human intervention. Our approach exploits the planarity of the target to find correspondences between the sensors measurements, leading to features that are robust to LiDAR noise. Moreover, we estimate a solution by solving a joint non-linear optimization problem. We validated our approach by carrying on quantitative and comparative experiments with other state-of-the-art approaches. Our results show that our simple schema performs on par or better than other approches using complex calibration targets. Finally, we release an open-source C++ implementation at \url{https://github.com/srrg-sapienza/ca2lib} △ Less

Submitted 14 September, 2023; originally announced September 2023.

Comments: 7 pages, 10 figures

arXiv:2309.01206 [pdf]

Comparative Safety Performance of Autonomous- and Human Drivers: A Real-World Case Study of the Waymo One Service

Authors: Luigi Di Lillo, Tilia Gode, Xilin Zhou, Margherita Atzei, Ruoshu Chen, Trent Victor

Abstract: This study compares the safety of autonomous- and human drivers. It finds that the Waymo One autonomous service is significantly safer towards other road users than human drivers are, as measured via collision causation. The result is determined by comparing Waymo's third party liability insurance claims data with mileage- and zip-code-calibrated Swiss Re (human driver) private passenger vehicle b… ▽ More This study compares the safety of autonomous- and human drivers. It finds that the Waymo One autonomous service is significantly safer towards other road users than human drivers are, as measured via collision causation. The result is determined by comparing Waymo's third party liability insurance claims data with mileage- and zip-code-calibrated Swiss Re (human driver) private passenger vehicle baselines. A liability claim is a request for compensation when someone is responsible for damage to property or injury to another person, typically following a collision. Liability claims reporting and their development is designed using insurance industry best practices to assess crash causation contribution and predict future crash contributions. In over 3.8 million miles driven without a human being behind the steering wheel in rider-only (RO) mode, the Waymo Driver incurred zero bodily injury claims in comparison with the human driver baseline of 1.11 claims per million miles (cpmm). The Waymo Driver also significantly reduced property damage claims to 0.78 cpmm in comparison with the human driver baseline of 3.26 cpmm. Similarly, in a more statistically robust dataset of over 35 million miles during autonomous testing operations (TO), the Waymo Driver, together with a human autonomous specialist behind the steering wheel monitoring the automation, also significantly reduced both bodily injury and property damage cpmm compared to the human driver baselines. △ Less

Submitted 3 September, 2023; originally announced September 2023.

arXiv:2308.01050 [pdf, other]

A Counterfactual Safety Margin Perspective on the Scoring of Autonomous Vehicles' Riskiness

Authors: Alessandro Zanardi, Andrea Censi, Margherita Atzei, Luigi Di Lillo, Emilio Frazzoli

Abstract: Autonomous Vehicles (AVs) promise a range of societal advantages, including broader access to mobility, reduced road accidents, and enhanced transportation efficiency. However, evaluating the risks linked to AVs is complex due to limited historical data and the swift progression of technology. This paper presents a data-driven framework for assessing the risk of different AVs' behaviors in various… ▽ More Autonomous Vehicles (AVs) promise a range of societal advantages, including broader access to mobility, reduced road accidents, and enhanced transportation efficiency. However, evaluating the risks linked to AVs is complex due to limited historical data and the swift progression of technology. This paper presents a data-driven framework for assessing the risk of different AVs' behaviors in various operational design domains (ODDs), based on counterfactual simulations of "misbehaving" road users. We propose the notion of counterfactual safety margin, which represents the minimum deviation from nominal behavior that could cause a collision. This methodology not only pinpoints the most critical scenarios but also quantifies the (relative) risk's frequency and severity concerning AVs. Importantly, we show that our approach is applicable even when the AV's behavioral policy remains undisclosed, through worst- and best-case analyses, benefiting external entities like regulators and risk evaluators. Our experimental outcomes demonstrate the correlation between the safety margin, the quality of the driving policy, and the ODD, shedding light on the relative risks of different AV providers. Overall, this work contributes to the safety assessment of AVs and addresses legislative and insurance concerns surrounding this burgeoning technology. △ Less

Submitted 28 November, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

Comments: updated experiments

arXiv:2307.15052 [pdf, other]

Learning Depth Estimation for Transparent and Mirror Surfaces

Authors: Alex Costanzino, Pierluigi Zama Ramirez, Matteo Poggi, Fabio Tosi, Stefano Mattoccia, Luigi Di Stefano

Abstract: Inferring the depth of transparent or mirror (ToM) surfaces represents a hard challenge for either sensors, algorithms, or deep networks. We propose a simple pipeline for learning to estimate depth properly for such surfaces with neural networks, without requiring any ground-truth annotation. We unveil how to obtain reliable pseudo labels by in-painting ToM objects in images and processing them wi… ▽ More Inferring the depth of transparent or mirror (ToM) surfaces represents a hard challenge for either sensors, algorithms, or deep networks. We propose a simple pipeline for learning to estimate depth properly for such surfaces with neural networks, without requiring any ground-truth annotation. We unveil how to obtain reliable pseudo labels by in-painting ToM objects in images and processing them with a monocular depth estimation model. These labels can be used to fine-tune existing monocular or stereo networks, to let them learn how to deal with ToM surfaces. Experimental results on the Booster dataset show the dramatic improvements enabled by our remarkably simple proposal. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Comments: Accepted at ICCV 2023. Project Page: https://cvlab-unibo.github.io/Depth4ToM

arXiv:2307.09776 [pdf, other]

LTL Synthesis on Infinite-State Arenas defined by Programs

Authors: Shaun Azzopardi, Nir Piterman, Gerardo Schneider, Luca di Stefano

Abstract: This paper deals with the problem of automatically and correctly controlling infinite-state reactive programs to achieve LTL goals. Applications include adapting a program to new requirements, or to repair bugs discovered in the original specification or program code. Existing approaches are able to solve this problem for safety and some reachability properties, but require an a priori template of… ▽ More This paper deals with the problem of automatically and correctly controlling infinite-state reactive programs to achieve LTL goals. Applications include adapting a program to new requirements, or to repair bugs discovered in the original specification or program code. Existing approaches are able to solve this problem for safety and some reachability properties, but require an a priori template of the solution for more general properties. Fully automated approaches for full LTL exist, reducing the problem into successive finite LTL reactive synthesis problems in an abstraction-refinement loop. However, they do not terminate when the number of steps to be completed depends on unbounded variables. Our main insight is that safety abstractions of the program are not enough -- fairness properties are also essential to be able to decide many interesting problems, something missed by existing automated approaches. We thus go beyond the state-of-the-art to allow for automated reactive program control for full LTL, with automated discovery of the knowledge, including fairness, of the program needed to determine realisability. We further implement the approach in a tool, with an associated DSL for reactive programs, and illustrate the approach through several case studies. △ Less

Submitted 19 July, 2023; originally announced July 2023.

arXiv:2306.05801 [pdf, other]

Strategies to exploit XAI to improve classification systems

Authors: Andrea Apicella, Luca Di Lorenzo, Francesco Isgrò, Andrea Pollastro, Roberto Prevete

Abstract: Explainable Artificial Intelligence (XAI) aims to provide insights into the decision-making process of AI models, allowing users to understand their results beyond their decisions. A significant goal of XAI is to improve the performance of AI models by providing explanations for their decision-making processes. However, most XAI literature focuses on how to explain an AI system, while less attenti… ▽ More Explainable Artificial Intelligence (XAI) aims to provide insights into the decision-making process of AI models, allowing users to understand their results beyond their decisions. A significant goal of XAI is to improve the performance of AI models by providing explanations for their decision-making processes. However, most XAI literature focuses on how to explain an AI system, while less attention has been given to how XAI methods can be exploited to improve an AI system. In this work, a set of well-known XAI methods typically used with Machine Learning (ML) classification tasks are investigated to verify if they can be exploited, not just to provide explanations but also to improve the performance of the model itself. To this aim, two strategies to use the explanation to improve a classification system are reported and empirically evaluated on three datasets: Fashion-MNIST, CIFAR10, and STL10. Results suggest that explanations built by Integrated Gradients highlight input features that can be effectively used to improve classification performance. △ Less

Submitted 9 June, 2023; originally announced June 2023.

Comments: This work has been accepted to be presented to The 1st World Conference on eXplainable Artificial Intelligence (xAI 2023), July 26-28, 2023 - Lisboa, Portugal

arXiv:2305.15358 [pdf, other]

Context-Aware Transformer Pre-Training for Answer Sentence Selection

Authors: Luca Di Liello, Siddhant Garg, Alessandro Moschitti

Abstract: Answer Sentence Selection (AS2) is a core component for building an accurate Question Answering pipeline. AS2 models rank a set of candidate sentences based on how likely they answer a given question. The state of the art in AS2 exploits pre-trained transformers by transferring them on large annotated datasets, while using local contextual information around the candidate sentence. In this paper,… ▽ More Answer Sentence Selection (AS2) is a core component for building an accurate Question Answering pipeline. AS2 models rank a set of candidate sentences based on how likely they answer a given question. The state of the art in AS2 exploits pre-trained transformers by transferring them on large annotated datasets, while using local contextual information around the candidate sentence. In this paper, we propose three pre-training objectives designed to mimic the downstream fine-tuning task of contextual AS2. This allows for specializing LMs when fine-tuning for contextual AS2. Our experiments on three public and two large-scale industrial datasets show that our pre-training approaches (applied to RoBERTa and ELECTRA) can improve baseline contextual AS2 accuracy by up to 8% on some datasets. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: Accepted at ACL 2023

arXiv:2304.10448 [pdf, other]

ReLight My NeRF: A Dataset for Novel View Synthesis and Relighting of Real World Objects

Authors: Marco Toschi, Riccardo De Matteo, Riccardo Spezialetti, Daniele De Gregorio, Luigi Di Stefano, Samuele Salti

Abstract: In this paper, we focus on the problem of rendering novel views from a Neural Radiance Field (NeRF) under unobserved light conditions. To this end, we introduce a novel dataset, dubbed ReNe (Relighting NeRF), framing real world objects under one-light-at-time (OLAT) conditions, annotated with accurate ground-truth camera and light poses. Our acquisition pipeline leverages two robotic arms holding,… ▽ More In this paper, we focus on the problem of rendering novel views from a Neural Radiance Field (NeRF) under unobserved light conditions. To this end, we introduce a novel dataset, dubbed ReNe (Relighting NeRF), framing real world objects under one-light-at-time (OLAT) conditions, annotated with accurate ground-truth camera and light poses. Our acquisition pipeline leverages two robotic arms holding, respectively, a camera and an omni-directional point-wise light source. We release a total of 20 scenes depicting a variety of objects with complex geometry and challenging materials. Each scene includes 2000 images, acquired from 50 different points of views under 40 different OLAT conditions. By leveraging the dataset, we perform an ablation study on the relighting capability of variants of the vanilla NeRF architecture and identify a lightweight architecture that can render novel views of an object under novel light conditions, which we use to establish a non-trivial baseline for the dataset. Dataset and benchmark are available at https://eyecan-ai.github.io/rene. △ Less

Submitted 20 April, 2023; originally announced April 2023.

Comments: Accepted at CVPR 2023 as a highlight

arXiv:2304.02991 [pdf, other]

Exploiting the Complementarity of 2D and 3D Networks to Address Domain-Shift in 3D Semantic Segmentation

Authors: Adriano Cardace, Pierluigi Zama Ramirez, Samuele Salti, Luigi Di Stefano

Abstract: 3D semantic segmentation is a critical task in many real-world applications, such as autonomous driving, robotics, and mixed reality. However, the task is extremely challenging due to ambiguities coming from the unstructured, sparse, and uncolored nature of the 3D point clouds. A possible solution is to combine the 3D information with others coming from sensors featuring a different modality, such… ▽ More 3D semantic segmentation is a critical task in many real-world applications, such as autonomous driving, robotics, and mixed reality. However, the task is extremely challenging due to ambiguities coming from the unstructured, sparse, and uncolored nature of the 3D point clouds. A possible solution is to combine the 3D information with others coming from sensors featuring a different modality, such as RGB cameras. Recent multi-modal 3D semantic segmentation networks exploit these modalities relying on two branches that process the 2D and 3D information independently, striving to maintain the strength of each modality. In this work, we first explain why this design choice is effective and then show how it can be improved to make the multi-modal semantic segmentation more robust to domain shift. Our surprisingly simple contribution achieves state-of-the-art performances on four popular multi-modal unsupervised domain adaptation benchmarks, as well as better results in a domain generalization scenario. △ Less

Submitted 6 April, 2023; originally announced April 2023.

Comments: Accepted at the CVPR2023 Workshop on Autonomous Driving (WAD)

arXiv:2303.16878 [pdf, other]

Photometric LiDAR and RGB-D Bundle Adjustment

Authors: Luca Di Giammarino, Emanuele Giacomini, Leonardo Brizi, Omar Salem, Giorgio Grisetti

Abstract: The joint optimization of the sensor trajectory and 3D map is a crucial characteristic of Simultaneous Localization and Map** (SLAM) systems. To achieve this, the gold standard is Bundle Adjustment (BA). Modern 3D LiDARs now retain higher resolutions that enable the creation of point cloud images resembling those taken by conventional cameras. Nevertheless, the typical effective global refinemen… ▽ More The joint optimization of the sensor trajectory and 3D map is a crucial characteristic of Simultaneous Localization and Map** (SLAM) systems. To achieve this, the gold standard is Bundle Adjustment (BA). Modern 3D LiDARs now retain higher resolutions that enable the creation of point cloud images resembling those taken by conventional cameras. Nevertheless, the typical effective global refinement techniques employed for RGB-D sensors are not widely applied to LiDARs. This paper presents a novel BA photometric strategy that accounts for both RGB-D and LiDAR in the same way. Our work can be used on top of any SLAM/GNSS estimate to improve and refine the initial trajectory. We conducted different experiments using these two depth sensors on public benchmarks. Our results show that our system performs on par or better compared to other state-of-the-art ad-hoc SLAM/BA strategies, free from data association and without making assumptions about the environment. In addition, we present the benefit of jointly using RGB-D and LiDAR within our unified method. We finally release an open-source CUDA/C++ implementation. △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: 11 pages, 9 figures

arXiv:2303.15356 [pdf, other]

doi 10.1093/comnet/cnad019

Hypergraphx: a library for higher-order network analysis

Authors: Quintino Francesco Lotito, Martina Contisciani, Caterina De Bacco, Leonardo Di Gaetano, Luca Gallo, Alberto Montresor, Federico Musciotto, Nicolò Ruggeri, Federico Battiston

Abstract: From social to biological systems, many real-world systems are characterized by higher-order, non-dyadic interactions. Such systems are conveniently described by hypergraphs, where hyperedges encode interactions among an arbitrary number of units. Here, we present an open-source python library, hypergraphx (HGX), providing a comprehensive collection of algorithms and functions for the analysis of… ▽ More From social to biological systems, many real-world systems are characterized by higher-order, non-dyadic interactions. Such systems are conveniently described by hypergraphs, where hyperedges encode interactions among an arbitrary number of units. Here, we present an open-source python library, hypergraphx (HGX), providing a comprehensive collection of algorithms and functions for the analysis of higher-order networks. These include different ways to convert data across distinct higher-order representations, a large variety of measures of higher-order organization at the local and the mesoscale, statistical filters to sparsify higher-order data, a wide array of static and dynamic generative models, and an implementation of different dynamical processes with higher-order interactions. Our computational framework is general, and allows to analyse hypergraphs with weighted, directed, signed, temporal and multiplex group interactions. We provide visual insights on higher-order data through a variety of different visualization tools. We accompany our code with an extended higher-order data repository, and demonstrate the ability of HGX to analyse real-world systems through a systematic analysis of a social network with higher-order interactions. The library is conceived as an evolving, community-based effort, which will further extend its functionalities over the years. Our software is available at https://github.com/HGX-Team/hypergraphx △ Less

Submitted 27 March, 2023; originally announced March 2023.

Journal ref: Journal of Complex Networks, Volume 11, Issue 3, June 2023

arXiv:2303.07312 [pdf, other]

Enhancing LiDAR performance: Robust De-skewing Exclusively Relying on Range Measurements

Authors: Omar Salem, Emanuele Giacomini, Leonardo Brizi, Luca Di Giammarino, Giorgio Grisetti

Abstract: Most commercially available Light Detection and Ranging (LiDAR)s measure the distances along a 2D section of the environment by sequentially sampling the free range along directions centered at the sensor's origin. When the sensor moves during the acquisition, the measured ranges are affected by a phenomenon known as "skewing", which appears as a distortion in the acquired scan. Skewing potentiall… ▽ More Most commercially available Light Detection and Ranging (LiDAR)s measure the distances along a 2D section of the environment by sequentially sampling the free range along directions centered at the sensor's origin. When the sensor moves during the acquisition, the measured ranges are affected by a phenomenon known as "skewing", which appears as a distortion in the acquired scan. Skewing potentially affects all systems that rely on LiDAR data, however, it could be compensated if the position of the sensor were known each time a single range is measured. Most methods to de-skew a LiDAR are based on external sensors such as IMU or wheel odometry, to estimate these intermediate LiDAR positions. In this paper, we present a method that relies exclusively on range measurements to effectively estimate the robot velocities which are then used for de-skewing. Our approach is suitable for low-frequency LiDAR where the skewing is more evident. It can be seamlessly integrated into existing pipelines, enhancing their performance at a negligible computational cost. △ Less

Submitted 16 October, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

Comments: 6 pages , 5 figures

arXiv:2302.05438 [pdf, other]

Deep Learning on Implicit Neural Representations of Shapes

Authors: Luca De Luigi, Adriano Cardace, Riccardo Spezialetti, Pierluigi Zama Ramirez, Samuele Salti, Luigi Di Stefano

Abstract: Implicit Neural Representations (INRs) have emerged in the last few years as a powerful tool to encode continuously a variety of different signals like images, videos, audio and 3D shapes. When applied to 3D shapes, INRs allow to overcome the fragmentation and shortcomings of the popular discrete representations used so far. Yet, considering that INRs consist in neural networks, it is not clear wh… ▽ More Implicit Neural Representations (INRs) have emerged in the last few years as a powerful tool to encode continuously a variety of different signals like images, videos, audio and 3D shapes. When applied to 3D shapes, INRs allow to overcome the fragmentation and shortcomings of the popular discrete representations used so far. Yet, considering that INRs consist in neural networks, it is not clear whether and how it may be possible to feed them into deep learning pipelines aimed at solving a downstream task. In this paper, we put forward this research problem and propose inr2vec, a framework that can compute a compact latent representation for an input INR in a single inference pass. We verify that inr2vec can embed effectively the 3D shapes represented by the input INRs and show how the produced embeddings can be fed into deep learning pipelines to solve several tasks by processing exclusively INRs. △ Less

Submitted 10 February, 2023; originally announced February 2023.

Comments: Accepted at ICLR 2023

arXiv:2301.11310 [pdf, other]

Learning Good Features to Transfer Across Tasks and Domains

Authors: Pierluigi Zama Ramirez, Adriano Cardace, Luca De Luigi, Alessio Tonioni, Samuele Salti, Luigi Di Stefano

Abstract: Availability of labelled data is the major obstacle to the deployment of deep learning algorithms for computer vision tasks in new domains. The fact that many frameworks adopted to solve different tasks share the same architecture suggests that there should be a way of reusing the knowledge learned in a specific setting to solve novel tasks with limited or no additional supervision. In this work,… ▽ More Availability of labelled data is the major obstacle to the deployment of deep learning algorithms for computer vision tasks in new domains. The fact that many frameworks adopted to solve different tasks share the same architecture suggests that there should be a way of reusing the knowledge learned in a specific setting to solve novel tasks with limited or no additional supervision. In this work, we first show that such knowledge can be shared across tasks by learning a map** between task-specific deep features in a given domain. Then, we show that this map** function, implemented by a neural network, is able to generalize to novel unseen domains. Besides, we propose a set of strategies to constrain the learned feature spaces, to ease learning and increase the generalization capability of the map** network, thereby considerably improving the final performance of our framework. Our proposal obtains compelling results in challenging synthetic-to-real adaptation scenarios by transferring knowledge between monocular depth estimation and semantic segmentation tasks. △ Less

Submitted 26 January, 2023; originally announced January 2023.

Comments: Extended version of the paper "Learning Across Tasks and Domains" presented at ICCV 2019. Accepted at TPAMI

arXiv:2301.08245 [pdf, other]

Booster: a Benchmark for Depth from Images of Specular and Transparent Surfaces

Authors: Pierluigi Zama Ramirez, Alex Costanzino, Fabio Tosi, Matteo Poggi, Samuele Salti, Stefano Mattoccia, Luigi Di Stefano

Abstract: Estimating depth from images nowadays yields outstanding results, both in terms of in-domain accuracy and generalization. However, we identify two main challenges that remain open in this field: dealing with non-Lambertian materials and effectively processing high-resolution images. Purposely, we propose a novel dataset that includes accurate and dense ground-truth labels at high resolution, featu… ▽ More Estimating depth from images nowadays yields outstanding results, both in terms of in-domain accuracy and generalization. However, we identify two main challenges that remain open in this field: dealing with non-Lambertian materials and effectively processing high-resolution images. Purposely, we propose a novel dataset that includes accurate and dense ground-truth labels at high resolution, featuring scenes containing several specular and transparent surfaces. Our acquisition pipeline leverages a novel deep space-time stereo framework, enabling easy and accurate labeling with sub-pixel precision. The dataset is composed of 606 samples collected in 85 different scenes, each sample includes both a high-resolution pair (12 Mpx) as well as an unbalanced stereo pair (Left: 12 Mpx, Right: 1.1 Mpx), typical of modern mobile devices that mount sensors with different resolutions. Additionally, we provide manually annotated material segmentation masks and 15K unlabeled samples. The dataset is composed of a train set and two test sets, the latter devoted to the evaluation of stereo and monocular depth estimation networks. Our experiments highlight the open challenges and future research directions in this field. △ Less

Submitted 30 January, 2024; v1 submitted 19 January, 2023; originally announced January 2023.

Comments: Extension of the paper "Open Challenges in Deep Stereo: the Booster Dataset" presented at CVPR 2022. Accepted at TPAMI

arXiv:2212.12380 [pdf, other]

Towards Scalable Physically Consistent Neural Networks: an Application to Data-driven Multi-zone Thermal Building Models

Authors: Loris Di Natale, Bratislav Svetozarevic, Philipp Heer, Colin Neil Jones

Abstract: With more and more data being collected, data-driven modeling methods have been gaining in popularity in recent years. While physically sound, classical gray-box models are often cumbersome to identify and scale, and their accuracy might be hindered by their limited expressiveness. On the other hand, classical black-box methods, typically relying on Neural Networks (NNs) nowadays, often achieve im… ▽ More With more and more data being collected, data-driven modeling methods have been gaining in popularity in recent years. While physically sound, classical gray-box models are often cumbersome to identify and scale, and their accuracy might be hindered by their limited expressiveness. On the other hand, classical black-box methods, typically relying on Neural Networks (NNs) nowadays, often achieve impressive performance, even at scale, by deriving statistical patterns from data. However, they remain completely oblivious to the underlying physical laws, which may lead to potentially catastrophic failures if decisions for real-world physical systems are based on them. Physically Consistent Neural Networks (PCNNs) were recently developed to address these aforementioned issues, ensuring physical consistency while still leveraging NNs to attain state-of-the-art accuracy. In this work, we scale PCNNs to model building temperature dynamics and propose a thorough comparison with classical gray-box and black-box methods. More precisely, we design three distinct PCNN extensions, thereby exemplifying the modularity and flexibility of the architecture, and formally prove their physical consistency. In the presented case study, PCNNs are shown to achieve state-of-the-art accuracy, even outperforming classical NN-based models despite their constrained structure. Our investigations furthermore provide a clear illustration of NNs achieving seemingly good performance while remaining completely physics-agnostic, which can be misleading in practice. While this performance comes at the cost of computational complexity, PCNNs on the other hand show accuracy improvements of 17-35% compared to all other physically consistent methods, paving the way for scalable physically consistent models with state-of-the-art performance. △ Less

Submitted 4 April, 2023; v1 submitted 23 December, 2022; originally announced December 2022.

Comments: Accepted in Applied Energy

arXiv:2211.16691 [pdf, other]

Computationally Efficient Reinforcement Learning: Targeted Exploration leveraging Simple Rules

Authors: Loris Di Natale, Bratislav Svetozarevic, Philipp Heer, Colin N. Jones

Abstract: Model-free Reinforcement Learning (RL) generally suffers from poor sample complexity, mostly due to the need to exhaustively explore the state-action space to find well-performing policies. On the other hand, we postulate that expert knowledge of the system often allows us to design simple rules we expect good policies to follow at all times. In this work, we hence propose a simple yet effective m… ▽ More Model-free Reinforcement Learning (RL) generally suffers from poor sample complexity, mostly due to the need to exhaustively explore the state-action space to find well-performing policies. On the other hand, we postulate that expert knowledge of the system often allows us to design simple rules we expect good policies to follow at all times. In this work, we hence propose a simple yet effective modification of continuous actor-critic frameworks to incorporate such rules and avoid regions of the state-action space that are known to be suboptimal, thereby significantly accelerating the convergence of RL agents. Concretely, we saturate the actions chosen by the agent if they do not comply with our intuition and, critically, modify the gradient update step of the policy to ensure the learning process is not affected by the saturation step. On a room temperature control case study, it allows agents to converge to well-performing policies up to 6-7x faster than classical agents without computational overhead and while retaining good final performance. △ Less

Submitted 12 September, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

Comments: Accepted to CDC 2023

arXiv:2211.13762 [pdf, other]

ScanNeRF: a Scalable Benchmark for Neural Radiance Fields

Authors: Luca De Luigi, Damiano Bolognini, Federico Domeniconi, Daniele De Gregorio, Matteo Poggi, Luigi Di Stefano

Abstract: In this paper, we propose the first-ever real benchmark thought for evaluating Neural Radiance Fields (NeRFs) and, in general, Neural Rendering (NR) frameworks. We design and implement an effective pipeline for scanning real objects in quantity and effortlessly. Our scan station is built with less than 500$ hardware budget and can collect roughly 4000 images of a scanned object in just 5 minutes.… ▽ More In this paper, we propose the first-ever real benchmark thought for evaluating Neural Radiance Fields (NeRFs) and, in general, Neural Rendering (NR) frameworks. We design and implement an effective pipeline for scanning real objects in quantity and effortlessly. Our scan station is built with less than 500$ hardware budget and can collect roughly 4000 images of a scanned object in just 5 minutes. Such a platform is used to build ScanNeRF, a dataset characterized by several train/val/test splits aimed at benchmarking the performance of modern NeRF methods under different conditions. Accordingly, we evaluate three cutting-edge NeRF variants on it to highlight their strengths and weaknesses. The dataset is available on our project page, together with an online benchmark to foster the development of better and better NeRFs. △ Less

Submitted 20 December, 2022; v1 submitted 24 November, 2022; originally announced November 2022.

Comments: WACV 2023. The first three authors contributed equally. Project page: https://eyecan-ai.github.io/scannerf/

arXiv:2211.06130 [pdf, other]

Physically Consistent Neural ODEs for Learning Multi-Physics Systems

Authors: Muhammad Zakwan, Loris Di Natale, Bratislav Svetozarevic, Philipp Heer, Colin N. Jones, Giancarlo Ferrari Trecate

Abstract: Despite the immense success of neural networks in modeling system dynamics from data, they often remain physics-agnostic black boxes. In the particular case of physical systems, they might consequently make physically inconsistent predictions, which makes them unreliable in practice. In this paper, we leverage the framework of Irreversible port-Hamiltonian Systems (IPHS), which can describe most m… ▽ More Despite the immense success of neural networks in modeling system dynamics from data, they often remain physics-agnostic black boxes. In the particular case of physical systems, they might consequently make physically inconsistent predictions, which makes them unreliable in practice. In this paper, we leverage the framework of Irreversible port-Hamiltonian Systems (IPHS), which can describe most multi-physics systems, and rely on Neural Ordinary Differential Equations (NODEs) to learn their parameters from data. Since IPHS models are consistent with the first and second principles of thermodynamics by design, so are the proposed Physically Consistent NODEs (PC-NODEs). Furthermore, the NODE training procedure allows us to seamlessly incorporate prior knowledge of the system properties in the learned dynamics. We demonstrate the effectiveness of the proposed method by learning the thermodynamics of a building from the real-world measurements and the dynamics of a simulated gas-piston system. Thanks to the modularity and flexibility of the IPHS framework, PC-NODEs can be extended to learn physically consistent models of multi-physics distributed systems. △ Less

Submitted 11 November, 2022; originally announced November 2022.

Comments: First two authors contributed equally. Submitted to IFAC 2023

arXiv:2211.05035 [pdf, ps, other]

Combining Contrastive Learning and Knowledge Graph Embeddings to develop medical word embeddings for the Italian language

Authors: Denys Amore Bondarenko, Roger Ferrod, Luigi Di Caro

Abstract: Word embeddings play a significant role in today's Natural Language Processing tasks and applications. While pre-trained models may be directly employed and integrated into existing pipelines, they are often fine-tuned to better fit with specific languages or domains. In this paper, we attempt to improve available embeddings in the uncovered niche of the Italian medical domain through the combinat… ▽ More Word embeddings play a significant role in today's Natural Language Processing tasks and applications. While pre-trained models may be directly employed and integrated into existing pipelines, they are often fine-tuned to better fit with specific languages or domains. In this paper, we attempt to improve available embeddings in the uncovered niche of the Italian medical domain through the combination of Contrastive Learning (CL) and Knowledge Graph Embedding (KGE). The main objective is to improve the accuracy of semantic similarity between medical terms, which is also used as an evaluation task. Since the Italian language lacks medical texts and controlled vocabularies, we have developed a specific solution by combining preexisting CL methods (multi-similarity loss, contextualization, dynamic sampling) and the integration of KGEs, creating a new variant of the loss. Although without having outperformed the state-of-the-art, represented by multilingual models, the obtained results are encouraging, providing a significant leap in performance compared to the starting model, while using a significantly lower amount of data. △ Less

Submitted 9 November, 2022; originally announced November 2022.

arXiv:2210.13536 [pdf, other]

Effective Pre-Training Objectives for Transformer-based Autoencoders

Authors: Luca Di Liello, Matteo Gabburo, Alessandro Moschitti

Abstract: In this paper, we study trade-offs between efficiency, cost and accuracy when pre-training Transformer encoders with different pre-training objectives. For this purpose, we analyze features of common objectives and combine them to create new effective pre-training approaches. Specifically, we designed light token generators based on a straightforward statistical approach, which can replace ELECTRA… ▽ More In this paper, we study trade-offs between efficiency, cost and accuracy when pre-training Transformer encoders with different pre-training objectives. For this purpose, we analyze features of common objectives and combine them to create new effective pre-training approaches. Specifically, we designed light token generators based on a straightforward statistical approach, which can replace ELECTRA computationally heavy generators, thus highly reducing cost. Our experiments also show that (i) there are more efficient alternatives to BERT's MLM, and (ii) it is possible to efficiently pre-train Transformer-based models using lighter generators without a significant drop in performance. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: Accepted at EMNLP 2022 Findings

arXiv:2210.08226 [pdf, other]

Self-Distillation for Unsupervised 3D Domain Adaptation

Authors: Adriano Cardace, Riccardo Spezialetti, Pierluigi Zama Ramirez, Samuele Salti, Luigi Di Stefano

Abstract: Point cloud classification is a popular task in 3D vision. However, previous works, usually assume that point clouds at test time are obtained with the same procedure or sensor as those at training time. Unsupervised Domain Adaptation (UDA) instead, breaks this assumption and tries to solve the task on an unlabeled target domain, leveraging only on a supervised source domain. For point cloud class… ▽ More Point cloud classification is a popular task in 3D vision. However, previous works, usually assume that point clouds at test time are obtained with the same procedure or sensor as those at training time. Unsupervised Domain Adaptation (UDA) instead, breaks this assumption and tries to solve the task on an unlabeled target domain, leveraging only on a supervised source domain. For point cloud classification, recent UDA methods try to align features across domains via auxiliary tasks such as point cloud reconstruction, which however do not optimize the discriminative power in the target domain in feature space. In contrast, in this work, we focus on obtaining a discriminative feature space for the target domain enforcing consistency between a point cloud and its augmented version. We then propose a novel iterative self-training methodology that exploits Graph Neural Networks in the UDA context to refine pseudo-labels. We perform extensive experiments and set the new state-of-the-art in standard UDA benchmarks for point cloud classification. Finally, we show how our approach can be extended to more complex tasks such as part segmentation. △ Less

Submitted 15 October, 2022; originally announced October 2022.

Comments: WACV 2023, Project Page: https://cvlab-unibo.github.io/FeatureDistillation/

arXiv:2209.00648 [pdf, other]

Cross-Spectral Neural Radiance Fields

Authors: Matteo Poggi, Pierluigi Zama Ramirez, Fabio Tosi, Samuele Salti, Stefano Mattoccia, Luigi Di Stefano

Abstract: We propose X-NeRF, a novel method to learn a Cross-Spectral scene representation given images captured from cameras with different light spectrum sensitivity, based on the Neural Radiance Fields formulation. X-NeRF optimizes camera poses across spectra during training and exploits Normalized Cross-Device Coordinates (NXDC) to render images of different modalities from arbitrary viewpoints, which a… ▽ More We propose X-NeRF, a novel method to learn a Cross-Spectral scene representation given images captured from cameras with different light spectrum sensitivity, based on the Neural Radiance Fields formulation. X-NeRF optimizes camera poses across spectra during training and exploits Normalized Cross-Device Coordinates (NXDC) to render images of different modalities from arbitrary viewpoints, which are aligned and at the same resolution. Experiments on 16 forward-facing scenes, featuring color, multi-spectral and infrared images, confirm the effectiveness of X-NeRF at modeling Cross-Spectral scene representations. △ Less

Submitted 1 September, 2022; originally announced September 2022.

Comments: 3DV 2022. Project page: https://cvlab-unibo.github.io/xnerf-web/

arXiv:2207.06355 [pdf, other]

Contextual Decision Trees

Authors: Tommaso Aldinucci, Enrico Civitelli, Leonardo di Gangi, Alessandro Sestini

Abstract: Focusing on Random Forests, we propose a multi-armed contextual bandit recommendation framework for feature-based selection of a single shallow tree of the learned ensemble. The trained system, which works on top of the Random Forest, dynamically identifies a base predictor that is responsible for providing the final output. In this way, we obtain local interpretations by observing the rules of th… ▽ More Focusing on Random Forests, we propose a multi-armed contextual bandit recommendation framework for feature-based selection of a single shallow tree of the learned ensemble. The trained system, which works on top of the Random Forest, dynamically identifies a base predictor that is responsible for providing the final output. In this way, we obtain local interpretations by observing the rules of the recommended tree. The carried out experiments reveal that our dynamic method is superior to an independent fitted CART decision tree and comparable to the whole black-box Random Forest in terms of predictive performances. △ Less

Submitted 13 July, 2022; originally announced July 2022.

arXiv:2207.01582 [pdf, other]

doi 10.1109/LRA.2021.3125046.

HiPE: Hierarchical Initialization for Pose Graphs

Authors: Tiziano Guadagnino, Luca Di Giammarino, Giorgio Grisetti

Abstract: Pose graph optimization is a non-convex optimization problem encountered in many areas of robotics perception. Its convergence to an accurate solution is conditioned by two factors: the non-linearity of the cost function in use and the initial configuration of the pose variables. In this paper, we present HiPE, a novel hierarchical algorithm for pose graph initialization. Our approach exploits a c… ▽ More Pose graph optimization is a non-convex optimization problem encountered in many areas of robotics perception. Its convergence to an accurate solution is conditioned by two factors: the non-linearity of the cost function in use and the initial configuration of the pose variables. In this paper, we present HiPE, a novel hierarchical algorithm for pose graph initialization. Our approach exploits a coarse-grained graph that encodes an abstract representation of the problem geometry. We construct this graph by combining maximum likelihood estimates coming from local regions of the input. By leveraging the sparsity of this representation, we can initialize the pose graph in a non-linear fashion, without computational overhead compared to existing methods. The resulting initial guess can effectively bootstrap the fine-grained optimization that is used to obtain the final solution. In addition, we perform an empirical analysis on the impact of different cost functions on the final estimate. Our experimental evaluation shows that the usage of HiPE leads to a more efficient and robust optimization process, comparing favorably with state-of-the-art methods. △ Less

Submitted 4 July, 2022; originally announced July 2022.

Journal ref: IEEE Robotics and Automation Letters, vol. 7, no. 1, pp. 287-294, Jan. 2022

arXiv:2206.07047 [pdf, other]

RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation

Authors: Fabio Tosi, Pierluigi Zama Ramirez, Matteo Poggi, Samuele Salti, Stefano Mattoccia, Luigi Di Stefano

Abstract: We address the problem of registering synchronized color (RGB) and multi-spectral (MS) images featuring very different resolution by solving stereo matching correspondences. Purposely, we introduce a novel RGB-MS dataset framing 13 different scenes in indoor environments and providing a total of 34 image pairs annotated with semi-dense, high-resolution ground-truth labels in the form of disparity… ▽ More We address the problem of registering synchronized color (RGB) and multi-spectral (MS) images featuring very different resolution by solving stereo matching correspondences. Purposely, we introduce a novel RGB-MS dataset framing 13 different scenes in indoor environments and providing a total of 34 image pairs annotated with semi-dense, high-resolution ground-truth labels in the form of disparity maps. To tackle the task, we propose a deep learning architecture trained in a self-supervised manner by exploiting a further RGB camera, required only during training data acquisition. In this setup, we can conveniently learn cross-modal matching in the absence of ground-truth labels by distilling knowledge from an easier RGB-RGB matching task based on a collection of about 11K unlabeled image triplets. Experiments show that the proposed pipeline sets a good performance bar (1.16 pixels average registration error) for future research on this novel, challenging task. △ Less

Submitted 14 June, 2022; originally announced June 2022.

Comments: CVPR 2022, New Orleans. Project page: https://cvlab-unibo.github.io/rgb-ms-web/

arXiv:2206.05194 [pdf, other]

Learning the Space of Deep Models

Authors: Gianluca Berardi, Luca De Luigi, Samuele Salti, Luigi Di Stefano

Abstract: Embedding of large but redundant data, such as images or text, in a hierarchy of lower-dimensional spaces is one of the key features of representation learning approaches, which nowadays provide state-of-the-art solutions to problems once believed hard or impossible to solve. In this work, in a plot twist with a strong meta aftertaste, we show how trained deep models are as redundant as the data t… ▽ More Embedding of large but redundant data, such as images or text, in a hierarchy of lower-dimensional spaces is one of the key features of representation learning approaches, which nowadays provide state-of-the-art solutions to problems once believed hard or impossible to solve. In this work, in a plot twist with a strong meta aftertaste, we show how trained deep models are as redundant as the data they are optimized to process, and how it is therefore possible to use deep learning models to embed deep learning models. In particular, we show that it is possible to use representation learning to learn a fixed-size, low-dimensional embedding space of trained deep models and that such space can be explored by interpolation or optimization to attain ready-to-use models. We find that it is possible to learn an embedding space of multiple instances of the same architecture and of multiple architectures. We address image classification and neural representation of signals, showing how our embedding space can be learnt so as to capture the notions of performance and 3D shape, respectively. In the Multi-Architecture setting we also show how an embedding trained only on a subset of architectures can learn to generate already-trained instances of architectures it never sees instantiated at training time. △ Less

Submitted 10 June, 2022; originally announced June 2022.

Comments: Accepted at ICPR2022

arXiv:2206.04671 [pdf, other]

Open Challenges in Deep Stereo: the Booster Dataset

Authors: Pierluigi Zama Ramirez, Fabio Tosi, Matteo Poggi, Samuele Salti, Stefano Mattoccia, Luigi Di Stefano

Abstract: We present a novel high-resolution and challenging stereo dataset framing indoor scenes annotated with dense and accurate ground-truth disparities. Peculiar to our dataset is the presence of several specular and transparent surfaces, i.e. the main causes of failures for state-of-the-art stereo networks. Our acquisition pipeline leverages a novel deep space-time stereo framework which allows for ea… ▽ More We present a novel high-resolution and challenging stereo dataset framing indoor scenes annotated with dense and accurate ground-truth disparities. Peculiar to our dataset is the presence of several specular and transparent surfaces, i.e. the main causes of failures for state-of-the-art stereo networks. Our acquisition pipeline leverages a novel deep space-time stereo framework which allows for easy and accurate labeling with sub-pixel precision. We release a total of 419 samples collected in 64 different scenes and annotated with dense ground-truth disparities. Each sample include a high-resolution pair (12 Mpx) as well as an unbalanced pair (Left: 12 Mpx, Right: 1.1 Mpx). Additionally, we provide manually annotated material segmentation masks and 15K unlabeled samples. We evaluate state-of-the-art deep networks based on our dataset, highlighting their limitations in addressing the open challenges in stereo and drawing hints for future research. △ Less

Submitted 9 June, 2022; originally announced June 2022.

Comments: CVPR 2022, New Orleans. Project page: https://cvlab-unibo.github.io/booster-web/

arXiv:2205.15703 [pdf, other]

Lessons Learned from Data-Driven Building Control Experiments: Contrasting Gaussian Process-based MPC, Bilevel DeePC, and Deep Reinforcement Learning

Authors: Loris Di Natale, Yingzhao Lian, Emilio T. Maddalena, Jicheng Shi, Colin N. Jones

Abstract: This manuscript offers the perspective of experimentalists on a number of modern data-driven techniques: model predictive control relying on Gaussian processes, adaptive data-driven control based on behavioral theory, and deep reinforcement learning. These techniques are compared in terms of data requirements, ease of use, computational burden, and robustness in the context of real-world applicati… ▽ More This manuscript offers the perspective of experimentalists on a number of modern data-driven techniques: model predictive control relying on Gaussian processes, adaptive data-driven control based on behavioral theory, and deep reinforcement learning. These techniques are compared in terms of data requirements, ease of use, computational burden, and robustness in the context of real-world applications. Our remarks and observations stem from a number of experimental investigations carried out in the field of building control in diverse environments, from lecture halls and apartment spaces to a hospital surgery center. The final goal is to support others in identifying what technique is best suited to tackle their own problems. △ Less

Submitted 31 May, 2022; originally announced May 2022.

Showing 1–50 of 127 results for author: Di, L