-
Estimating Map Completeness in Robot Exploration
Authors:
Matteo Luperto,
Marco Maria Ferrara,
Giacomo Boracchi,
Francesco Amigoni
Abstract:
In this paper, we propose a method that, given a partial grid map of an indoor environment built by an autonomous mobile robot, estimates the amount of the explored area represented in the map, as well as whether the uncovered part is still worth being explored or not. Our method is based on a deep convolutional neural network trained on data from partially explored environments with annotations d…
▽ More
In this paper, we propose a method that, given a partial grid map of an indoor environment built by an autonomous mobile robot, estimates the amount of the explored area represented in the map, as well as whether the uncovered part is still worth being explored or not. Our method is based on a deep convolutional neural network trained on data from partially explored environments with annotations derived from the knowledge of the entire map (which is not available when the network is used for inference). We show how such a network can be used to define a stop** criterion to terminate the exploration process when it is no longer adding relevant details about the environment to the map, saving, on average, 40% of the total exploration time with respect to covering all the area of the environment.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
An expert-driven data generation pipeline for histological images
Authors:
Roberto Basla,
Loris Giulivi,
Luca Magri,
Giacomo Boracchi
Abstract:
Deep Learning (DL) models have been successfully applied to many applications including biomedical cell segmentation and classification in histological images. These models require large amounts of annotated data which might not always be available, especially in the medical field where annotations are scarce and expensive. To overcome this limitation, we propose a novel pipeline for generating sy…
▽ More
Deep Learning (DL) models have been successfully applied to many applications including biomedical cell segmentation and classification in histological images. These models require large amounts of annotated data which might not always be available, especially in the medical field where annotations are scarce and expensive. To overcome this limitation, we propose a novel pipeline for generating synthetic datasets for cell segmentation. Given only a handful of annotated images, our method generates a large dataset of images which can be used to effectively train DL instance segmentation models. Our solution is designed to generate cells of realistic shapes and placement by allowing experts to incorporate domain knowledge during the generation of the dataset.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Explaining Multi-modal Large Language Models by Analyzing their Vision Perception
Authors:
Loris Giulivi,
Giacomo Boracchi
Abstract:
Multi-modal Large Language Models (MLLMs) have demonstrated remarkable capabilities in understanding and generating content across various modalities, such as images and text. However, their interpretability remains a challenge, hindering their adoption in critical applications. This research proposes a novel approach to enhance the interpretability of MLLMs by focusing on the image embedding comp…
▽ More
Multi-modal Large Language Models (MLLMs) have demonstrated remarkable capabilities in understanding and generating content across various modalities, such as images and text. However, their interpretability remains a challenge, hindering their adoption in critical applications. This research proposes a novel approach to enhance the interpretability of MLLMs by focusing on the image embedding component. We combine an open-world localization model with a MLLM, thus creating a new architecture able to simultaneously produce text and object localization outputs from the same vision embedding. The proposed architecture greatly promotes interpretability, enabling us to design a novel saliency map to explain any output token, to identify model hallucinations, and to assess model biases through semantic adversarial perturbations.
△ Less
Submitted 28 May, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
SE3D: A Framework For Saliency Method Evaluation In 3D Imaging
Authors:
Mariusz Wiśniewski,
Loris Giulivi,
Giacomo Boracchi
Abstract:
For more than a decade, deep learning models have been dominating in various 2D imaging tasks. Their application is now extending to 3D imaging, with 3D Convolutional Neural Networks (3D CNNs) being able to process LIDAR, MRI, and CT scans, with significant implications for fields such as autonomous driving and medical imaging. In these critical settings, explaining the model's decisions is fundam…
▽ More
For more than a decade, deep learning models have been dominating in various 2D imaging tasks. Their application is now extending to 3D imaging, with 3D Convolutional Neural Networks (3D CNNs) being able to process LIDAR, MRI, and CT scans, with significant implications for fields such as autonomous driving and medical imaging. In these critical settings, explaining the model's decisions is fundamental. Despite recent advances in Explainable Artificial Intelligence, however, little effort has been devoted to explaining 3D CNNs, and many works explain these models via inadequate extensions of 2D saliency methods.
One fundamental limitation to the development of 3D saliency methods is the lack of a benchmark to quantitatively assess them on 3D data. To address this issue, we propose SE3D: a framework for Saliency method Evaluation in 3D imaging. We propose modifications to ShapeNet, ScanNet, and BraTS datasets, and evaluation metrics to assess saliency methods for 3D CNNs. We evaluate both state-of-the-art saliency methods designed for 3D data and extensions of popular 2D saliency methods to 3D. Our experiments show that 3D saliency methods do not provide explanations of sufficient quality, and that there is margin for future improvements and safer applications of 3D CNNs in critical fields.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Concept Visualization: Explaining the CLIP Multi-modal Embedding Using WordNet
Authors:
Loris Giulivi,
Giacomo Boracchi
Abstract:
Advances in multi-modal embeddings, and in particular CLIP, have recently driven several breakthroughs in Computer Vision (CV). CLIP has shown impressive performance on a variety of tasks, yet, its inherently opaque architecture may hinder the application of models employing CLIP as backbone, especially in fields where trust and model explainability are imperative, such as in the medical domain. C…
▽ More
Advances in multi-modal embeddings, and in particular CLIP, have recently driven several breakthroughs in Computer Vision (CV). CLIP has shown impressive performance on a variety of tasks, yet, its inherently opaque architecture may hinder the application of models employing CLIP as backbone, especially in fields where trust and model explainability are imperative, such as in the medical domain. Current explanation methodologies for CV models rely on Saliency Maps computed through gradient analysis or input perturbation. However, these Saliency Maps can only be computed to explain classes relevant to the end task, often smaller in scope than the backbone training classes. In the context of models implementing CLIP as their vision backbone, a substantial portion of the information embedded within the learned representations is thus left unexplained.
In this work, we propose Concept Visualization (ConVis), a novel saliency methodology that explains the CLIP embedding of an image by exploiting the multi-modal nature of the embeddings. ConVis makes use of lexical information from WordNet to compute task-agnostic Saliency Maps for any concept, not limited to concepts the end model was trained on. We validate our use of WordNet via an out of distribution detection experiment, and test ConVis on an object localization benchmark, showing that Concept Visualizations correctly identify and localize the image's semantic content. Additionally, we perform a user study demonstrating that our methodology can give users insight on the model's functioning.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Comparing flow-based and anatomy-based features in the data-driven study of nasal pathologies
Authors:
Andrea Schillaci,
Kazuto Hasegawa,
Carlotta Pipolo,
Giacomo Boracchi,
Maurizio Quadrio
Abstract:
In several problems involving fluid flows, Computational Fluid Dynamics (CFD) provides detailed quantitative information, and often allows the designer to successfully optimize the system, by minimizing a cost function. Sometimes, however, one cannot improve the system with CFD alone, because a suitable cost function is not readily available: one notable example is diagnosis in medicine. The field…
▽ More
In several problems involving fluid flows, Computational Fluid Dynamics (CFD) provides detailed quantitative information, and often allows the designer to successfully optimize the system, by minimizing a cost function. Sometimes, however, one cannot improve the system with CFD alone, because a suitable cost function is not readily available: one notable example is diagnosis in medicine. The field of interest considered here is rhinology: a correct air flow is key for the functioning of the human nose, yet the notion of a functionally normal nose is not available, and a cost function cannot be written. An alternative and attractive pathway to diagnosis and surgery planning is offered by data-driven methods. In this work, we consider the machine-learning study of nasal pathologies caused by anatomic malformations, with the aim of understanding whether fluid dynamic features, available after a CFD analysis, are more effective than purely geometric features in the training of a neural network for regression. Our experiments are carried out on an extremely simplified anatomic model and a correspondingly simple CFD approach; nevertheless, they demonstrate that flow-based features perform better than geometry-based ones, and allow the training of a neural network with fewer inputs, a crucial advantage in fields like medicine.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Extracting a functional representation from a dictionary for non-rigid shape matching
Authors:
Michele Colombo,
Giacomo Boracchi,
Simone Melzi
Abstract:
Shape matching is a fundamental problem in computer graphics with many applications. Functional maps translate the point-wise shape-matching problem into its functional counterpart and have inspired numerous solutions over the last decade. Nearly all the solutions based on functional maps rely on the eigenfunctions of the Laplace-Beltrami Operator (LB) to describe the functional spaces defined on…
▽ More
Shape matching is a fundamental problem in computer graphics with many applications. Functional maps translate the point-wise shape-matching problem into its functional counterpart and have inspired numerous solutions over the last decade. Nearly all the solutions based on functional maps rely on the eigenfunctions of the Laplace-Beltrami Operator (LB) to describe the functional spaces defined on the surfaces and then convert the functional correspondences into point-wise correspondences. However, this final step is often error-prone and inaccurate in tiny regions and protrusions, where the energy of LB does not uniformly cover the surface. We propose a new functional basis Principal Components of a Dictionary (PCD) to address such intrinsic limitation. PCD constructs an orthonormal basis from the Principal Component Analysis (PCA) of a dictionary of functions defined over the shape. These dictionaries can target specific properties of the final basis, such as achieving an even spreading of energy. Our experimental evaluation compares seven different dictionaries on established benchmarks, showing that PCD is suited to target different shape-matching scenarios, resulting in more accurate point-wise maps than the LB basis when used in the same pipeline. This evidence provides a promising alternative for improving correspondence estimation, confirming the power and flexibility of functional maps.
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
Class Distribution Monitoring for Concept Drift Detection
Authors:
Diego Stucchi,
Luca Frittoli,
Giacomo Boracchi
Abstract:
We introduce Class Distribution Monitoring (CDM), an effective concept-drift detection scheme that monitors the class-conditional distributions of a datastream. In particular, our solution leverages multiple instances of an online and nonparametric change-detection algorithm based on QuantTree. CDM reports a concept drift after detecting a distribution change in any class, thus identifying which c…
▽ More
We introduce Class Distribution Monitoring (CDM), an effective concept-drift detection scheme that monitors the class-conditional distributions of a datastream. In particular, our solution leverages multiple instances of an online and nonparametric change-detection algorithm based on QuantTree. CDM reports a concept drift after detecting a distribution change in any class, thus identifying which classes are affected by the concept drift. This can be precious information for diagnostics and adaptation. Our experiments on synthetic and real-world datastreams show that when the concept drift affects a few classes, CDM outperforms algorithms monitoring the overall data distribution, while achieving similar detection delays when the drift affects all the classes. Moreover, CDM outperforms comparable approaches that monitor the classification error, particularly when the change is not very apparent. Finally, we demonstrate that CDM inherits the properties of the underlying change detector, yielding an effective control over the expected time before a false alarm, or Average Run Length (ARL$_0$).
△ Less
Submitted 16 October, 2022;
originally announced October 2022.
-
Composite Layers for Deep Anomaly Detection on 3D Point Clouds
Authors:
Alberto Floris,
Luca Frittoli,
Diego Carrera,
Giacomo Boracchi
Abstract:
Deep neural networks require specific layers to process point clouds, as the scattered and irregular location of points prevents us from using convolutional filters. Here we introduce the composite layer, a new convolutional operator for point clouds. The peculiarity of our composite layer is that it extracts and compresses the spatial information from the position of points before combining it wi…
▽ More
Deep neural networks require specific layers to process point clouds, as the scattered and irregular location of points prevents us from using convolutional filters. Here we introduce the composite layer, a new convolutional operator for point clouds. The peculiarity of our composite layer is that it extracts and compresses the spatial information from the position of points before combining it with their feature vectors. Compared to well-known point-convolutional layers such as those of ConvPoint and KPConv, our composite layer provides additional regularization and guarantees greater flexibility in terms of design and number of parameters. To demonstrate the design flexibility, we also define an aggregate composite layer that combines spatial information and features in a nonlinear manner, and we use these layers to implement a convolutional and an aggregate CompositeNet. We train our CompositeNets to perform classification and, most remarkably, unsupervised anomaly detection. Our experiments on synthetic and real-world datasets show that, in both tasks, our CompositeNets outperform ConvPoint and achieve similar results as KPConv despite having a much simpler architecture. Moreover, our CompositeNets substantially outperform existing solutions for anomaly detection on point clouds.
△ Less
Submitted 23 September, 2022;
originally announced September 2022.
-
Nonparametric and Online Change Detection in Multivariate Datastreams using QuantTree
Authors:
Luca Frittoli,
Diego Carrera,
Giacomo Boracchi
Abstract:
We address the problem of online change detection in multivariate datastreams, and we introduce QuantTree Exponentially Weighted Moving Average (QT-EWMA), a nonparametric change-detection algorithm that can control the expected time before a false alarm, yielding a desired Average Run Length (ARL$_0$). Controlling false alarms is crucial in many applications and is rarely guaranteed by online chan…
▽ More
We address the problem of online change detection in multivariate datastreams, and we introduce QuantTree Exponentially Weighted Moving Average (QT-EWMA), a nonparametric change-detection algorithm that can control the expected time before a false alarm, yielding a desired Average Run Length (ARL$_0$). Controlling false alarms is crucial in many applications and is rarely guaranteed by online change-detection algorithms that can monitor multivariate datastreams without knowing the data distribution. Like many change-detection algorithms, QT-EWMA builds a model of the data distribution, in our case a QuantTree histogram, from a stationary training set. To monitor datastreams even when the training set is extremely small, we propose QT-EWMA-update, which incrementally updates the QuantTree histogram during monitoring, always kee** the ARL$_0$ under control. Our experiments, performed on synthetic and real-world datastreams, demonstrate that QT-EWMA and QT-EWMA-update control the ARL$_0$ and the false alarm rate better than state-of-the-art methods operating in similar conditions, achieving lower or comparable detection delays.
△ Less
Submitted 30 August, 2022;
originally announced August 2022.
-
Deep Open-Set Recognition for Silicon Wafer Production Monitoring
Authors:
Luca Frittoli,
Diego Carrera,
Beatrice Rossi,
Pasqualina Fragneto,
Giacomo Boracchi
Abstract:
The chips contained in any electronic device are manufactured over circular silicon wafers, which are monitored by inspection machines at different production stages. Inspection machines detect and locate any defect within the wafer and return a Wafer Defect Map (WDM), i.e., a list of the coordinates where defects lie, which can be considered a huge, sparse, and binary image. In normal conditions,…
▽ More
The chips contained in any electronic device are manufactured over circular silicon wafers, which are monitored by inspection machines at different production stages. Inspection machines detect and locate any defect within the wafer and return a Wafer Defect Map (WDM), i.e., a list of the coordinates where defects lie, which can be considered a huge, sparse, and binary image. In normal conditions, wafers exhibit a small number of randomly distributed defects, while defects grouped in specific patterns might indicate known or novel categories of failures in the production line. Needless to say, a primary concern of semiconductor industries is to identify these patterns and intervene as soon as possible to restore normal production conditions.
Here we address WDM monitoring as an open-set recognition problem to accurately classify WDM in known categories and promptly detect novel patterns. In particular, we propose a comprehensive pipeline for wafer monitoring based on a Submanifold Sparse Convolutional Network, a deep architecture designed to process sparse data at an arbitrary resolution, which is trained on the known classes. To detect novelties, we define an outlier detector based on a Gaussian Mixture Model fitted on the latent representation of the classifier. Our experiments on a real dataset of WDMs show that directly processing full-resolution WDMs by Submanifold Sparse Convolutions yields superior classification performance on known classes than traditional Convolutional Neural Networks, which require a preliminary binning to reduce the size of the binary images representing WDMs. Moreover, our solution outperforms state-of-the-art open-set recognition solutions in detecting novelties.
△ Less
Submitted 30 August, 2022;
originally announced August 2022.
-
Deep Autoencoders for Anomaly Detection in Textured Images using CW-SSIM
Authors:
Andrea Bionda,
Luca Frittoli,
Giacomo Boracchi
Abstract:
Detecting anomalous regions in images is a frequently encountered problem in industrial monitoring. A relevant example is the analysis of tissues and other products that in normal conditions conform to a specific texture, while defects introduce changes in the normal pattern. We address the anomaly detection problem by training a deep autoencoder, and we show that adopting a loss function based on…
▽ More
Detecting anomalous regions in images is a frequently encountered problem in industrial monitoring. A relevant example is the analysis of tissues and other products that in normal conditions conform to a specific texture, while defects introduce changes in the normal pattern. We address the anomaly detection problem by training a deep autoencoder, and we show that adopting a loss function based on Complex Wavelet Structural Similarity (CW-SSIM) yields superior detection performance on this type of images compared to traditional autoencoder loss functions. Our experiments on well-known anomaly detection benchmarks show that a simple model trained with this loss function can achieve comparable or superior performance to state-of-the-art methods leveraging deeper, larger and more computationally demanding neural networks.
△ Less
Submitted 30 August, 2022;
originally announced August 2022.
-
Perception Visualization: Seeing Through the Eyes of a DNN
Authors:
Loris Giulivi,
Mark James Carman,
Giacomo Boracchi
Abstract:
Artificial intelligence (AI) systems power the world we live in. Deep neural networks (DNNs) are able to solve tasks in an ever-expanding landscape of scenarios, but our eagerness to apply these powerful models leads us to focus on their performance and deprioritises our ability to understand them. Current research in the field of explainable AI tries to bridge this gap by develo** various pertu…
▽ More
Artificial intelligence (AI) systems power the world we live in. Deep neural networks (DNNs) are able to solve tasks in an ever-expanding landscape of scenarios, but our eagerness to apply these powerful models leads us to focus on their performance and deprioritises our ability to understand them. Current research in the field of explainable AI tries to bridge this gap by develo** various perturbation or gradient-based explanation techniques. For images, these techniques fail to fully capture and convey the semantic information needed to elucidate why the model makes the predictions it does. In this work, we develop a new form of explanation that is radically different in nature from current explanation methods, such as Grad-CAM. Perception visualization provides a visual representation of what the DNN perceives in the input image by depicting what visual patterns the latent representation corresponds to. Visualizations are obtained through a reconstruction model that inverts the encoded features, such that the parameters and predictions of the original models are not modified. Results of our user study demonstrate that humans can better understand and predict the system's decisions when perception visualizations are available, thus easing the debugging and deployment of deep models as trusted systems.
△ Less
Submitted 21 April, 2022;
originally announced April 2022.
-
Adversarial Scratches: Deployable Attacks to CNN Classifiers
Authors:
Loris Giulivi,
Malhar Jere,
Loris Rossi,
Farinaz Koushanfar,
Gabriela Ciocarlie,
Briland Hitaj,
Giacomo Boracchi
Abstract:
A growing body of work has shown that deep neural networks are susceptible to adversarial examples. These take the form of small perturbations applied to the model's input which lead to incorrect predictions. Unfortunately, most literature focuses on visually imperceivable perturbations to be applied to digital images that often are, by design, impossible to be deployed to physical targets. We pre…
▽ More
A growing body of work has shown that deep neural networks are susceptible to adversarial examples. These take the form of small perturbations applied to the model's input which lead to incorrect predictions. Unfortunately, most literature focuses on visually imperceivable perturbations to be applied to digital images that often are, by design, impossible to be deployed to physical targets. We present Adversarial Scratches: a novel L0 black-box attack, which takes the form of scratches in images, and which possesses much greater deployability than other state-of-the-art attacks. Adversarial Scratches leverage Bézier Curves to reduce the dimension of the search space and possibly constrain the attack to a specific location. We test Adversarial Scratches in several scenarios, including a publicly available API and images of traffic signs. Results show that, often, our attack achieves higher fooling rate than other deployable state-of-the-art methods, while requiring significantly fewer queries and modifying very few pixels.
△ Less
Submitted 18 May, 2023; v1 submitted 20 April, 2022;
originally announced April 2022.
-
Scratch that! An Evolution-based Adversarial Attack against Neural Networks
Authors:
Malhar Jere,
Loris Rossi,
Briland Hitaj,
Gabriela Ciocarlie,
Giacomo Boracchi,
Farinaz Koushanfar
Abstract:
We study black-box adversarial attacks for image classifiers in a constrained threat model, where adversaries can only modify a small fraction of pixels in the form of scratches on an image. We show that it is possible for adversaries to generate localized \textit{adversarial scratches} that cover less than $5\%$ of the pixels in an image and achieve targeted success rates of $98.77\%$ and…
▽ More
We study black-box adversarial attacks for image classifiers in a constrained threat model, where adversaries can only modify a small fraction of pixels in the form of scratches on an image. We show that it is possible for adversaries to generate localized \textit{adversarial scratches} that cover less than $5\%$ of the pixels in an image and achieve targeted success rates of $98.77\%$ and $97.20\%$ on ImageNet and CIFAR-10 trained ResNet-50 models, respectively. We demonstrate that our scratches are effective under diverse shapes, such as straight lines or parabolic B\a'ezier curves, with single or multiple colors. In an extreme condition, in which our scratches are a single color, we obtain a targeted attack success rate of $66\%$ on CIFAR-10 with an order of magnitude fewer queries than comparable attacks. We successfully launch our attack against Microsoft's Cognitive Services Image Captioning API and propose various mitigation strategies.
△ Less
Submitted 6 August, 2020; v1 submitted 4 December, 2019;
originally announced December 2019.
-
Change Detection in Multivariate Datastreams: Likelihood and Detectability Loss
Authors:
Cesare Alippi,
Giacomo Boracchi,
Diego Carrera,
Manuel Roveri
Abstract:
We address the problem of detecting changes in multivariate datastreams, and we investigate the intrinsic difficulty that change-detection methods have to face when the data dimension scales. In particular, we consider a general approach where changes are detected by comparing the distribution of the log-likelihood of the datastream over different time windows. Despite the fact that this approach…
▽ More
We address the problem of detecting changes in multivariate datastreams, and we investigate the intrinsic difficulty that change-detection methods have to face when the data dimension scales. In particular, we consider a general approach where changes are detected by comparing the distribution of the log-likelihood of the datastream over different time windows. Despite the fact that this approach constitutes the frame of several change-detection methods, its effectiveness when data dimension scales has never been investigated, which is indeed the goal of our paper. We show that the magnitude of the change can be naturally measured by the symmetric Kullback-Leibler divergence between the pre- and post-change distributions, and that the detectability of a change of a given magnitude worsens when the data dimension increases. This problem, which we refer to as \emph{detectability loss}, is due to the linear relationship between the variance of the log-likelihood and the data dimension. We analytically derive the detectability loss on Gaussian-distributed datastreams, and empirically demonstrate that this problem holds also on real-world datasets and that can be harmful even at low data-dimensions (say, 10).
△ Less
Submitted 27 April, 2016; v1 submitted 16 October, 2015;
originally announced October 2015.
-
RTI Goes Wild: Radio Tomographic Imaging for Outdoor People Detection and Localization
Authors:
Cesare Alippi,
Maurizio Bocca,
Giacomo Boracchi,
Neal Patwari,
Manuel Roveri
Abstract:
RF sensor networks are used to localize people indoor without requiring them to wear invasive electronic devices. These wireless mesh networks, formed by low-power radio transceivers, continuously measure the received signal strength (RSS) of the links. Radio Tomographic Imaging (RTI) is a technique that generates 2D images of the change in the electromagnetic field inside the area covered by the…
▽ More
RF sensor networks are used to localize people indoor without requiring them to wear invasive electronic devices. These wireless mesh networks, formed by low-power radio transceivers, continuously measure the received signal strength (RSS) of the links. Radio Tomographic Imaging (RTI) is a technique that generates 2D images of the change in the electromagnetic field inside the area covered by the radio transceivers to spot the presence and movements of animates (e.g., people, large animals) or large metallic objects (e.g., cars). Here, we present a RTI system for localizing and tracking people outdoors. Differently than in indoor environments where the RSS does not change significantly with time unless people are found in the monitored area, the outdoor RSS signal is time-variant, e.g., due to rainfalls or wind-driven foliage. We present a novel outdoor RTI method that, despite the nonstationary noise introduced in the RSS data by the environment, achieves high localization accuracy and dramatically reduces the energy consumption of the sensing units. Experimental results demonstrate that the system accurately detects and tracks a person in real-time in a large forested area under varying environmental conditions, significantly reducing false positives, localization error and energy consumption compared to state-of-the-art RTI methods.
△ Less
Submitted 30 July, 2014;
originally announced July 2014.