-
For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives
Authors:
Lia Morra,
Antonio Santangelo,
Pietro Basci,
Luca Piano,
Fabio Garcea,
Fabrizio Lamberti,
Massimo Leone
Abstract:
Social networks are creating a digital world in which the cognitive, emotional, and pragmatic value of the imagery of human faces and bodies is arguably changing. However, researchers in the digital humanities are often ill-equipped to study these phenomena at scale. This work presents FRESCO (Face Representation in E-Societies through Computational Observation), a framework designed to explore th…
▽ More
Social networks are creating a digital world in which the cognitive, emotional, and pragmatic value of the imagery of human faces and bodies is arguably changing. However, researchers in the digital humanities are often ill-equipped to study these phenomena at scale. This work presents FRESCO (Face Representation in E-Societies through Computational Observation), a framework designed to explore the socio-cultural implications of images on social media platforms at scale. FRESCO deconstructs images into numerical and categorical variables using state-of-the-art computer vision techniques, aligning with the principles of visual semiotics. The framework analyzes images across three levels: the plastic level, encompassing fundamental visual features like lines and colors; the figurative level, representing specific entities or concepts; and the enunciation level, which focuses particularly on constructing the point of view of the spectator and observer. These levels are analyzed to discern deeper narrative layers within the imagery. Experimental validation confirms the reliability and utility of FRESCO, and we assess its consistency and precision across two public datasets. Subsequently, we introduce the FRESCO score, a metric derived from the framework's output that serves as a reliable measure of similarity in image content.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Toward a Realistic Benchmark for Out-of-Distribution Detection
Authors:
Pietro Recalcati,
Fabio Garcea,
Luca Piano,
Fabrizio Lamberti,
Lia Morra
Abstract:
Deep neural networks are increasingly used in a wide range of technologies and services, but remain highly susceptible to out-of-distribution (OOD) samples, that is, drawn from a different distribution than the original training set. A common approach to address this issue is to endow deep neural networks with the ability to detect OOD samples. Several benchmarks have been proposed to design and v…
▽ More
Deep neural networks are increasingly used in a wide range of technologies and services, but remain highly susceptible to out-of-distribution (OOD) samples, that is, drawn from a different distribution than the original training set. A common approach to address this issue is to endow deep neural networks with the ability to detect OOD samples. Several benchmarks have been proposed to design and validate OOD detection techniques. However, many of them are based on far-OOD samples drawn from very different distributions, and thus lack the complexity needed to capture the nuances of real-world scenarios. In this work, we introduce a comprehensive benchmark for OOD detection, based on ImageNet and Places365, that assigns individual classes as in-distribution or out-of-distribution depending on the semantic similarity with the training set. Several techniques can be used to determine which classes should be considered in-distribution, yielding benchmarks with varying properties. Experimental results on different OOD detection techniques show how their measured efficacy depends on the selected benchmark and how confidence-based techniques may outperform classifier-based ones on near-OOD samples.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Latent Diffusion Models for Attribute-Preserving Image Anonymization
Authors:
Luca Piano,
Pietro Basci,
Fabrizio Lamberti,
Lia Morra
Abstract:
Generative techniques for image anonymization have great potential to generate datasets that protect the privacy of those depicted in the images, while achieving high data fidelity and utility. Existing methods have focused extensively on preserving facial attributes, but failed to embrace a more comprehensive perspective that considers the scene and background into the anonymization process. This…
▽ More
Generative techniques for image anonymization have great potential to generate datasets that protect the privacy of those depicted in the images, while achieving high data fidelity and utility. Existing methods have focused extensively on preserving facial attributes, but failed to embrace a more comprehensive perspective that considers the scene and background into the anonymization process. This paper presents, to the best of our knowledge, the first approach to image anonymization based on Latent Diffusion Models (LDMs). Every element of a scene is maintained to convey the same meaning, yet manipulated in a way that makes re-identification difficult. We propose two LDMs for this purpose: CAMOUFLaGE-Base exploits a combination of pre-trained ControlNets, and a new controlling mechanism designed to increase the distance between the real and anonymized images. CAMOFULaGE-Light is based on the Adapter technique, coupled with an encoding designed to efficiently represent the attributes of different persons in a scene. The former solution achieves superior performance on most metrics and benchmarks, while the latter cuts the inference time in half at the cost of fine-tuning a lightweight module. We show through extensive experimental comparison that the proposed method is competitive with the state-of-the-art concerning identity obfuscation whilst better preserving the original content of the image and tackling unresolved challenges that current solutions fail to address.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Iterative Zero-Shot LLM Prompting for Knowledge Graph Construction
Authors:
Salvatore Carta,
Alessandro Giuliani,
Leonardo Piano,
Alessandro Sebastian Podda,
Livio Pompianu,
Sandro Gabriele Tiddia
Abstract:
In the current digitalization era, capturing and effectively representing knowledge is crucial in most real-world scenarios. In this context, knowledge graphs represent a potent tool for retrieving and organizing a vast amount of information in a properly interconnected and interpretable structure. However, their generation is still challenging and often requires considerable human effort and doma…
▽ More
In the current digitalization era, capturing and effectively representing knowledge is crucial in most real-world scenarios. In this context, knowledge graphs represent a potent tool for retrieving and organizing a vast amount of information in a properly interconnected and interpretable structure. However, their generation is still challenging and often requires considerable human effort and domain expertise, hampering the scalability and flexibility across different application fields. This paper proposes an innovative knowledge graph generation approach that leverages the potential of the latest generative large language models, such as GPT-3.5, that can address all the main critical issues in knowledge graph building. The approach is conveyed in a pipeline that comprises novel iterative zero-shot and external knowledge-agnostic strategies in the main stages of the generation process. Our unique manifold approach may encompass significant benefits to the scientific community. In particular, the main contribution can be summarized by: (i) an innovative strategy for iteratively prompting large language models to extract relevant components of the final graph; (ii) a zero-shot strategy for each prompt, meaning that there is no need for providing examples for "guiding" the prompt result; (iii) a scalable solution, as the adoption of LLMs avoids the need for any external resources or human expertise. To assess the effectiveness of our proposed model, we performed experiments on a dataset that covered a specific domain. We claim that our proposal is a suitable solution for scalable and versatile knowledge graph construction and may be applied to different and novel contexts.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Bent & Broken Bicycles: Leveraging synthetic data for damaged object re-identification
Authors:
Luca Piano,
Filippo Gabriele Pratticò,
Alessandro Sebastian Russo,
Lorenzo Lanari,
Lia Morra,
Fabrizio Lamberti
Abstract:
Instance-level object re-identification is a fundamental computer vision task, with applications from image retrieval to intelligent monitoring and fraud detection. In this work, we propose the novel task of damaged object re-identification, which aims at distinguishing changes in visual appearance due to deformations or missing parts from subtle intra-class variations. To explore this task, we le…
▽ More
Instance-level object re-identification is a fundamental computer vision task, with applications from image retrieval to intelligent monitoring and fraud detection. In this work, we propose the novel task of damaged object re-identification, which aims at distinguishing changes in visual appearance due to deformations or missing parts from subtle intra-class variations. To explore this task, we leverage the power of computer-generated imagery to create, in a semi-automatic fashion, high-quality synthetic images of the same bike before and after a damage occurs. The resulting dataset, Bent & Broken Bicycles (BBBicycles), contains 39,200 images and 2,800 unique bike instances spanning 20 different bike models. As a baseline for this task, we propose TransReI3D, a multi-task, transformer-based deep network unifying damage detection (framed as a multi-label classification task) with object re-identification. The BBBicycles dataset is available at https://huggingface.co/datasets/GrainsPolito/BBBicycles
△ Less
Submitted 16 April, 2023;
originally announced April 2023.
-
Variance-based sensitivity analysis: The quest for better estimators and designs between explorativity and economy
Authors:
Samuele Lo Piano,
Federico Ferretti,
Arnald Puy,
Daniel Albrecht,
Andrea Saltelli
Abstract:
Variance-based sensitivity indices have established themselves as a reference among practitioners of sensitivity analysis of model outputs. A variance-based sensitivity analysis typically produces the first-order sensitivity indices $S_j$ and the so-called total-effect sensitivity indices $T_j$ for the uncertain factors of the mathematical model under analysis.
The cost of the analysis depends u…
▽ More
Variance-based sensitivity indices have established themselves as a reference among practitioners of sensitivity analysis of model outputs. A variance-based sensitivity analysis typically produces the first-order sensitivity indices $S_j$ and the so-called total-effect sensitivity indices $T_j$ for the uncertain factors of the mathematical model under analysis.
The cost of the analysis depends upon the number of model evaluations needed to obtain stable and accurate values of the estimates. While efficient estimation procedures are available for $S_j$, this availability is less the case for $T_j$. When estimating these indices, one can either use a sample-based approach whose computational cost depends on the number of factors or use approaches based on meta modelling/emulators.
The present work focuses on sample-based estimation procedures for $T_j$ and tests different avenues to achieve an algorithmic improvement over the existing best practices. To improve the exploration of the space of the input factors (design) and the formula to compute the indices (estimator), we propose strategies based on the concepts of economy and explorativity. We then discuss how several existing estimators perform along these characteristics.
We conclude that: a) sample-based approaches based on the use of multiple matrices to enhance the economy are outperformed by designs using fewer matrices but with better explorativity; b) among the latter, asymmetric designs perform the best and outperform symmetric designs having corrective terms for spurious correlations; c) improving on the existing best practices is fraught with difficulties; and d) ameliorating the results comes at the cost of introducing extra design parameters.
△ Less
Submitted 1 March, 2022;
originally announced March 2022.
-
Unpacking uncertainty in the modelling process for energy policy making
Authors:
Samuele Lo Piano,
Máté János Lőrincz,
Arnald Puy,
Steve Pye,
Andrea Saltelli,
Stefán Thor Smith,
Jeroen P. van der Sluijs
Abstract:
This paper explores how the modelling of energy systems may lead to undue closure of alternatives by generating an excess of certainty around some of the possible policy options. We exemplify the problem with two cases: first, the International Institute for Applied Systems Analysis (IIASA) global modelling in the 1980s; and second, the modelling activity undertaken in support of the construction…
▽ More
This paper explores how the modelling of energy systems may lead to undue closure of alternatives by generating an excess of certainty around some of the possible policy options. We exemplify the problem with two cases: first, the International Institute for Applied Systems Analysis (IIASA) global modelling in the 1980s; and second, the modelling activity undertaken in support of the construction of a radioactive waste repository at Yucca Mountain (Nevada, USA). We discuss different methodologies for quality assessment that may help remedy this issue, which include NUSAP (Numeral Unit Spread Assessment Pedigree), diagnostic diagrams, and sensitivity auditing. We demonstrate the potential of these reflexive modelling practices in energy policy making with four additional cases: (i) stakeholders evaluation of the assessment of the external costs of a potential large-scale nuclear accident in Belgium in the context of the ExternE (External Costs of Energy) project; (ii) the case of the ESME (Energy System Modelling Environment) for the creation of UK energy policy; (iii) the NETs (Negative Emission Technologies) uptake in Integrated Assessment Models (IAMs); and (iv) the Ecological Footprint (EF) indicator. We encourage modellers to widely adopt these approaches to achieve more robust and inclusive modelling activities in the field of energy modelling.
△ Less
Submitted 1 November, 2021;
originally announced November 2021.
-
sensobol: an R package to compute variance-based sensitivity indices
Authors:
Arnald Puy,
Samuele Lo Piano,
Andrea Saltelli,
Simon A. Levin
Abstract:
The R package "sensobol" provides several functions to conduct variance-based uncertainty and sensitivity analysis, from the estimation of sensitivity indices to the visual representation of the results. It implements several state-of-the-art first and total-order estimators and allows the computation of up to third-order effects, as well as of the approximation error, in a swift and user-friendly…
▽ More
The R package "sensobol" provides several functions to conduct variance-based uncertainty and sensitivity analysis, from the estimation of sensitivity indices to the visual representation of the results. It implements several state-of-the-art first and total-order estimators and allows the computation of up to third-order effects, as well as of the approximation error, in a swift and user-friendly way. Its flexibility makes it also appropriate for models with either a scalar or a multivariate output. We illustrate its functionality by conducting a variance-based sensitivity analysis of three classic models: the Sobol' (1998) G function, the logistic population growth model of Verhulst (1845), and the spruce budworm and forest model of Ludwig, Jones and Holling (1976).
△ Less
Submitted 3 December, 2021; v1 submitted 22 January, 2021;
originally announced January 2021.
-
Is VARS more intuitive and efficient than Sobol' indices?
Authors:
Arnald Puy,
Samuele Lo Piano,
Andrea Saltelli
Abstract:
The Variogram Analysis of Response Surfaces (VARS) has been proposed by Razavi and Gupta as a new comprehensive framework in sensitivity analysis. According to these authors, VARS provides a more intuitive notion of sensitivity and it is much more computationally efficient than Sobol' indices. Here we review these arguments and critically compare the performance of VARS-TO, for total-order index,…
▽ More
The Variogram Analysis of Response Surfaces (VARS) has been proposed by Razavi and Gupta as a new comprehensive framework in sensitivity analysis. According to these authors, VARS provides a more intuitive notion of sensitivity and it is much more computationally efficient than Sobol' indices. Here we review these arguments and critically compare the performance of VARS-TO, for total-order index, against the total-order Jansen estimator. We argue that, unlike classic variance-based methods, VARS lacks a clear definition of what an "important" factor is, and show that the alleged computational superiority of VARS does not withstand scrutiny. We conclude that while VARS enriches the spectrum of existing methods for sensitivity analysis, especially for a diagnostic use of mathematical models, it complements rather than substitutes classic estimators used in variance-based sensitivity analysis.
△ Less
Submitted 22 November, 2020; v1 submitted 26 September, 2020;
originally announced September 2020.
-
A comprehensive comparison of total-order estimators for global sensitivity analysis
Authors:
Arnald Puy,
William Becker,
Samuele Lo Piano,
Andrea Saltelli
Abstract:
Sensitivity analysis helps identify which model inputs convey the most uncertainty to the model output. One of the most authoritative measures in global sensitivity analysis is the Sobol' total-order index, which can be computed with several different estimators. Although previous comparisons exist, it is hard to know which estimator performs best since the results are contingent on the benchmark…
▽ More
Sensitivity analysis helps identify which model inputs convey the most uncertainty to the model output. One of the most authoritative measures in global sensitivity analysis is the Sobol' total-order index, which can be computed with several different estimators. Although previous comparisons exist, it is hard to know which estimator performs best since the results are contingent on the benchmark setting defined by the analyst (the sampling method, the distribution of the model inputs, the number of model runs, the test function or model and its dimensionality, the weight of higher order effects or the performance measure selected). Here we compare several total-order estimators in an eight-dimension hypercube where these benchmark parameters are treated as random parameters. This arrangement significantly relaxes the dependency of the results on the benchmark design. We observe that the most accurate estimators are Razavi and Gupta's, Jansen's or Janon/Monod's for factor prioritization, and Jansen's, Janon/Monod's or Azzini and Rosati's for approaching the "true" total-order indices. The rest lag considerably behind. Our work helps analysts navigate the myriad of total-order formulae by reducing the uncertainty in the selection of the most appropriate estimator.
△ Less
Submitted 29 July, 2021; v1 submitted 2 September, 2020;
originally announced September 2020.
-
Bridging the gap between Natural and Medical Images through Deep Colorization
Authors:
Lia Morra,
Luca Piano,
Fabrizio Lamberti,
Tatiana Tommasi
Abstract:
Deep learning has thrived by training on large-scale datasets. However, in many applications, as for medical image diagnosis, getting massive amount of data is still prohibitive due to privacy, lack of acquisition homogeneity and annotation cost. In this scenario, transfer learning from natural image collections is a standard practice that attempts to tackle shape, texture and color discrepancies…
▽ More
Deep learning has thrived by training on large-scale datasets. However, in many applications, as for medical image diagnosis, getting massive amount of data is still prohibitive due to privacy, lack of acquisition homogeneity and annotation cost. In this scenario, transfer learning from natural image collections is a standard practice that attempts to tackle shape, texture and color discrepancies all at once through pretrained model fine-tuning. In this work, we propose to disentangle those challenges and design a dedicated network module that focuses on color adaptation. We combine learning from scratch of the color module with transfer learning of different classification backbones, obtaining an end-to-end, easy-to-train architecture for diagnostic image recognition on X-ray images. Extensive experiments showed how our approach is particularly efficient in case of data scarcity and provides a new path for further transferring the learned color information across multiple medical datasets.
△ Less
Submitted 19 October, 2020; v1 submitted 21 May, 2020;
originally announced May 2020.
-
Are the results of the groundwater model robust?
Authors:
Arnald Puy,
Emanuele Borgonovo,
Samuele Lo Piano,
Andrea Saltelli
Abstract:
De Graaf et al. (2019) suggest that groundwater pum** will bring 42--79\% of worldwide watersheds close to environmental exhaustion by 2050. We are skeptical of these figures due to several non-unique assumptions behind the calculation of irrigation water demands and the perfunctory exploration of the model's uncertainty space. Their sensitivity analysis reveals a widespread lack of elementary c…
▽ More
De Graaf et al. (2019) suggest that groundwater pum** will bring 42--79\% of worldwide watersheds close to environmental exhaustion by 2050. We are skeptical of these figures due to several non-unique assumptions behind the calculation of irrigation water demands and the perfunctory exploration of the model's uncertainty space. Their sensitivity analysis reveals a widespread lack of elementary concepts of design of experiments among modellers, and can not be taken as a proof that their conclusions are robust.
△ Less
Submitted 16 December, 2019;
originally announced December 2019.
-
A sensitivity analysis of the PAWN sensitivity index
Authors:
Arnald Puy,
Samuele Lo Piano,
Andrea Saltelli
Abstract:
The PAWN index is gaining traction among the modelling community as a sensitivity measure. However, the robustness to its design parameters has not yet been scrutinized: the size ($N$) and sampling ($\varepsilon$) of the model output, the number of conditioning intervals ($n$) or the summary statistic ($θ$). Here we fill this gap by running a sensitivity analysis of a PAWN-based sensitivity analys…
▽ More
The PAWN index is gaining traction among the modelling community as a sensitivity measure. However, the robustness to its design parameters has not yet been scrutinized: the size ($N$) and sampling ($\varepsilon$) of the model output, the number of conditioning intervals ($n$) or the summary statistic ($θ$). Here we fill this gap by running a sensitivity analysis of a PAWN-based sensitivity analysis. We compare the results with the design uncertainties of the Sobol' total-order index ($S_{Ti}^*$). Unlike in $S_{Ti}^*$, the design uncertainties in PAWN create non-negligible chances of producing biased results when ranking or screening inputs. The dependence of PAWN upon ($N,n,\varepsilon, θ$) is difficult to tame, as these parameters interact with one another. Even in an ideal setting in which the optimum choice for ($N,n,\varepsilon, θ$) is known in advance, PAWN might not allow to distinguish an influential, non-additive model input from a truly non-influential model input.
△ Less
Submitted 27 February, 2020; v1 submitted 9 April, 2019;
originally announced April 2019.
-
A new sample-based algorithms to compute the total sensitivity index
Authors:
Samuele Lo Piano,
Federico Ferretti,
Arnald Puy,
Daniel Albrecht,
Stefano Tarantola,
Andrea Saltelli
Abstract:
Variance-based sensitivity indices have established themselves as a reference among practitioners of sensitivity analysis of model output. It is not unusual to consider a variance-based sensitivity analysis as informative if it produces at least the first order sensitivity indices S_j and the so-called total-effect sensitivity indices T_j for all the uncertain factors of the mathematical model und…
▽ More
Variance-based sensitivity indices have established themselves as a reference among practitioners of sensitivity analysis of model output. It is not unusual to consider a variance-based sensitivity analysis as informative if it produces at least the first order sensitivity indices S_j and the so-called total-effect sensitivity indices T_j for all the uncertain factors of the mathematical model under analysis. Computational economy is critical in sensitivity analysis. It depends mostly upon the number of model evaluations needed to obtain stable values of the estimates. While efficient estimation procedures independent from the number of factors under analysis are available for the first order indices, this is less the case for the total sensitivity indices. When estimating T_j, one can either use a sample-based approach, whose computational cost depends fromon the number of factors, or approaches based on meta-modelling/emulators, e.g. based on Gaussian processes. The present work focuses on sample-based estimation procedures for T_j and tries different avenues to achieve an algorithmic improvement over the designs proposed in the existing best practices. We conclude that some proposed sample-based improvements found in the literature do not work as claimed, and that improving on the existing best practice is indeed fraught with difficulties. We motivate our conclusions introducing the concepts of explorativity and efficiency of the design.
△ Less
Submitted 8 May, 2019; v1 submitted 16 March, 2017;
originally announced March 2017.