Search | arXiv e-print repository

arXiv:2407.03268 [pdf, other]

For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives

Authors: Lia Morra, Antonio Santangelo, Pietro Basci, Luca Piano, Fabio Garcea, Fabrizio Lamberti, Massimo Leone

Abstract: Social networks are creating a digital world in which the cognitive, emotional, and pragmatic value of the imagery of human faces and bodies is arguably changing. However, researchers in the digital humanities are often ill-equipped to study these phenomena at scale. This work presents FRESCO (Face Representation in E-Societies through Computational Observation), a framework designed to explore th… ▽ More Social networks are creating a digital world in which the cognitive, emotional, and pragmatic value of the imagery of human faces and bodies is arguably changing. However, researchers in the digital humanities are often ill-equipped to study these phenomena at scale. This work presents FRESCO (Face Representation in E-Societies through Computational Observation), a framework designed to explore the socio-cultural implications of images on social media platforms at scale. FRESCO deconstructs images into numerical and categorical variables using state-of-the-art computer vision techniques, aligning with the principles of visual semiotics. The framework analyzes images across three levels: the plastic level, encompassing fundamental visual features like lines and colors; the figurative level, representing specific entities or concepts; and the enunciation level, which focuses particularly on constructing the point of view of the spectator and observer. These levels are analyzed to discern deeper narrative layers within the imagery. Experimental validation confirms the reliability and utility of FRESCO, and we assess its consistency and precision across two public datasets. Subsequently, we introduce the FRESCO score, a metric derived from the framework's output that serves as a reliable measure of similarity in image content. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2404.10474 [pdf, other]

doi 10.1109/DSAA60987.2023.10302486

Toward a Realistic Benchmark for Out-of-Distribution Detection

Authors: Pietro Recalcati, Fabio Garcea, Luca Piano, Fabrizio Lamberti, Lia Morra

Abstract: Deep neural networks are increasingly used in a wide range of technologies and services, but remain highly susceptible to out-of-distribution (OOD) samples, that is, drawn from a different distribution than the original training set. A common approach to address this issue is to endow deep neural networks with the ability to detect OOD samples. Several benchmarks have been proposed to design and v… ▽ More Deep neural networks are increasingly used in a wide range of technologies and services, but remain highly susceptible to out-of-distribution (OOD) samples, that is, drawn from a different distribution than the original training set. A common approach to address this issue is to endow deep neural networks with the ability to detect OOD samples. Several benchmarks have been proposed to design and validate OOD detection techniques. However, many of them are based on far-OOD samples drawn from very different distributions, and thus lack the complexity needed to capture the nuances of real-world scenarios. In this work, we introduce a comprehensive benchmark for OOD detection, based on ImageNet and Places365, that assigns individual classes as in-distribution or out-of-distribution depending on the semantic similarity with the training set. Several techniques can be used to determine which classes should be considered in-distribution, yielding benchmarks with varying properties. Experimental results on different OOD detection techniques show how their measured efficacy depends on the selected benchmark and how confidence-based techniques may outperform classifier-based ones on near-OOD samples. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Journal ref: 2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA)

arXiv:2403.14790 [pdf, other]

Latent Diffusion Models for Attribute-Preserving Image Anonymization

Authors: Luca Piano, Pietro Basci, Fabrizio Lamberti, Lia Morra

Abstract: Generative techniques for image anonymization have great potential to generate datasets that protect the privacy of those depicted in the images, while achieving high data fidelity and utility. Existing methods have focused extensively on preserving facial attributes, but failed to embrace a more comprehensive perspective that considers the scene and background into the anonymization process. This… ▽ More Generative techniques for image anonymization have great potential to generate datasets that protect the privacy of those depicted in the images, while achieving high data fidelity and utility. Existing methods have focused extensively on preserving facial attributes, but failed to embrace a more comprehensive perspective that considers the scene and background into the anonymization process. This paper presents, to the best of our knowledge, the first approach to image anonymization based on Latent Diffusion Models (LDMs). Every element of a scene is maintained to convey the same meaning, yet manipulated in a way that makes re-identification difficult. We propose two LDMs for this purpose: CAMOUFLaGE-Base exploits a combination of pre-trained ControlNets, and a new controlling mechanism designed to increase the distance between the real and anonymized images. CAMOFULaGE-Light is based on the Adapter technique, coupled with an encoding designed to efficiently represent the attributes of different persons in a scene. The former solution achieves superior performance on most metrics and benchmarks, while the latter cuts the inference time in half at the cost of fine-tuning a lightweight module. We show through extensive experimental comparison that the proposed method is competitive with the state-of-the-art concerning identity obfuscation whilst better preserving the original content of the image and tackling unresolved challenges that current solutions fail to address. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.08536 [pdf, other]

doi 10.1007/978-3-031-44067-0_25

HOLMES: HOLonym-MEronym based Semantic inspection for Convolutional Image Classifiers

Authors: Francesco Dibitonto, Fabio Garcea, André Panisson, Alan Perotti, Lia Morra

Abstract: Convolutional Neural Networks (CNNs) are nowadays the model of choice in Computer Vision, thanks to their ability to automatize the feature extraction process in visual tasks. However, the knowledge acquired during training is fully subsymbolic, and hence difficult to understand and explain to end users. In this paper, we propose a new technique called HOLMES (HOLonym-MEronym based Semantic inspec… ▽ More Convolutional Neural Networks (CNNs) are nowadays the model of choice in Computer Vision, thanks to their ability to automatize the feature extraction process in visual tasks. However, the knowledge acquired during training is fully subsymbolic, and hence difficult to understand and explain to end users. In this paper, we propose a new technique called HOLMES (HOLonym-MEronym based Semantic inspection) that decomposes a label into a set of related concepts, and provides component-level explanations for an image classification model. Specifically, HOLMES leverages ontologies, web scra** and transfer learning to automatically construct meronym (parts)-based detectors for a given holonym (class). Then, it produces heatmaps at the meronym level and finally, by probing the holonym CNN with occluded images, it highlights the importance of each part on the classification output. Compared to state-of-the-art saliency methods, HOLMES takes a step further and provides information about both where and what the holonym CNN is looking at, without relying on densely annotated datasets and without forcing concepts to be associated to single computational units. Extensive experimental evaluation on different categories of objects (animals, tools and vehicles) shows the feasibility of our approach. On average, HOLMES explanations include at least two meronyms, and the ablation of a single meronym roughly halves the holonym model confidence. The resulting heatmaps were quantitatively evaluated using the deletion/insertion/preservation curves. All metrics were comparable to those achieved by GradCAM, while offering the advantage of further decomposing the heatmap in human-understandable concepts, thus highlighting both the relevance of meronyms to object classification, as well as HOLMES ability to capture it. The code is available at https://github.com/FrancesC0de/HOLMES. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: This work has been accepted to be presented to The 1st World Conference on eXplainable Artificial Intelligence (xAI 2023), July 26-28, 2023 - Lisboa, Portugal

Journal ref: Longo, L. (eds) Explainable Artificial Intelligence. xAI 2023. Communications in Computer and Information Science, vol 1902. Springer, Cham

arXiv:2307.16019 [pdf, other]

Fuzzy Logic Visual Network (FLVN): A neuro-symbolic approach for visual features matching

Authors: Francesco Manigrasso, Lia Morra, Fabrizio Lamberti

Abstract: Neuro-symbolic integration aims at harnessing the power of symbolic knowledge representation combined with the learning capabilities of deep neural networks. In particular, Logic Tensor Networks (LTNs) allow to incorporate background knowledge in the form of logical axioms by grounding a first order logic language as differentiable operations between real tensors. Yet, few studies have investigate… ▽ More Neuro-symbolic integration aims at harnessing the power of symbolic knowledge representation combined with the learning capabilities of deep neural networks. In particular, Logic Tensor Networks (LTNs) allow to incorporate background knowledge in the form of logical axioms by grounding a first order logic language as differentiable operations between real tensors. Yet, few studies have investigated the potential benefits of this approach to improve zero-shot learning (ZSL) classification. In this study, we present the Fuzzy Logic Visual Network (FLVN) that formulates the task of learning a visual-semantic embedding space within a neuro-symbolic LTN framework. FLVN incorporates prior knowledge in the form of class hierarchies (classes and macro-classes) along with robust high-level inductive biases. The latter allow, for instance, to handle exceptions in class-level attributes, and to enforce similarity between images of the same class, preventing premature overfitting to seen classes and improving overall performance. FLVN reaches state of the art performance on the Generalized ZSL (GZSL) benchmarks AWA2 and CUB, improving by 1.3% and 3%, respectively. Overall, it achieves competitive performance to recent ZSL methods with less computational overhead. FLVN is available at https://gitlab.com/grains2/flvn. △ Less

Submitted 29 July, 2023; originally announced July 2023.

Comments: Accepted for publication at ICIAP 2023

arXiv:2304.07883 [pdf, other]

doi 10.1109/WACV56688.2023.00486

Bent & Broken Bicycles: Leveraging synthetic data for damaged object re-identification

Authors: Luca Piano, Filippo Gabriele Pratticò, Alessandro Sebastian Russo, Lorenzo Lanari, Lia Morra, Fabrizio Lamberti

Abstract: Instance-level object re-identification is a fundamental computer vision task, with applications from image retrieval to intelligent monitoring and fraud detection. In this work, we propose the novel task of damaged object re-identification, which aims at distinguishing changes in visual appearance due to deformations or missing parts from subtle intra-class variations. To explore this task, we le… ▽ More Instance-level object re-identification is a fundamental computer vision task, with applications from image retrieval to intelligent monitoring and fraud detection. In this work, we propose the novel task of damaged object re-identification, which aims at distinguishing changes in visual appearance due to deformations or missing parts from subtle intra-class variations. To explore this task, we leverage the power of computer-generated imagery to create, in a semi-automatic fashion, high-quality synthetic images of the same bike before and after a damage occurs. The resulting dataset, Bent & Broken Bicycles (BBBicycles), contains 39,200 images and 2,800 unique bike instances spanning 20 different bike models. As a baseline for this task, we propose TransReI3D, a multi-task, transformer-based deep network unifying damage detection (framed as a multi-label classification task) with object re-identification. The BBBicycles dataset is available at https://huggingface.co/datasets/GrainsPolito/BBBicycles △ Less

Submitted 16 April, 2023; originally announced April 2023.

Journal ref: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023, pp. 4881-4891

arXiv:2207.00433 [pdf, other]

PROTOtypical Logic Tensor Networks (PROTO-LTN) for Zero Shot Learning

Authors: Simone Martone, Francesco Manigrasso, Lamberti Fabrizio, Lia Morra

Abstract: Semantic image interpretation can vastly benefit from approaches that combine sub-symbolic distributed representation learning with the capability to reason at a higher level of abstraction. Logic Tensor Networks (LTNs) are a class of neuro-symbolic systems based on a differentiable, first-order logic grounded into a deep neural network. LTNs replace the classical concept of training set with a kn… ▽ More Semantic image interpretation can vastly benefit from approaches that combine sub-symbolic distributed representation learning with the capability to reason at a higher level of abstraction. Logic Tensor Networks (LTNs) are a class of neuro-symbolic systems based on a differentiable, first-order logic grounded into a deep neural network. LTNs replace the classical concept of training set with a knowledge base of fuzzy logical axioms. By defining a set of differentiable operators to approximate the role of connectives, predicates, functions and quantifiers, a loss function is automatically specified so that LTNs can learn to satisfy the knowledge base. We focus here on the subsumption or \texttt{isOfClass} predicate, which is fundamental to encode most semantic image interpretation tasks. Unlike conventional LTNs, which rely on a separate predicate for each class (e.g., dog, cat), each with its own set of learnable weights, we propose a common \texttt{isOfClass} predicate, whose level of truth is a function of the distance between an object embedding and the corresponding class prototype. The PROTOtypical Logic Tensor Networks (PROTO-LTN) extend the current formulation by grounding abstract concepts as parametrized class prototypes in a high-dimensional embedding space, while reducing the number of parameters required to ground the knowledge base. We show how this architecture can be effectively trained in the few and zero-shot learning scenarios. Experiments on Generalized Zero Shot Learning benchmarks validate the proposed implementation as a competitive alternative to traditional embedding-based approaches. The proposed formulation opens up new opportunities in zero shot learning settings, as the LTN formalism allows to integrate background knowledge in the form of logical axioms to compensate for the lack of labelled examples. △ Less

Submitted 26 June, 2022; originally announced July 2022.

arXiv:2107.01877 [pdf, other]

doi 10.1007/978-3-030-86340-1_4

Faster-LTN: a neuro-symbolic, end-to-end object detection architecture

Authors: Francesco Manigrasso, Filomeno Davide Miro, Lia Morra, Fabrizio Lamberti

Abstract: The detection of semantic relationships between objects represented in an image is one of the fundamental challenges in image interpretation. Neural-Symbolic techniques, such as Logic Tensor Networks (LTNs), allow the combination of semantic knowledge representation and reasoning with the ability to efficiently learn from examples typical of neural networks. We here propose Faster-LTN, an object d… ▽ More The detection of semantic relationships between objects represented in an image is one of the fundamental challenges in image interpretation. Neural-Symbolic techniques, such as Logic Tensor Networks (LTNs), allow the combination of semantic knowledge representation and reasoning with the ability to efficiently learn from examples typical of neural networks. We here propose Faster-LTN, an object detector composed of a convolutional backbone and an LTN. To the best of our knowledge, this is the first attempt to combine both frameworks in an end-to-end training setting. This architecture is trained by optimizing a grounded theory which combines labelled examples with prior knowledge, in the form of logical axioms. Experimental comparisons show competitive performance with respect to the traditional Faster R-CNN architecture. △ Less

Submitted 5 July, 2021; originally announced July 2021.

Comments: accepted for presentation at ICANN 2021

arXiv:2104.12218 [pdf, other]

doi 10.1109/ACCESS.2021.3072997

Breast Mass Detection with Faster R-CNN: On the Feasibility of Learning from Noisy Annotations

Authors: Sina Famouri, Lia Morra, Leonardo Mangia, Fabrizio Lamberti

Abstract: In this work we study the impact of noise on the training of object detection networks for the medical domain, and how it can be mitigated by improving the training procedure. Annotating large medical datasets for training data-hungry deep learning models is expensive and time consuming. Leveraging information that is already collected in clinical practice, in the form of text reports, bookmarks o… ▽ More In this work we study the impact of noise on the training of object detection networks for the medical domain, and how it can be mitigated by improving the training procedure. Annotating large medical datasets for training data-hungry deep learning models is expensive and time consuming. Leveraging information that is already collected in clinical practice, in the form of text reports, bookmarks or lesion measurements would substantially reduce this cost. Obtaining precise lesion bounding boxes through automatic mining procedures, however, is difficult. We provide here a quantitative evaluation of the effect of bounding box coordinate noise on the performance of Faster R-CNN object detection networks for breast mass detection. Varying degrees of noise are simulated by randomly modifying the bounding boxes: in our experiments, bounding boxes could be enlarged up to six times the original size. The noise is injected in the CBIS-DDSM collection, a well curated public mammography dataset for which accurate lesion location is available. We show how, due to an imperfect matching between the ground truth and the network bounding box proposals, the noise is propagated during training and reduces the ability of the network to correctly classify lesions from background. When using the standard Intersection over Union criterion, the area under the FROC curve decreases by up to 9%. A novel matching criterion is proposed to improve tolerance to noise. △ Less

Submitted 25 April, 2021; originally announced April 2021.

Journal ref: IEEE Access, 2021

arXiv:2102.02783 [pdf, other]

Comparing State-of-the-Art and Emerging Augmented Reality Interfaces for Autonomous Vehicle-to-Pedestrian Communication

Authors: F. Gabriele Pratticò, Fabrizio Lamberti, Alberto Cannavò, Lia Morra, Paolo Montuschi

Abstract: Providing pedestrians and other vulnerable road users with a clear indication about a fully autonomous vehicle status and intentions is crucial to make them coexist. In the last few years, a variety of external interfaces have been proposed, leveraging different paradigms and technologies including vehicle-mounted devices (like LED panels), short-range on-road projections, and road infrastructure… ▽ More Providing pedestrians and other vulnerable road users with a clear indication about a fully autonomous vehicle status and intentions is crucial to make them coexist. In the last few years, a variety of external interfaces have been proposed, leveraging different paradigms and technologies including vehicle-mounted devices (like LED panels), short-range on-road projections, and road infrastructure interfaces (e.g., special asphalts with embedded displays). These designs were experimented in different settings, using mockups, specially prepared vehicles, or virtual environments, with heterogeneous evaluation metrics. Promising interfaces based on Augmented Reality (AR) have been proposed too, but their usability and effectiveness have not been tested yet. This paper aims to complement such body of literature by presenting a comparison of state-of-the-art interfaces and new designs under common conditions. To this aim, an immersive Virtual Reality-based simulation was developed, recreating a well-known scenario represented by pedestrians crossing in urban environments under non-regulated conditions. A user study was then performed to investigate the various dimensions of vehicle-to-pedestrian interaction leveraging objective and subjective metrics. Even though no interface clearly stood out over all the considered dimensions, one of the AR designs achieved state-of-the-art results in terms of safety and trust, at the cost of higher cognitive effort and lower intuitiveness compared to LED panels showing anthropomorphic features. Together with rankings on the various dimensions, indications about advantages and drawbacks of the various alternatives that emerged from this study could provide important information for next developments in the field. △ Less

Submitted 4 February, 2021; originally announced February 2021.

Comments: Accepted for publication in IEEE Transactions on Vehicular Technology

arXiv:2007.13371 [pdf, other]

doi 10.1109/TVT.2019.2933601

Building Trust in Autonomous Vehicles: Role of Virtual Reality Driving Simulators in HMI Design

Authors: Lia Morra, Fabrizio Lamberti, F. Gabriele Pratticó, Salvatore La Rosa, Paolo Montuschi

Abstract: The investigation of factors contributing at making humans trust Autonomous Vehicles (AVs) will play a fundamental role in the adoption of such technology. The user's ability to form a mental model of the AV, which is crucial to establish trust, depends on effective user-vehicle communication; thus, the importance of Human-Machine Interaction (HMI) is poised to increase. In this work, we propose a… ▽ More The investigation of factors contributing at making humans trust Autonomous Vehicles (AVs) will play a fundamental role in the adoption of such technology. The user's ability to form a mental model of the AV, which is crucial to establish trust, depends on effective user-vehicle communication; thus, the importance of Human-Machine Interaction (HMI) is poised to increase. In this work, we propose a methodology to validate the user experience in AVs based on continuous, objective information gathered from physiological signals, while the user is immersed in a Virtual Reality-based driving simulation. We applied this methodology to the design of a head-up display interface delivering visual cues about the vehicle' sensory and planning systems. Through this approach, we obtained qualitative and quantitative evidence that a complete picture of the vehicle's surrounding, despite the higher cognitive load, is conducive to a less stressful experience. Moreover, after having been exposed to a more informative interface, users involved in the study were also more willing to test a real AV. The proposed methodology could be extended by adjusting the simulation environment, the HMI and/or the vehicle's Artificial Intelligence modules to dig into other aspects of the user experience. △ Less

Submitted 27 July, 2020; originally announced July 2020.

Journal ref: IEEE Transactions on Vehicular Technology, 68(10), pp.9438-9450, 2019

arXiv:2006.12061 [pdf, other]

doi 10.1007/978-3-030-50516-5_9

Object Tracking through Residual and Dense LSTMs

Authors: Fabio Garcea, Alessandro Cucco, Lia Morra, Fabrizio Lamberti

Abstract: Visual object tracking task is constantly gaining importance in several fields of application as traffic monitoring, robotics, and surveillance, to name a few. Dealing with changes in the appearance of the tracked object is paramount to achieve high tracking accuracy, and is usually achieved by continually learning features. Recently, deep learning-based trackers based on LSTMs (Long Short-Term Me… ▽ More Visual object tracking task is constantly gaining importance in several fields of application as traffic monitoring, robotics, and surveillance, to name a few. Dealing with changes in the appearance of the tracked object is paramount to achieve high tracking accuracy, and is usually achieved by continually learning features. Recently, deep learning-based trackers based on LSTMs (Long Short-Term Memory) recurrent neural networks have emerged as a powerful alternative, bypassing the need to retrain the feature extraction in an online fashion. Inspired by the success of residual and dense networks in image recognition, we propose here to enhance the capabilities of hybrid trackers using residual and/or dense LSTMs. By introducing skip connections, it is possible to increase the depth of the architecture while ensuring a fast convergence. Experimental results on the Re3 tracker show that DenseLSTMs outperform Residual and regular LSTM, and offer a higher resilience to nuisances such as occlusions and out-of-view objects. Our case study supports the adoption of residual-based RNNs for enhancing the robustness of other trackers. △ Less

Submitted 22 June, 2020; originally announced June 2020.

Journal ref: Proceedings of 17th International Conference On Image Analysis and Recognition (ICIAR 2020)

arXiv:2005.10589 [pdf, other]

Bridging the gap between Natural and Medical Images through Deep Colorization

Authors: Lia Morra, Luca Piano, Fabrizio Lamberti, Tatiana Tommasi

Abstract: Deep learning has thrived by training on large-scale datasets. However, in many applications, as for medical image diagnosis, getting massive amount of data is still prohibitive due to privacy, lack of acquisition homogeneity and annotation cost. In this scenario, transfer learning from natural image collections is a standard practice that attempts to tackle shape, texture and color discrepancies… ▽ More Deep learning has thrived by training on large-scale datasets. However, in many applications, as for medical image diagnosis, getting massive amount of data is still prohibitive due to privacy, lack of acquisition homogeneity and annotation cost. In this scenario, transfer learning from natural image collections is a standard practice that attempts to tackle shape, texture and color discrepancies all at once through pretrained model fine-tuning. In this work, we propose to disentangle those challenges and design a dedicated network module that focuses on color adaptation. We combine learning from scratch of the color module with transfer learning of different classification backbones, obtaining an end-to-end, easy-to-train architecture for diagnostic image recognition on X-ray images. Extensive experiments showed how our approach is particularly efficient in case of data scarcity and provides a new path for further transferring the learned color information across multiple medical datasets. △ Less

Submitted 19 October, 2020; v1 submitted 21 May, 2020; originally announced May 2020.

Comments: accepted for publication at ICPR2020

arXiv:2004.04147 [pdf, other]

doi 10.1007/978-3-030-50347-5_11

Slicing and dicing soccer: automatic detection of complex events from spatio-temporal data

Authors: Lia Morra, Francesco Manigrasso, Giuseppe Canto, Claudio Gianfrate, Enrico Guarino, Fabrizio Lamberti

Abstract: The automatic detection of events in sport videos has im-portant applications for data analytics, as well as for broadcasting andmedia companies. This paper presents a comprehensive approach for de-tecting a wide range of complex events in soccer videos starting frompositional data. The event detector is designed as a two-tier system thatdetectsatomicandcomplex events. Atomic events are detected b… ▽ More The automatic detection of events in sport videos has im-portant applications for data analytics, as well as for broadcasting andmedia companies. This paper presents a comprehensive approach for de-tecting a wide range of complex events in soccer videos starting frompositional data. The event detector is designed as a two-tier system thatdetectsatomicandcomplex events. Atomic events are detected basedon temporal and logical combinations of the detected objects, their rel-ative distances, as well as spatio-temporal features such as velocity andacceleration. Complex events are defined as temporal and logical com-binations of atomic and complex events, and are expressed by meansof a declarative Interval Temporal Logic (ITL). The effectiveness of theproposed approach is demonstrated over 16 different events, includingcomplex situations such as tackles and filtering passes. By formalizingevents based on principled ITL, it is possible to easily perform reason-ing tasks, such as understanding which passes or crosses result in a goalbeing scored. To counterbalance the lack of suitable, annotated publicdatasets, we built on an open source soccer simulation engine to re-lease the synthetic SoccER (Soccer Event Recognition) dataset, whichincludes complete positional data and annotations for more than 1.6 mil-lion atomic events and 9,000 complex events. The dataset and code areavailable at https://gitlab.com/grains2/slicing-and-dicing-soccer △ Less

Submitted 10 April, 2020; v1 submitted 8 April, 2020; originally announced April 2020.

Comments: accepted at 17th International Conference on Image Analysis and Recognition ICIAR 2020

arXiv:1907.02821 [pdf, other]

doi 10.1016/j.eswa.2019.05.002

Benchmarking unsupervised near-duplicate image detection

Authors: Lia Morra, Fabrizio Lamberti

Abstract: Unsupervised near-duplicate detection has many practical applications ranging from social media analysis and web-scale retrieval, to digital image forensics. It entails running a threshold-limited query on a set of descriptors extracted from the images, with the goal of identifying all possible near-duplicates, while limiting the false positives due to visually similar images. Since the rate of fa… ▽ More Unsupervised near-duplicate detection has many practical applications ranging from social media analysis and web-scale retrieval, to digital image forensics. It entails running a threshold-limited query on a set of descriptors extracted from the images, with the goal of identifying all possible near-duplicates, while limiting the false positives due to visually similar images. Since the rate of false alarms grows with the dataset size, a very high specificity is thus required, up to $1 - 10^{-9}$ for realistic use cases; this important requirement, however, is often overlooked in literature. In recent years, descriptors based on deep convolutional neural networks have matched or surpassed traditional feature extraction methods in content-based image retrieval tasks. To the best of our knowledge, ours is the first attempt to establish the performance range of deep learning-based descriptors for unsupervised near-duplicate detection on a range of datasets, encompassing a broad spectrum of near-duplicate definitions. We leverage both established and new benchmarks, such as the Mir-Flick Near-Duplicate (MFND) dataset, in which a known ground truth is provided for all possible pairs over a general, large scale image collection. To compare the specificity of different descriptors, we reduce the problem of unsupervised detection to that of binary classification of near-duplicate vs. not-near-duplicate images. The latter can be conveniently characterized using Receiver Operating Curve (ROC). Our findings in general favor the choice of fine-tuning deep convolutional networks, as opposed to using off-the-shelf features, but differences at high specificity settings depend on the dataset and are often small. The best performance was observed on the MFND benchmark, achieving 96\% sensitivity at a false positive rate of $1.43 \times 10^{-6}$. △ Less

Submitted 3 July, 2019; originally announced July 2019.

Comments: Accepted for publication in Expert Systems with Applications

Journal ref: Expert Systems with Applications, online first, 2019

arXiv:1811.05324 [pdf]

doi 10.1007/s00330-015-3784-2

Mammographic density: Comparison of visual assessment with fully automatic calculation on a multivendor dataset

Authors: Daniela Sacchetto, Lia Morra, Silvano Agliozzo, Daniela Bernardi, Tomas Bjorklund, Beniamino Brancato, Patrizia Bravetti, Luca A. Carbonaro, Loredana Correale, Carmen Fantò, Elisabetta Favettini, Laura Martincich, Luisella Milanesio, Sara Mombelloni, Francesco Monetti, Doralba Morrone, Marco Pellegrini, Barbara Pesce, Antonella Petrillo, Gianni Saguatti, Carmen Stevanin, Rubina M. Trimboli, Paola Tuttobene, Marvi Valentini, Vincenzo Marra , et al. (3 additional authors not shown)

Abstract: Objectives: To compare breast density (BD) assessment provided by an automated BD evaluator (ABDE) with that provided by a panel of experienced breast radiologists, on a multivendor dataset. Methods: Twenty-one radiologists assessed 613 screening/diagnostic digital mammograms from 9 centers and 6 different vendors, using the BI-RADS a, b, c, and d density classification. The same mammograms were… ▽ More Objectives: To compare breast density (BD) assessment provided by an automated BD evaluator (ABDE) with that provided by a panel of experienced breast radiologists, on a multivendor dataset. Methods: Twenty-one radiologists assessed 613 screening/diagnostic digital mammograms from 9 centers and 6 different vendors, using the BI-RADS a, b, c, and d density classification. The same mammograms were also evaluated by an ABDE providing the ratio between fibroglandular and total breast area on a continuous scale and, automatically, the BI-RADS score. Panel majority report (PMR) was used as reference standard. Agreement (k) and accuracy (proportion of cases correctly classified) were calculated for binary (BI-RADS a-b versus c-d) and 4-class classification. Results: While the agreement of individual radiologists with PMR ranged from k=0.483 to k=0.885, the ABDE correctly classified 563/613 mammograms (92%). A substantial agreement for binary classification was found for individual reader pairs (k=0.620, standard deviation [SD]=0.140), individual versus PMR (k=0.736, SD=0.117), and individual versus ABDE (k=0.674, SD=0.095). Agreement between ABDE and PMR was almost perfect (k=0.831). Conclusions: The ABDE showed an almost perfect agreement with a 21-radiologist panel in binary BD classification on a multivendor dataset, earning a chance as a reproducible alternative to visual evaluation. △ Less

Submitted 13 November, 2018; originally announced November 2018.

Journal ref: Sacchetto, Daniela et al. "Mammographic density: comparison of visual assessment with fully automatic calculation on a multivendor dataset." European radiology 26, no. 1 (2016): 175-183

Showing 1–16 of 16 results for author: Morra, L