-
Unsupervised Contrastive Analysis for Salient Pattern Detection using Conditional Diffusion Models
Authors:
Cristiano Patrício,
Carlo Alberto Barbano,
Attilio Fiandrotti,
Riccardo Renzulli,
Marco Grangetto,
Luis F. Teixeira,
João C. Neves
Abstract:
Contrastive Analysis (CA) regards the problem of identifying patterns in images that allow distinguishing between a background (BG) dataset (i.e. healthy subjects) and a target (TG) dataset (i.e. unhealthy subjects). Recent works on this topic rely on variational autoencoders (VAE) or contrastive learning strategies to learn the patterns that separate TG samples from BG samples in a supervised man…
▽ More
Contrastive Analysis (CA) regards the problem of identifying patterns in images that allow distinguishing between a background (BG) dataset (i.e. healthy subjects) and a target (TG) dataset (i.e. unhealthy subjects). Recent works on this topic rely on variational autoencoders (VAE) or contrastive learning strategies to learn the patterns that separate TG samples from BG samples in a supervised manner. However, the dependency on target (unhealthy) samples can be challenging in medical scenarios due to their limited availability. Also, the blurred reconstructions of VAEs lack utility and interpretability. In this work, we redefine the CA task by employing a self-supervised contrastive encoder to learn a latent representation encoding only common patterns from input images, using samples exclusively from the BG dataset during training, and approximating the distribution of the target patterns by leveraging data augmentation techniques. Subsequently, we exploit state-of-the-art generative methods, i.e. diffusion models, conditioned on the learned latent representation to produce a realistic (healthy) version of the input image encoding solely the common patterns. Thorough validation on a facial image dataset and experiments across three brain MRI datasets demonstrate that conditioning the generative process of state-of-the-art generative methods with the latent representation from our self-supervised contrastive encoder yields improvements in the generated image quality and in the accuracy of image classification. The code is available at https://github.com/CristianoPatricio/unsupervised-contrastive-cond-diff.
△ Less
Submitted 4 June, 2024; v1 submitted 2 June, 2024;
originally announced June 2024.
-
A Multilevel Strategy to Improve People Tracking in a Real-World Scenario
Authors:
Cristiano B. de Oliveira,
Joao C. Neves,
Rafael O. Ribeiro,
David Menotti
Abstract:
The Palácio do Planalto, office of the President of Brazil, was invaded by protesters on January 8, 2023. Surveillance videos taken from inside the building were subsequently released by the Brazilian Supreme Court for public scrutiny. We used segments of such footage to create the UFPR-Planalto801 dataset for people tracking and re-identification in a real-world scenario. This dataset consists of…
▽ More
The Palácio do Planalto, office of the President of Brazil, was invaded by protesters on January 8, 2023. Surveillance videos taken from inside the building were subsequently released by the Brazilian Supreme Court for public scrutiny. We used segments of such footage to create the UFPR-Planalto801 dataset for people tracking and re-identification in a real-world scenario. This dataset consists of more than 500,000 images. This paper presents a tracking approach targeting this dataset. The method proposed in this paper relies on the use of known state-of-the-art trackers combined in a multilevel hierarchy to correct the ID association over the trajectories. We evaluated our method using IDF1, MOTA, MOTP and HOTA metrics. The results show improvements for every tracker used in the experiments, with IDF1 score increasing by a margin up to 9.5%.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
The minimum distance of a parameterized code over an even cycle
Authors:
Eduardo Camps-Moreno,
Jorge Neves,
Eliseo Sarmiento
Abstract:
We compute the minimum distance of the parameterized code of order 1 over an even cycle.
We compute the minimum distance of the parameterized code of order 1 over an even cycle.
△ Less
Submitted 18 March, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Towards Concept-based Interpretability of Skin Lesion Diagnosis using Vision-Language Models
Authors:
Cristiano Patrício,
Luís F. Teixeira,
João C. Neves
Abstract:
Concept-based models naturally lend themselves to the development of inherently interpretable skin lesion diagnosis, as medical experts make decisions based on a set of visual patterns of the lesion. Nevertheless, the development of these models depends on the existence of concept-annotated datasets, whose availability is scarce due to the specialized knowledge and expertise required in the annota…
▽ More
Concept-based models naturally lend themselves to the development of inherently interpretable skin lesion diagnosis, as medical experts make decisions based on a set of visual patterns of the lesion. Nevertheless, the development of these models depends on the existence of concept-annotated datasets, whose availability is scarce due to the specialized knowledge and expertise required in the annotation process. In this work, we show that vision-language models can be used to alleviate the dependence on a large number of concept-annotated samples. In particular, we propose an embedding learning strategy to adapt CLIP to the downstream task of skin lesion classification using concept-based descriptions as textual embeddings. Our experiments reveal that vision-language models not only attain better accuracy when using concepts as textual embeddings, but also require a smaller number of concept-annotated samples to attain comparable performance to approaches specifically devised for automatic concept generation.
△ Less
Submitted 6 March, 2024; v1 submitted 24 November, 2023;
originally announced November 2023.
-
Embedding Aggregation for Forensic Facial Comparison
Authors:
Rafael Oliveira Ribeiro,
João C. R. Neves,
Arnout C. C. Ruifrok,
Flavio de Barros Vidal
Abstract:
In forensic facial comparison, questioned-source images are usually captured in uncontrolled environments, with non-uniform lighting, and from non-cooperative subjects. The poor quality of such material usually compromises their value as evidence in legal matters. On the other hand, in forensic casework, multiple images of the person of interest are usually available. In this paper, we propose to…
▽ More
In forensic facial comparison, questioned-source images are usually captured in uncontrolled environments, with non-uniform lighting, and from non-cooperative subjects. The poor quality of such material usually compromises their value as evidence in legal matters. On the other hand, in forensic casework, multiple images of the person of interest are usually available. In this paper, we propose to aggregate deep neural network embeddings from various images of the same person to improve performance in facial verification. We observe significant performance improvements, especially for very low-quality images. Further improvements are obtained by aggregating embeddings of more images and by applying quality-weighted aggregation. We demonstrate the benefits of this approach in forensic evaluation settings with the development and validation of score-based likelihood ratio systems and report improvements in Cllr of up to 95% (from 0.249 to 0.012) for CCTV images and of up to 96% (from 0.083 to 0.003) for social media images.
△ Less
Submitted 29 April, 2023;
originally announced May 2023.
-
Coherent Concept-based Explanations in Medical Image and Its Application to Skin Lesion Diagnosis
Authors:
Cristiano Patrício,
João C. Neves,
Luís F. Teixeira
Abstract:
Early detection of melanoma is crucial for preventing severe complications and increasing the chances of successful treatment. Existing deep learning approaches for melanoma skin lesion diagnosis are deemed black-box models, as they omit the rationale behind the model prediction, compromising the trustworthiness and acceptability of these diagnostic methods. Attempts to provide concept-based expla…
▽ More
Early detection of melanoma is crucial for preventing severe complications and increasing the chances of successful treatment. Existing deep learning approaches for melanoma skin lesion diagnosis are deemed black-box models, as they omit the rationale behind the model prediction, compromising the trustworthiness and acceptability of these diagnostic methods. Attempts to provide concept-based explanations are based on post-hoc approaches, which depend on an additional model to derive interpretations. In this paper, we propose an inherently interpretable framework to improve the interpretability of concept-based models by incorporating a hard attention mechanism and a coherence loss term to assure the visual coherence of concept activations by the concept encoder, without requiring the supervision of additional annotations. The proposed framework explains its decision in terms of human-interpretable concepts and their respective contribution to the final prediction, as well as a visual interpretation of the locations where the concept is present in the image. Experiments on skin image datasets demonstrate that our method outperforms existing black-box and concept-based models for skin lesion classification.
△ Less
Submitted 17 April, 2023; v1 submitted 10 April, 2023;
originally announced April 2023.
-
Face Super-Resolution Using Stochastic Differential Equations
Authors:
Marcelo dos Santos,
Rayson Laroca,
Rafael O. Ribeiro,
João Neves,
Hugo Proença,
David Menotti
Abstract:
Diffusion models have proven effective for various applications such as images, audio and graph generation. Other important applications are image super-resolution and the solution of inverse problems. More recently, some works have used stochastic differential equations (SDEs) to generalize diffusion models to continuous time. In this work, we introduce SDEs to generate super-resolution face imag…
▽ More
Diffusion models have proven effective for various applications such as images, audio and graph generation. Other important applications are image super-resolution and the solution of inverse problems. More recently, some works have used stochastic differential equations (SDEs) to generalize diffusion models to continuous time. In this work, we introduce SDEs to generate super-resolution face images. To the best of our knowledge, this is the first time SDEs have been used for such an application. The proposed method provides an improved peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and consistency than the existing super-resolution methods based on diffusion models. In particular, we also assess the potential application of this method for the face recognition task. A generic facial feature extractor is used to compare the super-resolution images with the ground truth and superior results were obtained compared with other methods. Our code is publicly available at https://github.com/marcelowds/sr-sde
△ Less
Submitted 24 September, 2022;
originally announced September 2022.
-
Explainable Deep Learning Methods in Medical Image Classification: A Survey
Authors:
Cristiano Patrício,
João C. Neves,
Luís F. Teixeira
Abstract:
The remarkable success of deep learning has prompted interest in its application to medical imaging diagnosis. Even though state-of-the-art deep learning models have achieved human-level accuracy on the classification of different types of medical data, these models are hardly adopted in clinical workflows, mainly due to their lack of interpretability. The black-box-ness of deep learning models ha…
▽ More
The remarkable success of deep learning has prompted interest in its application to medical imaging diagnosis. Even though state-of-the-art deep learning models have achieved human-level accuracy on the classification of different types of medical data, these models are hardly adopted in clinical workflows, mainly due to their lack of interpretability. The black-box-ness of deep learning models has raised the need for devising strategies to explain the decision process of these models, leading to the creation of the topic of eXplainable Artificial Intelligence (XAI). In this context, we provide a thorough survey of XAI applied to medical imaging diagnosis, including visual, textual, example-based and concept-based explanation methods. Moreover, this work reviews the existing medical imaging datasets and the existing metrics for evaluating the quality of the explanations. In addition, we include a performance comparison among a set of report generation-based methods. Finally, the major challenges in applying XAI to medical imaging and the future research directions on the topic are also discussed.
△ Less
Submitted 19 September, 2023; v1 submitted 10 May, 2022;
originally announced May 2022.
-
Parameterized codes over graphs
Authors:
Jorge Neves,
Maria Vaz Pinto
Abstract:
In this article we review known results on parameterized linear codes over graphs, introduced by Rentería, Simis and Villarreal in 2011. Very little is known about their basic parameters and invariants. We review in detail the parameters dimension, regularity and minimum distance. As regards the parameter dimension, we explore the connection to Eulerian ideals in the ternary case and we give new c…
▽ More
In this article we review known results on parameterized linear codes over graphs, introduced by Rentería, Simis and Villarreal in 2011. Very little is known about their basic parameters and invariants. We review in detail the parameters dimension, regularity and minimum distance. As regards the parameter dimension, we explore the connection to Eulerian ideals in the ternary case and we give new combinatorial formulas.
△ Less
Submitted 14 December, 2021;
originally announced December 2021.
-
Generative Adversarial Graph Convolutional Networks for Human Action Synthesis
Authors:
Bruno Degardin,
João Neves,
Vasco Lopes,
João Brito,
Ehsan Yaghoubi,
Hugo Proença
Abstract:
Synthesising the spatial and temporal dynamics of the human body skeleton remains a challenging task, not only in terms of the quality of the generated shapes, but also of their diversity, particularly to synthesise realistic body movements of a specific action (action conditioning). In this paper, we propose Kinetic-GAN, a novel architecture that leverages the benefits of Generative Adversarial N…
▽ More
Synthesising the spatial and temporal dynamics of the human body skeleton remains a challenging task, not only in terms of the quality of the generated shapes, but also of their diversity, particularly to synthesise realistic body movements of a specific action (action conditioning). In this paper, we propose Kinetic-GAN, a novel architecture that leverages the benefits of Generative Adversarial Networks and Graph Convolutional Networks to synthesise the kinetics of the human body. The proposed adversarial architecture can condition up to 120 different actions over local and global body movements while improving sample quality and diversity through latent space disentanglement and stochastic variations. Our experiments were carried out in three well-known datasets, where Kinetic-GAN notably surpasses the state-of-the-art methods in terms of distribution quality metrics while having the ability to synthesise more than one order of magnitude regarding the number of different actions. Our code and models are publicly available at https://github.com/DegardinBruno/Kinetic-GAN.
△ Less
Submitted 25 October, 2021; v1 submitted 21 October, 2021;
originally announced October 2021.
-
ZSpeedL -- Evaluating the Performance of Zero-Shot Learning Methods using Low-Power Devices
Authors:
Cristiano Patrício,
João Neves
Abstract:
The recognition of unseen objects from a semantic representation or textual description, usually denoted as zero-shot learning, is more prone to be used in real-world scenarios when compared to traditional object recognition. Nevertheless, no work has evaluated the feasibility of deploying zero-shot learning approaches in these scenarios, particularly when using low-power devices. In this paper, w…
▽ More
The recognition of unseen objects from a semantic representation or textual description, usually denoted as zero-shot learning, is more prone to be used in real-world scenarios when compared to traditional object recognition. Nevertheless, no work has evaluated the feasibility of deploying zero-shot learning approaches in these scenarios, particularly when using low-power devices. In this paper, we provide the first benchmark on the inference time of zero-shot learning, comprising an evaluation of state-of-the-art approaches regarding their speed/accuracy trade-off. An analysis to the processing time of the different phases of the ZSL inference stage reveals that visual feature extraction is the major bottleneck in this paradigm, but, we show that lightweight networks can dramatically reduce the overall inference time without reducing the accuracy obtained by the de facto ResNet101 architecture. Also, this benchmark evaluates how different ZSL approaches perform in low-power devices, and how the visual feature extraction phase could be optimized in this hardware. To foster the research and deployment of ZSL systems capable of operating in real-world scenarios, we release the evaluation framework used in this benchmark (https://github.com/CristianoPatricio/zsl-methods).
△ Less
Submitted 9 October, 2021;
originally announced October 2021.
-
Unlocking New York City Crime Insights using Relational Database Embeddings
Authors:
Apoorva Nitsure,
Rajesh Bordawekar,
Jose Neves
Abstract:
This version withdrawn by arXiv administrators because the author did not have the right to agree to our license at the time of submission.
This version withdrawn by arXiv administrators because the author did not have the right to agree to our license at the time of submission.
△ Less
Submitted 20 May, 2020; v1 submitted 19 May, 2020;
originally announced May 2020.
-
Kollaps: Decentralized and Dynamic Topology Emulation
Authors:
Paulo Gouveia,
João Neves,
Carlos Segarra,
Luca Liechti,
Shady Issa,
Valerio Schiavoni,
Miguel Matos
Abstract:
The performance and behavior of large-scale distributed applications is highly influenced by network properties such as latency, bandwidth, packet loss, and jitter. For instance, an engineer might need to answer questions such as: What is the impact of an increase in network latency in application response time? How does moving a cluster between geographical regions affect application throughput?…
▽ More
The performance and behavior of large-scale distributed applications is highly influenced by network properties such as latency, bandwidth, packet loss, and jitter. For instance, an engineer might need to answer questions such as: What is the impact of an increase in network latency in application response time? How does moving a cluster between geographical regions affect application throughput? How network dynamics affects application stability? Answering these questions in a systematic and reproducible way is very hard, given the variability and lack of control over the underlying network. Unfortunately, state-of-the-art network emulation or testbeds scale poorly (i.e., MiniNet), focus exclusively on the control-plane (i.e., CrystalNet) or ignore network dynamics (i.e., EmuLab). Kollaps is a fully distributed network emulator that address these limitations. Kollaps hinges on two key observations. First, from an application's perspective, what matters are the emergent end-to-end properties (e.g., latency, bandwidth, packet loss, and jitter) rather than the internal state of the routers and switches leading to those properties. This premise allows us to build a simpler, dynamically adaptable, emulation model that circumvent maintaining the full network state. Second, this simplified model is maintainable in a fully decentralized way, allowing the emulation to scale with the number of machines for the application. Kollaps is fully decentralized, agnostic of the application language and transport protocol, scales to thousands of processes and is accurate when compared against a bare-metal deployment or state-of-the-art approaches that emulate the full state of the network. We showcase how Kollaps can accurately reproduce results from the literature and predict the behaviour of a complex unmodified distributed key-value store (i.e., Cassandra) under different deployments.
△ Less
Submitted 5 April, 2020;
originally announced April 2020.
-
An Attention-Based Deep Learning Model for Multiple Pedestrian Attributes Recognition
Authors:
Ehsan Yaghoubi,
Diana Borza,
João Neves,
Aruna Kumar,
Hugo Proença
Abstract:
The automatic characterization of pedestrians in surveillance footage is a tough challenge, particularly when the data is extremely diverse with cluttered backgrounds, and subjects are captured from varying distances, under multiple poses, with partial occlusion. Having observed that the state-of-the-art performance is still unsatisfactory, this paper provides a novel solution to the problem, with…
▽ More
The automatic characterization of pedestrians in surveillance footage is a tough challenge, particularly when the data is extremely diverse with cluttered backgrounds, and subjects are captured from varying distances, under multiple poses, with partial occlusion. Having observed that the state-of-the-art performance is still unsatisfactory, this paper provides a novel solution to the problem, with two-fold contributions: 1) considering the strong semantic correlation between the different full-body attributes, we propose a multi-task deep model that uses an element-wise multiplication layer to extract more comprehensive feature representations. In practice, this layer serves as a filter to remove irrelevant background features, and is particularly important to handle complex, cluttered data; and 2) we introduce a weighted-sum term to the loss function that not only relativizes the contribution of each task (kind of attributed) but also is crucial for performance improvement in multiple-attribute inference settings. Our experiments were performed on two well-known datasets (RAP and PETA) and point for the superiority of the proposed method with respect to the state-of-the-art. The code is available at https://github.com/Ehsan-Yaghoubi/MAN-PAR-.
△ Less
Submitted 2 April, 2020;
originally announced April 2020.
-
GANprintR: Improved Fakes and Evaluation of the State of the Art in Face Manipulation Detection
Authors:
João C. Neves,
Ruben Tolosana,
Ruben Vera-Rodriguez,
Vasco Lopes,
Hugo Proença,
Julian Fierrez
Abstract:
The availability of large-scale facial databases, together with the remarkable progresses of deep learning technologies, in particular Generative Adversarial Networks (GANs), have led to the generation of extremely realistic fake facial content, raising obvious concerns about the potential for misuse. Such concerns have fostered the research on manipulation detection methods that, contrary to huma…
▽ More
The availability of large-scale facial databases, together with the remarkable progresses of deep learning technologies, in particular Generative Adversarial Networks (GANs), have led to the generation of extremely realistic fake facial content, raising obvious concerns about the potential for misuse. Such concerns have fostered the research on manipulation detection methods that, contrary to humans, have already achieved astonishing results in various scenarios. In this study, we focus on the synthesis of entire facial images, which is a specific type of facial manipulation. The main contributions of this study are four-fold: i) a novel strategy to remove GAN "fingerprints" from synthetic fake images based on autoencoders is described, in order to spoof facial manipulation detection systems while kee** the visual quality of the resulting images; ii) an in-depth analysis of the recent literature in facial manipulation detection; iii) a complete experimental assessment of this type of facial manipulation, considering the state-of-the-art fake detection systems (based on holistic deep networks, steganalysis, and local artifacts), remarking how challenging is this task in unconstrained scenarios; and finally iv) we announce a novel public database, named iFakeFaceDB, yielding from the application of our proposed GAN-fingerprint Removal approach (GANprintR) to already very realistic synthetic fake images.
The results obtained in our empirical evaluation show that additional efforts are required to develop robust facial manipulation detection systems against unseen conditions and spoof techniques, such as the one proposed in this study.
△ Less
Submitted 1 July, 2020; v1 submitted 13 November, 2019;
originally announced November 2019.
-
Vanishing ideals over graphs and even cycles
Authors:
Jorge Neves,
Maria Vaz Pinto,
Rafael H. Villarreal
Abstract:
Let X be an algebraic toric set in a projective space over a finite field. We study the vanishing ideal, I(X), of X and show some useful degree bounds for a minimal set of generators of I(X). We give an explicit description of a set of generators of I(X), when X is the algebraic toric set associated to an even cycle or to a connected bipartite graph with pairwise disjoint even cycles. In this case…
▽ More
Let X be an algebraic toric set in a projective space over a finite field. We study the vanishing ideal, I(X), of X and show some useful degree bounds for a minimal set of generators of I(X). We give an explicit description of a set of generators of I(X), when X is the algebraic toric set associated to an even cycle or to a connected bipartite graph with pairwise disjoint even cycles. In this case, a fomula for the regularity of I(X) is given. We show an upper bound for this invariant, when X is associated to a (not necessarily connected) bipartite graph. The upper bound is sharp if the graph is connected. We are able to show a formula for the length of the parameterized linear code associated with any graph, in terms of the number of bipartite and non-bipartite components.
△ Less
Submitted 8 March, 2012; v1 submitted 27 November, 2011;
originally announced November 2011.