Search | arXiv e-print repository

McCatch: Scalable Microcluster Detection in Dimensional and Nondimensional Datasets

Authors: Braulio V. Sánchez Vinces, Robson L. F. Cordeiro, Christos Faloutsos

Abstract: How could we have an outlier detector that works even with nondimensional data, and ranks together both singleton microclusters ('one-off' outliers) and nonsingleton microclusters by their anomaly scores? How to obtain scores that are principled in one scalable and 'hands-off' manner? Microclusters of outliers indicate coalition or repetition in fraud activities, etc.; their identification is thus… ▽ More How could we have an outlier detector that works even with nondimensional data, and ranks together both singleton microclusters ('one-off' outliers) and nonsingleton microclusters by their anomaly scores? How to obtain scores that are principled in one scalable and 'hands-off' manner? Microclusters of outliers indicate coalition or repetition in fraud activities, etc.; their identification is thus highly desirable. This paper presents McCatch: a new algorithm that detects microclusters by leveraging our proposed 'Oracle' plot (1NN Distance versus Group 1NN Distance). We study 31 real and synthetic datasets with up to 1M data elements to show that McCatch is the only method that answers both of the questions above; and, it outperforms 11 other methods, especially when the data has nonsingleton microclusters or is nondimensional. We also showcase McCatch's ability to detect meaningful microclusters in graphs, fingerprints, logs of network connections, text data, and satellite imagery. For example, it found a 30-elements microcluster of confirmed 'Denial of Service' attacks in the network logs, taking only ~3 minutes for 222K data elements on a stock desktop. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: \c{opyright} 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2401.12712 [pdf, ps, other]

Moutard hyperquadrics and generalized Darboux directions

Authors: Fernanda Py Silva Cordeiro, Marcos Craizer

Abstract: The higher order contact of a quadric with a surface in $3$-space at a non-degenerate point is obtained by the Moutard quadric in the Darboux direction. In this paper, we discuss the extension of this result to hypersurfaces in arbitrary dimensions. The higher order contact of a quadric with a surface in $3$-space at a non-degenerate point is obtained by the Moutard quadric in the Darboux direction. In this paper, we discuss the extension of this result to hypersurfaces in arbitrary dimensions. △ Less

Submitted 6 May, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

Comments: 10 pages

MSC Class: 53A15; 53A20

arXiv:2308.05820 [pdf, other]

Recognizing Handwritten Mathematical Expressions of Vertical Addition and Subtraction

Authors: Daniel Rosa, Filipe R. Cordeiro, Ruan Carvalho, Everton Souza, Sergio Chevtchenko, Luiz Rodrigues, Marcelo Marinho, Thales Vieira, Valmir Macario

Abstract: Handwritten Mathematical Expression Recognition (HMER) is a challenging task with many educational applications. Recent methods for HMER have been developed for complex mathematical expressions in standard horizontal format. However, solutions for elementary mathematical expression, such as vertical addition and subtraction, have not been explored in the literature. This work proposes a new handwr… ▽ More Handwritten Mathematical Expression Recognition (HMER) is a challenging task with many educational applications. Recent methods for HMER have been developed for complex mathematical expressions in standard horizontal format. However, solutions for elementary mathematical expression, such as vertical addition and subtraction, have not been explored in the literature. This work proposes a new handwritten elementary mathematical expression dataset composed of addition and subtraction expressions in a vertical format. We also extended the MNIST dataset to generate artificial images with this structure. Furthermore, we proposed a solution for offline HMER, able to recognize vertical addition and subtraction expressions. Our analysis evaluated the object detection algorithms YOLO v7, YOLO v8, YOLO-NAS, NanoDet and FCOS for identifying the mathematical symbols. We also proposed a transcription method to map the bounding boxes from the object detection stage to a mathematical expression in the LATEX markup sequence. Results show that our approach is efficient, achieving a high expression recognition rate. The code and dataset are available at https://github.com/Danielgol/HME-VAS △ Less

Submitted 10 August, 2023; originally announced August 2023.

Comments: Paper accepted at SIBGRAPI 2023

arXiv:2308.03486 [pdf, other]

Improving Mass Detection in Mammography Images: A Study of Weakly Supervised Learning and Class Activation Map Methods

Authors: Vicente Sampaio, Filipe R. Cordeiro

Abstract: In recent years, weakly supervised models have aided in mass detection using mammography images, decreasing the need for pixel-level annotations. However, most existing models in the literature rely on Class Activation Maps (CAM) as the activation method, overlooking the potential benefits of exploring other activation techniques. This work presents a study that explores and compares different act… ▽ More In recent years, weakly supervised models have aided in mass detection using mammography images, decreasing the need for pixel-level annotations. However, most existing models in the literature rely on Class Activation Maps (CAM) as the activation method, overlooking the potential benefits of exploring other activation techniques. This work presents a study that explores and compares different activation maps in conjunction with state-of-the-art methods for weakly supervised training in mammography images. Specifically, we investigate CAM, GradCAM, GradCAM++, XGradCAM, and LayerCAM methods within the framework of the GMIC model for mass detection in mammography images. The evaluation is conducted on the VinDr-Mammo dataset, utilizing the metrics Accuracy, True Positive Rate (TPR), False Negative Rate (FNR), and False Positive Per Image (FPPI). Results show that using different strategies of activation maps during training and test stages leads to an improvement of the model. With this strategy, we improve the results of the GMIC method, decreasing the FPPI value and increasing TPR. △ Less

Submitted 7 August, 2023; originally announced August 2023.

Comments: Accepted for publication at SIBGRAPI 20203

arXiv:2307.05795 [pdf]

Research Protocol for the Google Health Digital Well-being Study

Authors: Daniel McDuff, Andrew Barakat, Ari Winbush, Allen Jiang, Felicia Cordeiro, Ryann Crowley, Lauren E. Kahn, John Hernandez, Nicholas B. Allen

Abstract: The impact of digital device use on health and well-being is a pressing question to which individuals, families, schools, policy makers, legislators, and digital designers are all demanding answers. However, the scientific literature on this topic to date is marred by small and/or unrepresentative samples, poor measurement of core constructs (e.g., device use, smartphone addiction), and a limited… ▽ More The impact of digital device use on health and well-being is a pressing question to which individuals, families, schools, policy makers, legislators, and digital designers are all demanding answers. However, the scientific literature on this topic to date is marred by small and/or unrepresentative samples, poor measurement of core constructs (e.g., device use, smartphone addiction), and a limited ability to address the psychological and behavioral mechanisms that may underlie the relationships between device use and well-being. A number of recent authoritative reviews have made urgent calls for future research projects to address these limitations. The critical role of research is to identify which patterns of use are associated with benefits versus risks, and who is more vulnerable to harmful versus beneficial outcomes, so that we can pursue evidence-based product design, education, and regulation aimed at maximizing benefits and minimizing risks of smartphones and other digital devices. We describe a protocol for a Digital Well-Being (DWB) study to help answer these questions. △ Less

Submitted 11 July, 2023; originally announced July 2023.

arXiv:2210.08212 [pdf, other]

D.MCA: Outlier Detection with Explicit Micro-Cluster Assignments

Authors: Shuli Jiang, Robson Leonardo Ferreira Cordeiro, Leman Akoglu

Abstract: How can we detect outliers, both scattered and clustered, and also explicitly assign them to respective micro-clusters, without knowing apriori how many micro-clusters exist? How can we perform both tasks in-house, i.e., without any post-hoc processing, so that both detection and assignment can benefit simultaneously from each other? Presenting outliers in separate micro-clusters is informative to… ▽ More How can we detect outliers, both scattered and clustered, and also explicitly assign them to respective micro-clusters, without knowing apriori how many micro-clusters exist? How can we perform both tasks in-house, i.e., without any post-hoc processing, so that both detection and assignment can benefit simultaneously from each other? Presenting outliers in separate micro-clusters is informative to analysts in many real-world applications. However, a naïve solution based on post-hoc clustering of the outliers detected by any existing method suffers from two main drawbacks: (a) appropriate hyperparameter values are commonly unknown for clustering, and most algorithms struggle with clusters of varying shapes and densities; (b) detection and assignment cannot benefit from one another. In this paper, we propose D.MCA to $\underline{D}$etect outliers with explicit $\underline{M}$icro-$\underline{C}$luster $\underline{A}$ssignment. Our method performs both detection and assignment iteratively, and in-house, by using a novel strategy that prunes entire micro-clusters out of the training set to improve the performance of the detection. It also benefits from a novel strategy that avoids clustered outliers to mask each other, which is a well-known problem in the literature. Also, D.MCA is designed to be robust to a critical hyperparameter by employing a hyperensemble "warm up" phase. Experiments performed on 16 real-world and synthetic datasets demonstrate that D.MCA outperforms 8 state-of-the-art competitors, especially on the explicit outlier micro-cluster assignment task. △ Less

Submitted 15 October, 2022; originally announced October 2022.

Comments: Proceedings of the 22nd IEEE International Conference on Data Mining (ICDM 2022)

arXiv:2208.11176 [pdf, other]

doi 10.1109/SIBGRAPI55357.2022.9991791

A Study on the Impact of Data Augmentation for Training Convolutional Neural Networks in the Presence of Noisy Labels

Authors: Emeson Santana, Gustavo Carneiro, Filipe R. Cordeiro

Abstract: Label noise is common in large real-world datasets, and its presence harms the training process of deep neural networks. Although several works have focused on the training strategies to address this problem, there are few studies that evaluate the impact of data augmentation as a design choice for training deep neural networks. In this work, we analyse the model robustness when using different da… ▽ More Label noise is common in large real-world datasets, and its presence harms the training process of deep neural networks. Although several works have focused on the training strategies to address this problem, there are few studies that evaluate the impact of data augmentation as a design choice for training deep neural networks. In this work, we analyse the model robustness when using different data augmentations and their improvement on the training with the presence of noisy labels. We evaluate state-of-the-art and classical data augmentation strategies with different levels of synthetic noise for the datasets MNist, CIFAR-10, CIFAR-100, and the real-world dataset Clothing1M. We evaluate the methods using the accuracy metric. Results show that the appropriate selection of data augmentation can drastically improve the model robustness to label noise, increasing up to 177.84% of relative best test accuracy compared to the baseline with no augmentation, and an increase of up to 6% in absolute value with the state-of-the-art DivideMix training strategy. △ Less

Submitted 7 August, 2023; v1 submitted 23 August, 2022; originally announced August 2022.

Comments: Paper accepted at SIBGRAPI 2022

arXiv:2110.11809 [pdf, other]

PropMix: Hard Sample Filtering and Proportional MixUp for Learning with Noisy Labels

Authors: Filipe R. Cordeiro, Vasileios Belagiannis, Ian Reid, Gustavo Carneiro

Abstract: The most competitive noisy label learning methods rely on an unsupervised classification of clean and noisy samples, where samples classified as noisy are re-labelled and "MixMatched" with the clean samples. These methods have two issues in large noise rate problems: 1) the noisy set is more likely to contain hard samples that are in-correctly re-labelled, and 2) the number of samples produced by… ▽ More The most competitive noisy label learning methods rely on an unsupervised classification of clean and noisy samples, where samples classified as noisy are re-labelled and "MixMatched" with the clean samples. These methods have two issues in large noise rate problems: 1) the noisy set is more likely to contain hard samples that are in-correctly re-labelled, and 2) the number of samples produced by MixMatch tends to be reduced because it is constrained by the small clean set size. In this paper, we introduce the learning algorithm PropMix to handle the issues above. PropMix filters out hard noisy samples, with the goal of increasing the likelihood of correctly re-labelling the easy noisy samples. Also, PropMix places clean and re-labelled easy noisy samples in a training set that is augmented with MixUp, removing the clean set size constraint and including a large proportion of correctly re-labelled easy noisy samples. We also include self-supervised pre-training to improve robustness to high noisy label scenarios. Our experiments show that PropMix has state-of-the-art (SOTA) results on CIFAR-10/-100(with symmetric, asymmetric and semantic label noise), Red Mini-ImageNet (from the Controlled Noisy Web Labels), Clothing1M and WebVision. In severe label noise bench-marks, our results are substantially better than other methods. The code is available athttps://github.com/filipe-research/PropMix. △ Less

Submitted 22 October, 2021; originally announced October 2021.

Comments: Paper accepted at BMVC'21: The 32nd British Machine Vision Conference

arXiv:2110.08257 [pdf, other]

C-AllOut: Catching & Calling Outliers by Type

Authors: Guilherme D. F. Silva, Leman Akoglu, Robson L. F. Cordeiro

Abstract: Given an unlabeled dataset, wherein we have access only to pairwise similarities (or distances), how can we effectively (1) detect outliers, and (2) annotate/tag the outliers by type? Outlier detection has a large literature, yet we find a key gap in the field: to our knowledge, no existing work addresses the outlier annotation problem. Outliers are broadly classified into 3 types, representing di… ▽ More Given an unlabeled dataset, wherein we have access only to pairwise similarities (or distances), how can we effectively (1) detect outliers, and (2) annotate/tag the outliers by type? Outlier detection has a large literature, yet we find a key gap in the field: to our knowledge, no existing work addresses the outlier annotation problem. Outliers are broadly classified into 3 types, representing distinct patterns that could be valuable to analysts: (a) global outliers are severe yet isolate cases that do not repeat, e.g., a data collection error; (b) local outliers diverge from their peers within a context, e.g., a particularly short basketball player; and (c) collective outliers are isolated micro-clusters that may indicate coalition or repetitions, e.g., frauds that exploit the same loophole. This paper presents C-AllOut: a novel and effective outlier detector that annotates outliers by type. It is parameter-free and scalable, besides working only with pairwise similarities (or distances) when it is needed. We show that C-AllOut achieves on par or significantly better performance than state-of-the-art detectors when spotting outliers regardless of their type. It is also highly effective in annotating outliers of particular types, a task that none of the baselines can perform. △ Less

Submitted 13 October, 2021; originally announced October 2021.

Comments: 9+4 pages, 3 figures, 11 tables

arXiv:2109.02174 [pdf, other]

doi 10.34117/bjdv8n2-110

Performance evaluation in the reconstruction of 2D images of computed tomography using massively parallel programming CUDA

Authors: Alexssandro Ferreira Cordeiro, Pedro Luiz de Paula Filho, Hamilton Pereira da Silva, Arnaldo Candido Junior, Edresson Casanova, Jandrei Sartori Spancerski

Abstract: Analysis of processing time and similarity of images generated between CPU and GPU architectures and sequential and parallel programming. For image processing a computer with AMD FX-8350 processor and an Nvidia GTX 960 Maxwell GPU was used, along with the CUDAFY library and the programming language C\# with the IDE Visual studio. The results of the comparisons indicate that the form of sequential… ▽ More Analysis of processing time and similarity of images generated between CPU and GPU architectures and sequential and parallel programming. For image processing a computer with AMD FX-8350 processor and an Nvidia GTX 960 Maxwell GPU was used, along with the CUDAFY library and the programming language C\# with the IDE Visual studio. The results of the comparisons indicate that the form of sequential programming in a CPU generates reliable images at a high custom of time when compared to the forms of parallel programming in CPU and GPU. While parallel programming generates faster results, but with increased noise in the reconstructed image. For data types float a GPU obtained best result with average time equivalent to 1/3 of the processor, however the data is of type double the parallel CPU approach obtained the best performance. For the float data type, the GPU had the best average time performance, while for the double data type the best average time performance was for the parallel approach CPU. Regarding image quality, the sequential approach obtained similar outputs, while the parallel approaches generated noise in their outputs. △ Less

Submitted 5 September, 2021; originally announced September 2021.

arXiv:2103.11395 [pdf, other]

ScanMix: Learning from Severe Label Noise via Semantic Clustering and Semi-Supervised Learning

Authors: Ragav Sachdeva, Filipe R Cordeiro, Vasileios Belagiannis, Ian Reid, Gustavo Carneiro

Abstract: We propose a new training algorithm, ScanMix, that explores semantic clustering and semi-supervised learning (SSL) to allow superior robustness to severe label noise and competitive robustness to non-severe label noise problems, in comparison to the state of the art (SOTA) methods. ScanMix is based on the expectation maximisation framework, where the E-step estimates the latent variable to cluster… ▽ More We propose a new training algorithm, ScanMix, that explores semantic clustering and semi-supervised learning (SSL) to allow superior robustness to severe label noise and competitive robustness to non-severe label noise problems, in comparison to the state of the art (SOTA) methods. ScanMix is based on the expectation maximisation framework, where the E-step estimates the latent variable to cluster the training images based on their appearance and classification results, and the M-step optimises the SSL classification and learns effective feature representations via semantic clustering. We present a theoretical result that shows the correctness and convergence of ScanMix, and an empirical result that shows that ScanMix has SOTA results on CIFAR-10/-100 (with symmetric, asymmetric and semantic label noise), Red Mini-ImageNet (from the Controlled Noisy Web Labels), Clothing1M and WebVision. In all benchmarks with severe label noise, our results are competitive to the current SOTA. △ Less

Submitted 16 October, 2022; v1 submitted 21 March, 2021; originally announced March 2021.

Comments: Paper accepted at Pattern Recognition

arXiv:2103.04173 [pdf, other]

doi 10.1016/j.patcog.2022.109013

LongReMix: Robust Learning with High Confidence Samples in a Noisy Label Environment

Authors: Filipe R. Cordeiro, Ragav Sachdeva, Vasileios Belagiannis, Ian Reid, Gustavo Carneiro

Abstract: Deep neural network models are robust to a limited amount of label noise, but their ability to memorise noisy labels in high noise rate problems is still an open issue. The most competitive noisy-label learning algorithms rely on a 2-stage process comprising an unsupervised learning to classify training samples as clean or noisy, followed by a semi-supervised learning that minimises the empirical… ▽ More Deep neural network models are robust to a limited amount of label noise, but their ability to memorise noisy labels in high noise rate problems is still an open issue. The most competitive noisy-label learning algorithms rely on a 2-stage process comprising an unsupervised learning to classify training samples as clean or noisy, followed by a semi-supervised learning that minimises the empirical vicinal risk (EVR) using a labelled set formed by samples classified as clean, and an unlabelled set with samples classified as noisy. In this paper, we hypothesise that the generalisation of such 2-stage noisy-label learning methods depends on the precision of the unsupervised classifier and the size of the training set to minimise the EVR. We empirically validate these two hypotheses and propose the new 2-stage noisy-label training algorithm LongReMix. We test LongReMix on the noisy-label benchmarks CIFAR-10, CIFAR-100, WebVision, Clothing1M, and Food101-N. The results show that our LongReMix generalises better than competing approaches, particularly in high label noise problems. Furthermore, our approach achieves state-of-the-art performance in most datasets. The code is available at https://github.com/filipe-research/LongReMix. △ Less

Submitted 4 September, 2022; v1 submitted 6 March, 2021; originally announced March 2021.

Comments: Published at Pattern Recognition 2022

arXiv:2103.03629 [pdf, other]

Self-supervised Mean Teacher for Semi-supervised Chest X-ray Classification

Authors: Fengbei Liu, Yu Tian, Filipe R. Cordeiro, Vasileios Belagiannis, Ian Reid, Gustavo Carneiro

Abstract: The training of deep learning models generally requires a large amount of annotated data for effective convergence and generalisation. However, obtaining high-quality annotations is a laboursome and expensive process due to the need of expert radiologists for the labelling task. The study of semi-supervised learning in medical image analysis is then of crucial importance given that it is much less… ▽ More The training of deep learning models generally requires a large amount of annotated data for effective convergence and generalisation. However, obtaining high-quality annotations is a laboursome and expensive process due to the need of expert radiologists for the labelling task. The study of semi-supervised learning in medical image analysis is then of crucial importance given that it is much less expensive to obtain unlabelled images than to acquire images labelled by expert radiologists. Essentially, semi-supervised methods leverage large sets of unlabelled data to enable better training convergence and generalisation than using only the small set of labelled images. In this paper, we propose Self-supervised Mean Teacher for Semi-supervised (S$^2$MTS$^2$) learning that combines self-supervised mean-teacher pre-training with semi-supervised fine-tuning. The main innovation of S$^2$MTS$^2$ is the self-supervised mean-teacher pre-training based on the joint contrastive learning, which uses an infinite number of pairs of positive query and key features to improve the mean-teacher representation. The model is then fine-tuned using the exponential moving average teacher framework trained with semi-supervised learning. We validate S$^2$MTS$^2$ on the multi-label classification problems from Chest X-ray14 and CheXpert, and the multi-class classification from ISIC2018, where we show that it outperforms the previous SOTA semi-supervised learning methods by a large margin. △ Less

Submitted 4 November, 2021; v1 submitted 5 March, 2021; originally announced March 2021.

Comments: MLMI-MICCAI 2021

arXiv:2012.03087 [pdf, other]

MyFood: A Food Segmentation and Classification System to Aid Nutritional Monitoring

Authors: Charles N. C. Freitas, Filipe R. Cordeiro, Valmir Macario

Abstract: The absence of food monitoring has contributed significantly to the increase in the population's weight. Due to the lack of time and busy routines, most people do not control and record what is consumed in their diet. Some solutions have been proposed in computer vision to recognize food images, but few are specialized in nutritional monitoring. This work presents the development of an intelligent… ▽ More The absence of food monitoring has contributed significantly to the increase in the population's weight. Due to the lack of time and busy routines, most people do not control and record what is consumed in their diet. Some solutions have been proposed in computer vision to recognize food images, but few are specialized in nutritional monitoring. This work presents the development of an intelligent system that classifies and segments food presented in images to help the automatic monitoring of user diet and nutritional intake. This work shows a comparative study of state-of-the-art methods for image classification and segmentation, applied to food recognition. In our methodology, we compare the FCN, ENet, SegNet, DeepLabV3+, and Mask RCNN algorithms. We build a dataset composed of the most consumed Brazilian food types, containing nine classes and a total of 1250 images. The models were evaluated using the following metrics: Intersection over Union, Sensitivity, Specificity, Balanced Precision, and Positive Predefined Value. We also propose an system integrated into a mobile application that automatically recognizes and estimates the nutrients in a meal, assisting people with better nutritional monitoring. The proposed solution showed better results than the existing ones in the market. The dataset is publicly available at the following link http://doi.org/10.5281/zenodo.4041488 △ Less

Submitted 5 December, 2020; originally announced December 2020.

Comments: Paper published at SIBRAPI 2020 (Camera ready version)

Journal ref: 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)

arXiv:2012.03061 [pdf, other]

A Survey on Deep Learning with Noisy Labels: How to train your model when you cannot trust on the annotations?

Authors: Filipe R. Cordeiro, Gustavo Carneiro

Abstract: Noisy Labels are commonly present in data sets automatically collected from the internet, mislabeled by non-specialist annotators, or even specialists in a challenging task, such as in the medical field. Although deep learning models have shown significant improvements in different domains, an open issue is their ability to memorize noisy labels during training, reducing their generalization poten… ▽ More Noisy Labels are commonly present in data sets automatically collected from the internet, mislabeled by non-specialist annotators, or even specialists in a challenging task, such as in the medical field. Although deep learning models have shown significant improvements in different domains, an open issue is their ability to memorize noisy labels during training, reducing their generalization potential. As deep learning models depend on correctly labeled data sets and label correctness is difficult to guarantee, it is crucial to consider the presence of noisy labels for deep learning training. Several approaches have been proposed in the literature to improve the training of deep learning models in the presence of noisy labels. This paper presents a survey on the main techniques in literature, in which we classify the algorithm in the following groups: robust losses, sample weighting, sample selection, meta-learning, and combined approaches. We also present the commonly used experimental setup, data sets, and results of the state-of-the-art models. △ Less

Submitted 5 December, 2020; originally announced December 2020.

Comments: Paper published at SIBRAPI, 2020 (camera ready version)

Journal ref: 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)

arXiv:2011.05704 [pdf, other]

EvidentialMix: Learning with Combined Open-set and Closed-set Noisy Labels

Authors: Ragav Sachdeva, Filipe R. Cordeiro, Vasileios Belagiannis, Ian Reid, Gustavo Carneiro

Abstract: The efficacy of deep learning depends on large-scale data sets that have been carefully curated with reliable data acquisition and annotation processes. However, acquiring such large-scale data sets with precise annotations is very expensive and time-consuming, and the cheap alternatives often yield data sets that have noisy labels. The field has addressed this problem by focusing on training mode… ▽ More The efficacy of deep learning depends on large-scale data sets that have been carefully curated with reliable data acquisition and annotation processes. However, acquiring such large-scale data sets with precise annotations is very expensive and time-consuming, and the cheap alternatives often yield data sets that have noisy labels. The field has addressed this problem by focusing on training models under two types of label noise: 1) closed-set noise, where some training samples are incorrectly annotated to a training label other than their known true class; and 2) open-set noise, where the training set includes samples that possess a true class that is (strictly) not contained in the set of known training labels. In this work, we study a new variant of the noisy label problem that combines the open-set and closed-set noisy labels, and introduce a benchmark evaluation to assess the performance of training algorithms under this setup. We argue that such problem is more general and better reflects the noisy label scenarios in practice. Furthermore, we propose a novel algorithm, called EvidentialMix, that addresses this problem and compare its performance with the state-of-the-art methods for both closed-set and open-set noise on the proposed benchmark. Our results show that our method produces superior classification results and better feature representations than previous state-of-the-art methods. The code is available at https://github.com/ragavsachdeva/EvidentialMix. △ Less

Submitted 11 November, 2020; originally announced November 2020.

Comments: Paper accepted at WACV'21: Winter Conference on Applications of Computer Vision

arXiv:1801.01443 [pdf, other]

doi 10.1016/j.eswa.2016.08.016

A semi-supervised fuzzy GrowCut algorithm to segment and classify regions of interest of mammographic images

Authors: Filipe Rolim Cordeiro, Wellington Pinheiro dos Santos, Abel Guilhermino da Silva Filho

Abstract: According to the World Health Organization, breast cancer is the most common form of cancer in women. It is the second leading cause of death among women round the world, becoming the most fatal form of cancer. Mammographic image segmentation is a fundamental task to support image analysis and diagnosis, taking into account shape analysis of mammary lesions and their borders. However, mammogram se… ▽ More According to the World Health Organization, breast cancer is the most common form of cancer in women. It is the second leading cause of death among women round the world, becoming the most fatal form of cancer. Mammographic image segmentation is a fundamental task to support image analysis and diagnosis, taking into account shape analysis of mammary lesions and their borders. However, mammogram segmentation is a very hard process, once it is highly dependent on the types of mammary tissues. In this work we present a new semi-supervised segmentation algorithm based on the modification of the GrowCut algorithm to perform automatic mammographic image segmentation once a region of interest is selected by a specialist. In our proposal, we used fuzzy Gaussian membership functions to modify the evolution rule of the original GrowCut algorithm, in order to estimate the uncertainty of a pixel being object or background. The main impact of the proposed method is the significant reduction of expert effort in the initialization of seed points of GrowCut to perform accurate segmentation, once it removes the need of selection of background seeds. We also constructed an automatic point selection process based on the simulated annealing optimization method, avoiding the need of human intervention. The proposed approach was qualitatively compared with other state-of-the-art segmentation techniques, considering the shape of segmented regions. In order to validate our proposal, we built an image classifier using a classical multilayer perceptron. We used Zernike moments to extract segmented image features. This analysis employed 685 mammograms from IRMA breast cancer database, using fat and fibroid tissues. Results show that the proposed technique could achieve a classification rate of 91.28\% for fat tissues, evidencing the feasibility of our approach. △ Less

Submitted 3 December, 2017; originally announced January 2018.

Journal ref: Expert Systems With Applications, 65 (2016), 116-126

arXiv:1712.07312 [pdf, other]

doi 10.1080/21681163.2015.1127775

Analysis of supervised and semi-supervised GrowCut applied to segmentation of masses in mammography images

Authors: Filipe Rolim Cordeiro, Wellington Pinheiro dos Santos, Abel Guilhermino da Silva Filho

Abstract: Breast cancer is already one of the most common form of cancer worldwide. Mammography image analysis is still the most effective diagnostic method to promote the early detection of breast cancer. Accurately segmenting tumors in digital mammography images is important to improve diagnosis capabilities of health specialists and avoid misdiagnosis. In this work, we evaluate the feasibility of applyin… ▽ More Breast cancer is already one of the most common form of cancer worldwide. Mammography image analysis is still the most effective diagnostic method to promote the early detection of breast cancer. Accurately segmenting tumors in digital mammography images is important to improve diagnosis capabilities of health specialists and avoid misdiagnosis. In this work, we evaluate the feasibility of applying GrowCut to segment regions of tumor and we propose two GrowCut semi-supervised versions. All the analysis was performed by evaluating the application of segmentation techniques to a set of images obtained from the Mini-MIAS mammography image database. GrowCut segmentation was compared to Region Growing, Active Contours, Random Walks and Graph Cut techniques. Experiments showed that GrowCut, when compared to the other techniques, was able to acquire better results for the metrics analyzed. Moreover, the proposed semi-supervised versions of GrowCut was proved to have a clinically satisfactory quality of segmentation. △ Less

Submitted 19 December, 2017; originally announced December 2017.

Journal ref: Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, v. 5, p. 1-19, 2017

arXiv:1004.2799 [pdf, ps, other]

doi 10.1007/JHEP02(2011)093

Anomaly-Free Supersymmetric SO(2N+2)/U(N+1) sigma-Model Based on the SO(2N+1) Lie Algebra of the Fermion Operators

Authors: Seiya Nishiyama, Joao da Providencia, Constanca Providencia, Flavio Cordeiro

Abstract: The extended supersymmetric (SUSY) sigma-model has been proposed on the bases of SO(2N+1) Lie algebra spanned by fermion annihilation-creation operators and pair operators. The canonical transformation, extension of an SO(2N) Bogoliubov transformation to an SO(2N+1) group, is introduced. Embedding the SO(2N+1) group into an SO(2N+2) group and using SO(2N+2)/U(N+1) coset variables, we have investig… ▽ More The extended supersymmetric (SUSY) sigma-model has been proposed on the bases of SO(2N+1) Lie algebra spanned by fermion annihilation-creation operators and pair operators. The canonical transformation, extension of an SO(2N) Bogoliubov transformation to an SO(2N+1) group, is introduced. Embedding the SO(2N+1) group into an SO(2N+2) group and using SO(2N+2)/U(N+1) coset variables, we have investigated the SUSY sigma-model on the Kaehler manifold, the coset space SO(2N+2)/U(N+1). We have constructed the Killing potential, extension of the potential in the SO(2N)/U(N) coset space to that in the SO(2N+2)/U(N+1) coset space. It is equivalent to the generalized density matrix whose diagonal-block part is related to a reduced scalar potential with a Fayet-Ilipoulos term. The f-deformed reduced scalar potential is optimized with respect to vacuum expectation value of the sigma-model fields and a solution for one of the SO(2N+1) group parameters has been obtained. The solution, however, is only a small part of all solutions obtained from anomaly-free SUSY coset models. To construct the coset models consistently, we must embed a coset coordinate in an anomaly-free spinor representation (rep) of SO(2N+2) group and give corresponding Kaehler and Killing potentials for an anomaly-free SO(2N+2)/U(N+1) model based on each positive chiral spinor rep. Using such mathematical manipulation we construct successfully the anomaly-free SO(2N+2)/U(N+1) SUSY sigma-model and investigate new aspects which have never been seen in the SUSY sigma-model on the Kaehler coset space SO(2N)/U(N). We reach a f-deformed reduced scalar potential. It is minimized with respect to the vacuum expectation value of anomaly-free SUSY sigma-model fields. Thus we find an interesting f-deformed solution very different from the previous solution for an anomaly-free SO(2.5+2)/(SU(5+1)*U(1)) SUSY sigma-model. △ Less

Submitted 21 October, 2010; v1 submitted 16 April, 2010; originally announced April 2010.

Comments: 24 pages, no fiures

Journal ref: JHEP 1102:093,2011

arXiv:0912.0688 [pdf, ps, other]

doi 10.1142/S0219887810004439

Reduction and construction of Poisson quasi-Nijenhuis manifolds with background

Authors: Flavio Cordeiro, Joana M. Nunes da Costa

Abstract: We extend the Falceto-Zambon version of Marsden-Ratiu Poisson reduction to Poisson quasi-Nijenhuis structures with background on manifolds. We define gauge transformations of Poisson quasi-Nijenhuis structures with background, study some of their properties and show that they are compatible with reduction procedure. We use gauge transformations to construct Poisson quasi-Nijenhuis structures wit… ▽ More We extend the Falceto-Zambon version of Marsden-Ratiu Poisson reduction to Poisson quasi-Nijenhuis structures with background on manifolds. We define gauge transformations of Poisson quasi-Nijenhuis structures with background, study some of their properties and show that they are compatible with reduction procedure. We use gauge transformations to construct Poisson quasi-Nijenhuis structures with background. △ Less

Submitted 3 December, 2009; originally announced December 2009.

Comments: to appear in IJGMMP

arXiv:0909.3072 [pdf, ps, other]

doi 10.1016/j.aop.2009.04.009

The Bonn nuclear quark model revisited

Authors: Constança Providência, João da Providência, Flávio Cordeiro, Masatoshi Yamamura, Yasuhiko Tsue, Seiya Nishiyama

Abstract: We present the exact solutions to the equations of the lowest energy states of the colored and color-symmetric sectors of the Bonn quark model, which is SU(3) symmetric and is defined in terms of an effective pairing force with $su(4)$ algebraic structure. We show that the groundstate of the model is not color symmetrical except for a narrow interval in the range of possible quark numbers. We al… ▽ More We present the exact solutions to the equations of the lowest energy states of the colored and color-symmetric sectors of the Bonn quark model, which is SU(3) symmetric and is defined in terms of an effective pairing force with $su(4)$ algebraic structure. We show that the groundstate of the model is not color symmetrical except for a narrow interval in the range of possible quark numbers. We also study the performance of the Glauber coherent state, as well as of superconducting states of the BCS type, with respect to the description, not only of the absolute (colored) groundstate, but also of the minimum energy state of the color-symmetrical sector, finding that it is remarkably good. We use the model to discuss, in a schematic context, some controversial aspects of the conventional treatment of color superconductivity. △ Less

Submitted 16 September, 2009; originally announced September 2009.

Journal ref: Annals Phys.324:1666-1675,2009

arXiv:0901.3473 [pdf, ps, other]

doi 10.3842/SIGMA.2009.009

Self-Consistent-Field Method and $τ$-Functional Method on Group Manifold in Soliton Theory: a Review and New Results

Authors: Seiya Nishiyama, Joao da Providencia, Constanca Providencia, Flavio Cordeiro, Takao Komatsu

Abstract: The maximally-decoupled method has been considered as a theory to apply an basic idea of an integrability condition to certain multiple parametrized symmetries. The method is regarded as a mathematical tool to describe a symmetry of a collective submanifold in which a canonicity condition makes the collective variables to be an orthogonal coordinate-system. For this aim we adopt a concept of cur… ▽ More The maximally-decoupled method has been considered as a theory to apply an basic idea of an integrability condition to certain multiple parametrized symmetries. The method is regarded as a mathematical tool to describe a symmetry of a collective submanifold in which a canonicity condition makes the collective variables to be an orthogonal coordinate-system. For this aim we adopt a concept of curvature unfamiliar in the conventional time-dependent (TD) self-consistent field (SCF) theory. Our basic idea lies in the introduction of a sort of Lagrange manner familiar to fluid dynamics to describe a collective coordinate-system. This manner enables us to take a one-form which is linearly composed of a TD SCF Hamiltonian and infinitesimal generators induced by collective variable differentials of a canonical transformation on a group. The integrability condition of the system read the curvature C=0. Our method is constructed manifesting itself the structure of the group under consideration. >... △ Less

Submitted 22 January, 2009; originally announced January 2009.

Journal ref: SIGMA 5 (2009), 009, 76 pages

arXiv:0712.4208 [pdf, ps, other]

doi 10.1016/j.nuclphysb.2008.05.008

Extended Supersymmetric sigma-Model Based on the SO(2N+1) Lie Algebra of the Fermion Operators

Authors: Seiya Nishiyama, Joao da Providencia, Constanca Providencia, Flavio Cordeiro

Abstract: Extended supersymmetric sigma-model is given, standing on the SO(2N+1) Lie algebra of fermion operators composed of annihilation-creation operators and pair operators. Canonical transformation, the extension of the SO(2N) Bogoliubov transformation to the SO(2N+1) group, is introduced. Embedding the SO(2N+1) group into an SO(2N+2) group and using SO(2N+2)/U(N+1) coset variables, we investigate a… ▽ More Extended supersymmetric sigma-model is given, standing on the SO(2N+1) Lie algebra of fermion operators composed of annihilation-creation operators and pair operators. Canonical transformation, the extension of the SO(2N) Bogoliubov transformation to the SO(2N+1) group, is introduced. Embedding the SO(2N+1) group into an SO(2N+2) group and using SO(2N+2)/U(N+1) coset variables, we investigate a new aspect of the supersymmetric sigma-model on the Kaehler manifold of the symmetric space SO(2N+2)/U(N+1). We construct a Killing potential which is just the extension of the Killing potential in the SO(2N)/U(N) coset space given by van Holten et al. to that in the SO(2N+2)/U(N+1) coset space. To our great surprise, the Killing potential is equivalent with the generalized density matrix. Its diagonal-block matrix is related to a reduced scalar potential with a Fayet-Ilipoulos term. The reduced scalar potential is optimized in order to see the behaviour of the vacuum expectation value of the sigma-model fields and a proper solution for one of the SO(2N+1) group parameters is obtained. We give bosonization of the SO(2N+2) Lie operators, vacuum functions and differential forms for their bosons expressed in terms of the SO(2N+2)/U(N+1) coset variables, a U(1) phase and the corresponding Kaehler potential. △ Less

Submitted 27 December, 2007; originally announced December 2007.

Comments: 28 pages, submitted to Nucl. Phys. B

Journal ref: Nucl.Phys.B802:121-145,2008

Showing 1–23 of 23 results for author: Cordeiro, F