Search | arXiv e-print repository

FAU-Net: An Attention U-Net Extension with Feature Pyramid Attention for Prostate Cancer Segmentation

Authors: Pablo Cesar Quihui-Rubio, Daniel Flores-Araiza, Miguel Gonzalez-Mendoza, Christian Mata, Gilberto Ochoa-Ruiz

Abstract: This contribution presents a deep learning method for the segmentation of prostate zones in MRI images based on U-Net using additive and feature pyramid attention modules, which can improve the workflow of prostate cancer detection and diagnosis. The proposed model is compared to seven different U-Net-based architectures. The automatic segmentation performance of each model of the central zone (CZ… ▽ More This contribution presents a deep learning method for the segmentation of prostate zones in MRI images based on U-Net using additive and feature pyramid attention modules, which can improve the workflow of prostate cancer detection and diagnosis. The proposed model is compared to seven different U-Net-based architectures. The automatic segmentation performance of each model of the central zone (CZ), peripheral zone (PZ), transition zone (TZ) and Tumor were evaluated using Dice Score (DSC), and the Intersection over Union (IoU) metrics. The proposed alternative achieved a mean DSC of 84.15% and IoU of 76.9% in the test set, outperforming most of the studied models in this work except from R2U-Net and attention R2U-Net architectures. △ Less

Submitted 3 September, 2023; originally announced September 2023.

Comments: This paper has been accepted at the 22nd Mexican International Conference on Artificial Intelligence (MICAI 2023)

arXiv:2309.01318 [pdf, other]

An FPGA smart camera implementation of segmentation models for drone wildfire imagery

Authors: Eduardo Guarduño-Martinez, Jorge Ciprian-Sanchez, Gerardo Valente, Vazquez-Garcia, Gerardo Rodriguez-Hernandez, Adriana Palacios-Rosas, Lucile Rossi-Tisson, Gilberto Ochoa-Ruiz

Abstract: Wildfires represent one of the most relevant natural disasters worldwide, due to their impact on various societal and environmental levels. Thus, a significant amount of research has been carried out to investigate and apply computer vision techniques to address this problem. One of the most promising approaches for wildfire fighting is the use of drones equipped with visible and infrared cameras… ▽ More Wildfires represent one of the most relevant natural disasters worldwide, due to their impact on various societal and environmental levels. Thus, a significant amount of research has been carried out to investigate and apply computer vision techniques to address this problem. One of the most promising approaches for wildfire fighting is the use of drones equipped with visible and infrared cameras for the detection, monitoring, and fire spread assessment in a remote manner but in close proximity to the affected areas. However, implementing effective computer vision algorithms on board is often prohibitive since deploying full-precision deep learning models running on GPU is not a viable option, due to their high power consumption and the limited payload a drone can handle. Thus, in this work, we posit that smart cameras, based on low-power consumption field-programmable gate arrays (FPGAs), in tandem with binarized neural networks (BNNs), represent a cost-effective alternative for implementing onboard computing on the edge. Herein we present the implementation of a segmentation model applied to the Corsican Fire Database. We optimized an existing U-Net model for such a task and ported the model to an edge device (a Xilinx Ultra96-v2 FPGA). By pruning and quantizing the original model, we reduce the number of parameters by 90%. Furthermore, additional optimizations enabled us to increase the throughput of the original model from 8 frames per second (FPS) to 33.63 FPS without loss in the segmentation performance: our model obtained 0.912 in Matthews correlation coefficient (MCC),0.915 in F1 score and 0.870 in Hafiane quality index (HAF), and comparable qualitative segmentation results when contrasted to the original full-precision model. The final model was integrated into a low-cost FPGA, which was used to implement a neural network accelerator. △ Less

Submitted 3 September, 2023; originally announced September 2023.

Comments: This paper has been accepted at the 22nd Mexican International Conference on Artificial Intelligence (MICAI 2023)

arXiv:2308.04653 [pdf, other]

Assessing the performance of deep learning-based models for prostate cancer segmentation using uncertainty scores

Authors: Pablo Cesar Quihui-Rubio, Daniel Flores-Araiza, Gilberto Ochoa-Ruiz, Miguel Gonzalez-Mendoza, Christian Mata

Abstract: This study focuses on comparing deep learning methods for the segmentation and quantification of uncertainty in prostate segmentation from MRI images. The aim is to improve the workflow of prostate cancer detection and diagnosis. Seven different U-Net-based architectures, augmented with Monte-Carlo dropout, are evaluated for automatic segmentation of the central zone, peripheral zone, transition z… ▽ More This study focuses on comparing deep learning methods for the segmentation and quantification of uncertainty in prostate segmentation from MRI images. The aim is to improve the workflow of prostate cancer detection and diagnosis. Seven different U-Net-based architectures, augmented with Monte-Carlo dropout, are evaluated for automatic segmentation of the central zone, peripheral zone, transition zone, and tumor, with uncertainty estimation. The top-performing model in this study is the Attention R2U-Net, achieving a mean Intersection over Union (IoU) of 76.3% and Dice Similarity Coefficient (DSC) of 85% for segmenting all zones. Additionally, Attention R2U-Net exhibits the lowest uncertainty values, particularly in the boundaries of the transition zone and tumor, when compared to the other models. △ Less

Submitted 8 August, 2023; originally announced August 2023.

Comments: Article accepted at Cancer Prevention through early detecTion (CaPtTion) workshop at MICCAI 2023

arXiv:2307.07046 [pdf, other]

A metric learning approach for endoscopic kidney stone identification

Authors: Jorge Gonzalez-Zapata, Francisco Lopez-Tiro, Elias Villalvazo-Avila, Daniel Flores-Araiza, Jacques Hubert, Andres Mendez-Vazquez, Gilberto Ochoa-Ruiz, Christian Daul

Abstract: Several Deep Learning (DL) methods have recently been proposed for an automated identification of kidney stones during an ureteroscopy to enable rapid therapeutic decisions. Even if these DL approaches led to promising results, they are mainly appropriate for kidney stone types for which numerous labelled data are available. However, only few labelled images are available for some rare kidney ston… ▽ More Several Deep Learning (DL) methods have recently been proposed for an automated identification of kidney stones during an ureteroscopy to enable rapid therapeutic decisions. Even if these DL approaches led to promising results, they are mainly appropriate for kidney stone types for which numerous labelled data are available. However, only few labelled images are available for some rare kidney stone types. This contribution exploits Deep Metric Learning (DML) methods i) to handle such classes with few samples, ii) to generalize well to out of distribution samples, and iii) to cope better with new classes which are added to the database. The proposed Guided Deep Metric Learning approach is based on a novel architecture which was designed to learn data representations in an improved way. The solution was inspired by Few-Shot Learning (FSL) and makes use of a teacher-student approach. The teacher model (GEMINI) generates a reduced hypothesis space based on prior knowledge from the labeled data, and is used it as a guide to a student model (i.e., ResNet50) through a Knowledge Distillation scheme. Extensive tests were first performed on two datasets separately used for the recognition, namely a set of images acquired for the surfaces of the kidney stone fragments, and a set of images of the fragment sections. The proposed DML-approach improved the identification accuracy by 10% and 12% in comparison to DL-methods and other DML-approaches, respectively. Moreover, model embeddings from the two dataset types were merged in an organized way through a multi-view scheme to simultaneously exploit the information of surface and section fragments. Test with the resulting mixed model improves the identification accuracy by at least 3% and up to 30% with respect to DL-models and shallow machine learning methods, respectively. △ Less

Submitted 13 July, 2023; originally announced July 2023.

arXiv:2305.09062 [pdf, other]

SuSana Distancia is all you need: Enforcing class separability in metric learning via two novel distance-based loss functions for few-shot image classification

Authors: Mauricio Mendez-Ruiz, Jorge Gonzalez-Zapata, Ivan Reyes-Amezcua, Daniel Flores-Araiza, Francisco Lopez-Tiro, Andres Mendez-Vazquez, Gilberto Ochoa-Ruiz

Abstract: Few-shot learning is a challenging area of research that aims to learn new concepts with only a few labeled samples of data. Recent works based on metric-learning approaches leverage the meta-learning approach, which is encompassed by episodic tasks that make use a support (training) and query set (test) with the objective of learning a similarity comparison metric between those sets. Due to the l… ▽ More Few-shot learning is a challenging area of research that aims to learn new concepts with only a few labeled samples of data. Recent works based on metric-learning approaches leverage the meta-learning approach, which is encompassed by episodic tasks that make use a support (training) and query set (test) with the objective of learning a similarity comparison metric between those sets. Due to the lack of data, the learning process of the embedding network becomes an important part of the few-shot task. Previous works have addressed this problem using metric learning approaches, but the properties of the underlying latent space and the separability of the difference classes on it was not entirely enforced. In this work, we propose two different loss functions which consider the importance of the embedding vectors by looking at the intra-class and inter-class distance between the few data. The first loss function is the Proto-Triplet Loss, which is based on the original triplet loss with the modifications needed to better work on few-shot scenarios. The second loss function, which we dub ICNN loss is based on an inter and intra class nearest neighbors score, which help us to assess the quality of embeddings obtained from the trained network. Our results, obtained from a extensive experimental setup show a significant improvement in accuracy in the miniImagenNet benchmark compared to other metric-based few-shot learning methods by a margin of 2%, demonstrating the capability of these loss functions to allow the network to generalize better to previously unseen classes. In our experiments, we demonstrate competitive generalization capabilities to other domains, such as the Caltech CUB, Dogs and Cars datasets compared with the state of the art. △ Less

Submitted 18 May, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

Comments: Paper submitted to a journal for publication

arXiv:2304.05403 [pdf, other]

Isolated Sign Language Recognition based on Tree Structure Skeleton Images

Authors: David Laines, Gissella Bejarano, Miguel Gonzalez-Mendoza, Gilberto Ochoa-Ruiz

Abstract: Sign Language Recognition (SLR) systems aim to be embedded in video stream platforms to recognize the sign performed in front of a camera. SLR research has taken advantage of recent advances in pose estimation models to use skeleton sequences estimated from videos instead of RGB information to predict signs. This approach can make HAR-related tasks less complex and more robust to diverse backgroun… ▽ More Sign Language Recognition (SLR) systems aim to be embedded in video stream platforms to recognize the sign performed in front of a camera. SLR research has taken advantage of recent advances in pose estimation models to use skeleton sequences estimated from videos instead of RGB information to predict signs. This approach can make HAR-related tasks less complex and more robust to diverse backgrounds, lightning conditions, and physical appearances. In this work, we explore the use of a spatio-temporal skeleton representation such as Tree Structure Skeleton Image (TSSI) as an alternative input to improve the accuracy of skeleton-based models for SLR. TSSI converts a skeleton sequence into an RGB image where the columns represent the joints of the skeleton in a depth-first tree traversal order, the rows represent the temporal evolution of the joints, and the three channels represent the (x, y, z) coordinates of the joints. We trained a DenseNet-121 using this type of input and compared it with other skeleton-based deep learning methods using a large-scale American Sign Language (ASL) dataset, WLASL. Our model (SL-TSSI-DenseNet) overcomes the state-of-the-art of other skeleton-based models. Moreover, when including data augmentation our proposal achieves better results than both skeleton-based and RGB-based models. We evaluated the effectiveness of our model on the Ankara University Turkish Sign Language (TSL) dataset, AUTSL, and a Mexican Sign Language (LSM) dataset. On the AUTSL dataset, the model achieves similar results to the state-of-the-art of other skeleton-based models. On the LSM dataset, the model achieves higher results than the baseline. Code has been made available at: https://github.com/davidlainesv/SL-TSSI-DenseNet. △ Less

Submitted 9 April, 2023; originally announced April 2023.

Comments: This paper has been accepted the the LatinX in Computer Vision Research Workshop at CVPR 2023

arXiv:2304.04077 [pdf, other]

Deep Prototypical-Parts Ease Morphological Kidney Stone Identification and are Competitively Robust to Photometric Perturbations

Authors: Daniel Flores-Araiza, Francisco Lopez-Tiro, Jonathan El-Beze, Jacques Hubert, Miguel Gonzalez-Mendoza, Gilberto Ochoa-Ruiz, Christian Daul

Abstract: Identifying the type of kidney stones can allow urologists to determine their cause of formation, improving the prescription of appropriate treatments to diminish future relapses. Currently, the associated ex-vivo diagnosis (known as Morpho-constitutional Analysis, MCA) is time-consuming, expensive and requires a great deal of experience, as it requires a visual analysis component that is highly o… ▽ More Identifying the type of kidney stones can allow urologists to determine their cause of formation, improving the prescription of appropriate treatments to diminish future relapses. Currently, the associated ex-vivo diagnosis (known as Morpho-constitutional Analysis, MCA) is time-consuming, expensive and requires a great deal of experience, as it requires a visual analysis component that is highly operator dependant. Recently, machine learning methods have been developed for in-vivo endoscopic stone recognition. Deep Learning (DL) based methods outperform non-DL methods in terms of accuracy but lack explainability. Despite this trade-off, when it comes to making high-stakes decisions, it's important to prioritize understandable Computer-Aided Diagnosis (CADx) that suggests a course of action based on reasonable evidence, rather than a model prescribing a course of action. In this proposal, we learn Prototypical Parts (PPs) per kidney stone subtype, which are used by the DL model to generate an output classification. Using PPs in the classification task enables case-based reasoning explanations for such output, thus making the model interpretable. In addition, we modify global visual characteristics to describe their relevance to the PPs and the sensitivity of our model's performance. With this, we provide explanations with additional information at the sample, class and model levels in contrast to previous works. Although our implementation's average accuracy is lower than state-of-the-art (SOTA) non-interpretable DL models by 1.5 %, our models perform 2.8% better on perturbed images with a lower standard deviation, without adversarial training. Thus, Learning PPs has the potential to create more robust DL models. △ Less

Submitted 8 April, 2023; originally announced April 2023.

Comments: This paper has been accepted at the LatinX in Computer Vision Research Workshop at CVPR2023 as a full paper and it will appear on the CVPR proceedings

arXiv:2304.03193 [pdf, other]

Improving automatic endoscopic stone recognition using a multi-view fusion approach enhanced with two-step transfer learning

Authors: Francisco Lopez-Tiro, Elias Villalvazo-Avila, Juan Pablo Betancur-Rengifo, Ivan Reyes-Amezcua, Jacques Hubert, Gilberto Ochoa-Ruiz, Christian Daul

Abstract: This contribution presents a deep-learning method for extracting and fusing image information acquired from different viewpoints, with the aim to produce more discriminant object features for the identification of the type of kidney stones seen in endoscopic images. The model was further improved with a two-step transfer learning approach and by attention blocks to refine the learned feature maps.… ▽ More This contribution presents a deep-learning method for extracting and fusing image information acquired from different viewpoints, with the aim to produce more discriminant object features for the identification of the type of kidney stones seen in endoscopic images. The model was further improved with a two-step transfer learning approach and by attention blocks to refine the learned feature maps. Deep feature fusion strategies improved the results of single view extraction backbone models by more than 6% in terms of accuracy of the kidney stones classification. △ Less

Submitted 22 August, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

Comments: This paper has been accepted at the LatinX in Computer Vision (LXCV) Research workshop at ICCV 2023 (Paris, France)

arXiv:2304.03171 [pdf, other]

Deep learning-based image exposure enhancement as a pre-processing for an accurate 3D colon surface reconstruction

Authors: Ricardo Espinosa, Carlos Axel Garcia-Vega, Gilberto Ochoa-Ruiz, Dominique Lamarque, Christian Daul

Abstract: This contribution shows how an appropriate image pre-processing can improve a deep-learning based 3D reconstruction of colon parts. The assumption is that, rather than global image illumination corrections, local under- and over-exposures should be corrected in colonoscopy. An overview of the pipeline including the image exposure correction and a RNN-SLAM is first given. Then, this paper quantifie… ▽ More This contribution shows how an appropriate image pre-processing can improve a deep-learning based 3D reconstruction of colon parts. The assumption is that, rather than global image illumination corrections, local under- and over-exposures should be corrected in colonoscopy. An overview of the pipeline including the image exposure correction and a RNN-SLAM is first given. Then, this paper quantifies the reconstruction accuracy of the endoscope trajectory in the colon with and without appropriate illumination correction △ Less

Submitted 14 April, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

Comments: This article has been submitted to GRESTI 2023 for review

arXiv:2211.04658 [pdf, other]

SUPRA: Superpixel Guided Loss for Improved Multi-modal Segmentation in Endoscopy

Authors: Rafael Martinez-Garcia-Peña, Mansoor Ali Teevno, Gilberto Ochoa-Ruiz, Sharib Ali

Abstract: Domain shift is a well-known problem in the medical imaging community. In particular, for endoscopic image analysis where the data can have different modalities the performance of deep learning (DL) methods gets adversely affected. In other words, methods developed on one modality cannot be used for a different modality. However, in real clinical settings, endoscopists switch between modalities fo… ▽ More Domain shift is a well-known problem in the medical imaging community. In particular, for endoscopic image analysis where the data can have different modalities the performance of deep learning (DL) methods gets adversely affected. In other words, methods developed on one modality cannot be used for a different modality. However, in real clinical settings, endoscopists switch between modalities for better mucosal visualisation. In this paper, we explore the domain generalisation technique to enable DL methods to be used in such scenarios. To this extend, we propose to use super pixels generated with Simple Linear Iterative Clustering (SLIC) which we refer to as "SUPRA" for SUPeRpixel Augmented method. SUPRA first generates a preliminary segmentation mask making use of our new loss "SLICLoss" that encourages both an accurate and color-consistent segmentation. We demonstrate that SLICLoss when combined with Binary Cross Entropy loss (BCE) can improve the model's generalisability with data that presents significant domain shift. We validate this novel compound loss on a vanilla U-Net using the EndoUDA dataset, which contains images for Barret's Esophagus and polyps from two modalities. We show that our method yields an improvement of nearly 20% in the target domain set compared to the baseline. △ Less

Submitted 9 April, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

Comments: This work has been accepted at the LatinX in Computer Vision Research Workshop at CVPR 2023

arXiv:2211.02967 [pdf, other]

Improved Kidney Stone Recognition Through Attention and Multi-View Feature Fusion Strategies

Authors: Elias Villalvazo-Avila, Francisco Lopez-Tiro, Jonathan El-Beze, Jacques Hubert, Miguel Gonzalez-Mendoza, Gilberto Ochoa-Ruiz, Christian Daul

Abstract: This contribution presents a deep learning method for the extraction and fusion of information relating to kidney stone fragments acquired from different viewpoints of the endoscope. Surface and section fragment images are jointly used during the training of the classifier to improve the discrimination power of the features by adding attention layers at the end of each convolutional block. This ap… ▽ More This contribution presents a deep learning method for the extraction and fusion of information relating to kidney stone fragments acquired from different viewpoints of the endoscope. Surface and section fragment images are jointly used during the training of the classifier to improve the discrimination power of the features by adding attention layers at the end of each convolutional block. This approach is specifically designed to mimic the morpho-constitutional analysis performed in ex-vivo by biologists to visually identify kidney stones by inspecting both views. The addition of attention mechanisms to the backbone improved the results of single view extraction backbones by 4% on average. Moreover, in comparison to the state-of-the-art, the fusion of the deep features improved the overall results up to 11% in terms of kidney stone classification accuracy. △ Less

Submitted 5 November, 2022; originally announced November 2022.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2210.15033 [pdf, other]

Multi-Scale Structural-aware Exposure Correction for Endoscopic Imaging

Authors: Axel Garcia-Vega, Ricardo Espinosa, Luis Ramirez-Guzman, Thomas Bazin, Luis Falcon-Morales, Gilberto Ochoa-Ruiz, Dominique Lamarque, Christian Daul

Abstract: Endoscopy is the most widely used imaging technique for the diagnosis of cancerous lesions in hollow organs. However, endoscopic images are often affected by illumination artefacts: image parts may be over- or underexposed according to the light source pose and the tissue orientation. These artifacts have a strong negative impact on the performance of computer vision or AI-based diagnosis tools. A… ▽ More Endoscopy is the most widely used imaging technique for the diagnosis of cancerous lesions in hollow organs. However, endoscopic images are often affected by illumination artefacts: image parts may be over- or underexposed according to the light source pose and the tissue orientation. These artifacts have a strong negative impact on the performance of computer vision or AI-based diagnosis tools. Although endoscopic image enhancement methods are greatly required, little effort has been devoted to over- and under-exposition enhancement in real-time. This contribution presents an extension to the objective function of LMSPEC, a method originally introduced to enhance images from natural scenes. It is used here for the exposure correction in endoscopic imaging and the preservation of structural information. To the best of our knowledge, this contribution is the first one that addresses the enhancement of endoscopic images using deep learning (DL) methods. Tested on the Endo4IE dataset, the proposed implementation has yielded a significant improvement over LMSPEC reaching a SSIM increase of 4.40% and 4.21% for over- and underexposed images, respectively. △ Less

Submitted 26 October, 2022; originally announced October 2022.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2210.13654 [pdf, other]

Boosting Kidney Stone Identification in Endoscopic Images Using Two-Step Transfer Learning

Authors: Francisco Lopez-Tiro, Juan Pablo Betancur-Rengifo, Arturo Ruiz-Sanchez, Ivan Reyes-Amezcua, Jonathan El-Beze, Jacques Hubert, Michel Daudon, Gilberto Ochoa-Ruiz, Christian Daul

Abstract: Knowing the cause of kidney stone formation is crucial to establish treatments that prevent recurrence. There are currently different approaches for determining the kidney stone type. However, the reference ex-vivo identification procedure can take up to several weeks, while an in-vivo visual recognition requires highly trained specialists. Machine learning models have been developed to provide ur… ▽ More Knowing the cause of kidney stone formation is crucial to establish treatments that prevent recurrence. There are currently different approaches for determining the kidney stone type. However, the reference ex-vivo identification procedure can take up to several weeks, while an in-vivo visual recognition requires highly trained specialists. Machine learning models have been developed to provide urologists with an automated classification of kidney stones during an ureteroscopy; however, there is a general lack in terms of quality of the training data and methods. In this work, a two-step transfer learning approach is used to train the kidney stone classifier. The proposed approach transfers knowledge learned on a set of images of kidney stones acquired with a CCD camera (ex-vivo dataset) to a final model that classifies images from endoscopic images (ex-vivo dataset). The results show that learning features from different domains with similar information helps to improve the performance of a model that performs classification in real conditions (for instance, uncontrolled lighting conditions and blur). Finally, in comparison to models that are trained from scratch or by initializing ImageNet weights, the obtained results suggest that the two-step approach extracts features improving the identification of kidney stones in endoscopic images. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2208.09926 [pdf, other]

doi 10.1080/21681163.2022.2150688

A semi-supervised Teacher-Student framework for surgical tool detection and localization

Authors: Mansoor Ali, Gilberto Ochoa-Ruiz, Sharib Ali

Abstract: Surgical tool detection in minimally invasive surgery is an essential part of computer-assisted interventions. Current approaches are mostly based on supervised methods which require large fully labeled data to train supervised models and suffer from pseudo label bias because of class imbalance issues. However large image datasets with bounding box annotations are often scarcely available. Semi-su… ▽ More Surgical tool detection in minimally invasive surgery is an essential part of computer-assisted interventions. Current approaches are mostly based on supervised methods which require large fully labeled data to train supervised models and suffer from pseudo label bias because of class imbalance issues. However large image datasets with bounding box annotations are often scarcely available. Semi-supervised learning (SSL) has recently emerged as a means for training large models using only a modest amount of annotated data; apart from reducing the annotation cost. SSL has also shown promise to produce models that are more robust and generalizable. Therefore, in this paper we introduce a semi-supervised learning (SSL) framework in surgical tool detection paradigm which aims to mitigate the scarcity of training data and the data imbalance through a knowledge distillation approach. In the proposed work, we train a model with labeled data which initialises the Teacher-Student joint learning, where the Student is trained on Teacher-generated pseudo labels from unlabeled data. We propose a multi-class distance with a margin based classification loss function in the region-of-interest head of the detector to effectively segregate foreground classes from background region. Our results on m2cai16-tool-locations dataset indicate the superiority of our approach on different supervised data settings (1%, 2%, 5%, 10% of annotated data) where our model achieves overall improvements of 8%, 12% and 27% in mAP (on 1% labeled data) over the state-of-the-art SSL methods and a fully supervised baseline, respectively. The code is available at https://github.com/Mansoor-at/Semi-supervised-surgical-tool-det △ Less

Submitted 21 August, 2022; originally announced August 2022.

Comments: Paper accepted at Augmented Reality, Augmented Environments for Computer Assisted Interventions (AE-CAI), Computer Assisted and Robotic Endoscopy (CARE) and Context-Aware Operating Theaters (OR 2.0) at MICCAI 2022

arXiv:2207.09530 [pdf, other]

Knowledge distillation with a class-aware loss for endoscopic disease detection

Authors: Pedro E. Chavarrias-Solanon, Mansoor Ali-Teevno, Gilberto Ochoa-Ruiz, Sharib Ali

Abstract: Prevalence of gastrointestinal (GI) cancer is growing alarmingly every year leading to a substantial increase in the mortality rate. Endoscopic detection is providing crucial diagnostic support, however, subtle lesions in upper and lower GI are quite hard to detect and cause considerable missed detection. In this work, we leverage deep learning to develop a framework to improve the localization of… ▽ More Prevalence of gastrointestinal (GI) cancer is growing alarmingly every year leading to a substantial increase in the mortality rate. Endoscopic detection is providing crucial diagnostic support, however, subtle lesions in upper and lower GI are quite hard to detect and cause considerable missed detection. In this work, we leverage deep learning to develop a framework to improve the localization of difficult to detect lesions and minimize the missed detection rate. We propose an end to end student-teacher learning setup where class probabilities of a trained teacher model on one class with larger dataset are used to penalize multi-class student network. Our model achieves higher performance in terms of mean average precision (mAP) on both endoscopic disease detection (EDD2020) challenge and Kvasir-SEG datasets. Additionally, we show that using such learning paradigm, our model is generalizable to unseen test set giving higher APs for clinically crucial neoplastic and polyp categories △ Less

Submitted 19 July, 2022; originally announced July 2022.

Comments: Paper accepted at the CaPTion workshop at MICCAI2022

arXiv:2207.09483 [pdf, other]

Comparison of automatic prostate zones segmentation models in MRI images using U-net-like architectures

Authors: Pablo Cesar Quihui-Rubio, Gilberto Ochoa-Ruiz, Miguel Gonzalez-Mendoza, Gerardo Rodriguez-Hernandez, Christian Mata

Abstract: Prostate cancer is the second-most frequently diagnosed cancer and the sixth leading cause of cancer death in males worldwide. The main problem that specialists face during the diagnosis of prostate cancer is the localization of Regions of Interest (ROI) containing a tumor tissue. Currently, the segmentation of this ROI in most cases is carried out manually by expert doctors, but the procedure is… ▽ More Prostate cancer is the second-most frequently diagnosed cancer and the sixth leading cause of cancer death in males worldwide. The main problem that specialists face during the diagnosis of prostate cancer is the localization of Regions of Interest (ROI) containing a tumor tissue. Currently, the segmentation of this ROI in most cases is carried out manually by expert doctors, but the procedure is plagued with low detection rates (of about 27-44%) or overdiagnosis in some patients. Therefore, several research works have tackled the challenge of automatically segmenting and extracting features of the ROI from magnetic resonance images, as this process can greatly facilitate many diagnostic and therapeutic applications. However, the lack of clear prostate boundaries, the heterogeneity inherent to the prostate tissue, and the variety of prostate shapes makes this process very difficult to automate.In this work, six deep learning models were trained and analyzed with a dataset of MRI images obtained from the Centre Hospitalaire de Dijon and Universitat Politecnica de Catalunya. We carried out a comparison of multiple deep learning models (i.e. U-Net, Attention U-Net, Dense-UNet, Attention Dense-UNet, R2U-Net, and Attention R2U-Net) using categorical cross-entropy loss function. The analysis was performed using three metrics commonly used for image segmentation: Dice score, Jaccard index, and mean squared error. The model that give us the best result segmenting all the zones was R2U-Net, which achieved 0.869, 0.782, and 0.00013 for Dice, Jaccard and mean squared error, respectively. △ Less

Submitted 19 July, 2022; originally announced July 2022.

arXiv:2207.04010 [pdf, other]

MACFE: A Meta-learning and Causality Based Feature Engineering Framework

Authors: Ivan Reyes-Amezcua, Daniel Flores-Araiza, Gilberto Ochoa-Ruiz, Andres Mendez-Vazquez, Eduardo Rodriguez-Tello

Abstract: Feature engineering has become one of the most important steps to improve model prediction performance, and to produce quality datasets. However, this process requires non-trivial domain-knowledge which involves a time-consuming process. Thereby, automating such process has become an active area of research and of interest in industrial applications. In this paper, a novel method, called Meta-lear… ▽ More Feature engineering has become one of the most important steps to improve model prediction performance, and to produce quality datasets. However, this process requires non-trivial domain-knowledge which involves a time-consuming process. Thereby, automating such process has become an active area of research and of interest in industrial applications. In this paper, a novel method, called Meta-learning and Causality Based Feature Engineering (MACFE), is proposed; our method is based on the use of meta-learning, feature distribution encoding, and causality feature selection. In MACFE, meta-learning is used to find the best transformations, then the search is accelerated by pre-selecting "original" features given their causal relevance. Experimental evaluations on popular classification datasets show that MACFE can improve the prediction performance across eight classifiers, outperforms the current state-of-the-art methods in average by at least 6.54%, and obtains an improvement of 2.71% over the best previous works. △ Less

Submitted 8 July, 2022; originally announced July 2022.

arXiv:2207.02396 [pdf, other]

A Novel Hybrid Endoscopic Dataset for Evaluating Machine Learning-based Photometric Image Enhancement Models

Authors: Axel Garcia-Vega, Ricardo Espinosa, Gilberto Ochoa-Ruiz, Thomas Bazin, Luis Eduardo Falcon-Morales, Dominique Lamarque, Christian Daul

Abstract: Endoscopy is the most widely used medical technique for cancer and polyp detection inside hollow organs. However, images acquired by an endoscope are frequently affected by illumination artefacts due to the enlightenment source orientation. There exist two major issues when the endoscope's light source pose suddenly changes: overexposed and underexposed tissue areas are produced. These two scenari… ▽ More Endoscopy is the most widely used medical technique for cancer and polyp detection inside hollow organs. However, images acquired by an endoscope are frequently affected by illumination artefacts due to the enlightenment source orientation. There exist two major issues when the endoscope's light source pose suddenly changes: overexposed and underexposed tissue areas are produced. These two scenarios can result in misdiagnosis due to the lack of information in the affected zones or hamper the performance of various computer vision methods (e.g., SLAM, structure from motion, optical flow) used during the non invasive examination. The aim of this work is two-fold: i) to introduce a new synthetically generated data-set generated by a generative adversarial techniques and ii) and to explore both shallow based and deep learning-based image-enhancement methods in overexposed and underexposed lighting conditions. Best quantitative results (i.e., metric based results), were obtained by the deep-learnnig-based LMSPEC method,besides a running time around 7.6 fps) △ Less

Submitted 5 July, 2022; originally announced July 2022.

arXiv:2206.07580 [pdf, other]

Evaluating object detector ensembles for improving the robustness of artifact detection in endoscopic video streams

Authors: Pedro Esteban Chavarrias-Solano, Carlos Axel Garcia-Vega, Francisco Javier Lopez-Tiro, Gilberto Ochoa-Ruiz, Thomas Bazin, Dominique Lamarque, Christian Daul

Abstract: In this contribution we use an ensemble deep-learning method for combining the prediction of two individual one-stage detectors (i.e., YOLOv4 and Yolact) with the aim to detect artefacts in endoscopic images. This ensemble strategy enabled us to improve the robustness of the individual models without harming their real-time computation capabilities. We demonstrated the effectiveness of our approac… ▽ More In this contribution we use an ensemble deep-learning method for combining the prediction of two individual one-stage detectors (i.e., YOLOv4 and Yolact) with the aim to detect artefacts in endoscopic images. This ensemble strategy enabled us to improve the robustness of the individual models without harming their real-time computation capabilities. We demonstrated the effectiveness of our approach by training and testing the two individual models and various ensemble configurations on the "Endoscopic Artifact Detection Challenge" dataset. Extensive experiments show the superiority, in terms of mean average precision, of the ensemble approach over the individual models and previous works in the state of the art. △ Less

Submitted 15 June, 2022; originally announced June 2022.

arXiv:2206.02110 [pdf, other]

Computer Vision-based Characterization of Large-scale Jet Flames using a Synthetic Infrared Image Generation Approach

Authors: Carmina Pérez-Guerrero, Jorge Francisco Ciprián-Sánchez, Adriana Palacios, Gilberto Ochoa-Ruiz, Miguel Gonzalez-Mendoza, Vahid Foroughi, Elsa Pastor, Gerardo Rodriguez-Hernandez

Abstract: Among the different kinds of fire accidents that can occur during industrial activities that involve hazardous materials, jet fires are one of the lesser-known types. This is because they are often involved in a process that generates a sequence of other accidents of greater magnitude, known as domino effect. Flame im**ement usually causes domino effects, and jet fires present specific features… ▽ More Among the different kinds of fire accidents that can occur during industrial activities that involve hazardous materials, jet fires are one of the lesser-known types. This is because they are often involved in a process that generates a sequence of other accidents of greater magnitude, known as domino effect. Flame im**ement usually causes domino effects, and jet fires present specific features that can significantly increase the probability of this happening. These features become relevant from a risk analysis perspective, making their proper characterization a crucial task. Deep Learning approaches have become extensively used for tasks such as jet fire characterization; however, these methods are heavily dependent on the amount of data and the quality of the labels. Data acquisition of jet fires involve expensive experiments, especially so if infrared imagery is used. Therefore, this paper proposes the use of Generative Adversarial Networks to produce plausible infrared images from visible ones, making experiments less expensive and allowing for other potential applications. The results suggest that it is possible to realistically replicate the results for experiments carried out using both visible and infrared cameras. The obtained results are compared with some previous experiments, and it is shown that similar results were obtained. △ Less

Submitted 5 June, 2022; originally announced June 2022.

Comments: Pre-print submitted to Engineering Science and Technology, an International Journal

arXiv:2206.02029 [pdf, other]

Guided Deep Metric Learning

Authors: Jorge Gonzalez-Zapata, Ivan Reyes-Amezcua, Daniel Flores-Araiza, Mauricio Mendez-Ruiz, Gilberto Ochoa-Ruiz, Andres Mendez-Vazquez

Abstract: Deep Metric Learning (DML) methods have been proven relevant for visual similarity learning. However, they sometimes lack generalization properties because they are trained often using an inappropriate sample selection strategy or due to the difficulty of the dataset caused by a distributional shift in the data. These represent a significant drawback when attempting to learn the underlying data ma… ▽ More Deep Metric Learning (DML) methods have been proven relevant for visual similarity learning. However, they sometimes lack generalization properties because they are trained often using an inappropriate sample selection strategy or due to the difficulty of the dataset caused by a distributional shift in the data. These represent a significant drawback when attempting to learn the underlying data manifold. Therefore, there is a pressing need to develop better ways of obtaining generalization and representation of the underlying manifold. In this paper, we propose a novel approach to DML that we call Guided Deep Metric Learning, a novel architecture oriented to learning more compact clusters, improving generalization under distributional shifts in DML. This novel architecture consists of two independent models: A multi-branch master model, inspired from a Few-Shot Learning (FSL) perspective, generates a reduced hypothesis space based on prior knowledge from labeled data, which guides or regularizes the decision boundary of a student model during training under an offline knowledge distillation scheme. Experiments have shown that the proposed method is capable of a better manifold generalization and representation to up to 40% improvement (Recall@1, CIFAR10), using guidelines suggested by Musgrave et al. to perform a more fair and realistic comparison, which is currently absent in the literature △ Less

Submitted 4 June, 2022; originally announced June 2022.

arXiv:2206.00536 [pdf, other]

Impact of loss function in Deep Learning methods for accurate retinal vessel segmentation

Authors: Daniela Herrera, Gilberto Ochoa-Ruiz, Miguel Gonzalez-Mendoza, Christian Mata

Abstract: The retinal vessel network studied through fundus images contributes to the diagnosis of multiple diseases not only found in the eye. The segmentation of this system may help the specialized task of analyzing these images by assisting in the quantification of morphological characteristics. Due to its relevance, several Deep Learning-based architectures have been tested for tackling this problem au… ▽ More The retinal vessel network studied through fundus images contributes to the diagnosis of multiple diseases not only found in the eye. The segmentation of this system may help the specialized task of analyzing these images by assisting in the quantification of morphological characteristics. Due to its relevance, several Deep Learning-based architectures have been tested for tackling this problem automatically. However, the impact of loss function selection on the segmentation of the intricate retinal blood vessel system hasn't been systematically evaluated. In this work, we present the comparison of the loss functions Binary Cross Entropy, Dice, Tversky, and Combo loss using the deep learning architectures (i.e. U-Net, Attention U-Net, and Nested UNet) with the DRIVE dataset. Their performance is assessed using four metrics: the AUC, the mean squared error, the dice score, and the Hausdorff distance. The models were trained with the same number of parameters and epochs. Using dice score and AUC, the best combination was SA-UNet with Combo loss, which had an average of 0.9442 and 0.809 respectively. The best average of Hausdorff distance and mean square error were obtained using the Nested U-Net with the Dice loss function, which had an average of 6.32 and 0.0241 respectively. The results showed that there is a significant difference in the selection of loss function △ Less

Submitted 1 June, 2022; originally announced June 2022.

Comments: Paper submitted to MICAI 2022

arXiv:2206.00252 [pdf, other]

Interpretable Deep Learning Classifier by Detection of Prototypical Parts on Kidney Stones Images

Authors: Daniel Flores-Araiza, Francisco Lopez-Tiro, Elias Villalvazo-Avila, Jonathan El-Beze, Jacques Hubert, Gilberto Ochoa-Ruiz, Christian Daul

Abstract: Identifying the type of kidney stones can allow urologists to determine their formation cause, improving the early prescription of appropriate treatments to diminish future relapses. However, currently, the associated ex-vivo diagnosis (known as morpho-constitutional analysis, MCA) is time-consuming, expensive, and requires a great deal of experience, as it requires a visual analysis component tha… ▽ More Identifying the type of kidney stones can allow urologists to determine their formation cause, improving the early prescription of appropriate treatments to diminish future relapses. However, currently, the associated ex-vivo diagnosis (known as morpho-constitutional analysis, MCA) is time-consuming, expensive, and requires a great deal of experience, as it requires a visual analysis component that is highly operator dependant. Recently, machine learning methods have been developed for in-vivo endoscopic stone recognition. Shallow methods have been demonstrated to be reliable and interpretable but exhibit low accuracy, while deep learning-based methods yield high accuracy but are not explainable. However, high stake decisions require understandable computer-aided diagnosis (CAD) to suggest a course of action based on reasonable evidence, rather than merely prescribe one. Herein, we investigate means for learning part-prototypes (PPs) that enable interpretable models. Our proposal suggests a classification for a kidney stone patch image and provides explanations in a similar way as those used on the MCA method. △ Less

Submitted 1 June, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: Extended abstract accepted at LatinX in Computer Vision Research Workshop, at CVPR 2022

arXiv:2206.00069 [pdf, other]

Comparing feature fusion strategies for Deep Learning-based kidney stone identification

Authors: Elias Villalvazo-Avila, Francisco Lopez-Tiro, Daniel Flores-Araiza, Gilberto Ochoa-Ruiz, Jonathan El-Beze, Jacques Hubert, Christian Daul

Abstract: This contribution presents a deep-learning method for extracting and fusing image information acquired from different viewpoints with the aim to produce more discriminant object features. Our approach was specifically designed to mimic the morpho-constitutional analysis used by urologists to visually classify kidney stones by inspecting the sections and surfaces of their fragments. Deep feature fu… ▽ More This contribution presents a deep-learning method for extracting and fusing image information acquired from different viewpoints with the aim to produce more discriminant object features. Our approach was specifically designed to mimic the morpho-constitutional analysis used by urologists to visually classify kidney stones by inspecting the sections and surfaces of their fragments. Deep feature fusion strategies improved the results of single view extraction backbone models by more than 10\% in terms of precision of the kidney stones classification. △ Less

Submitted 31 May, 2022; originally announced June 2022.

Comments: 4 pages, 3 figures, XXVIIIème Colloque Francophone de Traitement du Signal et des Images

arXiv:2201.08865 [pdf, other]

On the in vivo recognition of kidney stones using machine learning

Authors: Francisco Lopez-Tiro, Vincent Estrade, Jacques Hubert, Daniel Flores-Araiza, Miguel Gonzalez-Mendoza, Gilberto Ochoa-Ruiz, Christian Daul

Abstract: Determining the type of kidney stones allows urologists to prescribe a treatment to avoid recurrence of renal lithiasis. An automated in-vivo image-based classification method would be an important step towards an immediate identification of the kidney stone type required as a first phase of the diagnosis. In the literature it was shown on ex-vivo data (i.e., in very controlled scene and image acq… ▽ More Determining the type of kidney stones allows urologists to prescribe a treatment to avoid recurrence of renal lithiasis. An automated in-vivo image-based classification method would be an important step towards an immediate identification of the kidney stone type required as a first phase of the diagnosis. In the literature it was shown on ex-vivo data (i.e., in very controlled scene and image acquisition conditions) that an automated kidney stone classification is indeed feasible. This pilot study compares the kidney stone recognition performances of six shallow machine learning methods and three deep-learning architectures which were tested with in-vivo images of the four most frequent urinary calculi types acquired with an endoscope during standard ureteroscopies. This contribution details the database construction and the design of the tested kidney stones classifiers. Even if the best results were obtained by the Inception v3 architecture (weighted precision, recall and F1-score of 0.97, 0.98 and 0.97, respectively), it is also shown that choosing an appropriate colour space and texture features allows a shallow machine learning method to approach closely the performances of the most promising deep-learning methods (the XGBoost classifier led to weighted precision, recall and F1-score values of 0.96). This paper is the first one that explores the most discriminant features to be extracted from images acquired during ureteroscopies. △ Less

Submitted 24 August, 2023; v1 submitted 21 January, 2022; originally announced January 2022.

Comments: Paper submitted to IEEE Access

arXiv:2201.07931 [pdf, other]

Experimental Large-Scale Jet Flames' Geometrical Features Extraction for Risk Management Using Infrared Images and Deep Learning Segmentation Methods

Authors: Carmina Pérez-Guerrero, Adriana Palacios, Gilberto Ochoa-Ruiz, Christian Mata, Joaquim Casal, Miguel Gonzalez-Mendoza, Luis Eduardo Falcón-Morales

Abstract: Jet fires are relatively small and have the least severe effects among the diverse fire accidents that can occur in industrial plants; however, they are usually involved in a process known as the domino effect, that leads to more severe events, such as explosions or the initiation of another fire, making the analysis of such fires an important part of risk analysis. This research work explores the… ▽ More Jet fires are relatively small and have the least severe effects among the diverse fire accidents that can occur in industrial plants; however, they are usually involved in a process known as the domino effect, that leads to more severe events, such as explosions or the initiation of another fire, making the analysis of such fires an important part of risk analysis. This research work explores the application of deep learning models in an alternative approach that uses the semantic segmentation of jet fires flames to extract main geometrical attributes, relevant for fire risk assessments. A comparison is made between traditional image processing methods and some state-of-the-art deep learning models. It is found that the best approach is a deep learning architecture known as UNet, along with its two improvements, Attention UNet and UNet++. The models are then used to segment a group of vertical jet flames of varying pipe outlet diameters to extract their main geometrical characteristics. Attention UNet obtained the best general performance in the approximation of both height and area of the flames, while also showing a statistically significant difference between it and UNet++. UNet obtained the best overall performance for the approximation of the lift-off distances; however, there is not enough data to prove a statistically significant difference between Attention UNet and UNet++. The only instance where UNet++ outperformed the other models, was while obtaining the lift-off distances of the jet flames with 0.01275 m pipe outlet diameter. In general, the explored models show good agreement between the experimental and predicted values for relatively large turbulent propane jet flames, released in sonic and subsonic regimes; thus, making these radiation zones segmentation models, a suitable approach for different jet flame risk management scenarios. △ Less

Submitted 19 January, 2022; originally announced January 2022.

arXiv:2111.04911 [pdf, other]

Real-time Instance Segmentation of Surgical Instruments using Attention and Multi-scale Feature Fusion

Authors: Juan Carlos Angeles-Ceron, Gilberto Ochoa-Ruiz, Leonardo Chang, Sharib Ali

Abstract: Precise instrument segmentation aid surgeons to navigate the body more easily and increase patient safety. While accurate tracking of surgical instruments in real-time plays a crucial role in minimally invasive computer-assisted surgeries, it is a challenging task to achieve, mainly due to 1) complex surgical environment, and 2) model design with both optimal accuracy and speed. Deep learning give… ▽ More Precise instrument segmentation aid surgeons to navigate the body more easily and increase patient safety. While accurate tracking of surgical instruments in real-time plays a crucial role in minimally invasive computer-assisted surgeries, it is a challenging task to achieve, mainly due to 1) complex surgical environment, and 2) model design with both optimal accuracy and speed. Deep learning gives us the opportunity to learn complex environment from large surgery scene environments and placements of these instruments in real world scenarios. The Robust Medical Instrument Segmentation 2019 challenge (ROBUST-MIS) provides more than 10,000 frames with surgical tools in different clinical settings. In this paper, we use a light-weight single stage instance segmentation model complemented with a convolutional block attention module for achieving both faster and accurate inference. We further improve accuracy through data augmentation and optimal anchor localisation strategies. To our knowledge, this is the first work that explicitly focuses on both real-time performance and improved accuracy. Our approach out-performed top team performances in the ROBUST-MIS challenge with over 44% improvement on both area-based metric MI_DSC and distance-based metric MI_NSD. We also demonstrate real-time performance (> 60 frames-per-second) with different but competitive variants of our final approach. △ Less

Submitted 9 November, 2021; v1 submitted 8 November, 2021; originally announced November 2021.

arXiv:2107.06992 [pdf, other]

doi 10.1007/978-3-030-89817-5_10

Finding Significant Features for Few-Shot Learning using Dimensionality Reduction

Authors: Mauricio Mendez-Ruiz, Ivan Garcia Jorge Gonzalez-Zapata, Gilberto Ochoa-Ruiz, Andres Mendez-Vazquez

Abstract: Few-shot learning is a relatively new technique that specializes in problems where we have little amounts of data. The goal of these methods is to classify categories that have not been seen before with just a handful of samples. Recent approaches, such as metric learning, adopt the meta-learning strategy in which we have episodic tasks conformed by support (training) data and query (test) data. M… ▽ More Few-shot learning is a relatively new technique that specializes in problems where we have little amounts of data. The goal of these methods is to classify categories that have not been seen before with just a handful of samples. Recent approaches, such as metric learning, adopt the meta-learning strategy in which we have episodic tasks conformed by support (training) data and query (test) data. Metric learning methods have demonstrated that simple models can achieve good performance by learning a similarity function to compare the support and the query data. However, the feature space learned by a given metric learning approach may not exploit the information given by a specific few-shot task. In this work, we explore the use of dimension reduction techniques as a way to find task-significant features hel** to make better predictions. We measure the performance of the reduced features by assigning a score based on the intra-class and inter-class distance, and selecting a feature reduction method in which instances of different classes are far away and instances of the same class are close. This module helps to improve the accuracy performance by allowing the similarity function, given by the metric learning method, to have more discriminative features for the classification. Our method outperforms the metric learning baselines in the miniImageNet dataset by around 2% in accuracy performance. △ Less

Submitted 6 July, 2021; originally announced July 2021.

Comments: This paper is currently under review for the Mexican International Conference on Artificial Intelligence (MICAI) 2021

arXiv:2107.03461 [pdf, other]

doi 10.1007/978-3-030-89817-5_12

Comparing Machine Learning based Segmentation Models on Jet Fire Radiation Zones

Authors: Carmina Pérez-Guerrero, Adriana Palacios, Gilberto Ochoa-Ruiz, Christian Mata, Miguel Gonzalez-Mendoza, Luis Eduardo Falcón-Morales

Abstract: Risk assessment is relevant in any workplace, however there is a degree of unpredictability when dealing with flammable or hazardous materials so that detection of fire accidents by itself may not be enough. An example of this is the im**ement of jet fires, where the heat fluxes of the flame could reach nearby equipment and dramatically increase the probability of a domino effect with catastroph… ▽ More Risk assessment is relevant in any workplace, however there is a degree of unpredictability when dealing with flammable or hazardous materials so that detection of fire accidents by itself may not be enough. An example of this is the im**ement of jet fires, where the heat fluxes of the flame could reach nearby equipment and dramatically increase the probability of a domino effect with catastrophic results. Because of this, the characterization of such fire accidents is important from a risk management point of view. One such characterization would be the segmentation of different radiation zones within the flame, so this paper presents an exploratory research regarding several traditional computer vision and Deep Learning segmentation approaches to solve this specific problem. A data set of propane jet fires is used to train and evaluate the different approaches and given the difference in the distribution of the zones and background of the images, different loss functions, that seek to alleviate data imbalance, are also explored. Additionally, different metrics are correlated to a manual ranking performed by experts to make an evaluation that closely resembles the expert's criteria. The Hausdorff Distance and Adjusted Random Index were the metrics with the highest correlation and the best results were obtained from the UNet architecture with a Weighted Cross-Entropy Loss. These results can be used in future research to extract more geometric information from the segmentation masks or could even be implemented on other types of fire accidents. △ Less

Submitted 1 November, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

Journal ref: Advances in Computational Intelligence. MICAI 2021. Lecture Notes in Computer Science, 13067 (2021), 161-172

arXiv:2104.05124 [pdf, other]

doi 10.1109/cvprw53098.2021.00140

A Bop and Beyond: A Second Order Optimizer for Binarized Neural Networks

Authors: Cuauhtemoc Daniel Suarez-Ramirez, Miguel Gonzalez-Mendoza, Leonardo Chang-Fernandez, Gilberto Ochoa-Ruiz, Mario Alberto Duran-Vega

Abstract: The optimization of Binary Neural Networks (BNNs) relies on approximating the real-valued weights with their binarized representations. Current techniques for weight-updating use the same approaches as traditional Neural Networks (NNs) with the extra requirement of using an approximation to the derivative of the sign function - as it is the Dirac-Delta function - for back-propagation; thus, effort… ▽ More The optimization of Binary Neural Networks (BNNs) relies on approximating the real-valued weights with their binarized representations. Current techniques for weight-updating use the same approaches as traditional Neural Networks (NNs) with the extra requirement of using an approximation to the derivative of the sign function - as it is the Dirac-Delta function - for back-propagation; thus, efforts are focused adapting full-precision techniques to work on BNNs. In the literature, only one previous effort has tackled the problem of directly training the BNNs with bit-flips by using the first raw moment estimate of the gradients and comparing it against a threshold for deciding when to flip a weight (Bop). In this paper, we take an approach parallel to Adam which also uses the second raw moment estimate to normalize the first raw moment before doing the comparison with the threshold, we call this method Bop2ndOrder. We present two versions of the proposed optimizer: a biased one and a bias-corrected one, each with its own applications. Also, we present a complete ablation study of the hyperparameters space, as well as the effect of using schedulers on each of them. For these studies, we tested the optimizer in CIFAR10 using the BinaryNet architecture. Also, we tested it in ImageNet 2012 with the XnorNet and BiRealNet architectures for accuracy. In both datasets our approach proved to converge faster, was robust to changes of the hyperparameters, and achieved better accuracy values. △ Less

Submitted 11 April, 2021; originally announced April 2021.

Comments: 9 pages, 12 figures, Preprint accepted to the LatinX in CV Research Workshop at CVPR'21

arXiv:2103.15997 [pdf, other]

doi 10.1109/EMBC46164.2021.9629914

Assessing YOLACT++ for real time and robust instance segmentation of medical instruments in endoscopic procedures

Authors: Juan Carlos Angeles Ceron, Leonardo Chang, Gilberto Ochoa-Ruiz, Sharib Ali

Abstract: Image-based tracking of laparoscopic instruments plays a fundamental role in computer and robotic-assisted surgeries by aiding surgeons and increasing patient safety. Computer vision contests, such as the Robust Medical Instrument Segmentation (ROBUST-MIS) Challenge, seek to encourage the development of robust models for such purposes, providing large, diverse, and annotated datasets. To date, mos… ▽ More Image-based tracking of laparoscopic instruments plays a fundamental role in computer and robotic-assisted surgeries by aiding surgeons and increasing patient safety. Computer vision contests, such as the Robust Medical Instrument Segmentation (ROBUST-MIS) Challenge, seek to encourage the development of robust models for such purposes, providing large, diverse, and annotated datasets. To date, most of the existing models for instance segmentation of medical instruments were based on two-stage detectors, which provide robust results but are nowhere near to the real-time (5 frames-per-second (fps)at most). However, in order for the method to be clinically applicable, real-time capability is utmost required along with high accuracy. In this paper, we propose the addition of attention mechanisms to the YOLACT architecture that allows real-time instance segmentation of instrument with improved accuracy on the ROBUST-MIS dataset. Our proposed approach achieves competitive performance compared to the winner ofthe 2019 ROBUST-MIS challenge in terms of robustness scores,obtaining 0.313 MI_DSC and 0.338 MI_NSD, while achieving real-time performance (37 fps) △ Less

Submitted 28 April, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

Comments: Preprint under review for EMBC 2021 following IEEE guidelines

arXiv:2101.11745 [pdf, other]

doi 10.1007/s00521-021-06691-3

FIRe-GAN: A novel Deep Learning-based infrared-visible fusion method for wildfire imagery

Authors: J. F. Ciprián-Sánchez, G. Ochoa-Ruiz, M. Gonzalez-Mendoza, L. Rossi

Abstract: Early wildfire detection is of paramount importance to avoid as much damage as possible to the environment, properties, and lives. Deep Learning (DL) models that can leverage both visible and infrared information have the potential to display state-of-the-art performance, with lower false-positive rates than existing techniques. However, most DL-based image fusion methods have not been evaluated i… ▽ More Early wildfire detection is of paramount importance to avoid as much damage as possible to the environment, properties, and lives. Deep Learning (DL) models that can leverage both visible and infrared information have the potential to display state-of-the-art performance, with lower false-positive rates than existing techniques. However, most DL-based image fusion methods have not been evaluated in the domain of fire imagery. Additionally, to the best of our knowledge, no publicly available dataset contains visible-infrared fused fire images. There is a growing interest in DL-based image fusion techniques due to their reduced complexity. Due to the latter, we select three state-of-the-art, DL-based image fusion techniques and evaluate them for the specific task of fire image fusion. We compare the performance of these methods on selected metrics. Finally, we also present an extension to one of the said methods, that we called FIRe-GAN, that improves the generation of artificial infrared images and fused ones on selected metrics. △ Less

Submitted 22 February, 2021; v1 submitted 27 January, 2021; originally announced January 2021.

Comments: 16 pages, 10 figures. Submitted to the Special Issue (SI) in the Neural Computing and Applications Journal

arXiv:1909.08991 [pdf]

doi 10.1007/978-3-030-33749-0_1

Road Damage Detection Acquisition System based on Deep Neural Networks for Physical Asset Management

Authors: A. A. Angulo, J. A. Vega-Fernández, L. M. Aguilar-Lobo, S. Natraj, G Ochoa-Ruiz

Abstract: Research on damage detection of road surfaces has been an active area of re-search, but most studies have focused so far on the detection of the presence of damages. However, in real-world scenarios, road managers need to clearly understand the type of damage and its extent in order to take effective action in advance or to allocate the necessary resources. Moreover, currently there are few unifor… ▽ More Research on damage detection of road surfaces has been an active area of re-search, but most studies have focused so far on the detection of the presence of damages. However, in real-world scenarios, road managers need to clearly understand the type of damage and its extent in order to take effective action in advance or to allocate the necessary resources. Moreover, currently there are few uniform and openly available road damage datasets, leading to a lack of a common benchmark for road damage detection. Such dataset could be used in a great variety of applications; herein, it is intended to serve as the acquisition component of a physical asset management tool which can aid governments agencies for planning purposes, or by infrastructure mainte-nance companies. In this paper, we make two contributions to address these issues. First, we present a large-scale road damage dataset, which includes a more balanced and representative set of damages. This dataset is composed of 18,034 road damage images captured with a smartphone, with 45,435 in-stances road surface damages. Second, we trained different types of object detection methods, both traditional (an LBP-cascaded classifier) and deep learning-based, specifically, MobileNet and RetinaNet, which are amenable for embedded and mobile and implementations with an acceptable perfor-mance for many applications. We compare the accuracy and inference time of all these models with others in the state of the art. △ Less

Submitted 19 September, 2019; originally announced September 2019.

Showing 1–33 of 33 results for author: Ochoa-Ruiz, G