Search | arXiv e-print repository

Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models

Authors: Cristina N. Vasconcelos, Abdullah Rashwan, Austin Waters, Trevor Walker, Keyang Xu, Jimmy Yan, Rui Qian, Shixin Luo, Zarana Parekh, Andrew Bunner, Hongliang Fei, Roopal Garg, Mandy Guo, Ivana Kajic, Yeqing Li, Henna Nandwani, Jordi Pont-Tuset, Yasumasa Onoe, Sarah Rosston, Su Wang, Wenlei Zhou, Kevin Swersky, David J. Fleet, Jason M. Baldridge, Oliver Wang

Abstract: We address the long-standing problem of how to learn effective pixel-based image diffusion models at scale, introducing a remarkably simple greedy growing method for stable training of large-scale, high-resolution models. without the needs for cascaded super-resolution components. The key insight stems from careful pre-training of core components, namely, those responsible for text-to-image alignm… ▽ More We address the long-standing problem of how to learn effective pixel-based image diffusion models at scale, introducing a remarkably simple greedy growing method for stable training of large-scale, high-resolution models. without the needs for cascaded super-resolution components. The key insight stems from careful pre-training of core components, namely, those responsible for text-to-image alignment {\it vs.} high-resolution rendering. We first demonstrate the benefits of scaling a {\it Shallow UNet}, with no down(up)-sampling enc(dec)oder. Scaling its deep core layers is shown to improve alignment, object structure, and composition. Building on this core model, we propose a greedy algorithm that grows the architecture into high-resolution end-to-end models, while preserving the integrity of the pre-trained representation, stabilizing training, and reducing the need for large high-resolution datasets. This enables a single stage model capable of generating high-resolution images without the need of a super-resolution cascade. Our key results rely on public datasets and show that we are able to train non-cascaded models up to 8B parameters with no further regularization schemes. Vermeer, our full pipeline model trained with internal datasets to produce 1024x1024 images, without cascades, is preferred by 44.0% vs. 21.4% human evaluators over SDXL. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2404.02738 [pdf, other]

Adaptive Affinity-Based Generalization For MRI Imaging Segmentation Across Resource-Limited Settings

Authors: Eddardaa B. Loussaief, Mohammed Ayad, Domenc Puig, Hatem A. Rashwan

Abstract: The joint utilization of diverse data sources for medical imaging segmentation has emerged as a crucial area of research, aiming to address challenges such as data heterogeneity, domain shift, and data quality discrepancies. Integrating information from multiple data domains has shown promise in improving model generalizability and adaptability. However, this approach often demands substantial com… ▽ More The joint utilization of diverse data sources for medical imaging segmentation has emerged as a crucial area of research, aiming to address challenges such as data heterogeneity, domain shift, and data quality discrepancies. Integrating information from multiple data domains has shown promise in improving model generalizability and adaptability. However, this approach often demands substantial computational resources, hindering its practicality. In response, knowledge distillation (KD) has garnered attention as a solution. KD involves training light-weight models to emulate the behavior of more resource-intensive models, thereby mitigating the computational burden while maintaining performance. This paper addresses the pressing need to develop a lightweight and generalizable model for medical imaging segmentation that can effectively handle data integration challenges. Our proposed approach introduces a novel relation-based knowledge framework by seamlessly combining adaptive affinity-based and kernel-based distillation through a gram matrix that can capture the style representation across features. This methodology empowers the student model to accurately replicate the feature representations of the teacher model, facilitating robust performance even in the face of domain shift and data heterogeneity. To validate our innovative approach, we conducted experiments on publicly available multi-source prostate MRI data. The results demonstrate a significant enhancement in segmentation performance using lightweight networks. Notably, our method achieves this improvement while reducing both inference time and storage usage, rendering it a practical and efficient solution for real-time medical imaging segmentation. △ Less

Submitted 3 April, 2024; originally announced April 2024.

arXiv:2312.06052 [pdf, other]

MaskConver: Revisiting Pure Convolution Model for Panoptic Segmentation

Authors: Abdullah Rashwan, Jiageng Zhang, Ali Taalimi, Fan Yang, Xingyi Zhou, Chaochao Yan, Liang-Chieh Chen, Yeqing Li

Abstract: In recent years, transformer-based models have dominated panoptic segmentation, thanks to their strong modeling capabilities and their unified representation for both semantic and instance classes as global binary masks. In this paper, we revisit pure convolution model and propose a novel panoptic architecture named MaskConver. MaskConver proposes to fully unify things and stuff representation by… ▽ More In recent years, transformer-based models have dominated panoptic segmentation, thanks to their strong modeling capabilities and their unified representation for both semantic and instance classes as global binary masks. In this paper, we revisit pure convolution model and propose a novel panoptic architecture named MaskConver. MaskConver proposes to fully unify things and stuff representation by predicting their centers. To that extent, it creates a lightweight class embedding module that can break the ties when multiple centers co-exist in the same location. Furthermore, our study shows that the decoder design is critical in ensuring that the model has sufficient context for accurate detection and segmentation. We introduce a powerful ConvNeXt-UNet decoder that closes the performance gap between convolution- and transformerbased models. With ResNet50 backbone, our MaskConver achieves 53.6% PQ on the COCO panoptic val set, outperforming the modern convolution-based model, Panoptic FCN, by 9.3% as well as transformer-based models such as Mask2Former (+1.7% PQ) and kMaX-DeepLab (+0.6% PQ). Additionally, MaskConver with a MobileNet backbone reaches 37.2% PQ, improving over Panoptic-DeepLab by +6.4% under the same FLOPs/latency constraints. A further optimized version of MaskConver achieves 29.7% PQ, while running in real-time on mobile devices. The code and model weights will be publicly available △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: 11 pages, 5 figures

arXiv:2309.16139 [pdf, other]

Two-Step Active Learning for Instance Segmentation with Uncertainty and Diversity Sampling

Authors: Ke Yu, Stephen Albro, Giulia DeSalvo, Suraj Kothawade, Abdullah Rashwan, Sasan Tavakkol, Kayhan Batmanghelich, Xiaoqi Yin

Abstract: Training high-quality instance segmentation models requires an abundance of labeled images with instance masks and classifications, which is often expensive to procure. Active learning addresses this challenge by striving for optimum performance with minimal labeling cost by selecting the most informative and representative images for labeling. Despite its potential, active learning has been less… ▽ More Training high-quality instance segmentation models requires an abundance of labeled images with instance masks and classifications, which is often expensive to procure. Active learning addresses this challenge by striving for optimum performance with minimal labeling cost by selecting the most informative and representative images for labeling. Despite its potential, active learning has been less explored in instance segmentation compared to other tasks like image classification, which require less labeling. In this study, we propose a post-hoc active learning algorithm that integrates uncertainty-based sampling with diversity-based sampling. Our proposed algorithm is not only simple and easy to implement, but it also delivers superior performance on various datasets. Its practical application is demonstrated on a real-world overhead imagery dataset, where it increases the labeling efficiency fivefold. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: UNCV ICCV 2023

arXiv:2306.01736 [pdf, other]

DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model

Authors: Xiuye Gu, Yin Cui, Jonathan Huang, Abdullah Rashwan, Xuan Yang, Xingyi Zhou, Golnaz Ghiasi, Weicheng Kuo, Huizhong Chen, Liang-Chieh Chen, David A Ross

Abstract: Observing the close relationship among panoptic, semantic and instance segmentation tasks, we propose to train a universal multi-dataset multi-task segmentation model: DaTaSeg.We use a shared representation (mask proposals with class predictions) for all tasks. To tackle task discrepancy, we adopt different merge operations and post-processing for different tasks. We also leverage weak-supervision… ▽ More Observing the close relationship among panoptic, semantic and instance segmentation tasks, we propose to train a universal multi-dataset multi-task segmentation model: DaTaSeg.We use a shared representation (mask proposals with class predictions) for all tasks. To tackle task discrepancy, we adopt different merge operations and post-processing for different tasks. We also leverage weak-supervision, allowing our segmentation model to benefit from cheaper bounding box annotations. To share knowledge across datasets, we use text embeddings from the same semantic embedding space as classifiers and share all network parameters among datasets. We train DaTaSeg on ADE semantic, COCO panoptic, and Objects365 detection datasets. DaTaSeg improves performance on all datasets, especially small-scale datasets, achieving 54.0 mIoU on ADE semantic and 53.5 PQ on COCO panoptic. DaTaSeg also enables weakly-supervised knowledge transfer on ADE panoptic and Objects365 instance segmentation. Experiments show DaTaSeg scales with the number of training datasets and enables open-vocabulary segmentation through direct transfer. In addition, we annotate an Objects365 instance segmentation set of 1,000 images and will release it as a public benchmark. △ Less

Submitted 2 June, 2023; originally announced June 2023.

arXiv:2211.07521 [pdf, other]

PKCAM: Previous Knowledge Channel Attention Module

Authors: Eslam Mohamed Bakr, Ahmad El Sallab, Mohsen A. Rashwan

Abstract: Recently, attention mechanisms have been explored with ConvNets, both across the spatial and channel dimensions. However, from our knowledge, all the existing methods devote the attention modules to capture local interactions from a uni-scale. In this paper, we propose a Previous Knowledge Channel Attention Module(PKCAM), that captures channel-wise relations across different layers to model the gl… ▽ More Recently, attention mechanisms have been explored with ConvNets, both across the spatial and channel dimensions. However, from our knowledge, all the existing methods devote the attention modules to capture local interactions from a uni-scale. In this paper, we propose a Previous Knowledge Channel Attention Module(PKCAM), that captures channel-wise relations across different layers to model the global context. Our proposed module PKCAM is easily integrated into any feed-forward CNN architectures and trained in an end-to-end fashion with a negligible footprint due to its lightweight property. We validate our novel architecture through extensive experiments on image classification and object detection tasks with different backbones. Our experiments show consistent improvements in performances against their counterparts. Our code is published at https://github.com/eslambakr/EMCA. △ Less

Submitted 25 November, 2022; v1 submitted 14 November, 2022; originally announced November 2022.

arXiv:2112.06782 [pdf, other]

GCNDepth: Self-supervised Monocular Depth Estimation based on Graph Convolutional Network

Authors: Armin Masoumian, Hatem A. Rashwan, Saddam Abdulwahab, Julian Cristiano, Domenec Puig

Abstract: Depth estimation is a challenging task of 3D reconstruction to enhance the accuracy sensing of environment awareness. This work brings a new solution with a set of improvements, which increase the quantitative and qualitative understanding of depth maps compared to existing methods. Recently, a convolutional neural network (CNN) has demonstrated its extraordinary ability in estimating depth maps f… ▽ More Depth estimation is a challenging task of 3D reconstruction to enhance the accuracy sensing of environment awareness. This work brings a new solution with a set of improvements, which increase the quantitative and qualitative understanding of depth maps compared to existing methods. Recently, a convolutional neural network (CNN) has demonstrated its extraordinary ability in estimating depth maps from monocular videos. However, traditional CNN does not support topological structure and they can work only on regular image regions with determined size and weights. On the other hand, graph convolutional networks (GCN) can handle the convolution on non-Euclidean data and it can be applied to irregular image regions within a topological structure. Therefore, in this work in order to preserve object geometric appearances and distributions, we aim at exploiting GCN for a self-supervised depth estimation model. Our model consists of two parallel auto-encoder networks: the first is an auto-encoder that will depend on ResNet-50 and extract the feature from the input image and on multi-scale GCN to estimate the depth map. In turn, the second network will be used to estimate the ego-motion vector (i.e., 3D pose) between two consecutive frames based on ResNet-18. Both the estimated 3D pose and depth map will be used for constructing a target image. A combination of loss functions related to photometric, projection, and smoothness is used to cope with bad depth prediction and preserve the discontinuities of the objects. In particular, our method provided comparable and promising results with a high prediction accuracy of 89% on the publicly KITTI and Make3D datasets along with a reduction of 40% in the number of trainable parameters compared to the state of the art solutions. The source code is publicly available at https://github.com/ArminMasoumian/GCNDepth.git △ Less

Submitted 13 December, 2021; originally announced December 2021.

Comments: 10 pages, Submitted to IEEE transactions on intelligent transportation systems

arXiv:2111.05309 [pdf]

doi 10.1109/ICMRE49073.2020.9065161

Designing and Analyzing the PID and Fuzzy Control System for an Inverted Pendulum

Authors: Armin Masoumian, Pezhman kazemi, Mohammad Chehreghani Montazer, Hatem A. Rashwan, Domenec Puig Valls

Abstract: The inverted pendulum is a non-linear unbalanced system that needs to be controlled using motors to achieve stability and equilibrium. The inverted pendulum is constructed with Lego and using the Lego Mindstorm NXT, which is a programmable robot capable of completing many different functions. In this paper, an initial design of the inverted pendulum is proposed and the performance of different sen… ▽ More The inverted pendulum is a non-linear unbalanced system that needs to be controlled using motors to achieve stability and equilibrium. The inverted pendulum is constructed with Lego and using the Lego Mindstorm NXT, which is a programmable robot capable of completing many different functions. In this paper, an initial design of the inverted pendulum is proposed and the performance of different sensors, which are compatible with the Lego Mindstorm NXT was studied. Furthermore, the ability of computer vision to achieve the stability required to maintain the system is also investigated. The inverted pendulum is a conventional cart that can be controlled using a Fuzzy Logic controller that produces a self-tuning PID control for the cart to move on. The fuzzy logic and PID are simulated in MATLAB and Simulink, and the program for the robot is developed in the LabVIEW software. △ Less

Submitted 9 November, 2021; originally announced November 2021.

Comments: 5 pages, Accepted for The 6th International Conference on Mechatronics and Robotics Engineering (ICMRE 2020)

arXiv:2111.05308 [pdf]

doi 10.1109/ICMRE49073.2020.9065017

Using The Feedback of Dynamic Active-Pixel Vision Sensor (Davis) to Prevent Slip in Real Time

Authors: Armin Masoumian, Pezhman kazemi, Mohammad Chehreghani Montazer, Hatem A. Rashwan, Domenec Puig Valls

Abstract: The objective of this paper is to describe an approach to detect the slip and contact force in real-time feedback. In this novel approach, the DAVIS camera is used as a vision tactile sensor due to its fast process speed and high resolution. Two hundred experiments were performed on four objects with different shapes, sizes, weights, and materials to compare the accuracy and response of the Baxter… ▽ More The objective of this paper is to describe an approach to detect the slip and contact force in real-time feedback. In this novel approach, the DAVIS camera is used as a vision tactile sensor due to its fast process speed and high resolution. Two hundred experiments were performed on four objects with different shapes, sizes, weights, and materials to compare the accuracy and response of the Baxter robot grippers to avoid slip**. The advanced approach is validated by using a force-sensitive resistor (FSR402). The events captured with the DAVIS camera are processed with specific algorithms to provide feedback to the Baxter robot aiding it to detect the slip. △ Less

Submitted 9 November, 2021; originally announced November 2021.

Comments: 5 pages, Accepted for The 6th International Conference on Mechatronics and Robotics Engineering (ICMRE 2020)

arXiv:2111.01715 [pdf]

doi 10.3233/FAIA210151

Absolute distance prediction based on deep learning object detection and monocular depth estimation models

Authors: Armin Masoumian, David G. F. Marei, Saddam Abdulwahab, Julian Cristiano, Domenec Puig, Hatem A. Rashwan

Abstract: Determining the distance between the objects in a scene and the camera sensor from 2D images is feasible by estimating depth images using stereo cameras or 3D cameras. The outcome of depth estimation is relative distances that can be used to calculate absolute distances to be applicable in reality. However, distance estimation is very challenging using 2D monocular cameras. This paper presents a d… ▽ More Determining the distance between the objects in a scene and the camera sensor from 2D images is feasible by estimating depth images using stereo cameras or 3D cameras. The outcome of depth estimation is relative distances that can be used to calculate absolute distances to be applicable in reality. However, distance estimation is very challenging using 2D monocular cameras. This paper presents a deep learning framework that consists of two deep networks for depth estimation and object detection using a single image. Firstly, objects in the scene are detected and localized using the You Only Look Once (YOLOv5) network. In parallel, the estimated depth image is computed using a deep autoencoder network to detect the relative distances. The proposed object detection based YOLO was trained using a supervised learning technique, in turn, the network of depth estimation was self-supervised training. The presented distance estimation framework was evaluated on real images of outdoor scenes. The achieved results show that the proposed framework is promising and it yields an accuracy of 96% with RMSE of 0.203 of the correct absolute distance. △ Less

Submitted 2 November, 2021; originally announced November 2021.

Comments: 10 pages, Submitted to 23rd International Conference of the Catalan Association for Artificial Intelligence (CCIA 2021)

arXiv:2103.12270 [pdf, other]

Dilated SpineNet for Semantic Segmentation

Authors: Abdullah Rashwan, Xianzhi Du, Xiaoqi Yin, **g Li

Abstract: Scale-permuted networks have shown promising results on object bounding box detection and instance segmentation. Scale permutation and cross-scale fusion of features enable the network to capture multi-scale semantics while preserving spatial resolution. In this work, we evaluate this meta-architecture design on semantic segmentation - another vision task that benefits from high spatial resolution… ▽ More Scale-permuted networks have shown promising results on object bounding box detection and instance segmentation. Scale permutation and cross-scale fusion of features enable the network to capture multi-scale semantics while preserving spatial resolution. In this work, we evaluate this meta-architecture design on semantic segmentation - another vision task that benefits from high spatial resolution and multi-scale feature fusion at different network stages. By further leveraging dilated convolution operations, we propose SpineNet-Seg, a network discovered by NAS that is searched from the DeepLabv3 system. SpineNet-Seg is designed with a better scale-permuted network topology with customized dilation ratios per block on a semantic segmentation task. SpineNet-Seg models outperform the DeepLabv3/v3+ baselines at all model scales on multiple popular benchmarks in speed and accuracy. In particular, our SpineNet-S143+ model achieves the new state-of-the-art on the popular Cityscapes benchmark at 83.04% mIoU and attained strong performance on the PASCAL VOC2012 benchmark at 85.56% mIoU. SpineNet-Seg models also show promising results on a challenging Street View segmentation dataset. Code and checkpoints will be open-sourced. △ Less

Submitted 22 March, 2021; originally announced March 2021.

Comments: 8 pages

arXiv:2002.10631 [pdf, other]

Batch norm with entropic regularization turns deterministic autoencoders into generative models

Authors: Amur Ghose, Abdullah Rashwan, Pascal Poupart

Abstract: The variational autoencoder is a well defined deep generative model that utilizes an encoder-decoder framework where an encoding neural network outputs a non-deterministic code for reconstructing an input. The encoder achieves this by sampling from a distribution for every input, instead of outputting a deterministic code per input. The great advantage of this process is that it allows the use of… ▽ More The variational autoencoder is a well defined deep generative model that utilizes an encoder-decoder framework where an encoding neural network outputs a non-deterministic code for reconstructing an input. The encoder achieves this by sampling from a distribution for every input, instead of outputting a deterministic code per input. The great advantage of this process is that it allows the use of the network as a generative model for sampling from the data distribution beyond provided samples for training. We show in this work that utilizing batch normalization as a source for non-determinism suffices to turn deterministic autoencoders into generative models on par with variational ones, so long as we add a suitable entropic regularization to the training objective. △ Less

Submitted 21 September, 2021; v1 submitted 24 February, 2020; originally announced February 2020.

Journal ref: Published in the Proceedings of the International Conference on Uncertainty in Artificial Intelligence (UAI), 2020

arXiv:2001.03194 [pdf, other]

MatrixNets: A New Scale and Aspect Ratio Aware Architecture for Object Detection

Authors: Abdullah Rashwan, Rishav Agarwal, Agastya Kalra, Pascal Poupart

Abstract: We present MatrixNets (xNets), a new deep architecture for object detection. xNets map objects with similar sizes and aspect ratios into many specialized layers, allowing xNets to provide a scale and aspect ratio aware architecture. We leverage xNets to enhance single-stage object detection frameworks. First, we apply xNets on anchor-based object detection, for which we predict object centers and… ▽ More We present MatrixNets (xNets), a new deep architecture for object detection. xNets map objects with similar sizes and aspect ratios into many specialized layers, allowing xNets to provide a scale and aspect ratio aware architecture. We leverage xNets to enhance single-stage object detection frameworks. First, we apply xNets on anchor-based object detection, for which we predict object centers and regress the top-left and bottom-right corners. Second, we use MatrixNets for corner-based object detection by predicting top-left and bottom-right corners. Each corner predicts the center location of the object. We also enhance corner-based detection by replacing the embedding layer with center regression. Our final architecture achieves mAP of 47.8 on MS COCO, which is higher than its CornerNet counterpart by +5.6 mAP while also closing the gap between single-stage and two-stage detectors. The code is available at https://github.com/arashwan/matrixnet. △ Less

Submitted 9 January, 2020; originally announced January 2020.

Comments: This is the full paper for arXiv:1908.04646 with more applications, experiments, and ablation study

arXiv:1908.04646 [pdf, other]

Matrix Nets: A New Deep Architecture for Object Detection

Authors: Abdullah Rashwan, Agastya Kalra, Pascal Poupart

Abstract: We present Matrix Nets (xNets), a new deep architecture for object detection. xNets map objects with different sizes and aspect ratios into layers where the sizes and the aspect ratios of the objects within their layers are nearly uniform. Hence, xNets provide a scale and aspect ratio aware architecture. We leverage xNets to enhance key-points based object detection. Our architecture achieves mAP… ▽ More We present Matrix Nets (xNets), a new deep architecture for object detection. xNets map objects with different sizes and aspect ratios into layers where the sizes and the aspect ratios of the objects within their layers are nearly uniform. Hence, xNets provide a scale and aspect ratio aware architecture. We leverage xNets to enhance key-points based object detection. Our architecture achieves mAP of 47.8 on MS COCO, which is higher than any other single-shot detector while using half the number of parameters and training 3x faster than the next best architecture. △ Less

Submitted 14 August, 2019; v1 submitted 13 August, 2019; originally announced August 2019.

Comments: Short paper, stay tuned for the full paper!

arXiv:1907.02742 [pdf, other]

Adversarial Learning with Multiscale Features and Kernel Factorization for Retinal Blood Vessel Segmentation

Authors: Farhan Akram, Vivek Kumar Singh, Hatem A. Rashwan, Mohamed Abdel-Nasser, Md. Mostafa Kamal Sarker, Nidhi Pandey, Domenec Puig

Abstract: In this paper, we propose an efficient blood vessel segmentation method for the eye fundus images using adversarial learning with multiscale features and kernel factorization. In the generator network of the adversarial framework, spatial pyramid pooling, kernel factorization and squeeze excitation block are employed to enhance the feature representation in spatial domain on different scales with… ▽ More In this paper, we propose an efficient blood vessel segmentation method for the eye fundus images using adversarial learning with multiscale features and kernel factorization. In the generator network of the adversarial framework, spatial pyramid pooling, kernel factorization and squeeze excitation block are employed to enhance the feature representation in spatial domain on different scales with reduced computational complexity. In turn, the discriminator network of the adversarial framework is formulated by combining convolutional layers with an additional squeeze excitation block to differentiate the generated segmentation mask from its respective ground truth. Before feeding the images to the network, we pre-processed them by using edge sharpening and Gaussian regularization to reach an optimized solution for vessel segmentation. The output of the trained model is post-processed using morphological operations to remove the small speckles of noise. The proposed method qualitatively and quantitatively outperforms state-of-the-art vessel segmentation methods using DRIVE and STARE datasets. △ Less

Submitted 5 July, 2019; originally announced July 2019.

Comments: 9 pages, 4 figures

arXiv:1907.00887 [pdf, other]

An Efficient Solution for Breast Tumor Segmentation and Classification in Ultrasound Images Using Deep Adversarial Learning

Authors: Vivek Kumar Singh, Hatem A. Rashwan, Mohamed Abdel-Nasser, Md. Mostafa Kamal Sarker, Farhan Akram, Nidhi Pandey, Santiago Romani, Domenec Puig

Abstract: This paper proposes an efficient solution for tumor segmentation and classification in breast ultrasound (BUS) images. We propose to add an atrous convolution layer to the conditional generative adversarial network (cGAN) segmentation model to learn tumor features at different resolutions of BUS images. To automatically re-balance the relative impact of each of the highest level encoded features,… ▽ More This paper proposes an efficient solution for tumor segmentation and classification in breast ultrasound (BUS) images. We propose to add an atrous convolution layer to the conditional generative adversarial network (cGAN) segmentation model to learn tumor features at different resolutions of BUS images. To automatically re-balance the relative impact of each of the highest level encoded features, we also propose to add a channel-wise weighting block in the network. In addition, the SSIM and L1-norm loss with the typical adversarial loss are used as a loss function to train the model. Our model outperforms the state-of-the-art segmentation models in terms of the Dice and IoU metrics, achieving top scores of 93.76% and 88.82%, respectively. In the classification stage, we show that few statistics features extracted from the shape of the boundaries of the predicted masks can properly discriminate between benign and malignant tumors with an accuracy of 85%$ △ Less

Submitted 1 July, 2019; originally announced July 2019.

Comments: 9 pages

arXiv:1907.00856 [pdf, other]

SLSNet: Skin lesion segmentation using a lightweight generative adversarial network

Authors: Md. Mostafa Kamal Sarker, Hatem A. Rashwan, Farhan Akram, Vivek Kumar Singh, Syeda Furruka Banu, Forhad U H Chowdhury, Kabir Ahmed Choudhury, Sylvie Chambon, Petia Radeva, Domenec Puig, Mohamed Abdel-Nasser

Abstract: The determination of precise skin lesion boundaries in dermoscopic images using automated methods faces many challenges, most importantly, the presence of hair, inconspicuous lesion edges and low contrast in dermoscopic images, and variability in the color, texture and shapes of skin lesions. Existing deep learning-based skin lesion segmentation algorithms are expensive in terms of computational t… ▽ More The determination of precise skin lesion boundaries in dermoscopic images using automated methods faces many challenges, most importantly, the presence of hair, inconspicuous lesion edges and low contrast in dermoscopic images, and variability in the color, texture and shapes of skin lesions. Existing deep learning-based skin lesion segmentation algorithms are expensive in terms of computational time and memory. Consequently, running such segmentation algorithms requires a powerful GPU and high bandwidth memory, which are not available in dermoscopy devices. Thus, this article aims to achieve precise skin lesion segmentation with minimum resources: a lightweight, efficient generative adversarial network (GAN) model called SLSNet, which combines 1-D kernel factorized networks, position and channel attention, and multiscale aggregation mechanisms with a GAN model. The 1-D kernel factorized network reduces the computational cost of 2D filtering. The position and channel attention modules enhance the discriminative ability between the lesion and non-lesion feature representations in spatial and channel dimensions, respectively. A multiscale block is also used to aggregate the coarse-to-fine features of input skin images and reduce the effect of the artifacts. SLSNet is evaluated on two publicly available datasets: ISBI 2017 and the ISIC 2018. Although SLSNet has only 2.35 million parameters, the experimental results demonstrate that it achieves segmentation results on a par with the state-of-the-art skin lesion segmentation methods with an accuracy of 97.61%, and Dice and Jaccard similarity coefficients of 90.63% and 81.98%, respectively. SLSNet can run at more than 110 frames per second (FPS) in a single GTX1080Ti GPU, which is faster than well-known deep learning-based image segmentation models, such as FCN. Therefore, SLSNet can be used for practical dermoscopic applications. △ Less

Submitted 17 June, 2021; v1 submitted 1 July, 2019; originally announced July 2019.

Comments: Accepted in Expert Systems with Applications

arXiv:1809.06663 [pdf, other]

Support Vector Machine (SVM) Recognition Approach adapted to Individual and Touching Moths Counting in Trap Images

Authors: Mohamed Chafik Bakkay, Sylvie Chambon, Hatem A. Rashwan, Christian Lubat, Sébastien Barsotti

Abstract: This paper aims at develo** an automatic algorithm for moth recognition from trap images in real-world conditions. This method uses our previous work for detection [1] and introduces an adapted classification step. More precisely, SVM classifier is trained with a multi-scale descriptor, Histogram Of Curviness Saliency (HCS). This descriptor is robust to illumination changes and is able to detect… ▽ More This paper aims at develo** an automatic algorithm for moth recognition from trap images in real-world conditions. This method uses our previous work for detection [1] and introduces an adapted classification step. More precisely, SVM classifier is trained with a multi-scale descriptor, Histogram Of Curviness Saliency (HCS). This descriptor is robust to illumination changes and is able to detect and to describe the external and the internal contours of the target insect in multi-scale. The proposed classification method can be trained with a small set of images. Quantitative evaluations show that the proposed method is able to classify insects with higher accuracy (rate of 95.8%) than the state-of-the art approaches. △ Less

Submitted 18 September, 2018; originally announced September 2018.

arXiv:1809.01687 [pdf, other]

Breast Tumor Segmentation and Shape Classification in Mammograms using Generative Adversarial and Convolutional Neural Network

Authors: Vivek Kumar Singh, Hatem A. Rashwan, Santiago Romani, Farhan Akram, Nidhi Pandey, Md. Mostafa Kamal Sarker, Adel Saleh, Meritexell Arenas, Miguel Arquez, Domenec Puig, Jordina Torrents-Barrena

Abstract: Mammogram inspection in search of breast tumors is a tough assignment that radiologists must carry out frequently. Therefore, image analysis methods are needed for the detection and delineation of breast masses, which portray crucial morphological information that will support reliable diagnosis. In this paper, we proposed a conditional Generative Adversarial Network (cGAN) devised to segment a br… ▽ More Mammogram inspection in search of breast tumors is a tough assignment that radiologists must carry out frequently. Therefore, image analysis methods are needed for the detection and delineation of breast masses, which portray crucial morphological information that will support reliable diagnosis. In this paper, we proposed a conditional Generative Adversarial Network (cGAN) devised to segment a breast mass within a region of interest (ROI) in a mammogram. The generative network learns to recognize the breast mass area and to create the binary mask that outlines the breast mass. In turn, the adversarial network learns to distinguish between real (ground truth) and synthetic segmentations, thus enforcing the generative network to create binary masks as realistic as possible. The cGAN works well even when the number of training samples are limited. Therefore, the proposed method outperforms several state-of-the-art approaches. This hypothesis is corroborated by diverse experiments performed on two datasets, the public INbreast and a private in-house dataset. The proposed segmentation model provides a high Dice coefficient and Intersection over Union (IoU) of 94% and 87%, respectively. In addition, a shape descriptor based on a Convolutional Neural Network (CNN) is proposed to classify the generated masks into four mass shapes: irregular, lobular, oval and round. The proposed shape descriptor was trained on Digital Database for Screening Mammography (DDSM) yielding an overall accuracy of 80%, which outperforms the current state-of-the-art. △ Less

Submitted 23 October, 2018; v1 submitted 5 September, 2018; originally announced September 2018.

Comments: 33 pages, Submitted to Expert Systems with Applications

arXiv:1808.09829 [pdf, other]

MACNet: Multi-scale Atrous Convolution Networks for Food Places Classification in Egocentric Photo-streams

Authors: Md. Mostafa Kamal Sarker, Hatem A. Rashwan, Estefania Talavera, Syeda Furruka Banu, Petia Radeva, Domenec Puig

Abstract: First-person (wearable) camera continually captures unscripted interactions of the camera user with objects, people, and scenes reflecting his personal and relational tendencies. One of the preferences of people is their interaction with food events. The regulation of food intake and its duration has a great importance to protect against diseases. Consequently, this work aims to develop a smart mo… ▽ More First-person (wearable) camera continually captures unscripted interactions of the camera user with objects, people, and scenes reflecting his personal and relational tendencies. One of the preferences of people is their interaction with food events. The regulation of food intake and its duration has a great importance to protect against diseases. Consequently, this work aims to develop a smart model that is able to determine the recurrences of a person on food places during a day. This model is based on a deep end-to-end model for automatic food places recognition by analyzing egocentric photo-streams. In this paper, we apply multi-scale Atrous convolution networks to extract the key features related to food places of the input images. The proposed model is evaluated on an in-house private dataset called "EgoFoodPlaces". Experimental results shows promising results of food places classification recognition in egocentric photo-streams. △ Less

Submitted 29 August, 2018; originally announced August 2018.

Comments: 10 pages, accepted in ECCV at EPIC 2018

arXiv:1807.11433 [pdf, other]

REFUGE CHALLENGE 2018-Task 2:Deep Optic Disc and Cup Segmentation in Fundus Images Using U-Net and Multi-scale Feature Matching Networks

Authors: Vivek Kumar Singh, Hatem A. Rashwan, Adel Saleh, Farhan Akram, Md Mostafa Kamal Sarker, Nidhi Pandey, Saddam Abdulwahab

Abstract: In this paper, an optic disc and cup segmentation method is proposed using U-Net followed by a multi-scale feature matching network. The proposed method targets task 2 of the REFUGE challenge 2018. In order to solve the segmentation problem of task 2, we firstly crop the input image using single shot multibox detector (SSD). The cropped image is then passed to an encoder-decoder network with skip… ▽ More In this paper, an optic disc and cup segmentation method is proposed using U-Net followed by a multi-scale feature matching network. The proposed method targets task 2 of the REFUGE challenge 2018. In order to solve the segmentation problem of task 2, we firstly crop the input image using single shot multibox detector (SSD). The cropped image is then passed to an encoder-decoder network with skip connections also known as generator. Afterwards, both the ground truth and generated images are fed to a convolution neural network (CNN) to extract their multi-level features. A dice loss function is then used to match the features of the two images by minimizing the error at each layer. The aggregation of error from each layer is back-propagated through the generator network to enforce it to generate a segmented image closer to the ground truth. The CNN network improves the performance of the generator network without increasing the complexity of the model. △ Less

Submitted 30 July, 2018; originally announced July 2018.

Comments: EYE REFUGE CHALLENGE 2018, submitted 7 Pages

arXiv:1805.12081 [pdf, other]

CuisineNet: Food Attributes Classification using Multi-scale Convolution Network

Authors: Md. Mostafa Kamal Sarker, Mohammed Jabreel, Hatem A. Rashwan, Syeda Furruka Banu, Antonio Moreno, Petia Radeva, Domenec Puig

Abstract: Diversity of food and its attributes represents the culinary habits of peoples from different countries. Thus, this paper addresses the problem of identifying food culture of people around the world and its flavor by classifying two main food attributes, cuisine and flavor. A deep learning model based on multi-scale convotuional networks is proposed for extracting more accurate features from input… ▽ More Diversity of food and its attributes represents the culinary habits of peoples from different countries. Thus, this paper addresses the problem of identifying food culture of people around the world and its flavor by classifying two main food attributes, cuisine and flavor. A deep learning model based on multi-scale convotuional networks is proposed for extracting more accurate features from input images. The aggregation of multi-scale convolution layers with different kernel size is also used for weighting the features results from different scales. In addition, a joint loss function based on Negative Log Likelihood (NLL) is used to fit the model probability to multi labeled classes for multi-modal classification task. Furthermore, this work provides a new dataset for food attributes, so-called Yummly48K, extracted from the popular food website, Yummly. Our model is assessed on the constructed Yummly48K dataset. The experimental results show that our proposed method yields 65% and 62% average F1 score on validation and test set which outperforming the state-of-the-art models. △ Less

Submitted 8 June, 2018; v1 submitted 30 May, 2018; originally announced May 2018.

Comments: 8 pages, Submitted in CCIA 2018

arXiv:1805.10241 [pdf, other]

SLSDeep: Skin Lesion Segmentation Based on Dilated Residual and Pyramid Pooling Networks

Authors: Md. Mostafa Kamal Sarker, Hatem A. Rashwan, Farhan Akram, Syeda Furruka Banu, Adel Saleh, Vivek Kumar Singh, Forhad U H Chowdhury, Saddam Abdulwahab, Santiago Romani, Petia Radeva, Domenec Puig

Abstract: Skin lesion segmentation (SLS) in dermoscopic images is a crucial task for automated diagnosis of melanoma. In this paper, we present a robust deep learning SLS model, so-called SLSDeep, which is represented as an encoder-decoder network. The encoder network is constructed by dilated residual layers, in turn, a pyramid pooling network followed by three convolution layers is used for the decoder. U… ▽ More Skin lesion segmentation (SLS) in dermoscopic images is a crucial task for automated diagnosis of melanoma. In this paper, we present a robust deep learning SLS model, so-called SLSDeep, which is represented as an encoder-decoder network. The encoder network is constructed by dilated residual layers, in turn, a pyramid pooling network followed by three convolution layers is used for the decoder. Unlike the traditional methods employing a cross-entropy loss, we investigated a loss function by combining both Negative Log Likelihood (NLL) and End Point Error (EPE) to accurately segment the melanoma regions with sharp boundaries. The robustness of the proposed model was evaluated on two public databases: ISBI 2016 and 2017 for skin lesion analysis towards melanoma detection challenge. The proposed model outperforms the state-of-the-art methods in terms of segmentation accuracy. Moreover, it is capable to segment more than $100$ images of size 384x384 per second on a recent GPU. △ Less

Submitted 30 May, 2018; v1 submitted 25 May, 2018; originally announced May 2018.

Comments: Accepted in MICCAI 2018, 9 pages

arXiv:1805.10207 [pdf, other]

Conditional Generative Adversarial and Convolutional Networks for X-ray Breast Mass Segmentation and Shape Classification

Authors: Vivek Kumar Singh, Santiago Romani, Hatem A. Rashwan, Farhan Akram, Nidhi Pandey, Md. Mostafa Kamal Sarker, Jordina Torrents Barrena, Saddam Abdulwahab, Adel Saleh, Miguel Arquez, Meritxell Arenas, Domenec Puig

Abstract: This paper proposes a novel approach based on conditional Generative Adversarial Networks (cGAN) for breast mass segmentation in mammography. We hypothesized that the cGAN structure is well-suited to accurately outline the mass area, especially when the training data is limited. The generative network learns intrinsic features of tumors while the adversarial network enforces segmentations to be si… ▽ More This paper proposes a novel approach based on conditional Generative Adversarial Networks (cGAN) for breast mass segmentation in mammography. We hypothesized that the cGAN structure is well-suited to accurately outline the mass area, especially when the training data is limited. The generative network learns intrinsic features of tumors while the adversarial network enforces segmentations to be similar to the ground truth. Experiments performed on dozens of malignant tumors extracted from the public DDSM dataset and from our in-house private dataset confirm our hypothesis with very high Dice coefficient and Jaccard index (>94% and >89%, respectively) outperforming the scores obtained by other state-of-the-art approaches. Furthermore, in order to detect portray significant morphological features of the segmented tumor, a specific Convolutional Neural Network (CNN) have also been designed for classifying the segmented tumor areas into four types (irregular, lobular, oval and round), which provides an overall accuracy about 72% with the DDSM dataset. △ Less

Submitted 10 June, 2018; v1 submitted 25 May, 2018; originally announced May 2018.

Comments: 8 pages, Accepted at Medical Image Computing and Computer Assisted Intervention (MICCAI) 2018

arXiv:1802.09384 [pdf, other]

doi 10.1109/TIP.2019.2911484

Using Curvilinear Features in Focus for Registering a Single Image to a 3D Object

Authors: Hatem A. Rashwan, Sylvie Chambon, Pierre Gurdjos, Géraldine Morin, Vincent Charvillat

Abstract: In the context of 2D/3D registration, this paper introduces an approach that allows to match features detected in two different modalities: photographs and 3D models, by using a common 2D reprensentation. More precisely, 2D images are matched with a set of depth images, representing the 3D model. After introducing the concept of curvilinear saliency, related to curvature estimation, we propose a n… ▽ More In the context of 2D/3D registration, this paper introduces an approach that allows to match features detected in two different modalities: photographs and 3D models, by using a common 2D reprensentation. More precisely, 2D images are matched with a set of depth images, representing the 3D model. After introducing the concept of curvilinear saliency, related to curvature estimation, we propose a new ridge and valley detector for depth images rendered from 3D model. A variant of this detector is adapted to photographs, in particular by applying it in multi-scale and by combining this feature detector with the principle of focus curves. Finally, a registration algorithm for determining the correct viewpoint of the 3D model and thus the pose is proposed. It is based on using histogram of gradients features adapted to the features manipulated in 2D and in 3D, and the introduction of repeatability scores. The results presented highlight the quality of the features detected, in term of repeatability, and also the interest of the approach for registration and pose estimation. △ Less

Submitted 26 February, 2018; originally announced February 2018.

arXiv:1709.08271 [pdf]

3D Camouflaging Object using RGB-D Sensors

Authors: Ahmed M. Siddek, Mohsen A. Rashwan, Islam A. Eshrah

Abstract: This paper proposes a new optical camouflage system that uses RGB-D cameras, for acquiring point cloud of background scene, and tracking observers eyes. This system enables a user to conceal an object located behind a display that surrounded by 3D objects. If we considered here the tracked point of observer s eyes is a light source, the system will work on estimating shadow shape of the display de… ▽ More This paper proposes a new optical camouflage system that uses RGB-D cameras, for acquiring point cloud of background scene, and tracking observers eyes. This system enables a user to conceal an object located behind a display that surrounded by 3D objects. If we considered here the tracked point of observer s eyes is a light source, the system will work on estimating shadow shape of the display device that falls on the objects in background. The system uses the 3d observer s eyes and the locations of display corners to predict their shadow points which have nearest neighbors in the constructed point cloud of background scene. △ Less

Submitted 24 September, 2017; originally announced September 2017.

Comments: 6 pages, 12 figures, 2017 IEEE International Conference on SMC

Showing 1–26 of 26 results for author: Rashwan, A