Search | arXiv e-print repository

arXiv:2406.19675 [pdf]

Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey

Authors: Uchitha Rajapaksha, Ferdous Sohel, Hamid Laga, Dean Diepeveen, Mohammed Bennamoun

Abstract: Estimating depth from single RGB images and videos is of widespread interest due to its applications in many areas, including autonomous driving, 3D reconstruction, digital entertainment, and robotics. More than 500 deep learning-based papers have been published in the past 10 years, which indicates the growing interest in the task. This paper presents a comprehensive survey of the existing deep l… ▽ More Estimating depth from single RGB images and videos is of widespread interest due to its applications in many areas, including autonomous driving, 3D reconstruction, digital entertainment, and robotics. More than 500 deep learning-based papers have been published in the past 10 years, which indicates the growing interest in the task. This paper presents a comprehensive survey of the existing deep learning-based methods, the challenges they address, and how they have evolved in their architecture and supervision methods. It provides a taxonomy for classifying the current work based on their input and output modalities, network architectures, and learning methods. It also discusses the major milestones in the history of monocular depth estimation, and different pipelines, datasets, and evaluation metrics used in existing methods. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: 46 pages, 10 figures, The paper has been accepted for publication in ACM Computing Surveys 2024

ACM Class: I.2.10; I.4; I.5.1; I.4.8

arXiv:2406.04861 [pdf, other]

Normal-guided Detail-Preserving Neural Implicit Functions for High-Fidelity 3D Surface Reconstruction

Authors: Aarya Patel, Hamid Laga, Ojaswa Sharma

Abstract: Neural implicit representations have emerged as a powerful paradigm for 3D reconstruction. However, despite their success, existing methods fail to capture fine geometric details and thin structures, especially in scenarios where only sparse RGB views of the objects of interest are available. We hypothesize that current methods for learning neural implicit representations from RGB or RGBD images p… ▽ More Neural implicit representations have emerged as a powerful paradigm for 3D reconstruction. However, despite their success, existing methods fail to capture fine geometric details and thin structures, especially in scenarios where only sparse RGB views of the objects of interest are available. We hypothesize that current methods for learning neural implicit representations from RGB or RGBD images produce 3D surfaces with missing parts and details because they only rely on 0-order differential properties, i.e. the 3D surface points and their projections, as supervisory signals. Such properties, however, do not capture the local 3D geometry around the points and also ignore the interactions between points. This paper demonstrates that training neural representations with first-order differential properties, i.e. surface normals, leads to highly accurate 3D surface reconstruction even in situations where only as few as two RGB (front and back) images are available. Given multiview RGB images of an object of interest, we first compute the approximate surface normals in the image space using the gradient of the depth maps produced using an off-the-shelf monocular depth estimator such as Depth Anything model. An implicit surface regressor is then trained using a loss function that enforces the first-order differential properties of the regressed surface to match those estimated from Depth Anything. Our extensive experiments on a wide range of real and synthetic datasets show that the proposed method achieves an unprecedented level of reconstruction accuracy even when using as few as two RGB views. The detailed ablation study also demonstrates that normal-based supervision plays a key role in this significant improvement in performance, enabling the 3D reconstruction of intricate geometric details and thin structures that were previously challenging to capture. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: Original version. Project page with images and code: https://sn-nir.github.io/

arXiv:2402.17910 [pdf, other]

Box It to Bind It: Unified Layout Control and Attribute Binding in T2I Diffusion Models

Authors: Ashkan Taghipour, Morteza Ghahremani, Mohammed Bennamoun, Aref Miri Rekavandi, Hamid Laga, Farid Boussaid

Abstract: While latent diffusion models (LDMs) excel at creating imaginative images, they often lack precision in semantic fidelity and spatial control over where objects are generated. To address these deficiencies, we introduce the Box-it-to-Bind-it (B2B) module - a novel, training-free approach for improving spatial control and semantic accuracy in text-to-image (T2I) diffusion models. B2B targets three… ▽ More While latent diffusion models (LDMs) excel at creating imaginative images, they often lack precision in semantic fidelity and spatial control over where objects are generated. To address these deficiencies, we introduce the Box-it-to-Bind-it (B2B) module - a novel, training-free approach for improving spatial control and semantic accuracy in text-to-image (T2I) diffusion models. B2B targets three key challenges in T2I: catastrophic neglect, attribute binding, and layout guidance. The process encompasses two main steps: i) Object generation, which adjusts the latent encoding to guarantee object generation and directs it within specified bounding boxes, and ii) attribute binding, guaranteeing that generated objects adhere to their specified attributes in the prompt. B2B is designed as a compatible plug-and-play module for existing T2I models, markedly enhancing model performance in addressing the key challenges. We evaluate our technique using the established CompBench and TIFA score benchmarks, demonstrating significant performance improvements compared to existing methods. The source code will be made publicly available at https://github.com/nextaistudio/BoxIt2BindIt. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2308.03005 [pdf, other]

MCTformer+: Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation

Authors: Lian Xu, Mohammed Bennamoun, Farid Boussaid, Hamid Laga, Wanli Ouyang, Dan Xu

Abstract: This paper proposes a novel transformer-based framework that aims to enhance weakly supervised semantic segmentation (WSSS) by generating accurate class-specific object localization maps as pseudo labels. Building upon the observation that the attended regions of the one-class token in the standard vision transformer can contribute to a class-agnostic localization map, we explore the potential of… ▽ More This paper proposes a novel transformer-based framework that aims to enhance weakly supervised semantic segmentation (WSSS) by generating accurate class-specific object localization maps as pseudo labels. Building upon the observation that the attended regions of the one-class token in the standard vision transformer can contribute to a class-agnostic localization map, we explore the potential of the transformer model to capture class-specific attention for class-discriminative object localization by learning multiple class tokens. We introduce a Multi-Class Token transformer, which incorporates multiple class tokens to enable class-aware interactions with the patch tokens. To achieve this, we devise a class-aware training strategy that establishes a one-to-one correspondence between the output class tokens and the ground-truth class labels. Moreover, a Contrastive-Class-Token (CCT) module is proposed to enhance the learning of discriminative class tokens, enabling the model to better capture the unique characteristics and properties of each class. As a result, class-discriminative object localization maps can be effectively generated by leveraging the class-to-patch attentions associated with different class tokens. To further refine these localization maps, we propose the utilization of patch-level pairwise affinity derived from the patch-to-patch transformer attention. Furthermore, the proposed framework seamlessly complements the Class Activation Map** (CAM) method, resulting in significantly improved WSSS performance on the PASCAL VOC 2012 and MS COCO 2014 datasets. These results underline the importance of the class token for WSSS. △ Less

Submitted 5 August, 2023; originally announced August 2023.

Comments: Journal extension for MCTformer

arXiv:2209.08305 [pdf, other]

Active-Passive SimStereo -- Benchmarking the Cross-Generalization Capabilities of Deep Learning-based Stereo Methods

Authors: Laurent Jospin, Allen Antony, Lian Xu, Hamid Laga, Farid Boussaid, Mohammed Bennamoun

Abstract: In stereo vision, self-similar or bland regions can make it difficult to match patches between two images. Active stereo-based methods mitigate this problem by projecting a pseudo-random pattern on the scene so that each patch of an image pair can be identified without ambiguity. However, the projected pattern significantly alters the appearance of the image. If this pattern acts as a form of adve… ▽ More In stereo vision, self-similar or bland regions can make it difficult to match patches between two images. Active stereo-based methods mitigate this problem by projecting a pseudo-random pattern on the scene so that each patch of an image pair can be identified without ambiguity. However, the projected pattern significantly alters the appearance of the image. If this pattern acts as a form of adversarial noise, it could negatively impact the performance of deep learning-based methods, which are now the de-facto standard for dense stereo vision. In this paper, we propose the Active-Passive SimStereo dataset and a corresponding benchmark to evaluate the performance gap between passive and active stereo images for stereo matching algorithms. Using the proposed benchmark and an additional ablation study, we show that the feature extraction and matching modules of a selection of twenty selected deep learning-based stereo matching methods generalize to active stereo without a problem. However, the disparity refinement modules of three of the twenty architectures (ACVNet, CascadeStereo, and StereoNet) are negatively affected by the active stereo patterns due to their reliance on the appearance of the input images. △ Less

Submitted 17 September, 2022; originally announced September 2022.

Comments: 22 pages, 12 figures, accepted in NeurIPS 2022 Datasets and Benchmarks Track

arXiv:2209.05082 [pdf, other]

Bayesian Learning for Disparity Map Refinement for Semi-Dense Active Stereo Vision

Authors: Laurent Valentin Jospin, Hamid Laga, Farid Boussaid, Mohammed Bennamoun

Abstract: A major focus of recent developments in stereo vision has been on how to obtain accurate dense disparity maps in passive stereo vision. Active vision systems enable more accurate estimations of dense disparity compared to passive stereo. However, subpixel-accurate disparity estimation remains an open problem that has received little attention. In this paper, we propose a new learning strategy to t… ▽ More A major focus of recent developments in stereo vision has been on how to obtain accurate dense disparity maps in passive stereo vision. Active vision systems enable more accurate estimations of dense disparity compared to passive stereo. However, subpixel-accurate disparity estimation remains an open problem that has received little attention. In this paper, we propose a new learning strategy to train neural networks to estimate high-quality subpixel disparity maps for semi-dense active stereo vision. The key insight is that neural networks can double their accuracy if they are able to jointly learn how to refine the disparity map while invalidating the pixels where there is insufficient information to correct the disparity estimate. Our approach is based on Bayesian modeling where validated and invalidated pixels are defined by their stochastic properties, allowing the model to learn how to choose by itself which pixels are worth its attention. Using active stereo datasets such as Active-Passive SimStereo, we demonstrate that the proposed method outperforms the current state-of-the-art active stereo models. We also demonstrate that the proposed approach compares favorably with state-of-the-art passive stereo models on the Middlebury dataset. △ Less

Submitted 12 September, 2022; originally announced September 2022.

Comments: 15 pages, 15 figures

arXiv:2112.07819 [pdf, other]

Weed Recognition using Deep Learning Techniques on Class-imbalanced Imagery

Authors: A S M Mahmudul Hasan, Ferdous Sohel, Dean Diepeveen, Hamid Laga, Michael G. K. Jones

Abstract: Most weed species can adversely impact agricultural productivity by competing for nutrients required by high-value crops. Manual weeding is not practical for large crop** areas. Many studies have been undertaken to develop automatic weed management systems for agricultural crops. In this process, one of the major tasks is to recognise the weeds from images. However, weed recognition is a challen… ▽ More Most weed species can adversely impact agricultural productivity by competing for nutrients required by high-value crops. Manual weeding is not practical for large crop** areas. Many studies have been undertaken to develop automatic weed management systems for agricultural crops. In this process, one of the major tasks is to recognise the weeds from images. However, weed recognition is a challenging task. It is because weed and crop plants can be similar in colour, texture and shape which can be exacerbated further by the imaging conditions, geographic or weather conditions when the images are recorded. Advanced machine learning techniques can be used to recognise weeds from imagery. In this paper, we have investigated five state-of-the-art deep neural networks, namely VGG16, ResNet-50, Inception-V3, Inception-ResNet-v2 and MobileNetV2, and evaluated their performance for weed recognition. We have used several experimental settings and multiple dataset combinations. In particular, we constructed a large weed-crop dataset by combining several smaller datasets, mitigating class imbalance by data augmentation, and using this dataset in benchmarking the deep neural networks. We investigated the use of transfer learning techniques by preserving the pre-trained weights for extracting the features and fine-tuning them using the images of crop and weed datasets. We found that VGG16 performed better than others on small-scale datasets, while ResNet-50 performed better than other deep networks on the large combined dataset. △ Less

Submitted 14 December, 2021; originally announced December 2021.

Comments: The paper is accepted by Crop and Pasture Science journal (https://www.publish.csiro.au/CP/justaccepted/CP21626)

arXiv:2112.00941 [pdf, other]

Generalized Closed-form Formulae for Feature-based Subpixel Alignment in Patch-based Matching

Authors: Laurent Valentin Jospin, Farid Boussaid, Hamid Laga, Mohammed Bennamoun

Abstract: Cost-based image patch matching is at the core of various techniques in computer vision, photogrammetry and remote sensing. When the subpixel disparity between the reference patch in the source and target images is required, either the cost function or the target image have to be interpolated. While cost-based interpolation is the easiest to implement, multiple works have shown that image based in… ▽ More Cost-based image patch matching is at the core of various techniques in computer vision, photogrammetry and remote sensing. When the subpixel disparity between the reference patch in the source and target images is required, either the cost function or the target image have to be interpolated. While cost-based interpolation is the easiest to implement, multiple works have shown that image based interpolation can increase the accuracy of the subpixel matching, but usually at the cost of expensive search procedures. This, however, is problematic, especially for very computation intensive applications such as stereo matching or optical flow computation. In this paper, we show that closed form formulae for subpixel disparity computation for the case of one dimensional matching, e.g., in the case of rectified stereo images where the search space is of one dimension, exists when using the standard NCC, SSD and SAD cost functions. We then demonstrate how to generalize the proposed formulae to the case of high dimensional search spaces, which is required for unrectified stereo matching and optical flow extraction. We also compare our results with traditional cost volume interpolation formulae as well as with state-of-the-art cost-based refinement methods, and show that the proposed formulae bring a small improvement over the state-of-the-art cost-based methods in the case of one dimensional search spaces, and a significant improvement when the search space is two dimensional. △ Less

Submitted 12 February, 2023; v1 submitted 1 December, 2021; originally announced December 2021.

Comments: 29 pages, 10 figures

ACM Class: I.4.8

arXiv:2110.08693 [pdf, other]

Elastic Shape Analysis of Tree-like 3D Objects using Extended SRVF Representation

Authors: Guan Wang, Hamid Laga, Anuj Srivastava

Abstract: How can one analyze detailed 3D biological objects, such as neurons and botanical trees, that exhibit complex geometrical and topological variation? In this paper, we develop a novel mathematical framework for representing, comparing, and computing geodesic deformations between the shapes of such tree-like 3D objects. A hierarchical organization of subtrees characterizes these objects -- each subt… ▽ More How can one analyze detailed 3D biological objects, such as neurons and botanical trees, that exhibit complex geometrical and topological variation? In this paper, we develop a novel mathematical framework for representing, comparing, and computing geodesic deformations between the shapes of such tree-like 3D objects. A hierarchical organization of subtrees characterizes these objects -- each subtree has the main branch with some side branches attached -- and one needs to match these structures across objects for meaningful comparisons. We propose a novel representation that extends the Square-Root Velocity Function (SRVF), initially developed for Euclidean curves, to tree-shaped 3D objects. We then define a new metric that quantifies the bending, stretching, and branch sliding needed to deform one tree-shaped object into the other. Compared to the current metrics, such as the Quotient Euclidean Distance (QED) and the Tree Edit Distance (TED), the proposed representation and metric capture the full elasticity of the branches (i.e., bending and stretching) as well as the topological variations (i.e., branch death/birth and sliding). It completely avoids the shrinkage that results from the edge collapse and node split operations of the QED and TED metrics. We demonstrate the utility of this framework in comparing, matching, and computing geodesics between biological objects such as neurons and botanical trees. The framework is also applied to various shape analysis tasks: (i) symmetry analysis and symmetrization of tree-shaped 3D objects, (ii) computing summary statistics (means and modes of variations) of populations of tree-shaped 3D objects, (iii) fitting parametric probability distributions to such populations, and (iv) finally synthesizing novel tree-shaped 3D objects through random sampling from estimated probability distributions. △ Less

Submitted 26 November, 2023; v1 submitted 16 October, 2021; originally announced October 2021.

arXiv:2109.11844 [pdf, other]

Learnable Triangulation for Deep Learning-based 3D Reconstruction of Objects of Arbitrary Topology from Single RGB Images

Authors: Tarek Ben Charrada, Hedi Tabia, Aladine Chetouani, Hamid Laga

Abstract: We propose a novel deep reinforcement learning-based approach for 3D object reconstruction from monocular images. Prior works that use mesh representations are template based. Thus, they are limited to the reconstruction of objects that have the same topology as the template. Methods that use volumetric grids as intermediate representations are computationally expensive, which limits their applica… ▽ More We propose a novel deep reinforcement learning-based approach for 3D object reconstruction from monocular images. Prior works that use mesh representations are template based. Thus, they are limited to the reconstruction of objects that have the same topology as the template. Methods that use volumetric grids as intermediate representations are computationally expensive, which limits their application in real-time scenarios. In this paper, we propose a novel end-to-end method that reconstructs 3D objects of arbitrary topology from a monocular image. It is composed of of (1) a Vertex Generation Network (VGN), which predicts the initial 3D locations of the object's vertices from an input RGB image, (2) a differentiable triangulation layer, which learns in a non-supervised manner, using a novel reinforcement learning algorithm, the best triangulation of the object's vertices, and finally, (3) a hierarchical mesh refinement network that uses graph convolutions to refine the initial mesh. Our key contribution is the learnable triangulation process, which recovers in an unsupervised manner the topology of the input shape. Our experiments on ShapeNet and Pix3D benchmarks show that the proposed method outperforms the state-of-the-art in terms of visual quality, reconstruction accuracy, and computational time. △ Less

Submitted 24 September, 2021; originally announced September 2021.

arXiv:2103.01415 [pdf, other]

A Survey of Deep Learning Techniques for Weed Detection from Images

Authors: A S M Mahmudul Hasan, Ferdous Sohel, Dean Diepeveen, Hamid Laga, Michael G. K. Jones

Abstract: The rapid advances in Deep Learning (DL) techniques have enabled rapid detection, localisation, and recognition of objects from images or videos. DL techniques are now being used in many applications related to agriculture and farming. Automatic detection and classification of weeds can play an important role in weed management and so contribute to higher yields. Weed detection in crops from image… ▽ More The rapid advances in Deep Learning (DL) techniques have enabled rapid detection, localisation, and recognition of objects from images or videos. DL techniques are now being used in many applications related to agriculture and farming. Automatic detection and classification of weeds can play an important role in weed management and so contribute to higher yields. Weed detection in crops from imagery is inherently a challenging problem because both weeds and crops have similar colours ('green-on-green'), and their shapes and texture can be very similar at the growth phase. Also, a crop in one setting can be considered a weed in another. In addition to their detection, the recognition of specific weed species is essential so that targeted controlling mechanisms (e.g. appropriate herbicides and correct doses) can be applied. In this paper, we review existing deep learning-based weed detection and classification techniques. We cover the detailed literature on four main procedures, i.e., data acquisition, dataset preparation, DL techniques employed for detection, location and classification of weeds in crops, and evaluation metrics approaches. We found that most studies applied supervised learning techniques, they achieved high classification accuracy by fine-tuning pre-trained models on any plant dataset, and past experiments have already achieved high accuracy when a large amount of labelled data is available. △ Less

Submitted 1 March, 2021; originally announced March 2021.

arXiv:2101.09403 [pdf, other]

4D Atlas: Statistical Analysis of the Spatiotemporal Variability in Longitudinal 3D Shape Data

Authors: Hamid Laga, Marcel Padilla, Ian H. Jermyn, Sebastian Kurtek, Mohammed Bennamoun, Anuj Srivastava

Abstract: We propose a novel framework to learn the spatiotemporal variability in longitudinal 3D shape data sets, which contain observations of objects that evolve and deform over time. This problem is challenging since surfaces come with arbitrary parameterizations and thus, they need to be spatially registered. Also, different deforming objects, also called 4D surfaces, evolve at different speeds and thu… ▽ More We propose a novel framework to learn the spatiotemporal variability in longitudinal 3D shape data sets, which contain observations of objects that evolve and deform over time. This problem is challenging since surfaces come with arbitrary parameterizations and thus, they need to be spatially registered. Also, different deforming objects, also called 4D surfaces, evolve at different speeds and thus they need to be temporally aligned. We solve this spatiotemporal registration problem using a Riemannian approach. We treat a 3D surface as a point in a shape space equipped with an elastic Riemannian metric that measures the amount of bending and stretching that the surfaces undergo. A 4D surface can then be seen as a trajectory in this space. With this formulation, the statistical analysis of 4D surfaces can be cast as the problem of analyzing trajectories embedded in a nonlinear Riemannian manifold. However, performing the spatiotemporal registration, and subsequently computing statistics, on such nonlinear spaces is not straightforward as they rely on complex nonlinear optimizations. Our core contribution is the map** of the surfaces to the space of Square-Root Normal Fields where the L2 metric is equivalent to the partial elastic metric in the space of surfaces. Thus, by solving the spatial registration in the SRNF space, the problem of analyzing 4D surfaces becomes the problem of analyzing trajectories embedded in the SRNF space, which has a Euclidean structure. In this paper, we develop the building blocks that enable such analysis. These include: (1) the spatiotemporal registration of arbitrarily parameterized 4D surfaces in the presence of large elastic deformations and large variations in their execution rates; (2) the computation of geodesics between 4D surfaces; (3) the computation of statistical summaries; and (4) the synthesis of random 4D surfaces. △ Less

Submitted 20 August, 2021; v1 submitted 22 January, 2021; originally announced January 2021.

arXiv:2009.07532 [pdf]

RCNN for Region of Interest Detection in Whole Slide Images

Authors: A Nugaliyadde, Kok Wai Wong, Jeremy Parry, Ferdous Sohel, Hamid Laga, Upeka V. Somaratne, Chris Yeomans, Orchid Foster

Abstract: Digital pathology has attracted significant attention in recent years. Analysis of Whole Slide Images (WSIs) is challenging because they are very large, i.e., of Giga-pixel resolution. Identifying Regions of Interest (ROIs) is the first step for pathologists to analyse further the regions of diagnostic interest for cancer detection and other anomalies. In this paper, we investigate the use of RCNN… ▽ More Digital pathology has attracted significant attention in recent years. Analysis of Whole Slide Images (WSIs) is challenging because they are very large, i.e., of Giga-pixel resolution. Identifying Regions of Interest (ROIs) is the first step for pathologists to analyse further the regions of diagnostic interest for cancer detection and other anomalies. In this paper, we investigate the use of RCNN, which is a deep machine learning technique, for detecting such ROIs only using a small number of labelled WSIs for training. For experimentation, we used real WSIs from a public hospital pathology service in Western Australia. We used 60 WSIs for training the RCNN model and another 12 WSIs for testing. The model was further tested on a new set of unseen WSIs. The results show that RCNN can be effectively used for ROI detection from WSIs. △ Less

Submitted 17 September, 2020; v1 submitted 16 September, 2020; originally announced September 2020.

Comments: This paper was accepted to the 27th International Conference on Neural Information Processing (ICONIP 2020) and will be published in the Springer CCIS Series

arXiv:2007.06823 [pdf, other]

doi 10.1109/MCI.2022.3155327

Hands-on Bayesian Neural Networks -- a Tutorial for Deep Learning Users

Authors: Laurent Valentin Jospin, Wray Buntine, Farid Boussaid, Hamid Laga, Mohammed Bennamoun

Abstract: Modern deep learning methods constitute incredibly powerful tools to tackle a myriad of challenging problems. However, since deep learning methods operate as black boxes, the uncertainty associated with their predictions is often challenging to quantify. Bayesian statistics offer a formalism to understand and quantify the uncertainty associated with deep neural network predictions. This tutorial p… ▽ More Modern deep learning methods constitute incredibly powerful tools to tackle a myriad of challenging problems. However, since deep learning methods operate as black boxes, the uncertainty associated with their predictions is often challenging to quantify. Bayesian statistics offer a formalism to understand and quantify the uncertainty associated with deep neural network predictions. This tutorial provides an overview of the relevant literature and a complete toolset to design, implement, train, use and evaluate Bayesian Neural Networks, i.e. Stochastic Artificial Neural Networks trained using Bayesian methods. △ Less

Submitted 3 January, 2022; v1 submitted 14 July, 2020; originally announced July 2020.

Comments: 20 pages, 13 figures

MSC Class: 62-02 (Primary) ACM Class: G.3; I.2.6

Journal ref: IEEE Computational Intelligence Magazine ( Volume: 17, Issue: 2, May 2022)

arXiv:2006.02535 [pdf, other]

doi 10.1109/TPAMI.2020.3032602

A Survey on Deep Learning Techniques for Stereo-based Depth Estimation

Authors: Hamid Laga, Laurent Valentin Jospin, Farid Boussaid, Mohammed Bennamoun

Abstract: Estimating depth from RGB images is a long-standing ill-posed problem, which has been explored for decades by the computer vision, graphics, and machine learning communities. Among the existing techniques, stereo matching remains one of the most widely used in the literature due to its strong connection to the human binocular system. Traditionally, stereo-based depth estimation has been addressed… ▽ More Estimating depth from RGB images is a long-standing ill-posed problem, which has been explored for decades by the computer vision, graphics, and machine learning communities. Among the existing techniques, stereo matching remains one of the most widely used in the literature due to its strong connection to the human binocular system. Traditionally, stereo-based depth estimation has been addressed through matching hand-crafted features across multiple images. Despite the extensive amount of research, these traditional techniques still suffer in the presence of highly textured areas, large uniform regions, and occlusions. Motivated by their growing success in solving various 2D and 3D vision problems, deep learning for stereo-based depth estimation has attracted growing interest from the community, with more than 150 papers published in this area between 2014 and 2019. This new generation of methods has demonstrated a significant leap in performance, enabling applications such as autonomous driving and augmented reality. In this article, we provide a comprehensive survey of this new and continuously growing field of research, summarize the most commonly used pipelines, and discuss their benefits and limitations. In retrospect of what has been achieved so far, we also conjecture what the future may hold for deep learning-based stereo for depth estimation research. △ Less

Submitted 1 June, 2020; originally announced June 2020.

Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020

arXiv:1907.09236 [pdf, other]

RGB-D image-based Object Detection: from Traditional Methods to Deep Learning Techniques

Authors: Isaac Ronald Ward, Hamid Laga, Mohammed Bennamoun

Abstract: Object detection from RGB images is a long-standing problem in image processing and computer vision. It has applications in various domains including robotics, surveillance, human-computer interaction, and medical diagnosis. With the availability of low cost 3D scanners, a large number of RGB-D object detection approaches have been proposed in the past years. This chapter provides a comprehensive… ▽ More Object detection from RGB images is a long-standing problem in image processing and computer vision. It has applications in various domains including robotics, surveillance, human-computer interaction, and medical diagnosis. With the availability of low cost 3D scanners, a large number of RGB-D object detection approaches have been proposed in the past years. This chapter provides a comprehensive survey of the recent developments in this field. We structure the chapter into two parts; the focus of the first part is on techniques that are based on hand-crafted features combined with machine learning algorithms. The focus of the second part is on the more recent work, which is based on deep learning. Deep learning techniques, coupled with the availability of large training datasets, have now revolutionized the field of computer vision, including RGB-D object detection, achieving an unprecedented level of performance. We survey the key contributions, summarize the most commonly used pipelines, discuss their benefits and limitations, and highlight some important directions for future research. △ Less

Submitted 22 July, 2019; originally announced July 2019.

Comments: Chapter in the book 'RGB-D Image Analysis and Processing' (Paul Rosin)

arXiv:1906.06543 [pdf, other]

doi 10.1109/TPAMI.2019.2954885

Image-based 3D Object Reconstruction: State-of-the-Art and Trends in the Deep Learning Era

Authors: Xian-Feng Han, Hamid Laga, Mohammed Bennamoun

Abstract: 3D reconstruction is a longstanding ill-posed problem, which has been explored for decades by the computer vision, computer graphics, and machine learning communities. Since 2015, image-based 3D reconstruction using convolutional neural networks (CNN) has attracted increasing interest and demonstrated an impressive performance. Given this new era of rapid evolution, this article provides a compreh… ▽ More 3D reconstruction is a longstanding ill-posed problem, which has been explored for decades by the computer vision, computer graphics, and machine learning communities. Since 2015, image-based 3D reconstruction using convolutional neural networks (CNN) has attracted increasing interest and demonstrated an impressive performance. Given this new era of rapid evolution, this article provides a comprehensive survey of the recent developments in this field. We focus on the works which use deep learning techniques to estimate the 3D shape of generic objects either from a single or multiple RGB images. We organize the literature based on the shape representations, the network architectures, and the training mechanisms they use. While this survey is intended for methods which reconstruct generic objects, we also review some of the recent works which focus on specific object classes such as human body shapes and faces. We provide an analysis and comparison of the performance of some key papers, summarize some of the open problems in this field, and discuss promising directions for future research. △ Less

Submitted 1 November, 2019; v1 submitted 15 June, 2019; originally announced June 2019.

Comments: arXiv admin note: text overlap with arXiv:1806.06098, arXiv:1712.06584, arXiv:1804.10975 by other authors

Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, Nov. 2019

arXiv:1906.06113 [pdf, other]

A Survey on Deep Learning Architectures for Image-based Depth Reconstruction

Authors: Hamid Laga

Abstract: Estimating depth from RGB images is a long-standing ill-posed problem, which has been explored for decades by the computer vision, graphics, and machine learning communities. In this article, we provide a comprehensive survey of the recent developments in this field. We will focus on the works which use deep learning techniques to estimate depth from one or multiple images. Deep learning, coupled… ▽ More Estimating depth from RGB images is a long-standing ill-posed problem, which has been explored for decades by the computer vision, graphics, and machine learning communities. In this article, we provide a comprehensive survey of the recent developments in this field. We will focus on the works which use deep learning techniques to estimate depth from one or multiple images. Deep learning, coupled with the availability of large training datasets, have revolutionized the way the depth reconstruction problem is being approached by the research community. In this article, we survey more than 100 key contributions that appeared in the past five years, summarize the most commonly used pipelines, and discuss their benefits and limitations. In retrospect of what has been achieved so far, we also conjecture what the future may hold for learning-based depth reconstruction research. △ Less

Submitted 14 June, 2019; originally announced June 2019.

arXiv:1905.06812 [pdf, other]

doi 10.1016/j.jtbi.2019.110108

Statistical Analysis and Modeling of the Geometry and Topology of Plant Roots

Authors: Guan Wang, Hamid Laga, **yuan Jia, Stanley J. Miklavcic, Anuj Srivastava

Abstract: The root is an important organ of a plant since it is responsible for water and nutrient uptake. Analyzing and modelling variabilities in the geometry and topology of roots can help in assessing the plant's health, understanding its growth patterns, and modeling relations between plant species and between plants and their environment. In this article, we develop a framework for the statistical ana… ▽ More The root is an important organ of a plant since it is responsible for water and nutrient uptake. Analyzing and modelling variabilities in the geometry and topology of roots can help in assessing the plant's health, understanding its growth patterns, and modeling relations between plant species and between plants and their environment. In this article, we develop a framework for the statistical analysis and modeling of the geometry and topology of plant roots. We represent root structures as points in a tree-shape space equipped with a metric that quantifies geometric and topological differences between pairs of roots. We then use these building blocks to compute geodesics, i.e., optimal deformations under the metric between root structures, and to perform statistical analysis on root populations. We demonstrate the utility of the proposed framework through an application to a dataset of wheat roots grown in different environmental conditions. We also show that the framework can be used in various applications including classification and regression. △ Less

Submitted 15 October, 2019; v1 submitted 16 May, 2019; originally announced May 2019.

Journal ref: Journal of Theoretical Biology, 2020

arXiv:1812.10111 [pdf, other]

A Survey on Non-rigid 3D Shape Analysis

Authors: Hamid Laga

Abstract: Shape is an important physical property of natural and manmade 3D objects that characterizes their external appearances. Understanding differences between shapes and modeling the variability within and across shape classes, hereinafter referred to as \emph{shape analysis}, are fundamental problems to many applications, ranging from computer vision and computer graphics to biology and medicine. Thi… ▽ More Shape is an important physical property of natural and manmade 3D objects that characterizes their external appearances. Understanding differences between shapes and modeling the variability within and across shape classes, hereinafter referred to as \emph{shape analysis}, are fundamental problems to many applications, ranging from computer vision and computer graphics to biology and medicine. This chapter provides an overview of some of the recent techniques that studied the shape of 3D objects that undergo non-rigid deformations including bending and stretching. Recent surveys that covered some aspects such classification, retrieval, recognition, and rigid or nonrigid registration, focused on methods that use shape descriptors. Descriptors, however, provide abstract representations that do not enable the exploration of shape variability. In this chapter, we focus on recent techniques that treated the shape of 3D objects as points in some high dimensional space where paths describe deformations. Equip** the space with a suitable metric enables the quantification of the range of deformations of a given shape, which in turn enables (1) comparing and classifying 3D objects based on their shape, (2) computing smooth deformations, i.e. geodesics, between pairs of objects, and (3) modeling and exploring continuous shape variability in a collection of 3D models. This article surveys and classifies recent developments in this field, outlines fundamental issues, discusses their potential applications in computer vision and graphics, and highlights opportunities for future research. Our primary goal is to bridge the gap between various techniques that have been often independently proposed by different communities including mathematics and statistics, computer vision and graphics, and medical image analysis. △ Less

Submitted 25 December, 2018; originally announced December 2018.

arXiv:1810.04020 [pdf, other]

A Comprehensive Survey of Deep Learning for Image Captioning

Authors: Md. Zakir Hossain, Ferdous Sohel, Mohd Fairuz Shiratuddin, Hamid Laga

Abstract: Generating a description of an image is called image captioning. Image captioning requires to recognize the important objects, their attributes and their relationships in an image. It also needs to generate syntactically and semantically correct sentences. Deep learning-based techniques are capable of handling the complexities and challenges of image captioning. In this survey paper, we aim to pre… ▽ More Generating a description of an image is called image captioning. Image captioning requires to recognize the important objects, their attributes and their relationships in an image. It also needs to generate syntactically and semantically correct sentences. Deep learning-based techniques are capable of handling the complexities and challenges of image captioning. In this survey paper, we aim to present a comprehensive review of existing deep learning-based image captioning techniques. We discuss the foundation of the techniques to analyze their performances, strengths and limitations. We also discuss the datasets and the evaluation metrics popularly used in deep learning based automatic image captioning. △ Less

Submitted 14 October, 2018; v1 submitted 6 October, 2018; originally announced October 2018.

Comments: 36 Pages, Accepted as a Journal Paper in ACM Computing Surveys (October 2018)

arXiv:1610.04531 [pdf, other]

doi 10.1109/TPAMI.2016.2647596

Numerical Inversion of SRNF Maps for Elastic Shape Analysis of Genus-Zero Surfaces

Authors: Hamid Laga, Qian Xie, Ian H. Jermyn, Anuj Srivastava

Abstract: Recent developments in elastic shape analysis (ESA) are motivated by the fact that it provides comprehensive frameworks for simultaneous registration, deformation, and comparison of shapes. These methods achieve computational efficiency using certain square-root representations that transform invariant elastic metrics into Euclidean metrics, allowing for applications of standard algorithms and sta… ▽ More Recent developments in elastic shape analysis (ESA) are motivated by the fact that it provides comprehensive frameworks for simultaneous registration, deformation, and comparison of shapes. These methods achieve computational efficiency using certain square-root representations that transform invariant elastic metrics into Euclidean metrics, allowing for applications of standard algorithms and statistical tools. For analyzing shapes of embeddings of $\mathbb{S}^2$ in $\mathbb{R}^3$, Jermyn et al. introduced square-root normal fields (SRNFs) that transformed an elastic metric, with desirable invariant properties, into the $\mathbb{L}^2$ metric. These SRNFs are essentially surface normals scaled by square-roots of infinitesimal area elements. A critical need in shape analysis is to invert solutions (deformations, averages, modes of variations, etc) computed in the SRNF space, back to the original surface space for visualizations and inferences. Due to the lack of theory for understanding SRNFs maps and their inverses, we take a numerical approach and derive an efficient multiresolution algorithm, based on solving an optimization problem in the surface space, that estimates surfaces corresponding to given SRNFs. This solution is found effective, even for complex shapes, e.g. human bodies and animals, that undergo significant deformations including bending and stretching. Specifically, we use this inversion for computing elastic shape deformations, transferring deformations, summarizing shapes, and for finding modes of variability in a given collection, while simultaneously registering the surfaces. We demonstrate the proposed algorithms using a statistical analysis of human body shapes, classification of generic surfaces and analysis of brain structures. △ Less

Submitted 14 October, 2016; originally announced October 2016.

Report number: Volume: 39 , Issue: 12

Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017

Showing 1–22 of 22 results for author: Laga, H