Search | arXiv e-print repository

Two Views Are Better than One: Monocular 3D Pose Estimation with Multiview Consistency

Authors: Christian Keilstrup Ingwersen, Anders Bjorholm Dahl, Janus Nørtoft Jensen, Morten Rieger Hannemose

Abstract: Deducing a 3D human pose from a single 2D image or 2D keypoints is inherently challenging, given the fundamental ambiguity wherein multiple 3D poses can correspond to the same 2D representation. The acquisition of 3D data, while invaluable for resolving pose ambiguity, is expensive and requires an intricate setup, often restricting its applicability to controlled lab environments. We improve perfo… ▽ More Deducing a 3D human pose from a single 2D image or 2D keypoints is inherently challenging, given the fundamental ambiguity wherein multiple 3D poses can correspond to the same 2D representation. The acquisition of 3D data, while invaluable for resolving pose ambiguity, is expensive and requires an intricate setup, often restricting its applicability to controlled lab environments. We improve performance of monocular human pose estimation models using multiview data for fine-tuning. We propose a novel loss function, multiview consistency, to enable adding additional training data with only 2D supervision. This loss enforces that the inferred 3D pose from one view aligns with the inferred 3D pose from another view under similarity transformations. Our consistency loss substantially improves performance for fine-tuning with no available 3D data. Our experiments demonstrate that two views offset by 90 degrees are enough to obtain good performance, with only marginal improvements by adding more views. Thus, we enable the acquisition of domain-specific data by capturing activities with off-the-shelf cameras, eliminating the need for elaborate calibration procedures. This research introduces new possibilities for domain adaptation in 3D pose estimation, providing a practical and cost-effective solution to customize models for specific applications. The used dataset, featuring additional views, will be made publicly available. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2304.01865 [pdf, other]

SportsPose -- A Dynamic 3D sports pose dataset

Authors: Christian Keilstrup Ingwersen, Christian Mikkelstrup, Janus Nørtoft Jensen, Morten Rieger Hannemose, Anders Bjorholm Dahl

Abstract: Accurate 3D human pose estimation is essential for sports analytics, coaching, and injury prevention. However, existing datasets for monocular pose estimation do not adequately capture the challenging and dynamic nature of sports movements. In response, we introduce SportsPose, a large-scale 3D human pose dataset consisting of highly dynamic sports movements. With more than 176,000 3D poses from 2… ▽ More Accurate 3D human pose estimation is essential for sports analytics, coaching, and injury prevention. However, existing datasets for monocular pose estimation do not adequately capture the challenging and dynamic nature of sports movements. In response, we introduce SportsPose, a large-scale 3D human pose dataset consisting of highly dynamic sports movements. With more than 176,000 3D poses from 24 different subjects performing 5 different sports activities, SportsPose provides a diverse and comprehensive set of 3D poses that reflect the complex and dynamic nature of sports movements. Contrary to other markerless datasets we have quantitatively evaluated the precision of SportsPose by comparing our poses with a commercial marker-based system and achieve a mean error of 34.5 mm across all evaluation sequences. This is comparable to the error reported on the commonly used 3DPW dataset. We further introduce a new metric, local movement, which describes the movement of the wrist and ankle joints in relation to the body. With this, we show that SportsPose contains more movement than the Human3.6M and 3DPW datasets in these extremum joints, indicating that our movements are more dynamic. The dataset with accompanying code can be downloaded from our website. We hope that SportsPose will allow researchers and practitioners to develop and evaluate more effective models for the analysis of sports performance and injury prevention. With its realistic and diverse dataset, SportsPose provides a valuable resource for advancing the state-of-the-art in pose estimation in sports. △ Less

Submitted 4 April, 2023; originally announced April 2023.

arXiv:2304.01838 [pdf, other]

BugNIST -- a Large Volumetric Dataset for Object Detection under Domain Shift

Authors: Patrick Møller Jensen, Vedrana Andersen Dahl, Carsten Gundlach, Rebecca Engberg, Hans Martin Kjer, Anders Bjorholm Dahl

Abstract: Domain shift significantly influences the performance of deep learning algorithms, particularly for object detection within volumetric 3D images. Annotated training data is essential for deep learning-based object detection. However, annotating densely packed objects is time-consuming and costly. Instead, we suggest training models on individually scanned objects, causing a domain shift between tr… ▽ More Domain shift significantly influences the performance of deep learning algorithms, particularly for object detection within volumetric 3D images. Annotated training data is essential for deep learning-based object detection. However, annotating densely packed objects is time-consuming and costly. Instead, we suggest training models on individually scanned objects, causing a domain shift between training and detection data. To address this challenge, we introduce the BugNIST dataset, comprising 9154 micro-CT volumes of 12 bug types and 388 volumes of tightly packed bug mixtures. This dataset is characterized by having objects with the same appearance in the source and target domains, which is uncommon for other benchmark datasets for domain shift. During training, individual bug volumes labeled by class are utilized, while testing employs mixtures with center point annotations and bug type labels. Together with the dataset, we provide a baseline detection analysis, with the aim of advancing the field of 3D object detection methods. △ Less

Submitted 7 July, 2024; v1 submitted 4 April, 2023; originally announced April 2023.

Comments: 31 pages, 12 figures, 5 tables

ACM Class: I.2.10; I.4.6

arXiv:2206.10241 [pdf, other]

Deep Active Latent Surfaces for Medical Geometries

Authors: Patrick M. Jensen, Udaranga Wickramasinghe, Anders B. Dahl, Pascal Fua, Vedrana A. Dahl

Abstract: Shape priors have long been known to be effective when reconstructing 3D shapes from noisy or incomplete data. When using a deep-learning based shape representation, this often involves learning a latent representation, which can be either in the form of a single global vector or of multiple local ones. The latter allows more flexibility but is prone to overfitting. In this paper, we advocate a hy… ▽ More Shape priors have long been known to be effective when reconstructing 3D shapes from noisy or incomplete data. When using a deep-learning based shape representation, this often involves learning a latent representation, which can be either in the form of a single global vector or of multiple local ones. The latter allows more flexibility but is prone to overfitting. In this paper, we advocate a hybrid approach representing shapes in terms of 3D meshes with a separate latent vector at each vertex. During training the latent vectors are constrained to have the same value, which avoids overfitting. For inference, the latent vectors are updated independently while imposing spatial regularization constraints. We show that this gives us both flexibility and generalization capabilities, which we demonstrate on several medical image processing tasks. △ Less

Submitted 21 June, 2022; originally announced June 2022.

Comments: 14 pages, 9 figures, submitted for review

arXiv:2202.00418 [pdf, other]

Review of Serial and Parallel Min-Cut/Max-Flow Algorithms for Computer Vision

Authors: Patrick M. Jensen, Niels Jeppesen, Anders B. Dahl, Vedrana A. Dahl

Abstract: Minimum cut/maximum flow (min-cut/max-flow) algorithms solve a variety of problems in computer vision and thus significant effort has been put into develo** fast min-cut/max-flow algorithms. As a result, it is difficult to choose an ideal algorithm for a given problem. Furthermore, parallel algorithms have not been thoroughly compared. In this paper, we evaluate the state-of-the-art serial and p… ▽ More Minimum cut/maximum flow (min-cut/max-flow) algorithms solve a variety of problems in computer vision and thus significant effort has been put into develo** fast min-cut/max-flow algorithms. As a result, it is difficult to choose an ideal algorithm for a given problem. Furthermore, parallel algorithms have not been thoroughly compared. In this paper, we evaluate the state-of-the-art serial and parallel min-cut/max-flow algorithms on the largest set of computer vision problems yet. We focus on generic algorithms, i.e., for unstructured graphs, but also compare with the specialized GridCut implementation. When applicable, GridCut performs best. Otherwise, the two pseudoflow algorithms, Hochbaum pseudoflow and excesses incremental breadth first search, achieves the overall best performance. The most memory efficient implementation tested is the Boykov-Kolmogorov algorithm. Amongst generic parallel algorithms, we find the bottom-up merging approach by Liu and Sun to be best, but no method is dominant. Of the generic parallel methods, only the parallel preflow push-relabel algorithm is able to efficiently scale with many processors across problem sizes, and no generic parallel method consistently outperforms serial algorithms. Finally, we provide and evaluate strategies for algorithm selection to obtain good expected performance. We make our dataset and implementations publicly available for further research. △ Less

Submitted 20 April, 2022; v1 submitted 1 February, 2022; originally announced February 2022.

Comments: 20 pages, 13 figures, accepted for publication at T-PAMI

arXiv:2109.06526 [pdf, other]

Image-Based Alignment of 3D Scans

Authors: Dolores Messer, Jakob Wilm, Eythor R. Eiriksson, Vedrana A. Dahl, Anders B. Dahl

Abstract: Full 3D scanning can efficiently be obtained using structured light scanning combined with a rotation stage. In this setting it is, however, necessary to reposition the object and scan it in different poses in order to cover the entire object. In this case, correspondence between the scans is lost, since the object was moved. In this paper, we propose a fully automatic method for aligning the scan… ▽ More Full 3D scanning can efficiently be obtained using structured light scanning combined with a rotation stage. In this setting it is, however, necessary to reposition the object and scan it in different poses in order to cover the entire object. In this case, correspondence between the scans is lost, since the object was moved. In this paper, we propose a fully automatic method for aligning the scans of an object in two different poses. This is done by matching 2D features between images from two poses and utilizing correspondence between the images and the scanned point clouds. To demonstrate the approach, we present the results of scanning three dissimilar objects. △ Less

Submitted 14 September, 2021; originally announced September 2021.

Comments: 8 pages, 7 figures

arXiv:2011.14842 [pdf, other]

Sparse-View Spectral CT Reconstruction Using Deep Learning

Authors: Wail Mustafa, Christian Kehl, Ulrik Lund Olsen, Søren Kimmer Schou Gregersen, David Malmgren-Hansen, Jan Kehres, Anders Bjorholm Dahl

Abstract: Spectral computed tomography (CT) is an emerging technology capable of providing high chemical specificity, which is crucial for many applications such as detecting threats in luggage. This type of application requires both fast and high-quality image reconstruction and is often based on sparse-view (few) projections. The conventional filtered back projection (FBP) method is fast but it produces l… ▽ More Spectral computed tomography (CT) is an emerging technology capable of providing high chemical specificity, which is crucial for many applications such as detecting threats in luggage. This type of application requires both fast and high-quality image reconstruction and is often based on sparse-view (few) projections. The conventional filtered back projection (FBP) method is fast but it produces low-quality images dominated by noise and artifacts in sparse-view CT. Iterative methods with, e.g., total variation regularizers can circumvent that but they are computationally expensive, as the computational load proportionally increases with the number of spectral channels. Instead, we propose an approach for fast reconstruction of sparse-view spectral CT data using a U-Net convolutional neural network architecture with multi-channel input and output. The network is trained to output high-quality CT images from FBP input image reconstructions. Our method is fast at run-time and because the internal convolutions are shared between the channels, the computational load increases only at the first and last layers, making it an efficient approach to process spectral data with a large number of channels. We have validated our approach using real CT scans. Our results show qualitatively and quantitatively that our approach outperforms the state-of-the-art iterative methods. Furthermore, the results indicate that the network can exploit the coupling of the channels to enhance the overall quality and robustness. △ Less

Submitted 26 March, 2021; v1 submitted 30 November, 2020; originally announced November 2020.

Comments: 13 pages, 9 figures, submitted to The IEEE Transactions on Computational Imaging

arXiv:2006.16120 [pdf, other]

doi 10.1016/j.ultramic.2021.113239

Shape from Projections via Differentiable Forward Projector for Computed Tomography

Authors: Jakeoung Koo, Anders B. Dahl, J. Andreas Bærentzen, Qiongyang Chen, Sara Bals, Vedrana A. Dahl

Abstract: In computed tomography, the reconstruction is typically obtained on a voxel grid. In this work, however, we propose a mesh-based reconstruction method. For tomographic problems, 3D meshes have mostly been studied to simulate data acquisition, but not for reconstruction, for which a 3D mesh means the inverse process of estimating shapes from projections. In this paper, we propose a differentiable f… ▽ More In computed tomography, the reconstruction is typically obtained on a voxel grid. In this work, however, we propose a mesh-based reconstruction method. For tomographic problems, 3D meshes have mostly been studied to simulate data acquisition, but not for reconstruction, for which a 3D mesh means the inverse process of estimating shapes from projections. In this paper, we propose a differentiable forward model for 3D meshes that bridge the gap between the forward model for 3D surfaces and optimization. We view the forward projection as a rendering process, and make it differentiable by extending recent work in differentiable rendering. We use the proposed forward model to reconstruct 3D shapes directly from projections. Experimental results for single-object problems show that the proposed method outperforms traditional voxel-based methods on noisy simulated data. We also apply the proposed method on electron tomography images of nanoparticles to demonstrate the applicability of the method on real data. △ Less

Submitted 11 March, 2021; v1 submitted 29 June, 2020; originally announced June 2020.

Comments: Accepted in Ultramicroscopy

arXiv:2002.03945 [pdf, other]

Dictionary-based Method for Vascular Segmentation for OCTA Images

Authors: Astrid M. E. Engberg, Vedrana A. Dahl, Anders B. Dahl

Abstract: Optical coherence tomography angiography (OCTA) is an imaging technique that allows for non-invasive investigation of the microvasculature in the retina. OCTA uses laser light reflectance to measure moving blood cells. Hereby, it visualizes the blood flow in the retina and can be used for determining regions with more or less blood flow. OCTA images contain the capillary network together with larg… ▽ More Optical coherence tomography angiography (OCTA) is an imaging technique that allows for non-invasive investigation of the microvasculature in the retina. OCTA uses laser light reflectance to measure moving blood cells. Hereby, it visualizes the blood flow in the retina and can be used for determining regions with more or less blood flow. OCTA images contain the capillary network together with larger blood vessels, and in this paper we propose a method that segments larger vessels, capillaries and background. The segmentation is obtained using a dictionary-based machine learning method that requires training data to learn the parameters of the segmentation model. Here, we give a detailed description of how the method is applied to OCTA images, and we demonstrate how it robustly labels capillaries and blood vessels and hereby provides the basis for quantifying retinal blood flow. △ Less

Submitted 10 February, 2020; originally announced February 2020.

Comments: 11 pages, 6 figures

arXiv:1908.10579 [pdf, other]

Guiding 3D U-nets with signed distance fields for creating 3D models from images

Authors: Kristine Aavild Juhl, Rasmus Reinhold Paulsen, Anders Bjorholm Dahl, Vedrana Andersen Dahl, Ole de Backer, Klaus Fuglsang Kofoed, Oscar Camara

Abstract: Morphological analysis of the left atrial appendage is an important tool to assess risk of ischemic stroke. Most deep learning approaches for 3D segmentation is guided by binary labelmaps, which results in voxelized segmentations unsuitable for morphological analysis. We propose to use signed distance fields to guide a deep network towards morphologically consistent 3D models. The proposed strateg… ▽ More Morphological analysis of the left atrial appendage is an important tool to assess risk of ischemic stroke. Most deep learning approaches for 3D segmentation is guided by binary labelmaps, which results in voxelized segmentations unsuitable for morphological analysis. We propose to use signed distance fields to guide a deep network towards morphologically consistent 3D models. The proposed strategy is evaluated on a synthetic dataset of simple geometries, as well as a set of cardiac computed tomography images containing the left atrial appendage. The proposed method produces smooth surfaces with a closer resemblance to the true surface in terms of segmentation overlap and surface distance. △ Less

Submitted 28 August, 2019; originally announced August 2019.

Comments: MIDL 2019 [arXiv:1907.08612]

Report number: MIDL/2019/ExtendedAbstract/rJgzz3Y4qV

arXiv:1810.11823 [pdf, other]

Multi-Spectral Imaging via Computed Tomography (MUSIC) - Comparing Unsupervised Spectral Segmentations for Material Differentiation

Authors: Christian Kehl, Wail Mustafa, Jan Kehres, Anders Bjorholm Dahl, Ulrik Lund Olsen

Abstract: Multi-spectral computed tomography is an emerging technology for the non-destructive identification of object materials and the study of their physical properties. Applications of this technology can be found in various scientific and industrial contexts, such as luggage scanning at airports. Material distinction and its identification is challenging, even with spectral x-ray information, due to a… ▽ More Multi-spectral computed tomography is an emerging technology for the non-destructive identification of object materials and the study of their physical properties. Applications of this technology can be found in various scientific and industrial contexts, such as luggage scanning at airports. Material distinction and its identification is challenging, even with spectral x-ray information, due to acquisition noise, tomographic reconstruction artefacts and scanning setup application constraints. We present MUSIC - and open access multi-spectral CT dataset in 2D and 3D - to promote further research in the area of material identification. We demonstrate the value of this dataset on the image analysis challenge of object segmentation purely based on the spectral response of its composing materials. In this context, we compare the segmentation accuracy of fast adaptive mean shift (FAMS) and unconstrained graph cuts on both datasets. We further discuss the impact of reconstruction artefacts and segmentation controls on the achievable results. Dataset, related software packages and further documentation are made available to the imaging community in an open-access manner to promote further data-driven research on the subject △ Less

Submitted 28 October, 2018; originally announced October 2018.

Comments: 21 pages, 24 figures (in articles), includes 2 appendices with 8 additional figures

arXiv:1809.02226 [pdf, other]

Content-based Propagation of User Markings for Interactive Segmentation of Patterned Images

Authors: Vedrana Andersen Dahl, Monica Jane Emerson, Camilla Himmelstrup Trinderup, Anders Bjorholm Dahl

Abstract: Efficient and easy segmentation of images and volumes is of great practical importance. Segmentation problems that motivate our approach originate from microscopy imaging commonly used in materials science, medicine, and biology. We formulate image segmentation as a probabilistic pixel classification problem, and we apply segmentation as a step towards characterising image content. Our method allo… ▽ More Efficient and easy segmentation of images and volumes is of great practical importance. Segmentation problems that motivate our approach originate from microscopy imaging commonly used in materials science, medicine, and biology. We formulate image segmentation as a probabilistic pixel classification problem, and we apply segmentation as a step towards characterising image content. Our method allows the user to define structures of interest by interactively marking a subset of pixels. Thanks to the real-time feedback, the user can place new markings strategically, depending on the current outcome. The final pixel classification may be obtained from a very modest user input. An important ingredient of our method is a graph that encodes image content. This graph is built in an unsupervised manner during initialisation and is based on clustering of image features. Since we combine a limited amount of user-labelled data with the clustering information obtained from the unlabelled parts of the image, our method fits in the general framework of semi-supervised learning. We demonstrate how this can be a very efficient approach to segmentation through pixel classification. △ Less

Submitted 28 September, 2020; v1 submitted 6 September, 2018; originally announced September 2018.

Comments: 9 pages, 7 figures, PDFLaTeX

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2020

arXiv:1704.04887 [pdf, other]

doi 10.1038/s41598-018-20461-7

Three Dimensional Polarimetric Neutron Tomography of Magnetic Fields

Authors: Morten Sales, Markus Strobl, Takenao Shinohara, Anton Tremsin, Luise Theil Kuhn, William R. B. Lionheart, Naeem M. Desai, Anders Bjorholm Dahl, Søren Schmidt

Abstract: Through the use of Time-of-Flight Three Dimensional Polarimetric Neutron Tomography (ToF 3DPNT) we have for the first time successfully demonstrated a technique capable of measuring and reconstructing three dimensional magnetic field strengths and directions unobtrusively and non-destructively with the potential to probe the interior of bulk samples which is not amenable otherwise. Using a pione… ▽ More Through the use of Time-of-Flight Three Dimensional Polarimetric Neutron Tomography (ToF 3DPNT) we have for the first time successfully demonstrated a technique capable of measuring and reconstructing three dimensional magnetic field strengths and directions unobtrusively and non-destructively with the potential to probe the interior of bulk samples which is not amenable otherwise. Using a pioneering polarimetric set-up for ToF neutron instrumentation in combination with a newly developed tailored reconstruction algorithm, the magnetic field generated by a current carrying solenoid has been measured and reconstructed, thereby providing the proof-of-principle of a technique able to reveal hitherto unobtainable information on the magnetic fields in the bulk of materials and devices, due to a high degree of penetration into many materials, including metals, and the sensitivity of neutron polarisation to magnetic fields. The technique puts the potential of the ToF time structure of pulsed neutron sources to full use in order to optimise the recorded information quality and reduce measurement time. △ Less

Submitted 2 February, 2018; v1 submitted 17 April, 2017; originally announced April 2017.

Comments: 12 pages, 4 figures

Journal ref: Scientific Reports, 8(1), (2018), 2214

arXiv:1411.5825 [pdf]

doi 10.1016/j.media.2014.11.010

Assessment of algorithms for mitosis detection in breast cancer histopathology images

Authors: Mitko Veta, Paul J. van Diest, Stefan M. Willems, Haibo Wang, Anant Madabhushi, Angel Cruz-Roa, Fabio Gonzalez, Anders B. L. Larsen, Jacob S. Vestergaard, Anders B. Dahl, Dan C. Cireşan, Jürgen Schmidhuber, Alessandro Giusti, Luca M. Gambardella, F. Boray Tek, Thomas Walter, Ching-Wei Wang, Satoshi Kondo, Bogdan J. Matuszewski, Frederic Precioso, Violet Snell, Josef Kittler, Teofilo E. de Campos, Adnan M. Khan, Nasir M. Rajpoot , et al. (4 additional authors not shown)

Abstract: The proliferative activity of breast tumors, which is routinely estimated by counting of mitotic figures in hematoxylin and eosin stained histology sections, is considered to be one of the most important prognostic markers. However, mitosis counting is laborious, subjective and may suffer from low inter-observer agreement. With the wider acceptance of whole slide images in pathology labs, automati… ▽ More The proliferative activity of breast tumors, which is routinely estimated by counting of mitotic figures in hematoxylin and eosin stained histology sections, is considered to be one of the most important prognostic markers. However, mitosis counting is laborious, subjective and may suffer from low inter-observer agreement. With the wider acceptance of whole slide images in pathology labs, automatic image analysis has been proposed as a potential solution for these issues. In this paper, the results from the Assessment of Mitosis Detection Algorithms 2013 (AMIDA13) challenge are described. The challenge was based on a data set consisting of 12 training and 11 testing subjects, with more than one thousand annotated mitotic figures by multiple observers. Short descriptions and results from the evaluation of eleven methods are presented. The top performing method has an error rate that is comparable to the inter-observer agreement among pathologists. △ Less

Submitted 21 November, 2014; originally announced November 2014.

Comments: 23 pages, 5 figures, accepted for publication in the journal Medical Image Analysis

Showing 1–14 of 14 results for author: Dahl, A B