-
NVS-MonoDepth: Improving Monocular Depth Prediction with Novel View Synthesis
Authors:
Zuria Bauer,
Zuoyue Li,
Sergio Orts-Escolano,
Miguel Cazorla,
Marc Pollefeys,
Martin R. Oswald
Abstract:
Building upon the recent progress in novel view synthesis, we propose its application to improve monocular depth estimation. In particular, we propose a novel training method split in three main steps. First, the prediction results of a monocular depth network are warped to an additional view point. Second, we apply an additional image synthesis network, which corrects and improves the quality of…
▽ More
Building upon the recent progress in novel view synthesis, we propose its application to improve monocular depth estimation. In particular, we propose a novel training method split in three main steps. First, the prediction results of a monocular depth network are warped to an additional view point. Second, we apply an additional image synthesis network, which corrects and improves the quality of the warped RGB image. The output of this network is required to look as similar as possible to the ground-truth view by minimizing the pixel-wise RGB reconstruction error. Third, we reapply the same monocular depth estimation onto the synthesized second view point and ensure that the depth predictions are consistent with the associated ground truth depth. Experimental results prove that our method achieves state-of-the-art or comparable performance on the KITTI and NYU-Depth-v2 datasets with a lightweight and simple vanilla U-Net architecture.
△ Less
Submitted 22 December, 2021;
originally announced December 2021.
-
A Large Visual, Qualitative and Quantitative Dataset of Web Pages
Authors:
Christian Mejia-Escobar,
Miguel Cazorla,
Ester Martinez-Martin
Abstract:
The World Wide Web is not only one of the most important platforms of communication and information at present, but also an area of growing interest for scientific research. This motivates a lot of work and projects that require large amounts of data. However, there is no dataset that integrates the parameters and visual appearance of Web pages, because its collection is a costly task in terms of…
▽ More
The World Wide Web is not only one of the most important platforms of communication and information at present, but also an area of growing interest for scientific research. This motivates a lot of work and projects that require large amounts of data. However, there is no dataset that integrates the parameters and visual appearance of Web pages, because its collection is a costly task in terms of time and effort. With the support of various computer tools and programming scripts, we have created a large dataset of 49,438 Web pages. It consists of visual, textual and numerical data types, includes all countries worldwide, and considers a broad range of topics such as art, entertainment, economy, business, education, government, news, media, science, and environment, covering different cultural characteristics and varied design preferences. In this paper, we describe the process of collecting, debugging and publishing the final product, which is freely available. To demonstrate the usefulness of our dataset, we expose a binary classification model for detecting error Web pages, and a multi-class Web subject-based categorization, both problems using convolutional neural networks.
△ Less
Submitted 14 May, 2021;
originally announced May 2021.
-
UMLS-ChestNet: A deep convolutional neural network for radiological findings, differential diagnoses and localizations of COVID-19 in chest x-rays
Authors:
Germán González,
Aurelia Bustos,
José María Salinas,
María de la Iglesia-Vaya,
Joaquín Galant,
Carlos Cano-Espinosa,
Xavier Barber,
Domingo Orozco-Beltrán,
Miguel Cazorla,
Antonio Pertusa
Abstract:
In this work we present a method for the detection of radiological findings, their location and differential diagnoses from chest x-rays. Unlike prior works that focus on the detection of few pathologies, we use a hierarchical taxonomy mapped to the Unified Medical Language System (UMLS) terminology to identify 189 radiological findings, 22 differential diagnosis and 122 anatomic locations, includ…
▽ More
In this work we present a method for the detection of radiological findings, their location and differential diagnoses from chest x-rays. Unlike prior works that focus on the detection of few pathologies, we use a hierarchical taxonomy mapped to the Unified Medical Language System (UMLS) terminology to identify 189 radiological findings, 22 differential diagnosis and 122 anatomic locations, including ground glass opacities, infiltrates, consolidations and other radiological findings compatible with COVID-19. We train the system on one large database of 92,594 frontal chest x-rays (AP or PA, standing, supine or decubitus) and a second database of 2,065 frontal images of COVID-19 patients identified by at least one positive Polymerase Chain Reaction (PCR) test. The reference labels are obtained through natural language processing of the radiological reports. On 23,159 test images, the proposed neural network obtains an AUC of 0.94 for the diagnosis of COVID-19. To our knowledge, this work uses the largest chest x-ray dataset of COVID-19 positive cases to date and is the first one to use a hierarchical labeling schema and to provide interpretability of the results, not only by using network attention methods, but also by indicating the radiological findings that have led to the diagnosis.
△ Less
Submitted 6 June, 2020;
originally announced June 2020.
-
BIMCV COVID-19+: a large annotated dataset of RX and CT images from COVID-19 patients
Authors:
Maria de la Iglesia Vayá,
Jose Manuel Saborit,
Joaquim Angel Montell,
Antonio Pertusa,
Aurelia Bustos,
Miguel Cazorla,
Joaquin Galant,
Xavier Barber,
Domingo Orozco-Beltrán,
Francisco García-García,
Marisa Caparrós,
Germán González,
Jose María Salinas
Abstract:
This paper describes BIMCV COVID-19+, a large dataset from the Valencian Region Medical ImageBank (BIMCV) containing chest X-ray images CXR (CR, DX) and computed tomography (CT) imaging of COVID-19+ patients along with their radiological findings and locations, pathologies, radiological reports (in Spanish), DICOM metadata, Polymerase chain reaction (PCR), Immunoglobulin G (IgG) and Immunoglobulin…
▽ More
This paper describes BIMCV COVID-19+, a large dataset from the Valencian Region Medical ImageBank (BIMCV) containing chest X-ray images CXR (CR, DX) and computed tomography (CT) imaging of COVID-19+ patients along with their radiological findings and locations, pathologies, radiological reports (in Spanish), DICOM metadata, Polymerase chain reaction (PCR), Immunoglobulin G (IgG) and Immunoglobulin M (IgM) diagnostic antibody tests. The findings have been mapped onto standard Unified Medical Language System (UMLS) terminology and cover a wide spectrum of thoracic entities, unlike the considerably more reduced number of entities annotated in previous datasets. Images are stored in high resolution and entities are localized with anatomical labels and stored in a Medical Imaging Data Structure (MIDS) format. In addition, 10 images were annotated by a team of radiologists to include semantic segmentation of radiological findings. This first iteration of the database includes 1,380 CX, 885 DX and 163 CT studies from 1,311 COVID-19+ patients. This is, to the best of our knowledge, the largest COVID-19+ dataset of images available in an open format. The dataset can be downloaded from http://bimcv.cipf.es/bimcv-projects/bimcv-covid19.
△ Less
Submitted 5 June, 2020; v1 submitted 1 June, 2020;
originally announced June 2020.
-
Computer Aided Detection for Pulmonary Embolism Challenge (CAD-PE)
Authors:
Germán González,
Daniel Jimenez-Carretero,
Sara Rodríguez-López,
Carlos Cano-Espinosa,
Miguel Cazorla,
Tanya Agarwal,
Vinit Agarwal,
Nima Tajbakhsh,
Michael B. Gotway,
Jianming Liang,
Mojtaba Masoudi,
Noushin Eftekhari,
Mahdi Saadatmand,
Hamid-Reza Pourreza,
Patricia Fraga-Rivas,
Eduardo Fraile,
Frank J. Rybicki,
Ara Kassarjian,
Raúl San José Estépar,
Maria J. Ledesma-Carbayo
Abstract:
Rationale: Computer aided detection (CAD) algorithms for Pulmonary Embolism (PE) algorithms have been shown to increase radiologists' sensitivity with a small increase in specificity. However, CAD for PE has not been adopted into clinical practice, likely because of the high number of false positives current CAD software produces. Objective: To generate a database of annotated computed tomography…
▽ More
Rationale: Computer aided detection (CAD) algorithms for Pulmonary Embolism (PE) algorithms have been shown to increase radiologists' sensitivity with a small increase in specificity. However, CAD for PE has not been adopted into clinical practice, likely because of the high number of false positives current CAD software produces. Objective: To generate a database of annotated computed tomography pulmonary angiographies, use it to compare the sensitivity and false positive rate of current algorithms and to develop new methods that improve such metrics. Methods: 91 Computed tomography pulmonary angiography scans were annotated by at least one radiologist by segmenting all pulmonary emboli visible on the study. 20 annotated CTPAs were open to the public in the form of a medical image analysis challenge. 20 more were kept for evaluation purposes. 51 were made available post-challenge. 8 submissions, 6 of them novel, were evaluated on the 20 evaluation CTPAs. Performance was measured as per embolus sensitivity vs. false positives per scan curve. Results: The best algorithms achieved a per-embolus sensitivity of 75% at 2 false positives per scan (fps) or of 70% at 1 fps, outperforming the state of the art. Deep learning approaches outperformed traditional machine learning ones, and their performance improved with the number of training cases. Significance: Through this work and challenge we have improved the state-of-the art of computer aided detection algorithms for pulmonary embolism. An open database and an evaluation benchmark for such algorithms have been generated, easing the development of further improvements. Implications on clinical practice will need further research.
△ Less
Submitted 30 March, 2020;
originally announced March 2020.
-
Large-scale Multiview 3D Hand Pose Dataset
Authors:
Francisco Gomez-Donoso,
Sergio Orts-Escolano,
Miguel Cazorla
Abstract:
Accurate hand pose estimation at joint level has several uses on human-robot interaction, user interfacing and virtual reality applications. Yet, it currently is not a solved problem. The novel deep learning techniques could make a great improvement on this matter but they need a huge amount of annotated data. The hand pose datasets released so far present some issues that make them impossible to…
▽ More
Accurate hand pose estimation at joint level has several uses on human-robot interaction, user interfacing and virtual reality applications. Yet, it currently is not a solved problem. The novel deep learning techniques could make a great improvement on this matter but they need a huge amount of annotated data. The hand pose datasets released so far present some issues that make them impossible to use on deep learning methods such as the few number of samples, high-level abstraction annotations or samples consisting in depth maps. In this work, we introduce a multiview hand pose dataset in which we provide color images of hands and different kind of annotations for each, i.e the bounding box and the 2D and 3D location on the joints in the hand. Besides, we introduce a simple yet accurate deep learning architecture for real-time robust 2D hand pose estimation.
△ Less
Submitted 18 July, 2017; v1 submitted 12 July, 2017;
originally announced July 2017.
-
Semantic Localization in the PCL library
Authors:
Jesús Martínez-Gómez,
Vicente Morell,
Miguel Cazorla,
Ismael García-Varea
Abstract:
The semantic localization problem in robotics consists in determining the place where a robot is located by means of semantic categories. The problem is usually addressed as a supervised classification process, where input data correspond to robot perceptions while classes to semantic categories, like kitchen or corridor.
In this paper we propose a framework, implemented in the PCL library, whic…
▽ More
The semantic localization problem in robotics consists in determining the place where a robot is located by means of semantic categories. The problem is usually addressed as a supervised classification process, where input data correspond to robot perceptions while classes to semantic categories, like kitchen or corridor.
In this paper we propose a framework, implemented in the PCL library, which provides a set of valuable tools to easily develop and evaluate semantic localization systems. The implementation includes the generation of 3D global descriptors following a Bag-of-Words approach. This allows the generation of dimensionality-fixed descriptors from any type of keypoint detector and feature extractor combinations. The framework has been designed, structured and implemented in order to be easily extended with different keypoint detectors, feature extractors as well as classification models.
The proposed framework has also been used to evaluate the performance of a set of already implemented descriptors, when used as input for a specific semantic localization system. The results obtained are discussed paying special attention to the internal parameters of the BoW descriptor generation process. Moreover, we also review the combination of some keypoint detectors with different 3D descriptor generation techniques.
△ Less
Submitted 29 January, 2016;
originally announced January 2016.