-
Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
Authors:
Bingxin Ke,
Anton Obukhov,
Shengyu Huang,
Nando Metzger,
Rodrigo Caye Daudt,
Konrad Schindler
Abstract:
Monocular depth estimation is a fundamental computer vision task. Recovering 3D depth from a single image is geometrically ill-posed and requires scene understanding, so it is not surprising that the rise of deep learning has led to a breakthrough. The impressive progress of monocular depth estimators has mirrored the growth in model capacity, from relatively modest CNNs to large Transformer archi…
▽ More
Monocular depth estimation is a fundamental computer vision task. Recovering 3D depth from a single image is geometrically ill-posed and requires scene understanding, so it is not surprising that the rise of deep learning has led to a breakthrough. The impressive progress of monocular depth estimators has mirrored the growth in model capacity, from relatively modest CNNs to large Transformer architectures. Still, monocular depth estimators tend to struggle when presented with images with unfamiliar content and layout, since their knowledge of the visual world is restricted by the data seen during training, and challenged by zero-shot generalization to new domains. This motivates us to explore whether the extensive priors captured in recent generative diffusion models can enable better, more generalizable depth estimation. We introduce Marigold, a method for affine-invariant monocular depth estimation that is derived from Stable Diffusion and retains its rich prior knowledge. The estimator can be fine-tuned in a couple of days on a single GPU using only synthetic training data. It delivers state-of-the-art performance across a wide range of datasets, including over 20% performance gains in specific cases. Project page: https://marigoldmonodepth.github.io.
△ Less
Submitted 3 April, 2024; v1 submitted 4 December, 2023;
originally announced December 2023.
-
Neural Fields with Thermal Activations for Arbitrary-Scale Super-Resolution
Authors:
Alexander Becker,
Rodrigo Caye Daudt,
Nando Metzger,
Jan Dirk Wegner,
Konrad Schindler
Abstract:
Recent approaches for arbitrary-scale single image super-resolution (ASSR) have used local neural fields to represent continuous signals that can be sampled at arbitrary rates. However, the point-wise query of the neural field does not naturally match the point spread function (PSF) of a given pixel, which may cause aliasing in the super-resolved image. We present a novel way to design neural fiel…
▽ More
Recent approaches for arbitrary-scale single image super-resolution (ASSR) have used local neural fields to represent continuous signals that can be sampled at arbitrary rates. However, the point-wise query of the neural field does not naturally match the point spread function (PSF) of a given pixel, which may cause aliasing in the super-resolved image. We present a novel way to design neural fields such that points can be queried with an adaptive Gaussian PSF, so as to guarantee correct anti-aliasing at any desired output resolution. We achieve this with a novel activation function derived from Fourier theory. Querying points with a Gaussian PSF, compliant with sampling theory, does not incur any additional computational cost in our framework, unlike filtering in the image domain. With its theoretically guaranteed anti-aliasing, our method sets a new state of the art for ASSR, while being more parameter-efficient than previous methods. Notably, even a minimal version of our model still outperforms previous methods in most cases, while adding 2-4 orders of magnitude fewer parameters. Code and pretrained models are available at https://github.com/prs-eth/thera.
△ Less
Submitted 14 March, 2024; v1 submitted 29 November, 2023;
originally announced November 2023.
-
High-resolution Population Maps Derived from Sentinel-1 and Sentinel-2
Authors:
Nando Metzger,
Rodrigo Caye Daudt,
Devis Tuia,
Konrad Schindler
Abstract:
Detailed population maps play an important role in diverse fields ranging from humanitarian action to urban planning. Generating such maps in a timely and scalable manner presents a challenge, especially in data-scarce regions. To address it we have developed POPCORN, a population map** method whose only inputs are free, globally available satellite images from Sentinel-1 and Sentinel-2; and a s…
▽ More
Detailed population maps play an important role in diverse fields ranging from humanitarian action to urban planning. Generating such maps in a timely and scalable manner presents a challenge, especially in data-scarce regions. To address it we have developed POPCORN, a population map** method whose only inputs are free, globally available satellite images from Sentinel-1 and Sentinel-2; and a small number of aggregate population counts over coarse census districts for calibration. Despite the minimal data requirements our approach surpasses the map** accuracy of existing schemes, including several that rely on building footprints derived from high-resolution imagery. E.g., we were able to produce population maps for Rwanda with 100m GSD based on less than 400 regional census counts. In Kigali, those maps reach an $R^2$ score of 66% w.r.t. a ground truth reference map, with an average error of only $\pm$10 inhabitants/ha. Conveniently, POPCORN retrieves explicit maps of built-up areas and of local building occupancy rates, making the map** process interpretable and offering additional insights, for instance about the distribution of built-up, but unpopulated areas, e.g., industrial warehouses. Moreover, we find that, once trained, the model can be applied repeatedly to track population changes; and that it can be transferred to geographically similar regions, e.g., from Uganda to Rwanda). With our work we aim to democratize access to up-to-date and high-resolution population maps, recognizing that some regions faced with particularly strong population dynamics may lack the resources for costly micro-census campaigns.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
BiasBed -- Rigorous Texture Bias Evaluation
Authors:
Nikolai Kalischek,
Rodrigo C. Daudt,
Torben Peters,
Reinhard Furrer,
Jan D. Wegner,
Konrad Schindler
Abstract:
The well-documented presence of texture bias in modern convolutional neural networks has led to a plethora of algorithms that promote an emphasis on shape cues, often to support generalization to new domains. Yet, common datasets, benchmarks and general model selection strategies are missing, and there is no agreed, rigorous evaluation protocol. In this paper, we investigate difficulties and limit…
▽ More
The well-documented presence of texture bias in modern convolutional neural networks has led to a plethora of algorithms that promote an emphasis on shape cues, often to support generalization to new domains. Yet, common datasets, benchmarks and general model selection strategies are missing, and there is no agreed, rigorous evaluation protocol. In this paper, we investigate difficulties and limitations when training networks with reduced texture bias. In particular, we also show that proper evaluation and meaningful comparisons between methods are not trivial. We introduce BiasBed, a testbed for texture- and style-biased training, including multiple datasets and a range of existing algorithms. It comes with an extensive evaluation protocol that includes rigorous hypothesis testing to gauge the significance of the results, despite the considerable training instability of some style bias methods. Our extensive experiments, shed new light on the need for careful, statistically founded evaluation protocols for style bias (and beyond). E.g., we find that some algorithms proposed in the literature do not significantly mitigate the impact of style bias at all. With the release of BiasBed, we hope to foster a common understanding of consistent and meaningful comparisons, and consequently faster progress towards learning methods free of texture bias. Code is available at https://github.com/D1noFuzi/BiasBed
△ Less
Submitted 24 March, 2023; v1 submitted 23 November, 2022;
originally announced November 2022.
-
Guided Depth Super-Resolution by Deep Anisotropic Diffusion
Authors:
Nando Metzger,
Rodrigo Caye Daudt,
Konrad Schindler
Abstract:
Performing super-resolution of a depth image using the guidance from an RGB image is a problem that concerns several fields, such as robotics, medical imaging, and remote sensing. While deep learning methods have achieved good results in this problem, recent work highlighted the value of combining modern methods with more formal frameworks. In this work, we propose a novel approach which combines…
▽ More
Performing super-resolution of a depth image using the guidance from an RGB image is a problem that concerns several fields, such as robotics, medical imaging, and remote sensing. While deep learning methods have achieved good results in this problem, recent work highlighted the value of combining modern methods with more formal frameworks. In this work, we propose a novel approach which combines guided anisotropic diffusion with a deep convolutional network and advances the state of the art for guided depth super-resolution. The edge transferring/enhancing properties of the diffusion are boosted by the contextual reasoning capabilities of modern networks, and a strict adjustment step guarantees perfect adherence to the source image. We achieve unprecedented results in three commonly used benchmarks for guided depth super-resolution. The performance gain compared to other methods is the largest at larger scales, such as x32 scaling. Code (https://github.com/prs-eth/Diffusion-Super-Resolution) for the proposed method is available to promote reproducibility of our results.
△ Less
Submitted 28 March, 2023; v1 submitted 21 November, 2022;
originally announced November 2022.
-
Fine-grained Population Map** from Coarse Census Counts and Open Geodata
Authors:
Nando Metzger,
John E. Vargas-Muñoz,
Rodrigo C. Daudt,
Benjamin Kellenberger,
Thao Ton-That Whelan,
Ferda Ofli,
Muhammad Imran,
Konrad Schindler,
Devis Tuia
Abstract:
Fine-grained population maps are needed in several domains, like urban planning, environmental monitoring, public health, and humanitarian operations. Unfortunately, in many countries only aggregate census counts over large spatial units are collected, moreover, these are not always up-to-date. We present POMELO, a deep learning model that employs coarse census counts and open geodata to estimate…
▽ More
Fine-grained population maps are needed in several domains, like urban planning, environmental monitoring, public health, and humanitarian operations. Unfortunately, in many countries only aggregate census counts over large spatial units are collected, moreover, these are not always up-to-date. We present POMELO, a deep learning model that employs coarse census counts and open geodata to estimate fine-grained population maps with 100m ground sampling distance. Moreover, the model can also estimate population numbers when no census counts at all are available, by generalizing across countries. In a series of experiments for several countries in sub-Saharan Africa, the maps produced with POMELOare in good agreement with the most detailed available reference counts: disaggregation of coarse census counts reaches R2 values of 85-89%; unconstrained prediction in the absence of any counts reaches 48-69%.
△ Less
Submitted 8 November, 2022;
originally announced November 2022.
-
Satellite-based high-resolution maps of cocoa planted area for Côte d'Ivoire and Ghana
Authors:
Nikolai Kalischek,
Nico Lang,
Cécile Renier,
Rodrigo Caye Daudt,
Thomas Addoah,
William Thompson,
Wilma J. Blaser-Hart,
Rachael Garrett,
Konrad Schindler,
Jan D. Wegner
Abstract:
Côte d'Ivoire and Ghana, the world's largest producers of cocoa, account for two thirds of the global cocoa production. In both countries, cocoa is the primary perennial crop, providing income to almost two million farmers. Yet precise maps of cocoa planted area are missing, hindering accurate quantification of expansion in protected areas, production and yields, and limiting information available…
▽ More
Côte d'Ivoire and Ghana, the world's largest producers of cocoa, account for two thirds of the global cocoa production. In both countries, cocoa is the primary perennial crop, providing income to almost two million farmers. Yet precise maps of cocoa planted area are missing, hindering accurate quantification of expansion in protected areas, production and yields, and limiting information available for improved sustainability governance. Here, we combine cocoa plantation data with publicly available satellite imagery in a deep learning framework and create high-resolution maps of cocoa plantations for both countries, validated in situ. Our results suggest that cocoa cultivation is an underlying driver of over 37% and 13% of forest loss in protected areas in Côte d'Ivoire and Ghana, respectively, and that official reports substantially underestimate the planted area, up to 40% in Ghana. These maps serve as a crucial building block to advance understanding of conservation and economic development in cocoa producing regions.
△ Less
Submitted 9 May, 2023; v1 submitted 13 June, 2022;
originally announced June 2022.
-
Recognition of Unseen Bird Species by Learning from Field Guides
Authors:
Andrés C. Rodríguez,
Stefano D'Aronco,
Rodrigo Caye Daudt,
Jan D. Wegner,
Konrad Schindler
Abstract:
We exploit field guides to learn bird species recognition, in particular zero-shot recognition of unseen species. Illustrations contained in field guides deliberately focus on discriminative properties of each species, and can serve as side information to transfer knowledge from seen to unseen bird species. We study two approaches: (1) a contrastive encoding of illustrations, which can be fed into…
▽ More
We exploit field guides to learn bird species recognition, in particular zero-shot recognition of unseen species. Illustrations contained in field guides deliberately focus on discriminative properties of each species, and can serve as side information to transfer knowledge from seen to unseen bird species. We study two approaches: (1) a contrastive encoding of illustrations, which can be fed into standard zero-shot learning schemes; and (2) a novel method that leverages the fact that illustrations are also images and as such structurally more similar to photographs than other kinds of side information. Our results show that illustrations from field guides, which are readily available for a wide range of species, are indeed a competitive source of side information for zero-shot learning. On a subset of the iNaturalist2021 dataset with 749 seen and 739 unseen species, we obtain a classification accuracy of unseen bird species of $12\%$ @top-1 and $38\%$ @top-10, which shows the potential of field guides for challenging real-world scenarios with many species. Our code is available at https://github.com/ac-rodriguez/zsl_billow
△ Less
Submitted 2 November, 2023; v1 submitted 3 June, 2022;
originally announced June 2022.
-
FiLM-Ensemble: Probabilistic Deep Learning via Feature-wise Linear Modulation
Authors:
Mehmet Ozgur Turkoglu,
Alexander Becker,
Hüseyin Anil Gündüz,
Mina Rezaei,
Bernd Bischl,
Rodrigo Caye Daudt,
Stefano D'Aronco,
Jan Dirk Wegner,
Konrad Schindler
Abstract:
The ability to estimate epistemic uncertainty is often crucial when deploying machine learning in the real world, but modern methods often produce overconfident, uncalibrated uncertainty predictions. A common approach to quantify epistemic uncertainty, usable across a wide class of prediction models, is to train a model ensemble. In a naive implementation, the ensemble approach has high computatio…
▽ More
The ability to estimate epistemic uncertainty is often crucial when deploying machine learning in the real world, but modern methods often produce overconfident, uncalibrated uncertainty predictions. A common approach to quantify epistemic uncertainty, usable across a wide class of prediction models, is to train a model ensemble. In a naive implementation, the ensemble approach has high computational cost and high memory demand. This challenges in particular modern deep learning, where even a single deep network is already demanding in terms of compute and memory, and has given rise to a number of attempts to emulate the model ensemble without actually instantiating separate ensemble members. We introduce FiLM-Ensemble, a deep, implicit ensemble method based on the concept of Feature-wise Linear Modulation (FiLM). That technique was originally developed for multi-task learning, with the aim of decoupling different tasks. We show that the idea can be extended to uncertainty quantification: by modulating the network activations of a single deep network with FiLM, one obtains a model ensemble with high diversity, and consequently well-calibrated estimates of epistemic uncertainty, with low computational overhead in comparison. Empirically, FiLM-Ensemble outperforms other implicit ensemble methods, and it and comes very close to the upper bound of an explicit ensemble of networks (sometimes even beating it), at a fraction of the memory cost.
△ Less
Submitted 19 December, 2022; v1 submitted 31 May, 2022;
originally announced June 2022.
-
Urban Change Forecasting from Satellite Images
Authors:
Nando Metzger,
Mehmet Özgür Türkoglu,
Rodrigo Caye Daudt,
Jan Dirk Wegner,
Konrad Schindler
Abstract:
Forecasting where and when new buildings will emerge is a rather unexplored topic, but one that is very useful in many disciplines such as urban planning, agriculture, resource management, and even autonomous flying. In the present work, we present a method that accomplishes this task with a deep neural network and a custom pretraining procedure. In Stage 1, a U-Net backbone is pretrained within a…
▽ More
Forecasting where and when new buildings will emerge is a rather unexplored topic, but one that is very useful in many disciplines such as urban planning, agriculture, resource management, and even autonomous flying. In the present work, we present a method that accomplishes this task with a deep neural network and a custom pretraining procedure. In Stage 1, a U-Net backbone is pretrained within a Siamese network architecture that aims to solve a (building) change detection task. In Stage 2, the backbone is repurposed to forecast the emergence of new buildings based solely on one image acquired before its construction. Furthermore, we also present a model that forecasts the time range within which the change will occur. We validate our approach using the SpaceNet7 dataset, which covers an area of 960 km^2 at 24 points in time across two years. In our experiments, we found that our proposed pretraining method consistently outperforms the traditional pretraining using the ImageNet dataset. We also show that it is to some degree possible to predict in advance when building changes will occur.
△ Less
Submitted 18 September, 2023; v1 submitted 27 April, 2022;
originally announced April 2022.
-
Weakly Supervised Change Detection Using Guided Anisotropic Difusion
Authors:
Rodrigo Caye Daudt,
Bertrand Le Saux,
Alexandre Boulch,
Yann Gousseau
Abstract:
Large scale datasets created from crowdsourced labels or openly available data have become crucial to provide training data for large scale learning algorithms. While these datasets are easier to acquire, the data are frequently noisy and unreliable, which is motivating research on weakly supervised learning techniques. In this paper we propose original ideas that help us to leverage such datasets…
▽ More
Large scale datasets created from crowdsourced labels or openly available data have become crucial to provide training data for large scale learning algorithms. While these datasets are easier to acquire, the data are frequently noisy and unreliable, which is motivating research on weakly supervised learning techniques. In this paper we propose original ideas that help us to leverage such datasets in the context of change detection. First, we propose the guided anisotropic diffusion (GAD) algorithm, which improves semantic segmentation results using the input images as guides to perform edge preserving filtering. We then show its potential in two weakly-supervised learning strategies tailored for change detection. The first strategy is an iterative learning method that combines model optimisation and data cleansing using GAD to extract the useful information from a large scale change detection dataset generated from open vector data. The second one incorporates GAD within a novel spatial attention layer that increases the accuracy of weakly supervised networks trained to perform pixel-level predictions from image-level labels. Improvements with respect to state-of-the-art are demonstrated on 4 different public datasets.
△ Less
Submitted 31 December, 2021;
originally announced December 2021.
-
Map** Vulnerable Populations with AI
Authors:
Benjamin Kellenberger,
John E. Vargas-Muñoz,
Devis Tuia,
Rodrigo C. Daudt,
Konrad Schindler,
Thao T-T Whelan,
Brenda Ayo,
Ferda Ofli,
Muhammad Imran
Abstract:
Humanitarian actions require accurate information to efficiently delegate support operations. Such information can be maps of building footprints, building functions, and population densities. While the access to this information is comparably easy in industrialized countries thanks to reliable census data and national geo-data infrastructures, this is not the case for develo** countries, where…
▽ More
Humanitarian actions require accurate information to efficiently delegate support operations. Such information can be maps of building footprints, building functions, and population densities. While the access to this information is comparably easy in industrialized countries thanks to reliable census data and national geo-data infrastructures, this is not the case for develo** countries, where that data is often incomplete or outdated. Building maps derived from remote sensing images may partially remedy this challenge in such countries, but are not always accurate due to different landscape configurations and lack of validation data. Even when they exist, building footprint layers usually do not reveal more fine-grained building properties, such as the number of stories or the building's function (e.g., office, residential, school, etc.). In this project we aim to automate building footprint and function map** using heterogeneous data sources. In a first step, we intend to delineate buildings from satellite data, using deep learning models for semantic image segmentation. Building functions shall be retrieved by parsing social media data like for instance tweets, as well as ground-based imagery, to automatically identify different buildings functions and retrieve further information such as the number of building stories. Building maps augmented with those additional attributes make it possible to derive more accurate population density maps, needed to support the targeted provision of humanitarian aid.
△ Less
Submitted 29 July, 2021;
originally announced July 2021.
-
Guided Anisotropic Diffusion and Iterative Learning for Weakly Supervised Change Detection
Authors:
Rodrigo Caye Daudt,
Bertrand Le Saux,
Alexandre Boulch,
Yann Gousseau
Abstract:
Large scale datasets created from user labels or openly available data have become crucial to provide training data for large scale learning algorithms. While these datasets are easier to acquire, the data are frequently noisy and unreliable, which is motivating research on weakly supervised learning techniques. In this paper we propose an iterative learning method that extracts the useful informa…
▽ More
Large scale datasets created from user labels or openly available data have become crucial to provide training data for large scale learning algorithms. While these datasets are easier to acquire, the data are frequently noisy and unreliable, which is motivating research on weakly supervised learning techniques. In this paper we propose an iterative learning method that extracts the useful information from a large scale change detection dataset generated from open vector data to train a fully convolutional network which surpasses the performance obtained by naive supervised learning. We also propose the guided anisotropic diffusion algorithm, which improves semantic segmentation results using the input images as guides to perform edge preserving filtering, and is used in conjunction with the iterative training method to improve results.
△ Less
Submitted 17 April, 2019;
originally announced April 2019.
-
Urban Change Detection for Multispectral Earth Observation Using Convolutional Neural Networks
Authors:
Rodrigo Caye Daudt,
Bertrand Le Saux,
Alexandre Boulch,
Yann Gousseau
Abstract:
The Copernicus Sentinel-2 program now provides multispectral images at a global scale with a high revisit rate. In this paper we explore the usage of convolutional neural networks for urban change detection using such multispectral images. We first present the new change detection dataset that was used for training the proposed networks, which will be openly available to serve as a benchmark. The…
▽ More
The Copernicus Sentinel-2 program now provides multispectral images at a global scale with a high revisit rate. In this paper we explore the usage of convolutional neural networks for urban change detection using such multispectral images. We first present the new change detection dataset that was used for training the proposed networks, which will be openly available to serve as a benchmark. The Onera Satellite Change Detection (OSCD) dataset is composed of pairs of multispectral aerial images, and the changes were manually annotated at pixel level. We then propose two architectures to detect changes, Siamese and Early Fusion, and compare the impact of using different numbers of spectral channels as inputs. These architectures are trained from scratch using the provided dataset.
△ Less
Submitted 19 October, 2018;
originally announced October 2018.
-
Fully Convolutional Siamese Networks for Change Detection
Authors:
Rodrigo Caye Daudt,
Bertrand Le Saux,
Alexandre Boulch
Abstract:
This paper presents three fully convolutional neural network architectures which perform change detection using a pair of coregistered images. Most notably, we propose two Siamese extensions of fully convolutional networks which use heuristics about the current problem to achieve the best results in our tests on two open change detection datasets, using both RGB and multispectral images. We show t…
▽ More
This paper presents three fully convolutional neural network architectures which perform change detection using a pair of coregistered images. Most notably, we propose two Siamese extensions of fully convolutional networks which use heuristics about the current problem to achieve the best results in our tests on two open change detection datasets, using both RGB and multispectral images. We show that our system is able to learn from scratch using annotated change detection images. Our architectures achieve better performance than previously proposed methods, while being at least 500 times faster than related systems. This work is a step towards efficient processing of data from large scale Earth observation systems such as Copernicus or Landsat.
△ Less
Submitted 19 October, 2018;
originally announced October 2018.
-
Multitask Learning for Large-scale Semantic Change Detection
Authors:
Rodrigo Caye Daudt,
Bertrand Le Saux,
Alexandre Boulch,
Yann Gousseau
Abstract:
Change detection is one of the main problems in remote sensing, and is essential to the accurate processing and understanding of the large scale Earth observation data available through programs such as Sentinel and Landsat. Most of the recently proposed change detection methods bring deep learning to this context, but openly available change detection datasets are still very scarce, which limits…
▽ More
Change detection is one of the main problems in remote sensing, and is essential to the accurate processing and understanding of the large scale Earth observation data available through programs such as Sentinel and Landsat. Most of the recently proposed change detection methods bring deep learning to this context, but openly available change detection datasets are still very scarce, which limits the methods that can be proposed and tested. In this paper we present the first large scale high resolution semantic change detection (HRSCD) dataset, which enables the usage of deep learning methods for semantic change detection. The dataset contains coregistered RGB image pairs, pixel-wise change information and land cover information. We then propose several methods using fully convolutional neural networks to perform semantic change detection. Most notably, we present a network architecture that performs change detection and land cover map** simultaneously, while using the predicted land cover information to help to predict changes. We also describe a sequential training scheme that allows this network to be trained without setting a hyperparameter that balances different loss functions and achieves the best overall results.
△ Less
Submitted 28 August, 2019; v1 submitted 19 October, 2018;
originally announced October 2018.