Search | arXiv e-print repository

Better, Not Just More: Data-Centric Machine Learning for Earth Observation

Authors: Ribana Roscher, Marc Rußwurm, Caroline Gevaert, Michael Kampffmeyer, Jefersson A. dos Santos, Maria Vakalopoulou, Ronny Hänsch, Stine Hansen, Keiller Nogueira, Jonathan Prexl, Devis Tuia

Abstract: Recent developments and research in modern machine learning have led to substantial improvements in the geospatial field. Although numerous deep learning architectures and models have been proposed, the majority of them have been solely developed on benchmark datasets that lack strong real-world relevance. Furthermore, the performance of many methods has already saturated on these datasets. We arg… ▽ More Recent developments and research in modern machine learning have led to substantial improvements in the geospatial field. Although numerous deep learning architectures and models have been proposed, the majority of them have been solely developed on benchmark datasets that lack strong real-world relevance. Furthermore, the performance of many methods has already saturated on these datasets. We argue that a shift from a model-centric view to a complementary data-centric perspective is necessary for further improvements in accuracy, generalization ability, and real impact on end-user applications. Furthermore, considering the entire machine learning cycle - from problem definition to model deployment with feedback - is crucial for enhancing machine learning models that can be reliable in unforeseen situations. This work presents a definition as well as a precise categorization and overview of automated data-centric learning approaches for geospatial data. It highlights the complementary role of data-centric learning with respect to model-centric in the larger machine learning deployment cycle. We review papers across the entire geospatial field and categorize them into different groups. A set of representative experiments shows concrete implementation examples. These examples provide concrete steps to act on geospatial data with data-centric machine learning approaches. △ Less

Submitted 22 June, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

arXiv:2311.09276 [pdf, other]

Leveraging Citizen Science for Flood Extent Detection using Machine Learning Benchmark Dataset

Authors: Muthukumaran Ramasubramanian, Iksha Gurung, Shubhankar Gahlot, Ronny Hänsch, Andrew L. Molthan, Manil Maskey

Abstract: Accurate detection of inundated water extents during flooding events is crucial in emergency response decisions and aids in recovery efforts. Satellite Remote Sensing data provides a global framework for detecting flooding extents. Specifically, Sentinel-1 C-Band Synthetic Aperture Radar (SAR) imagery has proven to be useful in detecting water bodies due to low backscatter of water features in bot… ▽ More Accurate detection of inundated water extents during flooding events is crucial in emergency response decisions and aids in recovery efforts. Satellite Remote Sensing data provides a global framework for detecting flooding extents. Specifically, Sentinel-1 C-Band Synthetic Aperture Radar (SAR) imagery has proven to be useful in detecting water bodies due to low backscatter of water features in both co-polarized and cross-polarized SAR imagery. However, increased backscatter can be observed in certain flooded regions such as presence of infrastructure and trees - rendering simple methods such as pixel intensity thresholding and time-series differencing inadequate. Machine Learning techniques has been leveraged to precisely capture flood extents in flooded areas with bumps in backscatter but needs high amounts of labelled data to work desirably. Hence, we created a labeled known water body extent and flooded area extents during known flooding events covering about 36,000 sq. kilometers of regions within mainland U.S and Bangladesh. Further, We also leveraged citizen science by open-sourcing the dataset and hosting an open competition based on the dataset to rapidly prototype flood extent detection using community generated models. In this paper we present the information about the dataset, the data processing pipeline, a baseline model and the details about the competition, along with discussion on winning approaches. We believe the dataset adds to already existing datasets based on Sentinel-1C SAR data and leads to more robust modeling of flood extents. We also hope the results from the competition pushes the research in flood extent detection further. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: 13 pages in AGU format, 7 figures

arXiv:2310.19231 [pdf, other]

doi 10.1109/MGRS.2023.3293459

There Are No Data Like More Data- Datasets for Deep Learning in Earth Observation

Authors: Michael Schmitt, Seyed Ali Ahmadi, Yonghao Xu, Gulsen Taskin, Ujjwal Verma, Francescopaolo Sica, Ronny Hansch

Abstract: Carefully curated and annotated datasets are the foundation of machine learning, with particularly data-hungry deep neural networks forming the core of what is often called Artificial Intelligence (AI). Due to the massive success of deep learning applied to Earth Observation (EO) problems, the focus of the community has been largely on the development of ever-more sophisticated deep neural network… ▽ More Carefully curated and annotated datasets are the foundation of machine learning, with particularly data-hungry deep neural networks forming the core of what is often called Artificial Intelligence (AI). Due to the massive success of deep learning applied to Earth Observation (EO) problems, the focus of the community has been largely on the development of ever-more sophisticated deep neural network architectures and training strategies largely ignoring the overall importance of datasets. For that purpose, numerous task-specific datasets have been created that were largely ignored by previously published review articles on AI for Earth observation. With this article, we want to change the perspective and put machine learning datasets dedicated to Earth observation data and applications into the spotlight. Based on a review of the historical developments, currently available resources are described and a perspective for future developments is formed. We hope to contribute to an understanding that the nature of our data is what distinguishes the Earth observation community from many other communities that apply deep learning techniques to image data, and that a detailed understanding of EO data peculiarities is among the core competencies of our discipline. △ Less

Submitted 29 October, 2023; originally announced October 2023.

Journal ref: Published in IEEE Geoscience and Remote Sensing Magazine, vol. 11, no. 3, pp. 63-97, Sept. 2023

arXiv:2209.12480 [pdf, other]

EOD: The IEEE GRSS Earth Observation Database

Authors: Michael Schmitt, Pedram Ghamisi, Naoto Yokoya, Ronny Hänsch

Abstract: In the era of deep learning, annotated datasets have become a crucial asset to the remote sensing community. In the last decade, a plethora of different datasets was published, each designed for a specific data type and with a specific task or application in mind. In the jungle of remote sensing datasets, it can be hard to keep track of what is available already. With this paper, we introduce EOD… ▽ More In the era of deep learning, annotated datasets have become a crucial asset to the remote sensing community. In the last decade, a plethora of different datasets was published, each designed for a specific data type and with a specific task or application in mind. In the jungle of remote sensing datasets, it can be hard to keep track of what is available already. With this paper, we introduce EOD - the IEEE GRSS Earth Observation Database (EOD) - an interactive online platform for cataloguing different types of datasets leveraging remote sensing imagery. △ Less

Submitted 26 September, 2022; originally announced September 2022.

Comments: This paper contains the description of the IEEE-GRSS Earth Observation Database

arXiv:2202.03498 [pdf, other]

doi 10.1109/TGRS.2021.3131418

Random Ferns for Semantic Segmentation of PolSAR Images

Authors: Pengchao Wei, Ronny Hänsch

Abstract: Random Ferns -- as a less known example of Ensemble Learning -- have been successfully applied in many Computer Vision applications ranging from keypoint matching to object detection. This paper extends the Random Fern framework to the semantic segmentation of polarimetric synthetic aperture radar images. By using internal projections that are defined over the space of Hermitian matrices, the prop… ▽ More Random Ferns -- as a less known example of Ensemble Learning -- have been successfully applied in many Computer Vision applications ranging from keypoint matching to object detection. This paper extends the Random Fern framework to the semantic segmentation of polarimetric synthetic aperture radar images. By using internal projections that are defined over the space of Hermitian matrices, the proposed classifier can be directly applied to the polarimetric covariance matrices without the need to explicitly compute predefined image features. Furthermore, two distinct optimization strategies are proposed: The first based on pre-selection and grou** of internal binary features before the creation of the classifier; and the second based on iteratively improving the properties of a given Random Fern. Both strategies are able to boost the performance by filtering features that are either redundant or have a low information content and by grou** correlated features to best fulfill the independence assumptions made by the Random Fern classifier. Experiments show that results can be achieved that are similar to a more complex Random Forest model and competitive to a deep learning baseline. △ Less

Submitted 7 February, 2022; originally announced February 2022.

Comments: This is the author's version of the article as accepted for publication in IEEE Transactions on Geoscience and Remote Sensing, 2021. Link to original: https://ieeexplore.ieee.org/document/9627989

arXiv:2112.11367 [pdf, other]

Deep Learning and Earth Observation to Support the Sustainable Development Goals

Authors: Claudio Persello, Jan Dirk Wegner, Ronny Hänsch, Devis Tuia, Pedram Ghamisi, Mila Koeva, Gustau Camps-Valls

Abstract: The synergistic combination of deep learning models and Earth observation promises significant advances to support the sustainable development goals (SDGs). New developments and a plethora of applications are already changing the way humanity will face the living planet challenges. This paper reviews current deep learning approaches for Earth observation data, along with their application towards… ▽ More The synergistic combination of deep learning models and Earth observation promises significant advances to support the sustainable development goals (SDGs). New developments and a plethora of applications are already changing the way humanity will face the living planet challenges. This paper reviews current deep learning approaches for Earth observation data, along with their application towards monitoring and achieving the SDGs most impacted by the rapid development of deep learning in Earth observation. We systematically review case studies to 1) achieve zero hunger, 2) sustainable cities, 3) deliver tenure security, 4) mitigate and adapt to climate change, and 5) preserve biodiversity. Important societal, economic and environmental implications are concerned. Exciting times ahead are coming where algorithms and Earth data can help in our endeavor to address the climate crisis and support more sustainable development. △ Less

Submitted 21 December, 2021; originally announced December 2021.

arXiv:2105.11726 [pdf, other]

There is no data like more data -- current status of machine learning datasets in remote sensing

Authors: Michael Schmitt, Seyed Ali Ahmadi, Ronny Hänsch

Abstract: Annotated datasets have become one of the most crucial preconditions for the development and evaluation of machine learning-based methods designed for the automated interpretation of remote sensing data. In this paper, we review the historic development of such datasets, discuss their features based on a few selected examples, and address open issues for future developments. Annotated datasets have become one of the most crucial preconditions for the development and evaluation of machine learning-based methods designed for the automated interpretation of remote sensing data. In this paper, we review the historic development of such datasets, discuss their features based on a few selected examples, and address open issues for future developments. △ Less

Submitted 17 June, 2021; v1 submitted 25 May, 2021; originally announced May 2021.

Comments: accepted for the Proceedings of IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2021

arXiv:2004.06500 [pdf, other]

SpaceNet 6: Multi-Sensor All Weather Map** Dataset

Authors: Jacob Shermeyer, Daniel Hogan, Jason Brown, Adam Van Etten, Nicholas Weir, Fabio Pacifici, Ronny Haensch, Alexei Bastidas, Scott Soenen, Todd Bacastow, Ryan Lewis

Abstract: Within the remote sensing domain, a diverse set of acquisition modalities exist, each with their own unique strengths and weaknesses. Yet, most of the current literature and open datasets only deal with electro-optical (optical) data for different detection and segmentation tasks at high spatial resolutions. optical data is often the preferred choice for geospatial applications, but requires clear… ▽ More Within the remote sensing domain, a diverse set of acquisition modalities exist, each with their own unique strengths and weaknesses. Yet, most of the current literature and open datasets only deal with electro-optical (optical) data for different detection and segmentation tasks at high spatial resolutions. optical data is often the preferred choice for geospatial applications, but requires clear skies and little cloud cover to work well. Conversely, Synthetic Aperture Radar (SAR) sensors have the unique capability to penetrate clouds and collect during all weather, day and night conditions. Consequently, SAR data are particularly valuable in the quest to aid disaster response, when weather and cloud cover can obstruct traditional optical sensors. Despite all of these advantages, there is little open data available to researchers to explore the effectiveness of SAR for such applications, particularly at very-high spatial resolutions, i.e. <1m Ground Sample Distance (GSD). To address this problem, we present an open Multi-Sensor All Weather Map** (MSAW) dataset and challenge, which features two collection modalities (both SAR and optical). The dataset and challenge focus on map** and building footprint extraction using a combination of these data sources. MSAW covers 120 km^2 over multiple overlap** collects and is annotated with over 48,000 unique building footprints labels, enabling the creation and evaluation of map** algorithms for multi-modal data. We present a baseline and benchmark for building footprint extraction with SAR data and find that state-of-the-art segmentation models pre-trained on optical data, and then trained on SAR (F1 score of 0.21) outperform those trained on SAR data alone (F1 score of 0.135). △ Less

Submitted 14 April, 2020; originally announced April 2020.

Comments: To appear in CVPR EarthVision Proceedings, 10 pages, 7 figures

Showing 1–8 of 8 results for author: Hänsch, R