-
Towards localized accuracy assessment of remote-sensing derived built-up land layers across the rural-urban continuum
Authors:
Johannes H. Uhl,
Stefan Leyk
Abstract:
The accuracy assessment of remote-sensing derived built-up land data represents a specific case of binary map comparison, where class imbalance varies considerably across rural-urban trajectories. Thus, local accuracy characterization of such datasets requires specific strategies that are robust to low sample sizes and different levels of class imbalance. Herein, we examine the suitability of comm…
▽ More
The accuracy assessment of remote-sensing derived built-up land data represents a specific case of binary map comparison, where class imbalance varies considerably across rural-urban trajectories. Thus, local accuracy characterization of such datasets requires specific strategies that are robust to low sample sizes and different levels of class imbalance. Herein, we examine the suitability of commonly used spatial agreement measures for their localized accuracy characterization of built-up land layers across the rural-urban continuum, using the Global Human Settlement Layer and a reference database of built-up land derived from cadastral and building footprint data.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
Thematic agreement assessment of gridded, multi-modal geospatial datasets of different semantics and spatial granularities
Authors:
Johannes H. Uhl,
Stefan Leyk
Abstract:
This paper presents a method for thematic agreement assessment of geospatial data products of different semantics and spatial granularities, which may be affected by spatial offsets between test and reference data. The proposed method uses a multi-scale framework allowing for a probabilistic evaluation whether thematic disagreement between datasets is induced by spatial offsets due to different na…
▽ More
This paper presents a method for thematic agreement assessment of geospatial data products of different semantics and spatial granularities, which may be affected by spatial offsets between test and reference data. The proposed method uses a multi-scale framework allowing for a probabilistic evaluation whether thematic disagreement between datasets is induced by spatial offsets due to different nature of the datasets or not. We test our method using real-estate derived settlement locations and remote-sensing derived building footprint data.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
Spatially explicit accuracy assessment of deep learning-based, fine-resolution built-up land data in the United States
Authors:
Johannes H. Uhl,
Stefan Leyk
Abstract:
Geospatial datasets derived from remote sensing data by means of machine learning methods are often based on probabilistic outputs of abstract nature, which are difficult to translate into interpretable measures. For example, the Global Human Settlement Layer GHS-BUILT-S2 product reports the probability of the presence of built-up areas in 2018 in a global 10 m x 10 m grid. However, practitioners…
▽ More
Geospatial datasets derived from remote sensing data by means of machine learning methods are often based on probabilistic outputs of abstract nature, which are difficult to translate into interpretable measures. For example, the Global Human Settlement Layer GHS-BUILT-S2 product reports the probability of the presence of built-up areas in 2018 in a global 10 m x 10 m grid. However, practitioners typically require interpretable measures such as binary surfaces indicating the presence or absence of built-up areas or estimates of sub-pixel built-up surface fractions. Herein, we assess the relationship between the built-up probability in GHS-BUILT-S2 and reference built-up surface fractions derived from a highly reliable reference database for several regions in the United States. Furthermore, we identify a binarization threshold using an agreement maximization method that creates binary built-up land data from these built-up probabilities. These binary surfaces are input to a spatially explicit, scale-sensitive accuracy assessment which includes the use of a novel, visual-analytical tool which we call focal precision-recall signature plots. Our analysis reveals that a threshold of 0.5 applied to GHS-BUILT-S2 maximizes the agreement with binarized built-up land data derived from the reference built-up area fraction. We find high levels of accuracy (i.e., county-level F-1 scores of almost 0.8 on average) in the derived built-up areas, and consistently high accuracy along the rural-urban gradient in our study area. These results reveal considerable accuracy improvements in human settlement models based on Sentinel-2 data and deep learning, in both rural and urban areas, as compared to earlier, Landsat-based versions of the Global Human Settlement Layer.
△ Less
Submitted 1 March, 2023;
originally announced March 2023.
-
Analyzing urban scaling laws in the United States over 115 years
Authors:
Keith Burghardt,
Johannes H. Uhl,
Kristina Lerman,
Stefan Leyk
Abstract:
The scaling relations between city attributes and population are emergent and ubiquitous aspects of urban growth. Quantifying these relations and understanding their theoretical foundation, however, is difficult due to the challenge of defining city boundaries and a lack of historical data to study city dynamics over time and space. To address this issue, we analyze scaling between city infrastruc…
▽ More
The scaling relations between city attributes and population are emergent and ubiquitous aspects of urban growth. Quantifying these relations and understanding their theoretical foundation, however, is difficult due to the challenge of defining city boundaries and a lack of historical data to study city dynamics over time and space. To address this issue, we analyze scaling between city infrastructure and population across 857 United States metropolitan areas over an unprecedented 115 years using dasymetrically refined historical population estimates, historical urban road network models, and multi-temporal settlement data to define dynamic city boundaries based on settlement density. We demonstrate the clearest evidence that urban scaling exponents can closely match theoretical models over a century if cities are defined as dense settlement patches. Despite the close quantitative agreement with theory, the empirical scaling relations unexpectedly vary across regions. Our analysis of scaling coefficients, meanwhile, reveals that a city in 2015 uses more developed land and kilometers of road than a city with a similar population in 1900, which has serious implications for urban development and impacts on the local environment. Overall, our results offer a new way to study urban systems based on novel, geohistorical data.
△ Less
Submitted 29 January, 2023; v1 submitted 22 September, 2022;
originally announced September 2022.
-
Uncertainty prediction of built-up areas from global human settlement data in the United States based on landscape metrics
Authors:
Johannes H. Uhl,
Stefan Leyk
Abstract:
The level of landscape heterogeneity may affect the performance of remote sensing based land use / land cover classification. However, the relationship between map** accuracy of built-up surfaces and morphological characteristics of built-up areas has not been analyzed explicitly, and previous studies typically rely on aggregated landscape metrics to quantify the morphology of built-up areas, ne…
▽ More
The level of landscape heterogeneity may affect the performance of remote sensing based land use / land cover classification. However, the relationship between map** accuracy of built-up surfaces and morphological characteristics of built-up areas has not been analyzed explicitly, and previous studies typically rely on aggregated landscape metrics to quantify the morphology of built-up areas, neglecting the fine-grained spatial variation and scale dependency of such metrics. Herein, we aim to fill this gap by assessing the associations between focal landscape metrics, derived from binary built-up surfaces, and focal data accuracy estimates. We test our approach for built-up surfaces from the Global Human Settlement Layer (GHSL) for Massachusetts (USA), by examining the explanatory power of landscape metrics for predictive modeling of commission and omission errors in the GHS-BUILT R2018A data product. We find that the Landscape Shape Index (LSI) exhibits the highest levels of correlation to focal accuracy measures. These relationships are scale-dependent, and increase with the level of spatial support. Our results are consistent across different regions within the U.S., and we find that the Recall measure has the strongest relationship to measures of built-up surface morphology across different temporal epochs and spatial resolutions. Regression analysis results (R2>0.9) indicate that it is possible to estimate commission errors in the GHSL in the absence of reference data, and that omission errors in the GHSL can be modeled without accessing the data themselves. Lastly, we test the generalizability of our predictive accuracy models to a different version of the GHSL (i.e., the GHS-BUILT-S2) covering a study area in North Carolina. We find varying levels of model transferability that increases with the spatial support at which landscape metrics and accuracy estimates are calculated.
△ Less
Submitted 18 May, 2022;
originally announced May 2022.
-
A framework for scale-sensitive, spatially explicit accuracy assessment of binary built-up surface layers
Authors:
Johannes H. Uhl,
Stefan Leyk
Abstract:
To better understand the dynamics of human settlements, thorough knowledge of the uncertainty in geospatial built-up surface datasets is critical. While frameworks for localized accuracy assessments of categorical gridded data have been proposed to account for the spatial non-stationarity of classification accuracy, such approaches have not been applied to (binary) built-up land data. Such data di…
▽ More
To better understand the dynamics of human settlements, thorough knowledge of the uncertainty in geospatial built-up surface datasets is critical. While frameworks for localized accuracy assessments of categorical gridded data have been proposed to account for the spatial non-stationarity of classification accuracy, such approaches have not been applied to (binary) built-up land data. Such data differs from other data such as land cover data, due to considerable variations of built-up surface density across the rural-urban continuum resulting in switches of class imbalance, causing sparsely populated confusion matrices based on small underlying sample sizes. In this paper, we aim to fill this gap by testing common agreement measures for their suitability and plausibility to measure the localized accuracy of built-up surface data. We examine the sensitivity of localized accuracy to the assessment support, as well as to the unit of analysis, and analyze the relationships between local accuracy and density / structure-related properties of built-up areas, across rural-urban trajectories and over time. Our experiments are based on the multi-temporal Global Human Settlement Layer (GHSL) and a reference database for the state of Massachusetts (USA). We find strong variation of suitability among commonly used agreement measures, and varying levels of sensitivity to the assessment support. We then apply our framework to assess localized GHSL data accuracy over time from 1975 to 2014. Besides increasing accuracy along the rural-urban gradient, we find that accuracy generally increases over time, mainly driven by peri-urban densification processes in our study area. Moreover, we find that localized densification measures derived from the GHSL tend to overestimate peri-urban densification processes that occurred between 1975 and 2014, due to higher levels of omission errors in the GHSL epoch 1975.
△ Less
Submitted 29 March, 2022; v1 submitted 21 March, 2022;
originally announced March 2022.
-
MTBF-33: A multi-temporal building footprint dataset for 33 counties in the United States (1900-2015)
Authors:
Johannes H. Uhl,
Stefan Leyk
Abstract:
Despite abundant data on the spatial distribution of contemporary human settlements, historical data on the long-term evolution of human settlements at fine spatial and temporal granularity is scarce, limiting our quantitative understanding of long-term changes of built-up areas. This is because commonly used map** methods (e.g., image classification) and suitable data sources (i.e., aerial imag…
▽ More
Despite abundant data on the spatial distribution of contemporary human settlements, historical data on the long-term evolution of human settlements at fine spatial and temporal granularity is scarce, limiting our quantitative understanding of long-term changes of built-up areas. This is because commonly used map** methods (e.g., image classification) and suitable data sources (i.e., aerial imagery, multi-spectral remote sensing data, LiDAR) have only been available in recent decades. However, there are alternative data sources such as cadastral records that are digitally available, containing relevant information such as building age information, allowing for an approximate, digital reconstruction of past building distributions. We conducted a non-exhaustive search of open and publicly available data resources from administrative institutions in the United States and gathered, integrated, and harmonized cadastral parcel data, tax assessment data, and building footprint data for 33 counties, wherever building footprint geometries and building construction year information was available. The result of this effort is a unique dataset which we call the Multi-Temporal Building Footprint Dataset for 33 U.S. Counties (MTBF-33). MTBF-33 contains over 6.2 million building footprints including their construction year, and can be used to derive retrospective depictions of built-up areas from 1900 to 2015, at fine spatial and temporal grain and can be used for data validation purposes, or to train statistical learning approaches aiming to extract historical information on human settlements from remote sensing data, historical maps, or similar data sources. MTBF-33 is available at http://doi.org/10.17632/w33vbvjtdy.
△ Less
Submitted 21 March, 2022;
originally announced March 2022.
-
Place-level urban-rural indices for the United States from 1930 to 2018
Authors:
Johannes H. Uhl,
Lori M. Hunter,
Stefan Leyk,
Dylan S. Connor,
Jeremiah J. Nieves,
Cyrus Hester,
Catherine B. Talbot,
Myron Gutmann
Abstract:
Rural-urban classifications are essential for analyzing geographic, demographic, environmental, and social processes across the rural-urban continuum. Most existing classifications are, however, only available at relatively aggregated spatial scales, such as at the county scale in the United States. The absence of rurality or urbanness measures at high spatial resolution poses significant problems…
▽ More
Rural-urban classifications are essential for analyzing geographic, demographic, environmental, and social processes across the rural-urban continuum. Most existing classifications are, however, only available at relatively aggregated spatial scales, such as at the county scale in the United States. The absence of rurality or urbanness measures at high spatial resolution poses significant problems when the process of interest is highly localized, as with the incorporation of rural towns and villages into encroaching metropolitan areas. Moreover, existing rural-urban classifications are often inconsistent over time, or require complex, multi-source input data (e.g., remote sensing observations or road network data), thus, prohibiting the longitudinal analysis of rural-urban dynamics. Here, we develop a set of distance- and spatial-network-based methods for consistently estimating the remoteness and rurality of places at fine spatial resolution, over long periods of time. We demonstrate the utility of our approach by constructing indices of urbanness for 30,000 places in the United States from 1930 to 2018 and further test the plausibility of our results against a variety of evaluation datasets. We call these indices the place-level urban-rural index (PLURAL) and make the resulting datasets publicly available (https://doi.org/10.3886/E162941) so that other researchers can conduct long-term, fine-grained analyses of urban and rural change. In addition, due to the simplistic nature of the input data, these methods can be generalized to other time periods or regions of the world, particularly to data-scarce environments.
△ Less
Submitted 18 February, 2022;
originally announced February 2022.
-
A fine-grained, versatile index of remoteness to characterize place-level rurality
Authors:
Johannes H. Uhl,
Stefan Leyk,
Lori M. Hunter,
Catherine B. Talbot,
Dylan S. Connor,
Jeremiah J. Nieves,
Myron Gutmann
Abstract:
Rural-urban classifications are essential for analyzing geographic, demographic, environmental, or socioeconomic processes across the rural-urban continuum. However, existing county-level classifications may ignore the within-county variations of rurality, which can be problematic if the scale of interest is at the place-level or finer. Moreover, existing rural-urban classification are often incon…
▽ More
Rural-urban classifications are essential for analyzing geographic, demographic, environmental, or socioeconomic processes across the rural-urban continuum. However, existing county-level classifications may ignore the within-county variations of rurality, which can be problematic if the scale of interest is at the place-level or finer. Moreover, existing rural-urban classification are often inconsistent over time and thus, impede the long-term analysis of rural-urban dynamics. We developed a distance-based method to generate place-level remoteness estimates based on simple input data. We create our remoteness index based on place-level population data for the U.S. from 1980 to 2010. The proposed index is generalizable to data-scarce environments and earlier time periods and is based on the distances of a given place to the nearest places of different population sizes, and allows for fine-grained, temporally consistent analyses of rural-urban processes.
△ Less
Submitted 17 February, 2022;
originally announced February 2022.
-
Towards the automated large-scale reconstruction of past road networks from historical maps
Authors:
Johannes H. Uhl,
Stefan Leyk,
Yao-Yi Chiang,
Craig A. Knoblock
Abstract:
Transportation infrastructure, such as road or railroad networks, represent a fundamental component of our civilization. For sustainable planning and informed decision making, a thorough understanding of the long-term evolution of transportation infrastructure such as road networks is crucial. However, spatially explicit, multi-temporal road network data covering large spatial extents are scarce a…
▽ More
Transportation infrastructure, such as road or railroad networks, represent a fundamental component of our civilization. For sustainable planning and informed decision making, a thorough understanding of the long-term evolution of transportation infrastructure such as road networks is crucial. However, spatially explicit, multi-temporal road network data covering large spatial extents are scarce and rarely available prior to the 2000s. Herein, we propose a framework that employs increasingly available scanned and georeferenced historical map series to reconstruct past road networks, by integrating abundant, contemporary road network data and color information extracted from historical maps. Specifically, our method uses contemporary road segments as analytical units and extracts historical roads by inferring their existence in historical map series based on image processing and clustering techniques. We tested our method on over 300,000 road segments representing more than 50,000 km of the road network in the United States, extending across three study areas that cover 53 historical topographic map sheets dated between 1890 and 1950. We evaluated our approach by comparison to other historical datasets and against manually created reference data, achieving F-1 scores of up to 0.95, and showed that the extracted road network statistics are highly plausible over time, i.e., following general growth patterns. We demonstrated that contemporary geospatial data integrated with information extracted from historical map series open up new avenues for the quantitative analysis of long-term urbanization processes and landscape changes far beyond the era of operational remote sensing and digital cartography.
△ Less
Submitted 11 February, 2022; v1 submitted 10 February, 2022;
originally announced February 2022.
-
A Label Correction Algorithm Using Prior Information for Automatic and Accurate Geospatial Object Recognition
Authors:
Weiwei Duan,
Yao-Yi Chiang,
Stefan Leyk,
Johannes H. Uhl,
Craig A. Knoblock
Abstract:
Thousands of scanned historical topographic maps contain valuable information covering long periods of time, such as how the hydrography of a region has changed over time. Efficiently unlocking the information in these maps requires training a geospatial objects recognition system, which needs a large amount of annotated data. Overlap** geo-referenced external vector data with topographic maps a…
▽ More
Thousands of scanned historical topographic maps contain valuable information covering long periods of time, such as how the hydrography of a region has changed over time. Efficiently unlocking the information in these maps requires training a geospatial objects recognition system, which needs a large amount of annotated data. Overlap** geo-referenced external vector data with topographic maps according to their coordinates can annotate the desired objects' locations in the maps automatically. However, directly overlap** the two datasets causes misaligned and false annotations because the publication years and coordinate projection systems of topographic maps are different from the external vector data. We propose a label correction algorithm, which leverages the color information of maps and the prior shape information of the external vector data to reduce misaligned and false annotations. The experiments show that the precision of annotations from the proposed algorithm is 10% higher than the annotations from a state-of-the-art algorithm. Consequently, recognition results using the proposed algorithm's annotations achieve 9% higher correctness than using the annotations from the state-of-the-art algorithm.
△ Less
Submitted 10 December, 2021;
originally announced December 2021.
-
Guided Generative Models using Weak Supervision for Detecting Object Spatial Arrangement in Overhead Images
Authors:
Weiwei Duan,
Yao-Yi Chiang,
Stefan Leyk,
Johannes H. Uhl,
Craig A. Knoblock
Abstract:
The increasing availability and accessibility of numerous overhead images allows us to estimate and assess the spatial arrangement of groups of geospatial target objects, which can benefit many applications, such as traffic monitoring and agricultural monitoring. Spatial arrangement estimation is the process of identifying the areas which contain the desired objects in overhead images. Traditional…
▽ More
The increasing availability and accessibility of numerous overhead images allows us to estimate and assess the spatial arrangement of groups of geospatial target objects, which can benefit many applications, such as traffic monitoring and agricultural monitoring. Spatial arrangement estimation is the process of identifying the areas which contain the desired objects in overhead images. Traditional supervised object detection approaches can estimate accurate spatial arrangement but require large amounts of bounding box annotations. Recent semi-supervised clustering approaches can reduce manual labeling but still require annotations for all object categories in the image. This paper presents the target-guided generative model (TGGM), under the Variational Auto-encoder (VAE) framework, which uses Gaussian Mixture Models (GMM) to estimate the distributions of both hidden and decoder variables in VAE. Modeling both hidden and decoder variables by GMM reduces the required manual annotations significantly for spatial arrangement estimation. Unlike existing approaches that the training process can only update the GMM as a whole in the optimization iterations (e.g., a "minibatch"), TGGM allows the update of individual GMM components separately in the same optimization iteration. Optimizing GMM components separately allows TGGM to exploit the semantic relationships in spatial data and requires only a few labels to initiate and guide the generative process. Our experiments shows that TGGM achieves results comparable to the state-of-the-art semi-supervised methods and outperforms unsupervised methods by 10% based on the $F_{1}$ scores, while requiring significantly fewer labeled data.
△ Less
Submitted 10 December, 2021;
originally announced December 2021.
-
An Automatic Approach for Generating Rich, Linked Geo-Metadata from Historical Map Images
Authors:
Zekun Li,
Yao-Yi Chiang,
Sasan Tavakkol,
Basel Shbita,
Johannes H. Uhl,
Stefan Leyk,
Craig A. Knoblock
Abstract:
Historical maps contain detailed geographic information difficult to find elsewhere covering long-periods of time (e.g., 125 years for the historical topographic maps in the US). However, these maps typically exist as scanned images without searchable metadata. Existing approaches making historical maps searchable rely on tedious manual work (including crowd-sourcing) to generate the metadata (e.g…
▽ More
Historical maps contain detailed geographic information difficult to find elsewhere covering long-periods of time (e.g., 125 years for the historical topographic maps in the US). However, these maps typically exist as scanned images without searchable metadata. Existing approaches making historical maps searchable rely on tedious manual work (including crowd-sourcing) to generate the metadata (e.g., geolocations and keywords). Optical character recognition (OCR) software could alleviate the required manual work, but the recognition results are individual words instead of location phrases (e.g., "Black" and "Mountain" vs. "Black Mountain"). This paper presents an end-to-end approach to address the real-world problem of finding and indexing historical map images. This approach automatically processes historical map images to extract their text content and generates a set of metadata that is linked to large external geospatial knowledge bases. The linked metadata in the RDF (Resource Description Framework) format support complex queries for finding and indexing historical maps, such as retrieving all historical maps covering mountain peaks higher than 1,000 meters in California. We have implemented the approach in a system called mapKurator. We have evaluated mapKurator using historical maps from several sources with various map styles, scales, and coverage. Our results show significant improvement over the state-of-the-art methods. The code has been made publicly available as modules of the Kartta Labs project at https://github.com/kartta-labs/Project.
△ Less
Submitted 2 December, 2021;
originally announced December 2021.