-
Can SAM recognize crops? Quantifying the zero-shot performance of a semantic segmentation foundation model on generating crop-type maps using satellite imagery for precision agriculture
Authors:
Rutuja Gurav,
Het Patel,
Zhuocheng Shang,
Ahmed Eldawy,
Jia Chen,
Elia Scudiero,
Evangelos Papalexakis
Abstract:
Climate change is increasingly disrupting worldwide agriculture, making global food production less reliable. To tackle the growing challenges in feeding the planet, cutting-edge management strategies, such as precision agriculture, empower farmers and decision-makers with rich and actionable information to increase the efficiency and sustainability of their farming practices. Crop-type maps are k…
▽ More
Climate change is increasingly disrupting worldwide agriculture, making global food production less reliable. To tackle the growing challenges in feeding the planet, cutting-edge management strategies, such as precision agriculture, empower farmers and decision-makers with rich and actionable information to increase the efficiency and sustainability of their farming practices. Crop-type maps are key information for decision-support tools but are challenging and costly to generate. We investigate the capabilities of Meta AI's Segment Anything Model (SAM) for crop-map prediction task, acknowledging its recent successes at zero-shot image segmentation. However, SAM being limited to up-to 3 channel inputs and its zero-shot usage being class-agnostic in nature pose unique challenges in using it directly for crop-type map**. We propose using clustering consensus metrics to assess SAM's zero-shot performance in segmenting satellite imagery and producing crop-type maps. Although direct crop-type map** is challenging using SAM in zero-shot setting, experiments reveal SAM's potential for swiftly and accurately outlining fields in satellite images, serving as a foundation for subsequent crop classification. This paper attempts to highlight a use-case of state-of-the-art image segmentation models like SAM for crop-type map** and related specific needs of the agriculture industry, offering a potential avenue for automatic, efficient, and cost-effective data products for precision agriculture practices.
△ Less
Submitted 4 December, 2023; v1 submitted 25 November, 2023;
originally announced November 2023.
-
Object Delineation in Satellite Images
Authors:
Zhuocheng Shang,
Ahmed Eldawy
Abstract:
Machine learning is being widely applied to analyze satellite data with problems such as classification and feature detection. Unlike traditional image processing algorithms, geospatial applications need to convert the detected objects from a raster form to a geospatial vector form to further analyze it. This gem delivers a simple and light-weight algorithm for delineating the pixels that are mark…
▽ More
Machine learning is being widely applied to analyze satellite data with problems such as classification and feature detection. Unlike traditional image processing algorithms, geospatial applications need to convert the detected objects from a raster form to a geospatial vector form to further analyze it. This gem delivers a simple and light-weight algorithm for delineating the pixels that are marked by ML algorithms to extract geospatial objects from satellite images. The proposed algorithm is exact and users can further apply simplification and approximation based on the application needs.
△ Less
Submitted 13 December, 2022;
originally announced December 2022.
-
Spatial Parquet: A Column File Format for Geospatial Data Lakes [Extended Version]
Authors:
Majid Saeedan,
Ahmed Eldawy
Abstract:
Modern data analytics applications prefer to use column-storage formats due to their improved storage efficiency through encoding and compression. Parquet is the most popular file format for column data storage that provides several of these benefits out of the box. However, geospatial data is not readily supported by Parquet. This paper introduces Spatial Parquet, a Parquet extension that efficie…
▽ More
Modern data analytics applications prefer to use column-storage formats due to their improved storage efficiency through encoding and compression. Parquet is the most popular file format for column data storage that provides several of these benefits out of the box. However, geospatial data is not readily supported by Parquet. This paper introduces Spatial Parquet, a Parquet extension that efficiently supports geospatial data. Spatial Parquet inherits all the advantages of Parquet for non-spatial data, such as rich data types, compression, and column/row filtering. Additionally, it adds three new features to accommodate geospatial data. First, it introduces a geospatial data type that can encode all standard spatial data types in a column format compatible with Parquet. Second, it adds a new lossless and efficient encoding method, termed FP-delta, that is customized to efficiently store geospatial coordinates stored in floating-point format. Third, it adds a light-weight spatial index that allows the reader to skip non-relevant parts of the file for increased read efficiency. Experiments on large-scale real data showed that SpatialParquet can reduce the data size by a factor of three, even without compression. Compression can further reduce the storage size. Additionally, Spatial Parquet can reduce the reading time by two orders of magnitude when the light-weight index is applied. This initial prototype can open new research directions to further improve geospatial data storage in column format.
△ Less
Submitted 5 September, 2022;
originally announced September 2022.
-
Spatial Data Generators
Authors:
Tin Vu,
Sara Migliorini,
Ahmed Eldawy,
Alberto Belussi
Abstract:
This gem describes a standard method for generating synthetic spatial data that can be used in benchmarking and scalability tests. The goal is to improve the reproducibility and increase the trust in experiments on synthetic data by using standard widely acceptable dataset distributions. In addition, this article describes how to assign a unique identifier to each synthetic dataset that can be sha…
▽ More
This gem describes a standard method for generating synthetic spatial data that can be used in benchmarking and scalability tests. The goal is to improve the reproducibility and increase the trust in experiments on synthetic data by using standard widely acceptable dataset distributions. In addition, this article describes how to assign a unique identifier to each synthetic dataset that can be shared in papers for reproducibility of results. Finally, this gem provides a supplementary material that gives a reference implementation for all the provided distributions.
△ Less
Submitted 27 September, 2021; v1 submitted 17 July, 2021;
originally announced July 2021.
-
Uncertainty Aware Wildfire Management
Authors:
Tina Diao,
Samriddhi Singla,
Ayan Mukhopadhyay,
Ahmed Eldawy,
Ross Shachter,
Mykel Kochenderfer
Abstract:
Recent wildfires in the United States have resulted in loss of life and billions of dollars, destroying countless structures and forests. Fighting wildfires is extremely complex. It is difficult to observe the true state of fires due to smoke and risk associated with ground surveillance. There are limited resources to be deployed over a massive area and the spread of the fire is challenging to pre…
▽ More
Recent wildfires in the United States have resulted in loss of life and billions of dollars, destroying countless structures and forests. Fighting wildfires is extremely complex. It is difficult to observe the true state of fires due to smoke and risk associated with ground surveillance. There are limited resources to be deployed over a massive area and the spread of the fire is challenging to predict. This paper proposes a decision-theoretic approach to combat wildfires. We model the resource allocation problem as a partially-observable Markov decision process. We also present a data-driven model that lets us simulate how fires spread as a function of relevant covariates. A major problem in using data-driven models to combat wildfires is the lack of comprehensive data sources that relate fires with relevant covariates. We present an algorithmic approach based on large-scale raster and vector analysis that can be used to create such a dataset. Our data with over 2 million data points is the first open-source dataset that combines existing fire databases with covariates extracted from satellite imagery. Through experiments using real-world wildfire data, we demonstrate that our forecasting model can accurately model the spread of wildfires. Finally, we use simulations to demonstrate that our response strategy can significantly reduce response times compared to baseline methods.
△ Less
Submitted 15 October, 2020;
originally announced October 2020.
-
Raptor Zonal Statistics: Fully Distributed Zonal Statistics of Big Raster + Vector Data [Pre-Print]
Authors:
Samriddhi Singla,
Ahmed Eldawy
Abstract:
Recent advancements in remote sensing technology have resulted in petabytes of data in raster format. This data is often processed in combination with high resolution vector data that represents, for example, city boundaries. One of the common operations that combine big raster and vector data is the zonal statistics which computes some statistics for each polygon in the vector dataset. This paper…
▽ More
Recent advancements in remote sensing technology have resulted in petabytes of data in raster format. This data is often processed in combination with high resolution vector data that represents, for example, city boundaries. One of the common operations that combine big raster and vector data is the zonal statistics which computes some statistics for each polygon in the vector dataset. This paper models the zonal statistics problem as a join problem and proposes a novel distributed system that can scale to petabytes of raster and vector data. The proposed method does not require any preprocessing or indexing which makes it perfect for ad-hoc queries that scientists usually want to run. We devise a theoretical cost model that proves the efficiency of our algorithm over the baseline method. Furthermore, we run an extensive experimental evaluation on large scale satellite data with up-to a trillion pixels, and big vector data with up-to hundreds of millions of edges, and we show that our method can perfectly scale to big data with up-to two orders of magnitude performance gain over Rasdaman and Google Earth Engine.
△ Less
Submitted 13 October, 2020;
originally announced October 2020.
-
DeepSampling: Selectivity Estimation with Predicted Error and Response Time
Authors:
Tin Vu,
Ahmed Eldawy
Abstract:
The rapid growth of spatial data urges the research community to find efficient processing techniques for interactive queries on large volumes of data. Approximate Query Processing (AQP) is the most prominent technique that can provide real-time answer for ad-hoc queries based on a random sample. Unfortunately, existing AQP methods provide an answer without providing any accuracy metrics due to th…
▽ More
The rapid growth of spatial data urges the research community to find efficient processing techniques for interactive queries on large volumes of data. Approximate Query Processing (AQP) is the most prominent technique that can provide real-time answer for ad-hoc queries based on a random sample. Unfortunately, existing AQP methods provide an answer without providing any accuracy metrics due to the complex relationship between the sample size, the query parameters, the data distribution, and the result accuracy. This paper proposes DeepSampling, a deep-learning-based model that predicts the accuracy of a sample-based AQP algorithm, specially selectivity estimation, given the sample size, the input distribution, and query parameters. The model can also be reversed to measure the sample size that would produce a desired accuracy. DeepSampling is the first system that provides a reliable tool for existing spatial databases to control the accuracy of AQP.
△ Less
Submitted 15 August, 2020;
originally announced August 2020.
-
R*-Grove: Balanced Spatial Partitioning for Large-scale Datasets
Authors:
Tin Vu,
Ahmed Eldawy
Abstract:
The rapid growth of big spatial data urged the research community to develop several big spatial data systems. Regardless of their architecture, one of the fundamental requirements of all these systems is to spatially partition the data efficiently across machines. The core challenges of big spatial partitioning are building high spatial quality partitions while simultaneously taking advantages of…
▽ More
The rapid growth of big spatial data urged the research community to develop several big spatial data systems. Regardless of their architecture, one of the fundamental requirements of all these systems is to spatially partition the data efficiently across machines. The core challenges of big spatial partitioning are building high spatial quality partitions while simultaneously taking advantages of distributed processing models by providing load balanced partitions. Previous works on big spatial partitioning are to reuse existing index search trees as-is, e.g., the R-tree family, STR, Kd-tree, and Quad-tree, by building a temporary tree for a sample of the input and use its leaf nodes as partition boundaries. However, we show in this paper that none of those techniques has addressed the mentioned challenges completely. This paper proposes a novel partitioning method, termed R*-Grove, which can partition very large spatial datasets into high quality partitions with excellent load balance and block utilization. This appealing property allows R*-Grove to outperform existing techniques in spatial query processing. R*-Grove can be easily integrated into any big data platforms such as Apache Spark or Apache Hadoop. Our experiments show that R*-Grove outperforms the existing partitioning techniques for big spatial data systems. With all the proposed work publicly available as open source, we envision that R*-Grove will be adopted by the community to better serve big spatial data research.
△ Less
Submitted 22 July, 2020;
originally announced July 2020.
-
Parallel Algorithms for Summing Floating-Point Numbers
Authors:
Michael T. Goodrich,
Ahmed Eldawy
Abstract:
The problem of exactly summing n floating-point numbers is a fundamental problem that has many applications in large-scale simulations and computational geometry. Unfortunately, due to the round-off error in standard floating-point operations, this problem becomes very challenging. Moreover, all existing solutions rely on sequential algorithms which cannot scale to the huge datasets that need to b…
▽ More
The problem of exactly summing n floating-point numbers is a fundamental problem that has many applications in large-scale simulations and computational geometry. Unfortunately, due to the round-off error in standard floating-point operations, this problem becomes very challenging. Moreover, all existing solutions rely on sequential algorithms which cannot scale to the huge datasets that need to be processed.
In this paper, we provide several efficient parallel algorithms for summing n floating point numbers, so as to produce a faithfully rounded floating-point representation of the sum. We present algorithms in PRAM, external-memory, and MapReduce models, and we also provide an experimental analysis of our MapReduce algorithms, due to their simplicity and practical efficiency.
△ Less
Submitted 18 May, 2016;
originally announced May 2016.
-
Secure Localization in Wireless Sensor Networks: A Survey
Authors:
Waleed Ammar,
Ahmed ElDawy,
Moustafa Youssef
Abstract:
Wireless sensor networks (WSNs) have gained researchers' attention in the last several years. Small sensors powered by miniaturized microprocessors are capable of supporting several applications for civil and military domains. Determining the location of sensors is a basic and essential knowledge for most WSN algorithms and protocols including data tagging, routing, node identification, among othe…
▽ More
Wireless sensor networks (WSNs) have gained researchers' attention in the last several years. Small sensors powered by miniaturized microprocessors are capable of supporting several applications for civil and military domains. Determining the location of sensors is a basic and essential knowledge for most WSN algorithms and protocols including data tagging, routing, node identification, among others. This paper surveys the different algorithms that have been proposed to securely determine the location of a sensor node. By "secure", we mean that adversaries cannot easily affect the accuracy of the localized sensors. In other words, the localization algorithm must be robust under several attacks. We provide a taxonomy for classifying different secure localization schemes and describe possible attacks that can harm localization. In addition, we survey different secure localization schemes and show how they map to the proposed taxonomy. We also give a comparison between the different schemes, showing the attacks addressed by each.
△ Less
Submitted 19 April, 2010;
originally announced April 2010.