Search | arXiv e-print repository

arXiv:2403.02059 [pdf, other]

Multi-Spectral Remote Sensing Image Retrieval Using Geospatial Foundation Models

Authors: Benedikt Blumenstiel, Viktoria Moor, Romeo Kienzler, Thomas Brunschwiler

Abstract: Image retrieval enables an efficient search through vast amounts of satellite imagery and returns similar images to a query. Deep learning models can identify images across various semantic concepts without the need for annotations. This work proposes to use Geospatial Foundation Models, like Prithvi, for remote sensing image retrieval with multiple benefits: i) the models encode multi-spectral sa… ▽ More Image retrieval enables an efficient search through vast amounts of satellite imagery and returns similar images to a query. Deep learning models can identify images across various semantic concepts without the need for annotations. This work proposes to use Geospatial Foundation Models, like Prithvi, for remote sensing image retrieval with multiple benefits: i) the models encode multi-spectral satellite data and ii) generalize without further fine-tuning. We introduce two datasets to the retrieval task and observe a strong performance: Prithvi processes six bands and achieves a mean Average Precision of 97.62% on BigEarthNet-43 and 44.51% on ForestNet-12, outperforming other RGB-based models. Further, we evaluate three compression methods with binarized embeddings balancing retrieval speed and accuracy. They match the retrieval speed of much shorter hash codes while maintaining the same accuracy as floating-point embeddings but with a 32-fold compression. The code is available at https://github.com/IBM/remote-sensing-image-retrieval. △ Less

Submitted 22 May, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS)

arXiv:2309.02094 [pdf]

TensorBank: Tensor Lakehouse for Foundation Model Training

Authors: Romeo Kienzler, Leonardo Pondian Tizzei, Benedikt Blumenstiel, Zoltan Arnold Nagy, S. Karthik Mukkavilli, Johannes Schmude, Marcus Freitag, Michael Behrendt, Daniel Salles Civitarese, Naomi Simumba, Daiki Kimura, Hendrik Hamann

Abstract: Storing and streaming high dimensional data for foundation model training became a critical requirement with the rise of foundation models beyond natural language. In this paper we introduce TensorBank, a petabyte scale tensor lakehouse capable of streaming tensors from Cloud Object Store (COS) to GPU memory at wire speed based on complex relational queries. We use Hierarchical Statistical Indices… ▽ More Storing and streaming high dimensional data for foundation model training became a critical requirement with the rise of foundation models beyond natural language. In this paper we introduce TensorBank, a petabyte scale tensor lakehouse capable of streaming tensors from Cloud Object Store (COS) to GPU memory at wire speed based on complex relational queries. We use Hierarchical Statistical Indices (HSI) for query acceleration. Our architecture allows to directly address tensors on block level using HTTP range reads. Once in GPU memory, data can be transformed using PyTorch transforms. We provide a generic PyTorch dataset type with a corresponding dataset factory translating relational queries and requested transformations as an instance. By making use of the HSI, irrelevant blocks can be skipped without reading them as those indices contain statistics on their content at different hierarchical resolution levels. This is an opinionated architecture powered by open standards and making heavy use of open-source technology. Although, hardened for production use using geospatial-temporal data, this architecture generalizes to other use case like computer vision, computational neuroscience, biological sequence analysis and more. △ Less

Submitted 21 March, 2024; v1 submitted 5 September, 2023; originally announced September 2023.

arXiv:2307.06824 [pdf]

CLAIMED -- the open source framework for building coarse-grained operators for accelerated discovery in science

Authors: Romeo Kienzler, Rafflesia Khan, Jerome Nilmeier, Ivan Nesic, Ibrahim Haddad

Abstract: In modern data-driven science, reproducibility and reusability are key challenges. Scientists are well skilled in the process from data to publication. Although some publication channels require source code and data to be made accessible, rerunning and verifying experiments is usually hard due to a lack of standards. Therefore, reusing existing scientific data processing code from state-of-the-art… ▽ More In modern data-driven science, reproducibility and reusability are key challenges. Scientists are well skilled in the process from data to publication. Although some publication channels require source code and data to be made accessible, rerunning and verifying experiments is usually hard due to a lack of standards. Therefore, reusing existing scientific data processing code from state-of-the-art research is hard as well. This is why we introduce CLAIMED, which has a proven track record in scientific research for addressing the repeatability and reusability issues in modern data-driven science. CLAIMED is a framework to build reusable operators and scalable scientific workflows by supporting the scientist to draw from previous work by re-composing workflows from existing libraries of coarse-grained scientific operators. Although various implementations exist, CLAIMED is programming language, scientific library, and execution environment agnostic. △ Less

Submitted 12 July, 2023; originally announced July 2023.

Comments: Received IEEE OSS Award 2023 - https://conferences.computer.org/services/2023/symposia/oss.html

arXiv:2103.03281 [pdf, other]

CLAIMED, a visual and scalable component library for Trusted AI

Authors: Romeo Kienzler, Ivan Nesic

Abstract: Deep Learning models are getting more and more popular but constraints on explainability, adversarial robustness and fairness are often major concerns for production deployment. Although the open source ecosystem is abundant on addressing those concerns, fully integrated, end to end systems are lacking in open source. Therefore we provide an entirely open source, reusable component framework, visu… ▽ More Deep Learning models are getting more and more popular but constraints on explainability, adversarial robustness and fairness are often major concerns for production deployment. Although the open source ecosystem is abundant on addressing those concerns, fully integrated, end to end systems are lacking in open source. Therefore we provide an entirely open source, reusable component framework, visual editor and execution engine for production grade machine learning on top of Kubernetes, a joint effort between IBM and the University Hospital Basel. It uses Kubeflow Pipelines, the AI Explainability360 toolkit, the AI Fairness360 toolkit and the Adversarial Robustness Toolkit on top of ElyraAI, Kubeflow, Kubernetes and JupyterLab. Using the Elyra pipeline editor, AI pipelines can be developed visually with a set of jupyter notebooks. △ Less

Submitted 4 March, 2021; originally announced March 2021.

Showing 1–4 of 4 results for author: Kienzler, R