Search | arXiv e-print repository

Pretraining Billion-scale Geospatial Foundational Models on Frontier

Authors: Aristeidis Tsaris, Philipe Ambrozio Dias, Abhishek Potnis, Junqi Yin, Feiyi Wang, Dalton Lunga

Abstract: As AI workloads increase in scope, generalization capability becomes challenging for small task-specific models and their demand for large amounts of labeled training samples increases. On the contrary, Foundation Models (FMs) are trained with internet-scale unlabeled data via self-supervised learning and have been shown to adapt to various tasks with minimal fine-tuning. Although large FMs have d… ▽ More As AI workloads increase in scope, generalization capability becomes challenging for small task-specific models and their demand for large amounts of labeled training samples increases. On the contrary, Foundation Models (FMs) are trained with internet-scale unlabeled data via self-supervised learning and have been shown to adapt to various tasks with minimal fine-tuning. Although large FMs have demonstrated significant impact in natural language processing and computer vision, efforts toward FMs for geospatial applications have been restricted to smaller size models, as pretraining larger models requires very large computing resources equipped with state-of-the-art hardware accelerators. Current satellite constellations collect 100+TBs of data a day, resulting in images that are billions of pixels and multimodal in nature. Such geospatial data poses unique challenges opening up new opportunities to develop FMs. We investigate billion scale FMs and HPC training profiles for geospatial applications by pretraining on publicly available data. We studied from end-to-end the performance and impact in the solution by scaling the model size. Our larger 3B parameter size model achieves up to 30% improvement in top1 scene classification accuracy when comparing a 100M parameter model. Moreover, we detail performance experiments on the Frontier supercomputer, America's first exascale system, where we study different model and data parallel approaches using PyTorch's Fully Sharded Data Parallel library. Specifically, we study variants of the Vision Transformer architecture (ViT), conducting performance analysis for ViT models with size up to 15B parameters. By discussing throughput and performance bottlenecks under different parallelism configurations, we offer insights on how to leverage such leadership-class HPC resources when develo** large models for geospatial imagery applications. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2310.04610 [pdf, other]

DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

Authors: Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, Yuxiong He, Pete Luferenko, Divya Kumar, Jonathan Weyn, Ruixiong Zhang, Sylwester Klocek, Volodymyr Vragov, Mohammed AlQuraishi, Gustaf Ahdritz, Christina Floristean, Cristina Negri , et al. (67 additional authors not shown)

Abstract: In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique… ▽ More In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique capabilities through AI system technology innovations to help domain experts to unlock today's biggest science mysteries. By leveraging DeepSpeed's current technology pillars (training, inference and compression) as base technology enablers, DeepSpeed4Science will create a new set of AI system technologies tailored for accelerating scientific discoveries by addressing their unique complexity beyond the common technical approaches used for accelerating generic large language models (LLMs). In this paper, we showcase the early progress we made with DeepSpeed4Science in addressing two of the critical system challenges in structural biology research. △ Less

Submitted 11 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

arXiv:2210.13207 [pdf]

GeoAI at ACM SIGSPATIAL: The New Frontier of Geospatial Artificial Intelligence Research

Authors: Dalton Lunga, Yingjie Hu, Shawn Newsam, Song Gao, Bruno Martins, Lexie Yang, Xueqing Deng

Abstract: Geospatial Artificial Intelligence (GeoAI) is an interdisciplinary field enjoying tremendous adoption. However, the efficient design and implementation of GeoAI systems face many open challenges. This is mainly due to the lack of non-standardized approaches to artificial intelligence tool development, inadequate platforms, and a lack of multidisciplinary engagements, which all motivate domain expe… ▽ More Geospatial Artificial Intelligence (GeoAI) is an interdisciplinary field enjoying tremendous adoption. However, the efficient design and implementation of GeoAI systems face many open challenges. This is mainly due to the lack of non-standardized approaches to artificial intelligence tool development, inadequate platforms, and a lack of multidisciplinary engagements, which all motivate domain experts to seek a shared stage with scientists and engineers to solve problems of significant impact on society. Since its inception in 2017, the GeoAI series of workshops has been co-located with the Association for Computing Machinery International Conference on Advances in Geographic Information Systems. The workshop series has fostered a nexus for geoscientists, computer scientists, engineers, entrepreneurs, and decision-makers, from academia, industry, and government to engage in artificial intelligence, spatiotemporal data computing, and geospatial data science research, motivated by various challenges. In this article, we revisit and discuss the state of GeoAI open research directions, the recent developments, and an emerging agenda calling for a continued cross-disciplinary community engagement. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Comments: 12 pages, 1 figure, 1 table

arXiv:2109.00435

Proceedings of KDD 2020 Workshop on Data-driven Humanitarian Map**: Harnessing Human-Machine Intelligence for High-Stake Public Policy and Resilience Planning

Authors: Snehalkumar, S. Gaikwad, Shankar Iyer, Dalton Lunga, Yu-Ru Lin

Abstract: Humanitarian challenges, including natural disasters, food insecurity, climate change, racial and gender violence, environmental crises, the COVID-19 coronavirus pandemic, human rights violations, and forced displacements, disproportionately impact vulnerable communities worldwide. According to UN OCHA, 235 million people will require humanitarian assistance in 2021 . Despite these growing perils,… ▽ More Humanitarian challenges, including natural disasters, food insecurity, climate change, racial and gender violence, environmental crises, the COVID-19 coronavirus pandemic, human rights violations, and forced displacements, disproportionately impact vulnerable communities worldwide. According to UN OCHA, 235 million people will require humanitarian assistance in 2021 . Despite these growing perils, there remains a notable paucity of data science research to scientifically inform equitable public policy decisions for improving the livelihood of at-risk populations. Scattered data science efforts exist to address these challenges, but they remain isolated from practice and prone to algorithmic harms concerning lack of privacy, fairness, interpretability, accountability, transparency, and ethics. Biases in data-driven methods carry the risk of amplifying inequalities in high-stakes policy decisions that impact the livelihood of millions of people. Consequently, proclaimed benefits of data-driven innovations remain inaccessible to policymakers, practitioners, and marginalized communities at the core of humanitarian actions and global development. To help fill this gap, we propose the Data-driven Humanitarian Map** Research Program, which focuses on develo** novel data science methodologies that harness human-machine intelligence for high-stakes public policy and resilience planning. The proceedings of the 1st Data-driven Humanitarian Map** workshop at the 26th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, August 24th, 2020. △ Less

Submitted 7 September, 2021; v1 submitted 1 September, 2021; originally announced September 2021.

Comments: The proceedings of the 1st Data-driven Humanitarian Map** workshop at the 26th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

arXiv:2109.00100

Proceedings of KDD 2021 Workshop on Data-driven Humanitarian Map**: Harnessing Human-Machine Intelligence for High-Stake Public Policy and Resilience Planning

Authors: Snehalkumar, S. Gaikwad, Shankar Iyer, Dalton Lunga, Elizabeth Bondi

Abstract: Humanitarian challenges, including natural disasters, food insecurity, climate change, racial and gender violence, environmental crises, the COVID-19 coronavirus pandemic, human rights violations, and forced displacements, disproportionately impact vulnerable communities worldwide. According to UN OCHA, 235 million people will require humanitarian assistance in 2021. Despite these growing perils,… ▽ More Humanitarian challenges, including natural disasters, food insecurity, climate change, racial and gender violence, environmental crises, the COVID-19 coronavirus pandemic, human rights violations, and forced displacements, disproportionately impact vulnerable communities worldwide. According to UN OCHA, 235 million people will require humanitarian assistance in 2021. Despite these growing perils, there remains a notable paucity of data science research to scientifically inform equitable public policy decisions for improving the livelihood of at-risk populations. Scattered data science efforts exist to address these challenges, but they remain isolated from practice and prone to algorithmic harms concerning lack of privacy, fairness, interpretability, accountability, transparency, and ethics. Biases in data-driven methods carry the risk of amplifying inequalities in high-stakes policy decisions that impact the livelihood of millions of people. Consequently, proclaimed benefits of data-driven innovations remain inaccessible to policymakers, practitioners, and marginalized communities at the core of humanitarian actions and global development. To help fill this gap, we propose the Data-driven Humanitarian Map** Research Program, which focuses on develo** novel data science methodologies that harness human-machine intelligence for high-stakes public policy and resilience planning. The proceedings of the 2nd Data-driven Humanitarian Map** workshop at the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. August 15th, 2021 △ Less

Submitted 7 September, 2021; v1 submitted 31 August, 2021; originally announced September 2021.

Comments: The proceedings of the 2nd Data-driven Humanitarian Map** workshop at the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. August 15th, 2021

arXiv:2012.10564 [pdf, other]

Computer-aided abnormality detection in chest radiographs in a clinical setting via domain-adaptation

Authors: Abhishek K Dubey, Michael T Young, Christopher Stanley, Dalton Lunga, Jacob Hinkle

Abstract: Deep learning (DL) models are being deployed at medical centers to aid radiologists for diagnosis of lung conditions from chest radiographs. Such models are often trained on a large volume of publicly available labeled radiographs. These pre-trained DL models' ability to generalize in clinical settings is poor because of the changes in data distributions between publicly available and privately he… ▽ More Deep learning (DL) models are being deployed at medical centers to aid radiologists for diagnosis of lung conditions from chest radiographs. Such models are often trained on a large volume of publicly available labeled radiographs. These pre-trained DL models' ability to generalize in clinical settings is poor because of the changes in data distributions between publicly available and privately held radiographs. In chest radiographs, the heterogeneity in distributions arises from the diverse conditions in X-ray equipment and their configurations used for generating the images. In the machine learning community, the challenges posed by the heterogeneity in the data generation source is known as domain shift, which is a mode shift in the generative model. In this work, we introduce a domain-shift detection and removal method to overcome this problem. Our experimental results show the proposed method's effectiveness in deploying a pre-trained DL model for abnormality detection in chest radiographs in a clinical setting. △ Less

Submitted 18 December, 2020; originally announced December 2020.

arXiv:1908.04383 [pdf, other]

Apache Spark Accelerated Deep Learning Inference for Large Scale Satellite Image Analytics

Authors: Dalton Lunga, Jonathan Gerrand, Hsiuhan Lexie Yang, Christopher Layton, Robert Stewart

Abstract: The shear volumes of data generated from earth observation and remote sensing technologies continue to make major impact; lea** key geospatial applications into the dual data and compute intensive era. As a consequence, this rapid advancement poses new computational and data processing challenges. We implement a novel remote sensing data flow (RESFlow) for advanced machine learning and computing… ▽ More The shear volumes of data generated from earth observation and remote sensing technologies continue to make major impact; lea** key geospatial applications into the dual data and compute intensive era. As a consequence, this rapid advancement poses new computational and data processing challenges. We implement a novel remote sensing data flow (RESFlow) for advanced machine learning and computing with massive amounts of remotely sensed imagery. The core contribution is partitioning massive amount of data based on the spectral and semantic characteristics for distributed imagery analysis. RESFlow takes advantage of both a unified analytics engine for large-scale data processing and the availability of modern computing hardware to harness the acceleration of deep learning inference on expansive remote sensing imagery. The framework incorporates a strategy to optimize resource utilization across multiple executors assigned to a single worker. We showcase its deployment across computationally and data-intensive on pixel-level labeling workloads. The pipeline invokes deep learning inference at three stages; during deep feature extraction, deep metric map**, and deep semantic segmentation. The tasks impose compute intensive and GPU resource sharing challenges motivating for a parallelized pipeline for all execution steps. By taking advantage of Apache Spark, Nvidia DGX1, and DGX2 computing platforms, we demonstrate unprecedented compute speed-ups for deep learning inference on pixel labeling workloads; processing 21,028~Terrabytes of imagery data and delivering an output maps at area rate of 5.245sq.km/sec, amounting to 453,168 sq.km/day - reducing a 28 day workload to 21~hours. △ Less

Submitted 8 August, 2019; originally announced August 2019.

arXiv:1805.08946 [pdf, other]

Building Extraction at Scale using Convolutional Neural Network: Map** of the United States

Authors: Hsiuhan Lexie Yang, Jiangye Yuan, Dalton Lunga, Melanie Laverdiere, Amy Rose, Budhendra Bhaduri

Abstract: Establishing up-to-date large scale building maps is essential to understand urban dynamics, such as estimating population, urban planning and many other applications. Although many computer vision tasks has been successfully carried out with deep convolutional neural networks, there is a growing need to understand their large scale impact on building map** with remote sensing imagery. Taking ad… ▽ More Establishing up-to-date large scale building maps is essential to understand urban dynamics, such as estimating population, urban planning and many other applications. Although many computer vision tasks has been successfully carried out with deep convolutional neural networks, there is a growing need to understand their large scale impact on building map** with remote sensing imagery. Taking advantage of the scalability of CNNs and using only few areas with the abundance of building footprints, for the first time we conduct a comparative analysis of four state-of-the-art CNNs for extracting building footprints across the entire continental United States. The four CNN architectures namely: branch-out CNN, fully convolutional neural network (FCN), conditional random field as recurrent neural network (CRFasRNN), and SegNet, support semantic pixel-wise labeling and focus on capturing textural information at multi-scale. We use 1-meter resolution aerial images from National Agriculture Imagery Program (NAIP) as the test-bed, and compare the extraction results across the four methods. In addition, we propose to combine signed-distance labels with SegNet, the preferred CNN architecture identified by our extensive evaluations, to advance building extraction results to instance level. We further demonstrate the usefulness of fusing additional near IR information into the building extraction framework. Large scale experimental evaluations are conducted and reported using metrics that include: precision, recall rate, intersection over union, and the number of buildings extracted. With the improved CNN model and no requirement of further post-processing, we have generated building maps for the United States. The quality of extracted buildings and processing time demonstrated the proposed CNN-based framework fits the need of building extraction at scale. △ Less

Submitted 22 May, 2018; originally announced May 2018.

Comments: Accepted by IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

arXiv:1707.05685 [pdf, other]

Hashed Binary Search Sampling for Convolutional Network Training with Large Overhead Image Patches

Authors: Dalton Lunga, Lexie Yang, Budhendra Bhaduri

Abstract: Very large overhead imagery associated with ground truth maps has the potential to generate billions of training image patches for machine learning algorithms. However, random sampling selection criteria often leads to redundant and noisy-image patches for model training. With minimal research efforts behind this challenge, the current status spells missed opportunities to develop supervised learn… ▽ More Very large overhead imagery associated with ground truth maps has the potential to generate billions of training image patches for machine learning algorithms. However, random sampling selection criteria often leads to redundant and noisy-image patches for model training. With minimal research efforts behind this challenge, the current status spells missed opportunities to develop supervised learning algorithms that generalize over wide geographical scenes. In addition, much of the computational cycles for large scale machine learning are poorly spent crunching through noisy and redundant image patches. We demonstrate a potential framework to address these challenges specifically, while evaluating a human settlement detection task. A novel binary search tree sampling scheme is fused with a kernel based hashing procedure that maps image patches into hash-buckets using binary codes generated from image content. The framework exploits inherent redundancy within billions of image patches to promote mostly high variance preserving samples for accelerating algorithmic training and increasing model generalization. △ Less

Submitted 18 July, 2017; originally announced July 2017.

Comments: 4 pages, 5 figures, IGARSS 2017

arXiv:1707.05683 [pdf, other]

Exploiting Convolutional Representations for Multiscale Human Settlement Detection

Authors: Dalton Lunga, Dilip Patlolla, Lexie Yang, Jeanette Weaver, Budhendra Bhadhuri

Abstract: We test this premise and explore representation spaces from a single deep convolutional network and their visualization to argue for a novel unified feature extraction framework. The objective is to utilize and re-purpose trained feature extractors without the need for network retraining on three remote sensing tasks i.e. superpixel map**, pixel-level segmentation and semantic based image visual… ▽ More We test this premise and explore representation spaces from a single deep convolutional network and their visualization to argue for a novel unified feature extraction framework. The objective is to utilize and re-purpose trained feature extractors without the need for network retraining on three remote sensing tasks i.e. superpixel map**, pixel-level segmentation and semantic based image visualization. By leveraging the same convolutional feature extractors and viewing them as visual information extractors that encode different image representation spaces, we demonstrate a preliminary inductive transfer learning potential on multiscale experiments that incorporate edge-level details up to semantic-level information. △ Less

Submitted 18 July, 2017; originally announced July 2017.

Comments: 4 pages, 5 figures, IGARSS 2017

arXiv:1105.2965 [pdf, other]

Generating Similar Graphs From Spherical Features

Authors: Dalton Lunga, Sergey Kirshner

Abstract: We propose a novel model for generating graphs similar to a given example graph. Unlike standard approaches that compute features of graphs in Euclidean space, our approach obtains features on a surface of a hypersphere. We then utilize a von Mises-Fisher distribution, an exponential family distribution on the surface of a hypersphere, to define a model over possible feature values. While our appr… ▽ More We propose a novel model for generating graphs similar to a given example graph. Unlike standard approaches that compute features of graphs in Euclidean space, our approach obtains features on a surface of a hypersphere. We then utilize a von Mises-Fisher distribution, an exponential family distribution on the surface of a hypersphere, to define a model over possible feature values. While our approach bears similarity to a popular exponential random graph model (ERGM), unlike ERGMs, it does not suffer from degeneracy, a situation when a significant probability mass is placed on unrealistic graphs. We propose a parameter estimation approach for our model, and a procedure for drawing samples from the distribution. We evaluate the performance of our approach both on the small domain of all 8-node graphs as well as larger real-world social networks. △ Less

Submitted 18 May, 2011; v1 submitted 15 May, 2011; originally announced May 2011.

Comments: 29 pages, 14 figures, 1 table

Showing 1–11 of 11 results for author: Lunga, D