Search | arXiv e-print repository

GEOBIND: Binding Text, Image, and Audio through Satellite Images

Authors: Aayush Dhakal, Subash Khanal, Srikumar Sastry, Adeel Ahmad, Nathan Jacobs

Abstract: In remote sensing, we are interested in modeling various modalities for some geographic location. Several works have focused on learning the relationship between a location and type of landscape, habitability, audio, textual descriptions, etc. Recently, a common way to approach these problems is to train a deep-learning model that uses satellite images to infer some unique characteristics of the l… ▽ More In remote sensing, we are interested in modeling various modalities for some geographic location. Several works have focused on learning the relationship between a location and type of landscape, habitability, audio, textual descriptions, etc. Recently, a common way to approach these problems is to train a deep-learning model that uses satellite images to infer some unique characteristics of the location. In this work, we present a deep-learning model, GeoBind, that can infer about multiple modalities, specifically text, image, and audio, from satellite imagery of a location. To do this, we use satellite images as the binding element and contrastively align all other modalities to the satellite image data. Our training results in a joint embedding space with multiple types of data: satellite image, ground-level image, audio, and text. Furthermore, our approach does not require a single complex dataset that contains all the modalities mentioned above. Rather it only requires multiple satellite-image paired data. While we only align three modalities in this paper, we present a general framework that can be used to create an embedding space with any number of modalities by using satellite images as the binding element. Our results show that, unlike traditional unimodal models, GeoBind is versatile and can reason about multiple modalities for a given satellite image input. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 2024 IEEE International Geoscience and Remote Sensing Symposium

arXiv:2404.06637 [pdf, other]

GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis

Authors: Srikumar Sastry, Subash Khanal, Aayush Dhakal, Nathan Jacobs

Abstract: We present GeoSynth, a model for synthesizing satellite images with global style and image-driven layout control. The global style control is via textual prompts or geographic location. These enable the specification of scene semantics or regional appearance respectively, and can be used together. We train our model on a large dataset of paired satellite imagery, with automatically generated capti… ▽ More We present GeoSynth, a model for synthesizing satellite images with global style and image-driven layout control. The global style control is via textual prompts or geographic location. These enable the specification of scene semantics or regional appearance respectively, and can be used together. We train our model on a large dataset of paired satellite imagery, with automatically generated captions, and OpenStreetMap data. We evaluate various combinations of control inputs, including different types of layout controls. Results demonstrate that our model can generate diverse, high-quality images and exhibits excellent zero-shot generalization. The code and model checkpoints are available at https://github.com/mvrl/GeoSynth. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.03606 [pdf, other]

Analyzing Musical Characteristics of National Anthems in Relation to Global Indices

Authors: S M Rakib Hasan, Aakar Dhakal, Ms. Ayesha Siddiqua, Mohammad Mominur Rahman, Md Maidul Islam, Mohammed Arfat Raihan Chowdhury, S M Masfequier Rahman Swapno, SM Nuruzzaman Nobel

Abstract: Music plays a huge part in sha** peoples' psychology and behavioral patterns. This paper investigates the connection between national anthems and different global indices with computational music analysis and statistical correlation analysis. We analyze national anthem musical data to determine whether certain musical characteristics are associated with peace, happiness, suicide rate, crime rate… ▽ More Music plays a huge part in sha** peoples' psychology and behavioral patterns. This paper investigates the connection between national anthems and different global indices with computational music analysis and statistical correlation analysis. We analyze national anthem musical data to determine whether certain musical characteristics are associated with peace, happiness, suicide rate, crime rate, etc. To achieve this, we collect national anthems from 169 countries and use computational music analysis techniques to extract pitch, tempo, beat, and other pertinent audio features. We then compare these musical characteristics with data on different global indices to ascertain whether a significant correlation exists. Our findings indicate that there may be a correlation between the musical characteristics of national anthems and the indices we investigated. The implications of our findings for music psychology and policymakers interested in promoting social well-being are discussed. This paper emphasizes the potential of musical data analysis in social research and offers a novel perspective on the relationship between music and social indices. The source code and data are made open-access for reproducibility and future research endeavors. It can be accessed at http://bit.ly/na_code. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2404.02375 [pdf, other]

Optical Text Recognition in Nepali and Bengali: A Transformer-based Approach

Authors: S M Rakib Hasan, Aakar Dhakal, Md Humaion Kabir Mehedi, Annajiat Alim Rasel

Abstract: Efforts on the research and development of OCR systems for Low-Resource Languages are relatively new. Low-resource languages have little training data available for training Machine Translation systems or other systems. Even though a vast amount of text has been digitized and made available on the internet the text is still in PDF and Image format, which are not instantly accessible. This paper di… ▽ More Efforts on the research and development of OCR systems for Low-Resource Languages are relatively new. Low-resource languages have little training data available for training Machine Translation systems or other systems. Even though a vast amount of text has been digitized and made available on the internet the text is still in PDF and Image format, which are not instantly accessible. This paper discusses text recognition for two scripts: Bengali and Nepali; there are about 300 and 40 million Bengali and Nepali speakers respectively. In this study, using encoder-decoder transformers, a model was developed, and its efficacy was assessed using a collection of optical text images, both handwritten and printed. The results signify that the suggested technique corresponds with current approaches and achieves high precision in recognizing text in Bengali and Nepali. This study can pave the way for the advanced and accessible study of linguistics in South East Asia. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: Accepted and Presented at ICAECC 2023, Bengaluru, India

arXiv:2404.02372 [pdf, other]

Obfuscated Malware Detection: Investigating Real-world Scenarios through Memory Analysis

Authors: S M Rakib Hasan, Aakar Dhakal

Abstract: In the era of the internet and smart devices, the detection of malware has become crucial for system security. Malware authors increasingly employ obfuscation techniques to evade advanced security solutions, making it challenging to detect and eliminate threats. Obfuscated malware, adept at hiding itself, poses a significant risk to various platforms, including computers, mobile devices, and IoT d… ▽ More In the era of the internet and smart devices, the detection of malware has become crucial for system security. Malware authors increasingly employ obfuscation techniques to evade advanced security solutions, making it challenging to detect and eliminate threats. Obfuscated malware, adept at hiding itself, poses a significant risk to various platforms, including computers, mobile devices, and IoT devices. Conventional methods like heuristic-based or signature-based systems struggle against this type of malware, as it leaves no discernible traces on the system. In this research, we propose a simple and cost-effective obfuscated malware detection system through memory dump analysis, utilizing diverse machine-learning algorithms. The study focuses on the CIC-MalMem-2022 dataset, designed to simulate real-world scenarios and assess memory-based obfuscated malware detection. We evaluate the effectiveness of machine learning algorithms, such as decision trees, ensemble methods, and neural networks, in detecting obfuscated malware within memory dumps. Our analysis spans multiple malware categories, providing insights into algorithmic strengths and limitations. By offering a comprehensive assessment of machine learning algorithms for obfuscated malware detection through memory analysis, this paper contributes to ongoing efforts to enhance cybersecurity and fortify digital ecosystems against evolving and sophisticated malware threats. The source code is made open-access for reproducibility and future research endeavours. It can be accessed at https://bit.ly/MalMemCode. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: Accepted and Presented at IEEE-ICTP2023, Dhaka, Bangladesh

arXiv:2312.08334 [pdf, other]

LD-SDM: Language-Driven Hierarchical Species Distribution Modeling

Authors: Srikumar Sastry, Xin Xing, Aayush Dhakal, Subash Khanal, Adeel Ahmad, Nathan Jacobs

Abstract: We focus on the problem of species distribution modeling using global-scale presence-only data. Most previous studies have mapped the range of a given species using geographical and environmental features alone. To capture a stronger implicit relationship between species, we encode the taxonomic hierarchy of species using a large language model. This enables range map** for any taxonomic rank an… ▽ More We focus on the problem of species distribution modeling using global-scale presence-only data. Most previous studies have mapped the range of a given species using geographical and environmental features alone. To capture a stronger implicit relationship between species, we encode the taxonomic hierarchy of species using a large language model. This enables range map** for any taxonomic rank and unseen species without additional supervision. Further, we propose a novel proximity-aware evaluation metric that enables evaluating species distribution models using any pixel-level representation of ground-truth species range map. The proposed metric penalizes the predictions of a model based on its proximity to the ground truth. We describe the effectiveness of our model by systematically evaluating on the task of species range prediction, zero-shot prediction and geo-feature regression against the state-of-the-art. Results show our model outperforms the strong baselines when trained with a variety of multi-label learning losses. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: 17 pages, 9 figures

arXiv:2312.07389 [pdf]

Eroding Trust In Aerial Imagery: Comprehensive Analysis and Evaluation Of Adversarial Attacks In Geospatial Systems

Authors: Michael Lanier, Aayush Dhakal, Zhexiao Xiong, Arthur Li, Nathan Jacobs, Yevgeniy Vorobeychik

Abstract: In critical operations where aerial imagery plays an essential role, the integrity and trustworthiness of data are paramount. The emergence of adversarial attacks, particularly those that exploit control over labels or employ physically feasible trojans, threatens to erode that trust, making the analysis and mitigation of these attacks a matter of urgency. We demonstrate how adversarial attacks ca… ▽ More In critical operations where aerial imagery plays an essential role, the integrity and trustworthiness of data are paramount. The emergence of adversarial attacks, particularly those that exploit control over labels or employ physically feasible trojans, threatens to erode that trust, making the analysis and mitigation of these attacks a matter of urgency. We demonstrate how adversarial attacks can degrade confidence in geospatial systems, specifically focusing on scenarios where the attacker's control over labels is restricted and the use of realistic threat vectors. Proposing and evaluating several innovative attack methodologies, including those tailored to overhead images, we empirically show their threat to remote sensing systems using high-quality SpaceNet datasets. Our experimentation reflects the unique challenges posed by aerial imagery, and these preliminary results not only reveal the potential risks but also highlight the non-trivial nature of the problem compared to recent works. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: Accepted at IEEE AIRP 2023

arXiv:2311.16140 [pdf]

Adapting Segment Anything Model (SAM) through Prompt-based Learning for Enhanced Protein Identification in Cryo-EM Micrographs

Authors: Fei He, Zhiyuan Yang, Mingyue Gao, Biplab Poudel, Newgin Sam Ebin Sam Dhas, Rajan Gyawali, Ashwin Dhakal, Jianlin Cheng, Dong Xu

Abstract: Cryo-electron microscopy (cryo-EM) remains pivotal in structural biology, yet the task of protein particle picking, integral for 3D protein structure construction, is laden with manual inefficiencies. While recent AI tools such as Topaz and crYOLO are advancing the field, they do not fully address the challenges of cryo-EM images, including low contrast, complex shapes, and heterogeneous conformat… ▽ More Cryo-electron microscopy (cryo-EM) remains pivotal in structural biology, yet the task of protein particle picking, integral for 3D protein structure construction, is laden with manual inefficiencies. While recent AI tools such as Topaz and crYOLO are advancing the field, they do not fully address the challenges of cryo-EM images, including low contrast, complex shapes, and heterogeneous conformations. This study explored prompt-based learning to adapt the state-of-the-art image segmentation foundation model Segment Anything Model (SAM) for cryo-EM. This focus was driven by the desire to optimize model performance with a small number of labeled data without altering pre-trained parameters, aiming for a balance between adaptability and foundational knowledge retention. Through trials with three prompt-based learning strategies, namely head prompt, prefix prompt, and encoder prompt, we observed enhanced performance and reduced computational requirements compared to the fine-tuning approach. This work not only highlights the potential of prompting SAM in protein identification from cryo-EM micrographs but also suggests its broader promise in biomedical image segmentation and object detection. △ Less

Submitted 4 November, 2023; originally announced November 2023.

arXiv:2310.19168 [pdf, other]

BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Map**

Authors: Srikumar Sastry, Subash Khanal, Aayush Dhakal, Di Huang, Nathan Jacobs

Abstract: We propose a metadata-aware self-supervised learning~(SSL)~framework useful for fine-grained classification and ecological map** of bird species around the world. Our framework unifies two SSL strategies: Contrastive Learning~(CL) and Masked Image Modeling~(MIM), while also enriching the embedding space with metadata available with ground-level imagery of birds. We separately train uni-modal and… ▽ More We propose a metadata-aware self-supervised learning~(SSL)~framework useful for fine-grained classification and ecological map** of bird species around the world. Our framework unifies two SSL strategies: Contrastive Learning~(CL) and Masked Image Modeling~(MIM), while also enriching the embedding space with metadata available with ground-level imagery of birds. We separately train uni-modal and cross-modal ViT on a novel cross-view global bird species dataset containing ground-level imagery, metadata (location, time), and corresponding satellite imagery. We demonstrate that our models learn fine-grained and geographically conditioned features of birds, by evaluating on two downstream tasks: fine-grained visual classification~(FGVC) and cross-modal retrieval. Pre-trained models learned using our framework achieve SotA performance on FGVC of iNAT-2021 birds and in transfer learning settings for CUB-200-2011 and NABirds datasets. Moreover, the impressive cross-modal retrieval performance of our model enables the creation of species distribution maps across any geographic region. The dataset and source code will be released at https://github.com/mvrl/BirdSAT}. △ Less

Submitted 29 October, 2023; originally announced October 2023.

Comments: Accepted at WACV 2024

arXiv:2310.02903 [pdf, other]

FroSSL: Frobenius Norm Minimization for Efficient Multiview Self-Supervised Learning

Authors: Oscar Skean, Aayush Dhakal, Nathan Jacobs, Luis Gonzalo Sanchez Giraldo

Abstract: Self-supervised learning (SSL) is a popular paradigm for representation learning. Recent multiview methods can be classified as sample-contrastive, dimension-contrastive, or asymmetric network-based, with each family having its own approach to avoiding informational collapse. While these families converge to solutions of similar quality, it can be empirically shown that some methods are epoch-inef… ▽ More Self-supervised learning (SSL) is a popular paradigm for representation learning. Recent multiview methods can be classified as sample-contrastive, dimension-contrastive, or asymmetric network-based, with each family having its own approach to avoiding informational collapse. While these families converge to solutions of similar quality, it can be empirically shown that some methods are epoch-inefficient and require longer training to reach a target performance. Two main approaches to improving efficiency are covariance eigenvalue regularization and using more views. However, these two approaches are difficult to combine due to the computational complexity of computing eigenvalues. We present the objective function FroSSL which reconciles both approaches while avoiding eigendecomposition entirely. FroSSL works by minimizing covariance Frobenius norms to avoid collapse and minimizing mean-squared error to achieve augmentation invariance. We show that FroSSL reaches competitive accuracies more quickly than any other SSL method and provide theoretical and empirical support that this faster convergence is due to how FroSSL affects the eigenvalues of the embedding covariance matrices. We also show that FroSSL learns competitive representations on linear probe evaluation when used to train a ResNet18 on several datasets, including STL-10, Tiny Imagenet, and Imagenet-100. △ Less

Submitted 19 March, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: Updated to reflect ECCV submission

arXiv:2309.10667 [pdf, other]

Learning Tri-modal Embeddings for Zero-Shot Soundscape Map**

Authors: Subash Khanal, Srikumar Sastry, Aayush Dhakal, Nathan Jacobs

Abstract: We focus on the task of soundscape map**, which involves predicting the most probable sounds that could be perceived at a particular geographic location. We utilise recent state-of-the-art models to encode geotagged audio, a textual description of the audio, and an overhead image of its capture location using contrastive pre-training. The end result is a shared embedding space for the three moda… ▽ More We focus on the task of soundscape map**, which involves predicting the most probable sounds that could be perceived at a particular geographic location. We utilise recent state-of-the-art models to encode geotagged audio, a textual description of the audio, and an overhead image of its capture location using contrastive pre-training. The end result is a shared embedding space for the three modalities, which enables the construction of soundscape maps for any geographic region from textual or audio queries. Using the SoundingEarth dataset, we find that our approach significantly outperforms the existing SOTA, with an improvement of image-to-audio Recall@100 from 0.256 to 0.450. Our code is available at https://github.com/mvrl/geoclap. △ Less

Submitted 19 September, 2023; originally announced September 2023.

Comments: Accepted at BMVC 2023

arXiv:2307.15904 [pdf, other]

Sat2Cap: Map** Fine-Grained Textual Descriptions from Satellite Images

Authors: Aayush Dhakal, Adeel Ahmad, Subash Khanal, Srikumar Sastry, Hannah Kerner, Nathan Jacobs

Abstract: We propose a weakly supervised approach for creating maps using free-form textual descriptions. We refer to this work of creating textual maps as zero-shot map**. Prior works have approached map** tasks by develo** models that predict a fixed set of attributes using overhead imagery. However, these models are very restrictive as they can only solve highly specific tasks for which they were t… ▽ More We propose a weakly supervised approach for creating maps using free-form textual descriptions. We refer to this work of creating textual maps as zero-shot map**. Prior works have approached map** tasks by develo** models that predict a fixed set of attributes using overhead imagery. However, these models are very restrictive as they can only solve highly specific tasks for which they were trained. Map** text, on the other hand, allows us to solve a large variety of map** problems with minimal restrictions. To achieve this, we train a contrastive learning framework called Sat2Cap on a new large-scale dataset with 6.1M pairs of overhead and ground-level images. For a given location and overhead image, our model predicts the expected CLIP embeddings of the ground-level scenery. The predicted CLIP embeddings are then used to learn about the textual space associated with that location. Sat2Cap is also conditioned on date-time information, allowing it to model temporally varying concepts over a location. Our experimental results demonstrate that our models successfully capture ground-level concepts and allow large-scale map** of fine-grained textual queries. Our approach does not require any text-labeled data, making the training easily scalable. The code, dataset, and models will be made publicly available. △ Less

Submitted 11 April, 2024; v1 submitted 29 July, 2023; originally announced July 2023.

Comments: 16 pages

arXiv:2304.13541 [pdf, other]

D-STACK: High Throughput DNN Inference by Effective Multiplexing and Spatio-Temporal Scheduling of GPUs

Authors: Aditya Dhakal, Sameer G. Kulkarni, K. K. Ramakrishnan

Abstract: Hardware accelerators such as GPUs are required for real-time, low-latency inference with Deep Neural Networks (DNN). However, due to the inherent limits to the parallelism they can exploit, DNNs often under-utilize the capacity of today's high-end accelerators. Although spatial multiplexing of the GPU, leads to higher GPU utilization and higher inference throughput, there remain a number of chall… ▽ More Hardware accelerators such as GPUs are required for real-time, low-latency inference with Deep Neural Networks (DNN). However, due to the inherent limits to the parallelism they can exploit, DNNs often under-utilize the capacity of today's high-end accelerators. Although spatial multiplexing of the GPU, leads to higher GPU utilization and higher inference throughput, there remain a number of challenges. Finding the GPU percentage for right-sizing the GPU for each DNN through profiling, determining an optimal batching of requests to balance throughput improvement while meeting application-specific deadlines and service level objectives (SLOs), and maximizing throughput by appropriately scheduling DNNs are still significant challenges. This paper introduces a dynamic and fair spatio-temporal scheduler (D-STACK) that enables multiple DNNs to run in the GPU concurrently. To help allocate the appropriate GPU percentage (we call it the "Knee"), we develop and validate a model that estimates the parallelism each DNN can utilize. We also develop a lightweight optimization formulation to find an efficient batch size for each DNN operating with D-STACK. We bring together our optimizations and our spatio-temporal scheduler to provide a holistic inference framework. We demonstrate its ability to provide high throughput while meeting application SLOs. We compare D-STACK with an ideal scheduler that can allocate the right GPU percentage for every DNN kernel. D-STACK gets higher than 90 percent throughput and GPU utilization compared to the ideal scheduler. We also compare D-STACK with other GPU multiplexing and scheduling methods (e.g., NVIDIA Triton, Clipper, Nexus), using popular DNN models. Our controlled experiments with multiplexing several popular DNN models achieve up to 1.6X improvement in GPU utilization and up to 4X improvement in inference throughput. △ Less

Submitted 31 March, 2023; originally announced April 2023.

arXiv:2304.09248 [pdf]

doi 10.1109/AIPR60534.2023.10440713

Real-Time Helmet Violation Detection in AI City Challenge 2023 with Genetic Algorithm-Enhanced YOLOv5

Authors: Elham Soltanikazemi, Ashwin Dhakal, Bijaya Kumar Hatuwal, Imad Eddine Toubal, Armstrong Aboah, Kannappan Palaniappan

Abstract: This research focuses on real-time surveillance systems as a means for tackling the issue of non-compliance with helmet regulations, a practice that considerably amplifies the risk for motorcycle drivers or riders. Despite the well-established advantages of helmet usage, achieving widespread compliance remains challenging due to diverse contributing factors. To effectively address this concern, re… ▽ More This research focuses on real-time surveillance systems as a means for tackling the issue of non-compliance with helmet regulations, a practice that considerably amplifies the risk for motorcycle drivers or riders. Despite the well-established advantages of helmet usage, achieving widespread compliance remains challenging due to diverse contributing factors. To effectively address this concern, real-time monitoring and enforcement of helmet laws have been proposed as a plausible solution. However, previous attempts at real-time helmet violation detection have been hindered by their limited ability to operate in real-time. To overcome this limitation, the current paper introduces a novel real-time helmet violation detection system that utilizes the YOLOv5 single-stage object detection model. This model is trained on the 2023 NVIDIA AI City Challenge 2023 Track 5 dataset. The optimal hyperparameters for training the model are determined using genetic algorithms. Additionally, data augmentation and various sampling techniques are implemented to enhance the model's performance. The efficacy of the models is evaluated using precision, recall, and mean Average Precision (mAP) metrics. The results demonstrate impressive precision, recall, and mAP scores of 0.848, 0.599, and 0.641, respectively for the training data. Furthermore, the model achieves notable mAP score of 0.6667 for the test datasets, leading to a commendable 4th place rank in the public leaderboard. This innovative approach represents a notable breakthrough in the field and holds immense potential to substantially enhance motorcycle safety. By enabling real-time monitoring and enforcement capabilities, this system has the capacity to contribute towards increased compliance with helmet laws, thereby effectively reducing the risks faced by motorcycle riders and passengers. △ Less

Submitted 20 November, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

Journal ref: 2023 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)

arXiv:2206.01696 [pdf]

Deep Learning Prediction of Severe Health Risks for Pediatric COVID-19 Patients with a Large Feature Set in 2021 BARDA Data Challenge

Authors: Sajid Mahmud, Elham Soltanikazemi, Frimpong Boadu, Ashwin Dhakal, Jianlin Cheng

Abstract: Most children infected with COVID-19 have no or mild symptoms and can recover automatically by themselves, but some pediatric COVID-19 patients need to be hospitalized or even to receive intensive medical care (e.g., invasive mechanical ventilation or cardiovascular support) to recover from the illnesses. Therefore, it is critical to predict the severe health risk that COVID-19 infection poses to… ▽ More Most children infected with COVID-19 have no or mild symptoms and can recover automatically by themselves, but some pediatric COVID-19 patients need to be hospitalized or even to receive intensive medical care (e.g., invasive mechanical ventilation or cardiovascular support) to recover from the illnesses. Therefore, it is critical to predict the severe health risk that COVID-19 infection poses to children to provide precise and timely medical care for vulnerable pediatric COVID-19 patients. However, predicting the severe health risk for COVID-19 patients including children remains a significant challenge because many underlying medical factors affecting the risk are still largely unknown. In this work, instead of searching for a small number of most useful features to make prediction, we design a novel large-scale bag-of-words like method to represent various medical conditions and measurements of COVID-19 patients. After some simple feature filtering based on logistical regression, the large set of features is used with a deep learning method to predict both the hospitalization risk for COVID-19 infected children and the severe complication risk for the hospitalized pediatric COVID-19 patients. The method was trained and tested the datasets of the Biomedical Advanced Research and Development Authority (BARDA) Pediatric COVID-19 Data Challenge held from Sept. 15 to Dec. 17, 2021. The results show that the approach can rather accurately predict the risk of hospitalization and severe complication for pediatric COVID-19 patients and deep learning is more accurate than other machine learning methods. △ Less

Submitted 6 June, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

Comments: Acknowledgment updated, minor typos fixed

arXiv:2008.03602 [pdf, other]

Spatial Sharing of GPU for Autotuning DNN models

Authors: Aditya Dhakal, Junguk Cho, Sameer G. Kulkarni, K. K. Ramakrishnan, Puneet Sharma

Abstract: GPUs are used for training, inference, and tuning the machine learning models. However, Deep Neural Network (DNN) vary widely in their ability to exploit the full power of high-performance GPUs. Spatial sharing of GPU enables multiplexing several DNNs on the GPU and can improve GPU utilization, thus improving throughput and lowering latency. DNN models given just the right amount of GPU resources… ▽ More GPUs are used for training, inference, and tuning the machine learning models. However, Deep Neural Network (DNN) vary widely in their ability to exploit the full power of high-performance GPUs. Spatial sharing of GPU enables multiplexing several DNNs on the GPU and can improve GPU utilization, thus improving throughput and lowering latency. DNN models given just the right amount of GPU resources can still provide low inference latency, just as much as dedicating all of the GPU for their inference task. An approach to improve DNN inference is tuning of the DNN model. Autotuning frameworks find the optimal low-level implementation for a certain target device based on the trained machine learning model, thus reducing the DNN's inference latency and increasing inference throughput. We observe an interdependency between the tuned model and its inference latency. A DNN model tuned with specific GPU resources provides the best inference latency when inferred with close to the same amount of GPU resources. While a model tuned with the maximum amount of the GPU's resources has poorer inference latency once the GPU resources are limited for inference. On the other hand, a model tuned with an appropriate amount of GPU resources still achieves good inference latency across a wide range of GPU resource availability. We explore the causes that impact the tuning of a model at different amounts of GPU resources. We present many techniques to maximize resource utilization and improve tuning performance. We enable controlled spatial sharing of GPU to multiplex several tuning applications on the GPU. We scale the tuning server instances and shard the tuning model across multiple client instances for concurrent tuning of different operators of a model, achieving better GPU multiplexing. With our improvements, we decrease DNN autotuning time by up to 75 percent and increase throughput by a factor of 5. △ Less

Submitted 8 August, 2020; originally announced August 2020.

Showing 1–16 of 16 results for author: Dhakal, A