-
The infrastructure powering IBM's Gen AI model development
Authors:
Talia Gershon,
Seetharami Seelam,
Brian Belgodere,
Milton Bonilla,
Lan Hoang,
Danny Barnett,
I-Hsin Chung,
Apoorve Mohan,
Ming-Hung Chen,
Lixiang Luo,
Robert Walkup,
Constantinos Evangelinos,
Shweta Salaria,
Marc Dombrowa,
Yoonho Park,
Apo Kayi,
Liran Schour,
Alim Alim,
Ali Sydney,
Pavlos Maniotis,
Laurent Schares,
Bernard Metzler,
Bengi Karacali-Akyamac,
Sophia Wen,
Tatsuhiro Chiba
, et al. (121 additional authors not shown)
Abstract:
AI Infrastructure plays a key role in the speed and cost-competitiveness of develo** and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering effi…
▽ More
AI Infrastructure plays a key role in the speed and cost-competitiveness of develo** and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering efficient and high-performing AI training requires an end-to-end solution that combines hardware, software and holistic telemetry to cater for multiple types of AI workloads. In this report, we describe IBM's hybrid cloud infrastructure that powers our generative AI model development. This infrastructure includes (1) Vela: an AI-optimized supercomputing capability directly integrated into the IBM Cloud, delivering scalable, dynamic, multi-tenant and geographically distributed infrastructure for large-scale model training and other AI workflow steps and (2) Blue Vela: a large-scale, purpose-built, on-premises hosting environment that is optimized to support our largest and most ambitious AI model training tasks. Vela provides IBM with the dual benefit of high performance for internal use along with the flexibility to adapt to an evolving commercial landscape. Blue Vela provides us with the benefits of rapid development of our largest and most ambitious models, as well as future-proofing against the evolving model landscape in the industry. Taken together, they provide IBM with the ability to rapidly innovate in the development of both AI models and commercial offerings.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Efficacy of Machine-Generated Instructions
Authors:
Samaksh Gulati,
Anshit Verma,
Manoj Parmar,
Palash Chaudhary
Abstract:
Large "instruction-tuned" language models (i.e., finetuned to respond to instructions) have demonstrated a remarkable ability to generalize zero-shot to new tasks. Nevertheless, they depend heavily on human-written instruction data that is often limited in quantity, diversity, and creativity, therefore hindering the generality of the tuned model. We conducted a quantitative study to figure out the…
▽ More
Large "instruction-tuned" language models (i.e., finetuned to respond to instructions) have demonstrated a remarkable ability to generalize zero-shot to new tasks. Nevertheless, they depend heavily on human-written instruction data that is often limited in quantity, diversity, and creativity, therefore hindering the generality of the tuned model. We conducted a quantitative study to figure out the efficacy of machine-generated annotations, where we compare the results of a fine-tuned BERT model with human v/s machine-generated annotations. Applying our methods to the vanilla GPT-3 model, we saw that machine generated annotations were 78.54% correct and the fine-tuned model achieved a 96.01% model performance compared to the performance with human-labelled annotations. This result shows that machine-generated annotations are a resource and cost effective way to fine-tune down-stream models.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Towards Enhanced Human Activity Recognition through Natural Language Generation and Pose Estimation
Authors:
Nikhil Kashyap,
Manas Satish Bedmutha,
Prerit Chaudhary,
Brian Wood,
Wanda Pratt,
Janice Sabin,
Andrea Hartzler,
Nadir Weibel
Abstract:
Vision-based human activity recognition (HAR) has made substantial progress in recognizing predefined gestures but lacks adaptability for emerging activities. This paper introduces a paradigm shift by harnessing generative modeling and large language models (LLMs) to enhance vision-based HAR. We propose utilizing LLMs to generate descriptive textual representations of activities using pose keypoin…
▽ More
Vision-based human activity recognition (HAR) has made substantial progress in recognizing predefined gestures but lacks adaptability for emerging activities. This paper introduces a paradigm shift by harnessing generative modeling and large language models (LLMs) to enhance vision-based HAR. We propose utilizing LLMs to generate descriptive textual representations of activities using pose keypoints as an intermediate representation. Incorporating pose keypoints adds contextual depth to the recognition process, allowing for sequences of vectors resembling text chunks, compatible with LLMs. This innovative fusion of computer vision and natural language processing holds significant potential for revolutionizing activity recognition. A proof of concept study on a Kinetics700 dataset subset validates the approach's efficacy, highlighting improved accuracy and interpretability. Future implications encompass enhanced accuracy, novel research avenues, model generalization, and ethical considerations for transparency. This framework has real-world applications, including personalized gym workout feedback and nuanced sports training insights. By connecting visual cues to interpretable textual descriptions, the proposed framework advances HAR accuracy and applicability, sha** the landscape of pervasive computing and activity recognition research. As this approach evolves, it promises a more insightful understanding of human activities across diverse contexts, marking a significant step towards a better world.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Detect and Classify IoT Camera Traffic
Authors:
Priyanka Rushikesh Chaudhary,
Rajib Ranjan Maiti
Abstract:
Deployment of IoT cameras in an organization threatens security and privacy policies, and the classification of network traffic without using IP addresses and port numbers has been challenging. In this paper, we have designed, implemented and deployed a system called iCamInspector to classify network traffic arising from IoT camera in a mixed networking environment. We have collected a total of ab…
▽ More
Deployment of IoT cameras in an organization threatens security and privacy policies, and the classification of network traffic without using IP addresses and port numbers has been challenging. In this paper, we have designed, implemented and deployed a system called iCamInspector to classify network traffic arising from IoT camera in a mixed networking environment. We have collected a total of about 36GB of network traffic containing video data from three different types of applications (four online audio/video conferencing applications, two video sharing applications and six IoT camera from different manufacturers) in our IoT laboratory. We show that with the help of a limited number of flow-based features, iCamInspector achieves an average accuracy of more than 98% in a 10-fold cross-validation with a false rate of about 1.5% in testing phase of the system. A real deployment of our system in an unseen environment achieves a commendable performance of detecting IoT camera with an average detection probability higher than 0.9.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
Search Disaster Victims using Sound Source Localization
Authors:
Abhish Khanal,
Deepak Chand,
Prakash Chaudhary,
Subash Timilsina,
Sanjeeb Prasad Panday,
Aman Shakya,
Rom Kant Pandey
Abstract:
Sound Source Localization (SSL) are used to estimate the position of sound sources. Various methods have been used for detecting sound and its localization. This paper presents a system for stationary sound source localization by cubical microphone array consisting of eight microphones placed on four vertical adjacent faces which is mounted on three wheel omni-directional drive for the inspection…
▽ More
Sound Source Localization (SSL) are used to estimate the position of sound sources. Various methods have been used for detecting sound and its localization. This paper presents a system for stationary sound source localization by cubical microphone array consisting of eight microphones placed on four vertical adjacent faces which is mounted on three wheel omni-directional drive for the inspection and monitoring of the disaster victims in disaster areas. The proposed method localizes sound source on a 3D space by grid search method using Generalized Cross Correlation Phase Transform (GCC-PHAT) which is robust when operating in real life scenario where there is lack of visibility. The computed azimuth and elevation angle of victimized human voice are fed to embedded omni-directional drive system which navigates the vehicle automatically towards the stationary sound source.
△ Less
Submitted 10 March, 2021;
originally announced March 2021.
-
Water level prediction from social media images with a multi-task ranking approach
Authors:
P. Chaudhary,
S. D'Aronco,
J. P. Leitao,
K. Schindler,
J. D. Wegner
Abstract:
Floods are among the most frequent and catastrophic natural disasters and affect millions of people worldwide. It is important to create accurate flood maps to plan (offline) and conduct (real-time) flood mitigation and flood rescue operations. Arguably, images collected from social media can provide useful information for that task, which would otherwise be unavailable. We introduce a computer vi…
▽ More
Floods are among the most frequent and catastrophic natural disasters and affect millions of people worldwide. It is important to create accurate flood maps to plan (offline) and conduct (real-time) flood mitigation and flood rescue operations. Arguably, images collected from social media can provide useful information for that task, which would otherwise be unavailable. We introduce a computer vision system that estimates water depth from social media images taken during flooding events, in order to build flood maps in (near) real-time. We propose a multi-task (deep) learning approach, where a model is trained using both a regression and a pairwise ranking loss. Our approach is motivated by the observation that a main bottleneck for image-based flood level estimation is training data: it is diffcult and requires a lot of effort to annotate uncontrolled images with the correct water depth. We demonstrate how to effciently learn a predictor from a small set of annotated water levels and a larger set of weaker annotations that only indicate in which of two images the water level is higher, and are much easier to obtain. Moreover, we provide a new dataset, named DeepFlood, with 8145 annotated ground-level images, and show that the proposed multi-task approach can predict the water level from a single, crowd-sourced image with ~11 cm root mean square error.
△ Less
Submitted 13 July, 2020;
originally announced July 2020.
-
Achieving Starvation-Freedom in Multi-Version Transactional Memory Systems
Authors:
Ved Prakash Chaudhary,
Chirag Juyal,
Sandeep Kulkarni,
Sweta Kumari,
Sathya Peri
Abstract:
Software Transactional Memory systems (STMs) have garnered significant interest as an elegant alternative for addressing synchronization and concurrency issues with multi-threaded programming in multi-core systems. Client programs use STMs by issuing transactions. STM ensures that transaction either commits or aborts. A transaction aborted due to conflicts is typically re-issued with the expectati…
▽ More
Software Transactional Memory systems (STMs) have garnered significant interest as an elegant alternative for addressing synchronization and concurrency issues with multi-threaded programming in multi-core systems. Client programs use STMs by issuing transactions. STM ensures that transaction either commits or aborts. A transaction aborted due to conflicts is typically re-issued with the expectation that it will complete successfully in a subsequent incarnation. However, many existing STMs fail to provide starvation freedom, i.e., in these systems, it is possible that concurrency conflicts may prevent an incarnated transaction from committing. To overcome this limitation, we systematically derive a novel starvation free algorithm for multi-version STM. Our algorithm can be used either with the case where the number of versions is unbounded and garbage collection is used or where only the latest K versions are maintained, KSFTM. We have demonstrated that our proposed algorithm performs better than existing state-of-the-art STMs.
△ Less
Submitted 22 March, 2019; v1 submitted 4 September, 2017;
originally announced September 2017.