-
The infrastructure powering IBM's Gen AI model development
Authors:
Talia Gershon,
Seetharami Seelam,
Brian Belgodere,
Milton Bonilla,
Lan Hoang,
Danny Barnett,
I-Hsin Chung,
Apoorve Mohan,
Ming-Hung Chen,
Lixiang Luo,
Robert Walkup,
Constantinos Evangelinos,
Shweta Salaria,
Marc Dombrowa,
Yoonho Park,
Apo Kayi,
Liran Schour,
Alim Alim,
Ali Sydney,
Pavlos Maniotis,
Laurent Schares,
Bernard Metzler,
Bengi Karacali-Akyamac,
Sophia Wen,
Tatsuhiro Chiba
, et al. (121 additional authors not shown)
Abstract:
AI Infrastructure plays a key role in the speed and cost-competitiveness of develo** and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering effi…
▽ More
AI Infrastructure plays a key role in the speed and cost-competitiveness of develo** and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering efficient and high-performing AI training requires an end-to-end solution that combines hardware, software and holistic telemetry to cater for multiple types of AI workloads. In this report, we describe IBM's hybrid cloud infrastructure that powers our generative AI model development. This infrastructure includes (1) Vela: an AI-optimized supercomputing capability directly integrated into the IBM Cloud, delivering scalable, dynamic, multi-tenant and geographically distributed infrastructure for large-scale model training and other AI workflow steps and (2) Blue Vela: a large-scale, purpose-built, on-premises hosting environment that is optimized to support our largest and most ambitious AI model training tasks. Vela provides IBM with the dual benefit of high performance for internal use along with the flexibility to adapt to an evolving commercial landscape. Blue Vela provides us with the benefits of rapid development of our largest and most ambitious models, as well as future-proofing against the evolving model landscape in the industry. Taken together, they provide IBM with the ability to rapidly innovate in the development of both AI models and commercial offerings.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Viewport-Aware Deep Reinforcement Learning Approach for 360$^o$ Video Caching
Authors:
Pantelis Maniotis,
Nikolaos Thomos
Abstract:
360$^o$ video is an essential component of VR/AR/MR systems that provides immersive experience to the users. However, 360$^o$ video is associated with high bandwidth requirements. The required bandwidth can be reduced by exploiting the fact that users are interested in viewing only a part of the video scene and that users request viewports that overlap with each other. Motivated by the findings of…
▽ More
360$^o$ video is an essential component of VR/AR/MR systems that provides immersive experience to the users. However, 360$^o$ video is associated with high bandwidth requirements. The required bandwidth can be reduced by exploiting the fact that users are interested in viewing only a part of the video scene and that users request viewports that overlap with each other. Motivated by the findings of recent works where the benefits of caching video tiles at edge servers instead of caching entire 360$^o$ videos were shown, in this paper, we introduce the concept of virtual viewports that have the same number of tiles with the original viewports. The tiles forming these viewports are the most popular ones for each video and are determined by the users' requests. Then, we propose a proactive caching scheme that assumes unknown videos' and viewports' popularity. Our scheme determines which videos to cache as well as which is the optimal virtual viewport per video. Virtual viewports permit to lower the dimensionality of the cache optimization problem. To solve the problem, we first formulate the content placement of 360$^o$ videos in edge cache networks as a Markov Decision Process (MDP), and then we determine the optimal caching placement using the Deep Q-Network (DQN) algorithm. The proposed solution aims at maximizing the overall quality of the 360$^o$ videos delivered to the end-users by caching the most popular 360$^o$ videos at base quality along with a virtual viewport in high quality. We extensively evaluate the performance of the proposed system and compare it with that of known systems such as LFU, LRU, FIFO, over both synthetic and real 360$^o$ video traces. The results reveal the large benefits coming from proactive caching of virtual viewports instead of the original ones in terms of the overall quality of the rendered viewports, the cache hit ratio, and the servicing cost.
△ Less
Submitted 10 April, 2020; v1 submitted 18 March, 2020;
originally announced March 2020.
-
Tile-Based Joint Caching and Delivery of $360^o$ Videos in Heterogeneous Networks
Authors:
Pantelis Maniotis,
Eirina Bourtsoulatze,
Nikolaos Thomos
Abstract:
The recent surge of applications involving the use of $360^o$ video challenges mobile networks infrastructure, as $360^o$ video files are of significant size, and current delivery and edge caching architectures are unable to guarantee their timely delivery. In this paper, we investigate the problem of joint collaborative content-aware caching and delivery of $360^o$ videos in a video on demand set…
▽ More
The recent surge of applications involving the use of $360^o$ video challenges mobile networks infrastructure, as $360^o$ video files are of significant size, and current delivery and edge caching architectures are unable to guarantee their timely delivery. In this paper, we investigate the problem of joint collaborative content-aware caching and delivery of $360^o$ videos in a video on demand setting. The proposed scheme takes advantage of $360^o$ video encoding in multiple tiles and layers to make fine-grained decisions regarding which tiles to cache in each Small Base Station (SBS), and where to deliver them from to the end users, as users may reside in the coverage area of multiple SBSs. This permits to cache the most popular tiles in the SBSs, while the remaining tiles may be obtained through the backhaul. In addition, we explicitly consider the time delivery constraints to ensure continuous video playback. To reduce the computational complexity of the optimization problem, we simplify it by introducing a fairness constraint. This allows us to split the original problem into subproblems corresponding to Groups of Pictures (GoP). Each of the subproblems is then solved with the method of Lagrange partial relaxation. Finally, we evaluate the performance of the proposed method for various system parameters and compare it with schemes that do not consider $360^o$ video encoding into multiple tiles and quality layers, as well as with two variants of the proposed method one that considers layered encoding and SBSs collaboration and another that uses tiles encoding but with no SBSs collaboration. The results showcase the benefits coming from caching and delivery decisions on per tile basis and the importance of exploiting SBSs collaboration.
△ Less
Submitted 25 October, 2019; v1 submitted 25 February, 2019;
originally announced February 2019.