Search | arXiv e-print repository

Trillion Parameter AI Serving Infrastructure for Scientific Discovery: A Survey and Vision

Authors: Nathaniel Hudson, J. Gregory Pauloski, Matt Baughman, Alok Kamatar, Mansi Sakarvadia, Logan Ward, Ryan Chard, André Bauer, Maksim Levental, Wenyi Wang, Will Engler, Owen Price Skelly, Ben Blaiszik, Rick Stevens, Kyle Chard, Ian Foster

Abstract: Deep learning methods are transforming research, enabling new techniques, and ultimately leading to new discoveries. As the demand for more capable AI models continues to grow, we are now entering an era of Trillion Parameter Models (TPM), or models with more than a trillion parameters -- such as Huawei's PanGu-$Σ$. We describe a vision for the ecosystem of TPM users and providers that caters to t… ▽ More Deep learning methods are transforming research, enabling new techniques, and ultimately leading to new discoveries. As the demand for more capable AI models continues to grow, we are now entering an era of Trillion Parameter Models (TPM), or models with more than a trillion parameters -- such as Huawei's PanGu-$Σ$. We describe a vision for the ecosystem of TPM users and providers that caters to the specific needs of the scientific community. We then outline the significant technical challenges and open problems in system design for serving TPMs to enable scientific research and discovery. Specifically, we describe the requirements of a comprehensive software stack and interfaces to support the diverse and flexible requirements of researchers. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 10 pages, 3 figures, accepted for publication in the proceedings of the 10th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT2023)

arXiv:2305.09593 [pdf, other]

Accelerating Communications in Federated Applications with Transparent Object Proxies

Authors: J. Gregory Pauloski, Valerie Hayot-Sasson, Logan Ward, Nathaniel Hudson, Charlie Sabino, Matt Baughman, Kyle Chard, Ian Foster

Abstract: Advances in networks, accelerators, and cloud services encourage programmers to reconsider where to compute -- such as when fast networks make it cost-effective to compute on remote accelerators despite added latency. Workflow and cloud-hosted serverless computing frameworks can manage multi-step computations spanning federated collections of cloud, high-performance computing (HPC), and edge syste… ▽ More Advances in networks, accelerators, and cloud services encourage programmers to reconsider where to compute -- such as when fast networks make it cost-effective to compute on remote accelerators despite added latency. Workflow and cloud-hosted serverless computing frameworks can manage multi-step computations spanning federated collections of cloud, high-performance computing (HPC), and edge systems, but passing data among computational steps via cloud storage can incur high costs. Here, we overcome this obstacle with a new programming paradigm that decouples control flow from data flow by extending the pass-by-reference model to distributed applications. We describe ProxyStore, a system that implements this paradigm by providing object proxies that act as wide-area object references with just-in-time resolution. This proxy model enables data producers to communicate data unilaterally, transparently, and efficiently to both local and remote consumers. We demonstrate the benefits of this model with synthetic benchmarks and real-world scientific applications, running across various computing platforms. △ Less

Submitted 29 August, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

Comments: Accepted for publication at the International Conference for High Performance Computing, Networking, Storage and Analysis (SC23)

arXiv:2304.14982 [pdf]

Hierarchical and Decentralised Federated Learning

Authors: Omer Rana, Theodoros Spyridopoulos, Nathaniel Hudson, Matt Baughman, Kyle Chard, Ian Foster, Aftab Khan

Abstract: Federated learning has shown enormous promise as a way of training ML models in distributed environments while reducing communication costs and protecting data privacy. However, the rise of complex cyber-physical systems, such as the Internet-of-Things, presents new challenges that are not met with traditional FL methods. Hierarchical Federated Learning extends the traditional FL process to enable… ▽ More Federated learning has shown enormous promise as a way of training ML models in distributed environments while reducing communication costs and protecting data privacy. However, the rise of complex cyber-physical systems, such as the Internet-of-Things, presents new challenges that are not met with traditional FL methods. Hierarchical Federated Learning extends the traditional FL process to enable more efficient model aggregation based on application needs or characteristics of the deployment environment (e.g., resource capabilities and/or network connectivity). It illustrates the benefits of balancing processing across the cloud-edge continuum. Hierarchical Federated Learning is likely to be a key enabler for a wide range of applications, such as smart farming and smart energy management, as it can improve performance and reduce costs, whilst also enabling FL workflows to be deployed in environments that are not well-suited to traditional FL. Model aggregation algorithms, software frameworks, and infrastructures will need to be designed and implemented to make such solutions accessible to researchers and engineers across a growing set of domains. H-FL also introduces a number of new challenges. For instance, there are implicit infrastructural challenges. There is also a trade-off between having generalised models and personalised models. If there exist geographical patterns for data (e.g., soil conditions in a smart farm likely are related to the geography of the region itself), then it is crucial that models used locally can consider their own locality in addition to a globally-learned model. H-FL will be crucial to future FL solutions as it can aggregate and distribute models at multiple levels to optimally serve the trade-off between locality dependence and global anomaly robustness. △ Less

Submitted 28 April, 2023; originally announced April 2023.

Comments: 11 pages, 6 figures, 25 references

ACM Class: C.2.4; I.2.11

arXiv:1612.04891 [pdf]

Deep learning is effective for the classification of OCT images of normal versus Age-related Macular Degeneration

Authors: Cecilia S. Lee, Doug M. Baughman, Aaron Y. Lee

Abstract: Objective: The advent of Electronic Medical Records (EMR) with large electronic imaging databases along with advances in deep neural networks with machine learning has provided a unique opportunity to achieve milestones in automated image analysis. Optical coherence tomography (OCT) is the most commonly obtained imaging modality in ophthalmology and represents a dense and rich dataset when combine… ▽ More Objective: The advent of Electronic Medical Records (EMR) with large electronic imaging databases along with advances in deep neural networks with machine learning has provided a unique opportunity to achieve milestones in automated image analysis. Optical coherence tomography (OCT) is the most commonly obtained imaging modality in ophthalmology and represents a dense and rich dataset when combined with labels derived from the EMR. We sought to determine if deep learning could be utilized to distinguish normal OCT images from images from patients with Age-related Macular Degeneration (AMD). Methods: Automated extraction of an OCT imaging database was performed and linked to clinical endpoints from the EMR. OCT macula scans were obtained by Heidelberg Spectralis, and each OCT scan was linked to EMR clinical endpoints extracted from EPIC. The central 11 images were selected from each OCT scan of two cohorts of patients: normal and AMD. Cross-validation was performed using a random subset of patients. Area under receiver operator curves (auROC) were constructed at an independent image level, macular OCT level, and patient level. Results: Of an extraction of 2.6 million OCT images linked to clinical datapoints from the EMR, 52,690 normal and 48,312 AMD macular OCT images were selected. A deep neural network was trained to categorize images as either normal or AMD. At the image level, we achieved an auROC of 92.78% with an accuracy of 87.63%. At the macula level, we achieved an auROC of 93.83% with an accuracy of 88.98%. At a patient level, we achieved an auROC of 97.45% with an accuracy of 93.45%. Peak sensitivity and specificity with optimal cutoffs were 92.64% and 93.69% respectively. Conclusions: Deep learning techniques are effective for classifying OCT images. These findings have important implications in utilizing OCT in automated screening and computer aided diagnosis tools. △ Less

Submitted 14 December, 2016; originally announced December 2016.

Comments: 4 Figures, 1 Table

arXiv:1010.3172 [pdf, ps, other]

doi 10.1016/j.astropartphys.2010.07.002

CRT: A numerical tool for propagating ultra-high energy cosmic rays through Galactic magnetic field models

Authors: Michael S. Sutherland, Brian M. Baughman, James J. Beatty

Abstract: Deflection of ultra high energy cosmic rays (UHECRs) by the Galactic magnetic field (GMF) may be sufficiently strong to hinder identification of the UHECR source distribution. A common method for determining the effect of GMF models on source identification efforts is backtracking cosmic rays. We present the public numerical tool CRT for propagating charged particles through Galactic magnetic fiel… ▽ More Deflection of ultra high energy cosmic rays (UHECRs) by the Galactic magnetic field (GMF) may be sufficiently strong to hinder identification of the UHECR source distribution. A common method for determining the effect of GMF models on source identification efforts is backtracking cosmic rays. We present the public numerical tool CRT for propagating charged particles through Galactic magnetic field models by numerically integrating the relativistic equation of motion. It is capable of both forward- and back-tracking particles with varying compositions through pre-defined and custom user-created magnetic fields. These particles are injected from various types of sources specified and distributed according to the user. Here, we present a description of some source and magnetic field model implementations, as well as validation of the integration routines. △ Less

Submitted 15 October, 2010; originally announced October 2010.

Comments: 12 pages, 9 figures

Journal ref: Astroparticle Physics, Volume 34, Issue 4, p. 198-204. (2010)

Showing 1–5 of 5 results for author: Baughman, M