-
Training Next Generation AI Users and Developers at NCSA
Authors:
Daniel S. Katz,
Volodymyr Kindratenko,
Olena Kindratenko,
Priyam Mazumdar
Abstract:
This article focuses on training work carried out in artificial intelligence (AI) at the National Center for Supercomputing Applications (NCSA) at the University of Illinois Urbana-Champaign via a research experience for undergraduates (REU) program named FoDOMMaT. It also describes why we are interested in AI, and concludes by discussing what we've learned from running this program and its predec…
▽ More
This article focuses on training work carried out in artificial intelligence (AI) at the National Center for Supercomputing Applications (NCSA) at the University of Illinois Urbana-Champaign via a research experience for undergraduates (REU) program named FoDOMMaT. It also describes why we are interested in AI, and concludes by discussing what we've learned from running this program and its predecessor over six years.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Secure Federated Learning Across Heterogeneous Cloud and High-Performance Computing Resources -- A Case Study on Federated Fine-tuning of LLaMA 2
Authors:
Zilinghan Li,
Shilan He,
Pranshu Chaturvedi,
Volodymyr Kindratenko,
Eliu A Huerta,
Kibaek Kim,
Ravi Madduri
Abstract:
Federated learning enables multiple data owners to collaboratively train robust machine learning models without transferring large or sensitive local datasets by only sharing the parameters of the locally trained models. In this paper, we elaborate on the design of our Advanced Privacy-Preserving Federated Learning (APPFL) framework, which streamlines end-to-end secure and reliable federated learn…
▽ More
Federated learning enables multiple data owners to collaboratively train robust machine learning models without transferring large or sensitive local datasets by only sharing the parameters of the locally trained models. In this paper, we elaborate on the design of our Advanced Privacy-Preserving Federated Learning (APPFL) framework, which streamlines end-to-end secure and reliable federated learning experiments across cloud computing facilities and high-performance computing resources by leveraging Globus Compute, a distributed function as a service platform, and Amazon Web Services. We further demonstrate the use case of APPFL in fine-tuning a LLaMA 2 7B model using several cloud resources and supercomputers.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Leveraging Large Language Models to Build and Execute Computational Workflows
Authors:
Alejandro Duque,
Abdullah Syed,
Kastan V. Day,
Matthew J. Berry,
Daniel S. Katz,
Volodymyr V. Kindratenko
Abstract:
The recent development of large language models (LLMs) with multi-billion parameters, coupled with the creation of user-friendly application programming interfaces (APIs), has paved the way for automatically generating and executing code in response to straightforward human queries. This paper explores how these emerging capabilities can be harnessed to facilitate complex scientific workflows, eli…
▽ More
The recent development of large language models (LLMs) with multi-billion parameters, coupled with the creation of user-friendly application programming interfaces (APIs), has paved the way for automatically generating and executing code in response to straightforward human queries. This paper explores how these emerging capabilities can be harnessed to facilitate complex scientific workflows, eliminating the need for traditional coding methods. We present initial findings from our attempt to integrate Phyloflow with OpenAI's function-calling API, and outline a strategy for develo** a comprehensive workflow management system based on these concepts.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
FedCompass: Efficient Cross-Silo Federated Learning on Heterogeneous Client Devices using a Computing Power Aware Scheduler
Authors:
Zilinghan Li,
Pranshu Chaturvedi,
Shilan He,
Han Chen,
Gagandeep Singh,
Volodymyr Kindratenko,
E. A. Huerta,
Kibaek Kim,
Ravi Madduri
Abstract:
Cross-silo federated learning offers a promising solution to collaboratively train robust and generalized AI models without compromising the privacy of local datasets, e.g., healthcare, financial, as well as scientific projects that lack a centralized data facility. Nonetheless, because of the disparity of computing resources among different clients (i.e., device heterogeneity), synchronous federa…
▽ More
Cross-silo federated learning offers a promising solution to collaboratively train robust and generalized AI models without compromising the privacy of local datasets, e.g., healthcare, financial, as well as scientific projects that lack a centralized data facility. Nonetheless, because of the disparity of computing resources among different clients (i.e., device heterogeneity), synchronous federated learning algorithms suffer from degraded efficiency when waiting for straggler clients. Similarly, asynchronous federated learning algorithms experience degradation in the convergence rate and final model accuracy on non-identically and independently distributed (non-IID) heterogeneous datasets due to stale local models and client drift. To address these limitations in cross-silo federated learning with heterogeneous clients and data, we propose FedCompass, an innovative semi-asynchronous federated learning algorithm with a computing power-aware scheduler on the server side, which adaptively assigns varying amounts of training tasks to different clients using the knowledge of the computing power of individual clients. FedCompass ensures that multiple locally trained models from clients are received almost simultaneously as a group for aggregation, effectively reducing the staleness of local models. At the same time, the overall training process remains asynchronous, eliminating prolonged waiting periods from straggler clients. Using diverse non-IID heterogeneous distributed datasets, we demonstrate that FedCompass achieves faster convergence and higher accuracy than other asynchronous algorithms while remaining more efficient than synchronous algorithms when performing federated learning on heterogeneous clients. The source code for FedCompass is available at https://github.com/APPFL/FedCompass.
△ Less
Submitted 11 March, 2024; v1 submitted 26 September, 2023;
originally announced September 2023.
-
Self-Supervised Masked Digital Elevation Models Encoding for Low-Resource Downstream Tasks
Authors:
Priyam Mazumdar,
Aiman Soliman,
Volodymyr Kindratenko,
Luigi Marini,
Kenton McHenry
Abstract:
The lack of quality labeled data is one of the main bottlenecks for training Deep Learning models. As the task increases in complexity, there is a higher penalty for overfitting and unstable learning. The typical paradigm employed today is Self-Supervised learning, where the model attempts to learn from a large corpus of unstructured and unlabeled data and then transfer that knowledge to the requi…
▽ More
The lack of quality labeled data is one of the main bottlenecks for training Deep Learning models. As the task increases in complexity, there is a higher penalty for overfitting and unstable learning. The typical paradigm employed today is Self-Supervised learning, where the model attempts to learn from a large corpus of unstructured and unlabeled data and then transfer that knowledge to the required task. Some notable examples of self-supervision in other modalities are BERT for Large Language Models, Wav2Vec for Speech Recognition, and the Masked AutoEncoder for Vision, which all utilize Transformers to solve a masked prediction task. GeoAI is uniquely poised to take advantage of the self-supervised methodology due to the decades of data collected, little of which is precisely and dependably annotated. Our goal is to extract building and road segmentations from Digital Elevation Models (DEM) that provide a detailed topography of the earths surface. The proposed architecture is the Masked Autoencoder pre-trained on ImageNet (with the limitation that there is a large domain discrepancy between ImageNet and DEM) with an UperNet Head for decoding segmentations. We tested this model with 450 and 50 training images only, utilizing roughly 5% and 0.5% of the original data respectively. On the building segmentation task, this model obtains an 82.1% Intersection over Union (IoU) with 450 Images and 69.1% IoU with only 50 images. On the more challenging road detection task the model obtains an 82.7% IoU with 450 images and 73.2% IoU with only 50 images. Any hand-labeled dataset made today about the earths surface will be immediately obsolete due to the constantly changing nature of the landscape. This motivates the clear necessity for data-efficient learners that can be used for a wide variety of downstream tasks.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
APPFLx: Providing Privacy-Preserving Cross-Silo Federated Learning as a Service
Authors:
Zilinghan Li,
Shilan He,
Pranshu Chaturvedi,
Trung-Hieu Hoang,
Minseok Ryu,
E. A. Huerta,
Volodymyr Kindratenko,
Jordan Fuhrman,
Maryellen Giger,
Ryan Chard,
Kibaek Kim,
Ravi Madduri
Abstract:
Cross-silo privacy-preserving federated learning (PPFL) is a powerful tool to collaboratively train robust and generalized machine learning (ML) models without sharing sensitive (e.g., healthcare of financial) local data. To ease and accelerate the adoption of PPFL, we introduce APPFLx, a ready-to-use platform that provides privacy-preserving cross-silo federated learning as a service. APPFLx empl…
▽ More
Cross-silo privacy-preserving federated learning (PPFL) is a powerful tool to collaboratively train robust and generalized machine learning (ML) models without sharing sensitive (e.g., healthcare of financial) local data. To ease and accelerate the adoption of PPFL, we introduce APPFLx, a ready-to-use platform that provides privacy-preserving cross-silo federated learning as a service. APPFLx employs Globus authentication to allow users to easily and securely invite trustworthy collaborators for PPFL, implements several synchronous and asynchronous FL algorithms, streamlines the FL experiment launch process, and enables tracking and visualizing the life cycle of FL experiments, allowing domain experts and ML practitioners to easily orchestrate and evaluate cross-silo FL under one platform. APPFLx is available online at https://appflx.link
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
One-shot Generative Data Augmentation with Bounded Divergence for UAV Identification in Limited RF Environments
Authors:
Amir Kazemi,
Salar Basiri,
Volodymyr Kindratenko,
Srinivasa Salapaka
Abstract:
This work addresses the pressing need for cybersecurity in Unmanned Aerial Vehicles (UAVs), particularly focusing on the challenges of identifying UAVs using radiofrequency (RF) fingerprinting in constrained environments. The complexity and variability of RF signals, influenced by environmental interference and hardware imperfections, often render traditional RF-based identification methods ineffe…
▽ More
This work addresses the pressing need for cybersecurity in Unmanned Aerial Vehicles (UAVs), particularly focusing on the challenges of identifying UAVs using radiofrequency (RF) fingerprinting in constrained environments. The complexity and variability of RF signals, influenced by environmental interference and hardware imperfections, often render traditional RF-based identification methods ineffective. To address these complications, the study introduces the rigorous use of one-shot generative methods for augmenting transformed RF signals, offering a significant improvement in UAV identification. This approach shows promise in low-data regimes, outperforming deep generative methods like conditional generative adversarial networks (GANs) and variational autoencoders (VAEs). The paper provides a theoretical guarantee for the effectiveness of one-shot generative models in augmenting limited data, setting a precedent for their application in limited RF environments. This research not only contributes to the cybersecurity of UAVs but also rigorously broadens the scope of machine learning techniques in data-constrained scenarios, which may include atypical complex sequences beyond images and videos.
△ Less
Submitted 14 May, 2024; v1 submitted 19 January, 2023;
originally announced January 2023.
-
FAIR AI Models in High Energy Physics
Authors:
Javier Duarte,
Haoyang Li,
Avik Roy,
Ruike Zhu,
E. A. Huerta,
Daniel Diaz,
Philip Harris,
Raghav Kansal,
Daniel S. Katz,
Ishaan H. Kavoori,
Volodymyr V. Kindratenko,
Farouk Mokhtar,
Mark S. Neubauer,
Sang Eon Park,
Melissa Quinnan,
Roger Rusack,
Zhizhen Zhao
Abstract:
The findable, accessible, interoperable, and reusable (FAIR) data principles provide a framework for examining, evaluating, and improving how data is shared to facilitate scientific discovery. Generalizing these principles to research software and other digital products is an active area of research. Machine learning (ML) models -- algorithms that have been trained on data without being explicitly…
▽ More
The findable, accessible, interoperable, and reusable (FAIR) data principles provide a framework for examining, evaluating, and improving how data is shared to facilitate scientific discovery. Generalizing these principles to research software and other digital products is an active area of research. Machine learning (ML) models -- algorithms that have been trained on data without being explicitly programmed -- and more generally, artificial intelligence (AI) models, are an important target for this because of the ever-increasing pace with which AI is transforming scientific domains, such as experimental high energy physics (HEP). In this paper, we propose a practical definition of FAIR principles for AI models in HEP and describe a template for the application of these principles. We demonstrate the template's use with an example AI model applied to HEP, in which a graph neural network is used to identify Higgs bosons decaying to two bottom quarks. We report on the robustness of this FAIR AI model, its portability across hardware architectures and software frameworks, and its interpretability.
△ Less
Submitted 29 December, 2023; v1 submitted 9 December, 2022;
originally announced December 2022.
-
FAIR for AI: An interdisciplinary and international community building perspective
Authors:
E. A. Huerta,
Ben Blaiszik,
L. Catherine Brinson,
Kristofer E. Bouchard,
Daniel Diaz,
Caterina Doglioni,
Javier M. Duarte,
Murali Emani,
Ian Foster,
Geoffrey Fox,
Philip Harris,
Lukas Heinrich,
Shantenu Jha,
Daniel S. Katz,
Volodymyr Kindratenko,
Christine R. Kirkpatrick,
Kati Lassila-Perini,
Ravi K. Madduri,
Mark S. Neubauer,
Fotis E. Psomopoulos,
Avik Roy,
Oliver Rübel,
Zhizhen Zhao,
Ruike Zhu
Abstract:
A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were proposed in 2016 as prerequisites for proper data management and stewardship, with the goal of enabling the reusability of scholarly data. The principles were also meant to apply to other digital assets, at a high level, and over time, the FAIR guiding principles have been re-interpreted or extended to i…
▽ More
A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were proposed in 2016 as prerequisites for proper data management and stewardship, with the goal of enabling the reusability of scholarly data. The principles were also meant to apply to other digital assets, at a high level, and over time, the FAIR guiding principles have been re-interpreted or extended to include the software, tools, algorithms, and workflows that produce data. FAIR principles are now being adapted in the context of AI models and datasets. Here, we present the perspectives, vision, and experiences of researchers from different countries, disciplines, and backgrounds who are leading the definition and adoption of FAIR principles in their communities of practice, and discuss outcomes that may result from pursuing and incentivizing FAIR AI research. The material for this report builds on the FAIR for AI Workshop held at Argonne National Laboratory on June 7, 2022.
△ Less
Submitted 1 August, 2023; v1 submitted 30 September, 2022;
originally announced October 2022.
-
Weakly Supervised Two-Stage Training Scheme for Deep Video Fight Detection Model
Authors:
Zhenting Qi,
Ruike Zhu,
Zheyu Fu,
Wenhao Chai,
Volodymyr Kindratenko
Abstract:
Fight detection in videos is an emerging deep learning application with today's prevalence of surveillance systems and streaming media. Previous work has largely relied on action recognition techniques to tackle this problem. In this paper, we propose a simple but effective method that solves the task from a new perspective: we design the fight detection model as a composition of an action-aware f…
▽ More
Fight detection in videos is an emerging deep learning application with today's prevalence of surveillance systems and streaming media. Previous work has largely relied on action recognition techniques to tackle this problem. In this paper, we propose a simple but effective method that solves the task from a new perspective: we design the fight detection model as a composition of an action-aware feature extractor and an anomaly score generator. Also, considering that collecting frame-level labels for videos is too laborious, we design a weakly supervised two-stage training scheme, where we utilize multiple-instance-learning loss calculated on video-level labels to train the score generator, and adopt the self-training technique to further improve its performance. Extensive experiments on a publicly available large-scale dataset, UBI-Fights, demonstrate the effectiveness of our method, and the performance on the dataset exceeds several previous state-of-the-art approaches. Furthermore, we collect a new dataset, VFD-2000, that specializes in video fight detection, with a larger scale and more scenarios than existing datasets. The implementation of our method and the proposed dataset will be publicly available at https://github.com/Hepta-Col/VideoFightDetection.
△ Less
Submitted 23 September, 2022;
originally announced September 2022.
-
AGNet: Weighing Black Holes with Deep Learning
Authors:
Joshua Yao-Yu Lin,
Sneh Pandya,
Devanshi Pratap,
Xin Liu,
Matias Carrasco Kind,
Volodymyr Kindratenko
Abstract:
Supermassive black holes (SMBHs) are ubiquitously found at the centers of most massive galaxies. Measuring SMBH mass is important for understanding the origin and evolution of SMBHs. However, traditional methods require spectroscopic data which is expensive to gather. We present an algorithm that weighs SMBHs using quasar light time series, circumventing the need for expensive spectra. We train, v…
▽ More
Supermassive black holes (SMBHs) are ubiquitously found at the centers of most massive galaxies. Measuring SMBH mass is important for understanding the origin and evolution of SMBHs. However, traditional methods require spectroscopic data which is expensive to gather. We present an algorithm that weighs SMBHs using quasar light time series, circumventing the need for expensive spectra. We train, validate, and test neural networks that directly learn from the Sloan Digital Sky Survey (SDSS) Stripe 82 light curves for a sample of $38,939$ spectroscopically confirmed quasars to map out the nonlinear encoding between SMBH mass and multi-color optical light curves. We find a 1$σ$ scatter of 0.37 dex between the predicted SMBH mass and the fiducial virial mass estimate based on SDSS single-epoch spectra, which is comparable to the systematic uncertainty in the virial mass estimate. Our results have direct implications for more efficient applications with future observations from the Vera C. Rubin Observatory. Our code, \textsf{AGNet}, is publicly available at \url{https://github.com/snehjp2/AGNet}.
△ Less
Submitted 21 November, 2022; v1 submitted 17 August, 2021;
originally announced August 2021.
-
A FAIR and AI-ready Higgs boson decay dataset
Authors:
Yifan Chen,
E. A. Huerta,
Javier Duarte,
Philip Harris,
Daniel S. Katz,
Mark S. Neubauer,
Daniel Diaz,
Farouk Mokhtar,
Raghav Kansal,
Sang Eon Park,
Volodymyr V. Kindratenko,
Zhizhen Zhao,
Roger Rusack
Abstract:
To enable the reusability of massive scientific datasets by humans and machines, researchers aim to adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR) for data and artificial intelligence (AI) models. This article provides a domain-agnostic, step-by-step assessment guide to evaluate whether or not a given dataset meets these principles. We demonstrate…
▽ More
To enable the reusability of massive scientific datasets by humans and machines, researchers aim to adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR) for data and artificial intelligence (AI) models. This article provides a domain-agnostic, step-by-step assessment guide to evaluate whether or not a given dataset meets these principles. We demonstrate how to use this guide to evaluate the FAIRness of an open simulated dataset produced by the CMS Collaboration at the CERN Large Hadron Collider. This dataset consists of Higgs boson decays and quark and gluon background, and is available through the CERN Open Data Portal. We use additional available tools to assess the FAIRness of this dataset, and incorporate feedback from members of the FAIR community to validate our results. This article is accompanied by a Jupyter notebook to visualize and explore this dataset. This study marks the first in a planned series of articles that will guide scientists in the creation of FAIR AI models and datasets in high energy particle physics.
△ Less
Submitted 16 February, 2022; v1 submitted 4 August, 2021;
originally announced August 2021.
-
Accelerated, Scalable and Reproducible AI-driven Gravitational Wave Detection
Authors:
E. A. Huerta,
Asad Khan,
Xiaobo Huang,
Minyang Tian,
Maksim Levental,
Ryan Chard,
Wei Wei,
Maeve Heflin,
Daniel S. Katz,
Volodymyr Kindratenko,
Dawei Mu,
Ben Blaiszik,
Ian Foster
Abstract:
The development of reusable artificial intelligence (AI) models for wider use and rigorous validation by the community promises to unlock new opportunities in multi-messenger astrophysics. Here we develop a workflow that connects the Data and Learning Hub for Science, a repository for publishing AI models, with the Hardware Accelerated Learning (HAL) cluster, using funcX as a universal distributed…
▽ More
The development of reusable artificial intelligence (AI) models for wider use and rigorous validation by the community promises to unlock new opportunities in multi-messenger astrophysics. Here we develop a workflow that connects the Data and Learning Hub for Science, a repository for publishing AI models, with the Hardware Accelerated Learning (HAL) cluster, using funcX as a universal distributed computing service. Using this workflow, an ensemble of four openly available AI models can be run on HAL to process an entire month's worth (August 2017) of advanced Laser Interferometer Gravitational-Wave Observatory data in just seven minutes, identifying all four all four binary black hole mergers previously identified in this dataset and reporting no misclassifications. This approach combines advances in AI, distributed computing, and scientific data infrastructure to open new pathways to conduct reproducible, accelerated, data-driven discovery.
△ Less
Submitted 9 July, 2021; v1 submitted 15 December, 2020;
originally announced December 2020.
-
Convergence of Artificial Intelligence and High Performance Computing on NSF-supported Cyberinfrastructure
Authors:
E. A. Huerta,
Asad Khan,
Edward Davis,
Colleen Bushell,
William D. Gropp,
Daniel S. Katz,
Volodymyr Kindratenko,
Seid Koric,
William T. C. Kramer,
Brendan McGinty,
Kenton McHenry,
Aaron Saxton
Abstract:
Significant investments to upgrade and construct large-scale scientific facilities demand commensurate investments in R&D to design algorithms and computing approaches to enable scientific and engineering breakthroughs in the big data era. Innovative Artificial Intelligence (AI) applications have powered transformational solutions for big data challenges in industry and technology that now drive a…
▽ More
Significant investments to upgrade and construct large-scale scientific facilities demand commensurate investments in R&D to design algorithms and computing approaches to enable scientific and engineering breakthroughs in the big data era. Innovative Artificial Intelligence (AI) applications have powered transformational solutions for big data challenges in industry and technology that now drive a multi-billion dollar industry, and which play an ever increasing role sha** human social patterns. As AI continues to evolve into a computing paradigm endowed with statistical and mathematical rigor, it has become apparent that single-GPU solutions for training, validation, and testing are no longer sufficient for computational grand challenges brought about by scientific facilities that produce data at a rate and volume that outstrip the computing capabilities of available cyberinfrastructure platforms. This realization has been driving the confluence of AI and high performance computing (HPC) to reduce time-to-insight, and to enable a systematic study of domain-inspired AI architectures and optimization schemes to enable data-driven discovery. In this article we present a summary of recent developments in this field, and describe specific advances that authors in this article are spearheading to accelerate and streamline the use of HPC platforms to design and apply accelerated AI algorithms in academia and industry.
△ Less
Submitted 19 October, 2020; v1 submitted 18 March, 2020;
originally announced March 2020.
-
Enabling real-time multi-messenger astrophysics discoveries with deep learning
Authors:
E. A. Huerta,
Gabrielle Allen,
Igor Andreoni,
Javier M. Antelis,
Etienne Bachelet,
Bruce Berriman,
Federica Bianco,
Rahul Biswas,
Matias Carrasco,
Kyle Chard,
Minsik Cho,
Philip S. Cowperthwaite,
Zachariah B. Etienne,
Maya Fishbach,
Francisco Förster,
Daniel George,
Tom Gibbs,
Matthew Graham,
William Gropp,
Robert Gruendl,
Anushri Gupta,
Roland Haas,
Sarah Habib,
Elise Jennings,
Margaret W. G. Johnson
, et al. (35 additional authors not shown)
Abstract:
Multi-messenger astrophysics is a fast-growing, interdisciplinary field that combines data, which vary in volume and speed of data processing, from many different instruments that probe the Universe using different cosmic messengers: electromagnetic waves, cosmic rays, gravitational waves and neutrinos. In this Expert Recommendation, we review the key challenges of real-time observations of gravit…
▽ More
Multi-messenger astrophysics is a fast-growing, interdisciplinary field that combines data, which vary in volume and speed of data processing, from many different instruments that probe the Universe using different cosmic messengers: electromagnetic waves, cosmic rays, gravitational waves and neutrinos. In this Expert Recommendation, we review the key challenges of real-time observations of gravitational wave sources and their electromagnetic and astroparticle counterparts, and make a number of recommendations to maximize their potential for scientific discovery. These recommendations refer to the design of scalable and computationally efficient machine learning algorithms; the cyber-infrastructure to numerically simulate astrophysical sources, and to process and interpret multi-messenger astrophysics data; the management of gravitational wave detections to trigger real-time alerts for electromagnetic and astroparticle follow-ups; a vision to harness future developments of machine learning and cyber-infrastructure resources to cope with the big-data requirements; and the need to build a community of experts to realize the goals of multi-messenger astrophysics.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.
-
Deep Learning for Multi-Messenger Astrophysics: A Gateway for Discovery in the Big Data Era
Authors:
Gabrielle Allen,
Igor Andreoni,
Etienne Bachelet,
G. Bruce Berriman,
Federica B. Bianco,
Rahul Biswas,
Matias Carrasco Kind,
Kyle Chard,
Minsik Cho,
Philip S. Cowperthwaite,
Zachariah B. Etienne,
Daniel George,
Tom Gibbs,
Matthew Graham,
William Gropp,
Anushri Gupta,
Roland Haas,
E. A. Huerta,
Elise Jennings,
Daniel S. Katz,
Asad Khan,
Volodymyr Kindratenko,
William T. C. Kramer,
Xin Liu,
Ashish Mahabal
, et al. (23 additional authors not shown)
Abstract:
This report provides an overview of recent work that harnesses the Big Data Revolution and Large Scale Computing to address grand computational challenges in Multi-Messenger Astrophysics, with a particular emphasis on real-time discovery campaigns. Acknowledging the transdisciplinary nature of Multi-Messenger Astrophysics, this document has been prepared by members of the physics, astronomy, compu…
▽ More
This report provides an overview of recent work that harnesses the Big Data Revolution and Large Scale Computing to address grand computational challenges in Multi-Messenger Astrophysics, with a particular emphasis on real-time discovery campaigns. Acknowledging the transdisciplinary nature of Multi-Messenger Astrophysics, this document has been prepared by members of the physics, astronomy, computer science, data science, software and cyberinfrastructure communities who attended the NSF-, DOE- and NVIDIA-funded "Deep Learning for Multi-Messenger Astrophysics: Real-time Discovery at Scale" workshop, hosted at the National Center for Supercomputing Applications, October 17-19, 2018. Highlights of this report include unanimous agreement that it is critical to accelerate the development and deployment of novel, signal-processing algorithms that use the synergy between artificial intelligence (AI) and high performance computing to maximize the potential for scientific discovery with Multi-Messenger Astrophysics. We discuss key aspects to realize this endeavor, namely (i) the design and exploitation of scalable and computationally efficient AI algorithms for Multi-Messenger Astrophysics; (ii) cyberinfrastructure requirements to numerically simulate astrophysical sources, and to process and interpret Multi-Messenger Astrophysics data; (iii) management of gravitational wave detections and triggers to enable electromagnetic and astro-particle follow-ups; (iv) a vision to harness future developments of machine and deep learning and cyberinfrastructure resources to cope with the scale of discovery in the Big Data Era; (v) and the need to build a community that brings domain experts together with data scientists on equal footing to maximize and accelerate discovery in the nascent field of Multi-Messenger Astrophysics.
△ Less
Submitted 1 February, 2019;
originally announced February 2019.