-
Partitioning Distributed Compute Jobs with Reinforcement Learning and Graph Neural Networks
Authors:
Christopher W. F. Parsonson,
Zacharaya Shabka,
Alessandro Ottino,
Georgios Zervas
Abstract:
From natural language processing to genome sequencing, large-scale machine learning models are bringing advances to a broad range of fields. Many of these models are too large to be trained on a single machine, and instead must be distributed across multiple devices. This has motivated the research of new compute and network systems capable of handling such tasks. In particular, recent work has fo…
▽ More
From natural language processing to genome sequencing, large-scale machine learning models are bringing advances to a broad range of fields. Many of these models are too large to be trained on a single machine, and instead must be distributed across multiple devices. This has motivated the research of new compute and network systems capable of handling such tasks. In particular, recent work has focused on develo** management schemes which decide how to allocate distributed resources such that some overall objective, such as minimising the job completion time (JCT), is optimised. However, such studies omit explicit consideration of how much a job should be distributed, usually assuming that maximum distribution is desirable. In this work, we show that maximum parallelisation is sub-optimal in relation to user-critical metrics such as throughput and blocking rate. To address this, we propose PAC-ML (partitioning for asynchronous computing with machine learning). PAC-ML leverages a graph neural network and reinforcement learning to learn how much to partition computation graphs such that the number of jobs which meet arbitrary user-defined JCT requirements is maximised. In experiments with five real deep learning computation graphs on a recently proposed optical architecture across four user-defined JCT requirement distributions, we demonstrate PAC-ML achieving up to 56.2% lower blocking rates in dynamic job arrival settings than the canonical maximum parallelisation strategy used by most prior works.
△ Less
Submitted 31 January, 2023;
originally announced January 2023.
-
RAMP: A Flat Nanosecond Optical Network and MPI Operations for Distributed Deep Learning Systems
Authors:
Alessandro Ottino,
Joshua Benjamin,
Georgios Zervas
Abstract:
Distributed deep learning (DDL) systems strongly depend on network performance. Current electronic packet switched (EPS) network architectures and technologies suffer from variable diameter topologies, low-bisection bandwidth and over-subscription affecting completion time of communication and collective operations.
We introduce a near-exascale, full-bisection bandwidth, all-to-all, single-hop,…
▽ More
Distributed deep learning (DDL) systems strongly depend on network performance. Current electronic packet switched (EPS) network architectures and technologies suffer from variable diameter topologies, low-bisection bandwidth and over-subscription affecting completion time of communication and collective operations.
We introduce a near-exascale, full-bisection bandwidth, all-to-all, single-hop, all-optical network architecture with nanosecond reconfiguration called RAMP, which supports large-scale distributed and parallel computing systems (12.8~Tbps per node for up to 65,536 nodes).
For the first time, a custom RAMP-x MPI strategy and a network transcoder is proposed to run MPI collective operations across the optical circuit switched (OCS) network in a schedule-less and contention-less manner. RAMP achieves 7.6-171$\times$ speed-up in completion time across all MPI operations compared to realistic EPS and OCS counterparts. It can also deliver a 1.3-16$\times$ and 7.8-58$\times$ reduction in Megatron and DLRM training time respectively} while offering 42-53$\times$ and 3.3-12.4$\times$ improvement in energy consumption and cost respectively.
△ Less
Submitted 24 February, 2023; v1 submitted 28 November, 2022;
originally announced November 2022.
-
Random Walk for modelling Multi Core Fiber cross-talk and step distribution characterisation
Authors:
Alessandro Ottino,
Hui Yuan,
Yunnuo Xu,
Eric Sillekens,
Georgios Zervas
Abstract:
A novel random walk based model for inter-core cross-talk (IC-XT) characterization of multi-core fibres capable of accurately representing both time-domain distribution and frequency-domain representation of experimental IC-XT has been proposed. It was demonstrated that this model is a generalization of the most widely used model in literature to which it will converge when the number of samples a…
▽ More
A novel random walk based model for inter-core cross-talk (IC-XT) characterization of multi-core fibres capable of accurately representing both time-domain distribution and frequency-domain representation of experimental IC-XT has been proposed. It was demonstrated that this model is a generalization of the most widely used model in literature to which it will converge when the number of samples and measurement time-window tend to infinity. In addition, this model is consistent with statistical analysis such as short term average crosstalk (STAXT), kee** the same convergence properties and it showed to be almost independent to time-window. To validate this model, a new type of characterization of the IC-XT in the dB domain (based on a pseudo random walk) has been proposed and the statistical properties of its step distribution have been evaluated. The performed analysis showed that this characterization is capable of fitting every type of signal source with an accuracy above 99.3%. It also proved to be very robust to time-window length, temperature and other signal properties such as symbol rate and pseudo-random bit stream (PRBS) length. The obtained results suggest that the model was able to communicate most of the relevant information using a short observation time, making it suitable for IC-XT characterization and core-pair source signal classification. Using machine-learning (ML) techniques for source-signal classification, we empirically demonstrated that this technique carries more information regarding IC-XT than traditional statistical methods.
△ Less
Submitted 31 August, 2020;
originally announced August 2020.
-
Experimental Analysis on Variations and Accuracy of Crosstalk in Trench-Assisted Multi-core Fibers
Authors:
Hui Yuan,
Alessandro Ottino,
Yunnuo Xu,
Arsalan Saljoghei,
Tetsuya Hayashi,
Tetsuya Nakanishi,
Eric Sillekens,
Lidia Galdino,
Polina Bayvel,
Zhixin Liu,
Georgios Zervas
Abstract:
Space division multiplexing using multi-core fiber (MCF) is a promising solution to cope with the capacity crunch in standard single-mode fiber based optical communication systems. Nevertheless, the achievable capacity of MCF is limited by inter-core crosstalk (IC-XT). Many existing researches treat IC-XT as a static interference, however, recent research shows that IC-XT varies with time, wavelen…
▽ More
Space division multiplexing using multi-core fiber (MCF) is a promising solution to cope with the capacity crunch in standard single-mode fiber based optical communication systems. Nevertheless, the achievable capacity of MCF is limited by inter-core crosstalk (IC-XT). Many existing researches treat IC-XT as a static interference, however, recent research shows that IC-XT varies with time, wavelength and baud rate. This inherent stochastic feature requires a comprehensive characterization of the behaviour of MCF to its application in practical transmission systems and the theoretical understanding of IC-XT phenomenon. In this paper, we experimentally investigate the IC-XT behaviour of an 8-core trench-assisted MCF in a temperature-controlled environment, using popular modulation formats. We compare the measured results with the theoretical prediction to validate the analytical IC-XT models previously developed. Moreover, we explore the effects of the measurement configurations on the IC-XT accuracy and present an analysis on the IC-XT step distribution. Our results indicate that a number of transmission parameters have significant influence on the strength and volatility of IC-XT. Moreover, the averaging time of the power meter and the observation time window can affect the value of the observed IC-XT, the degrees of the effects vary with the type of the source signals.
△ Less
Submitted 7 August, 2020;
originally announced August 2020.
-
Experimental Demonstration of Learned Time-Domain Digital Back-Propagation
Authors:
Eric Sillekens,
Wenting Yi,
Daniel Semrau,
Alessandro Ottino,
Boris Karanov,
Sujie Zhou,
Kevin Law,
Jack Chen,
Domanic Lavery,
Lidia Galdino,
Polina Bayvel,
Robert I. Killey
Abstract:
We present the first experimental demonstration of learned time-domain digital back-propagation (DBP), in 64-GBd dual-polarization 64-QAM signal transmission over 1014 km. Performance gains were comparable to those obtained with conventional, higher complexity, frequency-domain DBP.
We present the first experimental demonstration of learned time-domain digital back-propagation (DBP), in 64-GBd dual-polarization 64-QAM signal transmission over 1014 km. Performance gains were comparable to those obtained with conventional, higher complexity, frequency-domain DBP.
△ Less
Submitted 23 December, 2019;
originally announced December 2019.