-
RAMP: A Flat Nanosecond Optical Network and MPI Operations for Distributed Deep Learning Systems
Authors:
Alessandro Ottino,
Joshua Benjamin,
Georgios Zervas
Abstract:
Distributed deep learning (DDL) systems strongly depend on network performance. Current electronic packet switched (EPS) network architectures and technologies suffer from variable diameter topologies, low-bisection bandwidth and over-subscription affecting completion time of communication and collective operations.
We introduce a near-exascale, full-bisection bandwidth, all-to-all, single-hop,…
▽ More
Distributed deep learning (DDL) systems strongly depend on network performance. Current electronic packet switched (EPS) network architectures and technologies suffer from variable diameter topologies, low-bisection bandwidth and over-subscription affecting completion time of communication and collective operations.
We introduce a near-exascale, full-bisection bandwidth, all-to-all, single-hop, all-optical network architecture with nanosecond reconfiguration called RAMP, which supports large-scale distributed and parallel computing systems (12.8~Tbps per node for up to 65,536 nodes).
For the first time, a custom RAMP-x MPI strategy and a network transcoder is proposed to run MPI collective operations across the optical circuit switched (OCS) network in a schedule-less and contention-less manner. RAMP achieves 7.6-171$\times$ speed-up in completion time across all MPI operations compared to realistic EPS and OCS counterparts. It can also deliver a 1.3-16$\times$ and 7.8-58$\times$ reduction in Megatron and DLRM training time respectively} while offering 42-53$\times$ and 3.3-12.4$\times$ improvement in energy consumption and cost respectively.
△ Less
Submitted 24 February, 2023; v1 submitted 28 November, 2022;
originally announced November 2022.
-
One-shot, Offline and Production-Scalable PID Optimisation with Deep Reinforcement Learning
Authors:
Zacharaya Shabka,
Michael Enrico,
Nick Parsons,
Georgios Zervas
Abstract:
Proportional-integral-derivative (PID) control underlies more than $97\%$ of automated industrial processes. Controlling these processes effectively with respect to some specified set of performance goals requires finding an optimal set of PID parameters to moderate the PID loop. Tuning these parameters is a long and exhaustive process. A method (patent pending) based on deep reinforcement learnin…
▽ More
Proportional-integral-derivative (PID) control underlies more than $97\%$ of automated industrial processes. Controlling these processes effectively with respect to some specified set of performance goals requires finding an optimal set of PID parameters to moderate the PID loop. Tuning these parameters is a long and exhaustive process. A method (patent pending) based on deep reinforcement learning is presented that learns a relationship between generic system properties (e.g. resonance frequency), a multi-objective performance goal and optimal PID parameter values. Performance is demonstrated in the context of a real optical switching product of the foremost manufacturer of such devices globally. Switching is handled by piezoelectric actuators where switching time and optical loss are derived from the speed and stability of actuator-control processes respectively. The method achieves a $5\times$ improvement in the number of actuators that fall within the most challenging target switching speed, $\geq 20\%$ improvement in mean switching speed at the same optical loss and $\geq 75\%$ reduction in performance inconsistency when temperature varies between 5 and 73 degrees celcius. Furthermore, once trained (which takes $\mathcal{O}(hours)$), the model generates actuator-unique PID parameters in a one-shot inference process that takes $\mathcal{O}(ms)$ in comparison to up to $\mathcal{O}(week)$ required for conventional tuning methods, therefore accomplishing these performance improvements whilst achieving up to a $10^6\times$ speed-up. After training, the method can be applied entirely offline, incurring effectively zero optimisation-overhead in production.
△ Less
Submitted 25 October, 2022;
originally announced October 2022.
-
Random Walk for modelling Multi Core Fiber cross-talk and step distribution characterisation
Authors:
Alessandro Ottino,
Hui Yuan,
Yunnuo Xu,
Eric Sillekens,
Georgios Zervas
Abstract:
A novel random walk based model for inter-core cross-talk (IC-XT) characterization of multi-core fibres capable of accurately representing both time-domain distribution and frequency-domain representation of experimental IC-XT has been proposed. It was demonstrated that this model is a generalization of the most widely used model in literature to which it will converge when the number of samples a…
▽ More
A novel random walk based model for inter-core cross-talk (IC-XT) characterization of multi-core fibres capable of accurately representing both time-domain distribution and frequency-domain representation of experimental IC-XT has been proposed. It was demonstrated that this model is a generalization of the most widely used model in literature to which it will converge when the number of samples and measurement time-window tend to infinity. In addition, this model is consistent with statistical analysis such as short term average crosstalk (STAXT), kee** the same convergence properties and it showed to be almost independent to time-window. To validate this model, a new type of characterization of the IC-XT in the dB domain (based on a pseudo random walk) has been proposed and the statistical properties of its step distribution have been evaluated. The performed analysis showed that this characterization is capable of fitting every type of signal source with an accuracy above 99.3%. It also proved to be very robust to time-window length, temperature and other signal properties such as symbol rate and pseudo-random bit stream (PRBS) length. The obtained results suggest that the model was able to communicate most of the relevant information using a short observation time, making it suitable for IC-XT characterization and core-pair source signal classification. Using machine-learning (ML) techniques for source-signal classification, we empirically demonstrated that this technique carries more information regarding IC-XT than traditional statistical methods.
△ Less
Submitted 31 August, 2020;
originally announced August 2020.
-
Experimental Analysis on Variations and Accuracy of Crosstalk in Trench-Assisted Multi-core Fibers
Authors:
Hui Yuan,
Alessandro Ottino,
Yunnuo Xu,
Arsalan Saljoghei,
Tetsuya Hayashi,
Tetsuya Nakanishi,
Eric Sillekens,
Lidia Galdino,
Polina Bayvel,
Zhixin Liu,
Georgios Zervas
Abstract:
Space division multiplexing using multi-core fiber (MCF) is a promising solution to cope with the capacity crunch in standard single-mode fiber based optical communication systems. Nevertheless, the achievable capacity of MCF is limited by inter-core crosstalk (IC-XT). Many existing researches treat IC-XT as a static interference, however, recent research shows that IC-XT varies with time, wavelen…
▽ More
Space division multiplexing using multi-core fiber (MCF) is a promising solution to cope with the capacity crunch in standard single-mode fiber based optical communication systems. Nevertheless, the achievable capacity of MCF is limited by inter-core crosstalk (IC-XT). Many existing researches treat IC-XT as a static interference, however, recent research shows that IC-XT varies with time, wavelength and baud rate. This inherent stochastic feature requires a comprehensive characterization of the behaviour of MCF to its application in practical transmission systems and the theoretical understanding of IC-XT phenomenon. In this paper, we experimentally investigate the IC-XT behaviour of an 8-core trench-assisted MCF in a temperature-controlled environment, using popular modulation formats. We compare the measured results with the theoretical prediction to validate the analytical IC-XT models previously developed. Moreover, we explore the effects of the measurement configurations on the IC-XT accuracy and present an analysis on the IC-XT step distribution. Our results indicate that a number of transmission parameters have significant influence on the strength and volatility of IC-XT. Moreover, the averaging time of the power meter and the observation time window can affect the value of the observed IC-XT, the degrees of the effects vary with the type of the source signals.
△ Less
Submitted 7 August, 2020;
originally announced August 2020.
-
On the Relationship Between Network Topology and Throughput in Mesh Optical Networks
Authors:
Daniel Semrau,
Shahzaib Durrani,
Georgios Zervas,
Robert I. Killey,
Polina Bayvel
Abstract:
The relationship between topology and network throughput of arbitrarily-connected mesh networks is studied. Taking into account nonlinear channel properties, it is shown that throughput decreases logarithmically with physical network size with minor dependence on network ellipticity.
The relationship between topology and network throughput of arbitrarily-connected mesh networks is studied. Taking into account nonlinear channel properties, it is shown that throughput decreases logarithmically with physical network size with minor dependence on network ellipticity.
△ Less
Submitted 15 August, 2020;
originally announced August 2020.
-
Optimal Control of SOAs with Artificial Intelligence for Sub-Nanosecond Optical Switching
Authors:
Christopher W. F. Parsonson,
Zacharaya Shabka,
W. Konrad Chlupka,
Bawang Goh,
Georgios Zervas
Abstract:
Novel approaches to switching ultra-fast semiconductor optical amplifiers using artificial intelligence algorithms (particle swarm optimisation, ant colony optimisation, and a genetic algorithm) are developed and applied both in simulation and experiment. Effective off-on switching (settling) times of 542 ps are demonstrated with just 4.8% overshoot, achieving an order of magnitude improvement ove…
▽ More
Novel approaches to switching ultra-fast semiconductor optical amplifiers using artificial intelligence algorithms (particle swarm optimisation, ant colony optimisation, and a genetic algorithm) are developed and applied both in simulation and experiment. Effective off-on switching (settling) times of 542 ps are demonstrated with just 4.8% overshoot, achieving an order of magnitude improvement over previous attempts described in the literature and standard dampening techniques from control theory.
△ Less
Submitted 22 June, 2020;
originally announced June 2020.
-
SWIFT: Scalable Ultra-Wideband Sub-Nanosecond Wavelength Switching for Data Centre Networks
Authors:
Thomas Gerard,
Christopher Parsonson,
Zacharaya Shabka,
Polina Bayvel,
DomaniƧ Lavery,
Georgios Zervas
Abstract:
We propose a time-multiplexed DS-DBR/SOA-gated system to deliver low-power fast tuning across S-/C-/L-bands. Sub-ns switching is demonstrated, supporting 122$\times$50 GHz channels over 6.05 THz using AI techniques.
We propose a time-multiplexed DS-DBR/SOA-gated system to deliver low-power fast tuning across S-/C-/L-bands. Sub-ns switching is demonstrated, supporting 122$\times$50 GHz channels over 6.05 THz using AI techniques.
△ Less
Submitted 11 March, 2020;
originally announced March 2020.