-
Experimental Observation of Space-Charge Field Screening of a Relativistic Particle Bunch in Plasma
Authors:
L. Verra,
M. Galletti,
R. Pompili,
A. Biagioni,
M. Carillo,
A. Cianchi,
L. Crincoli,
A. Curcio,
F. Demurtas,
G. Di Pirro,
V. Lollo,
G. Parise,
D. Pellegrini,
S. Romeo,
G. J. Silvi,
F. Villa,
M. Ferrario
Abstract:
The space-charge field of a relativistic charged bunch propagating in plasma is screened due to the presence of mobile charge carriers. We experimentally investigate such screening by measuring the effect of dielectric wakefields driven by the bunch in a uncoated dielectric capillary where the plasma is confined. We show that the plasma screens the space-charge field and therefore suppresses the d…
▽ More
The space-charge field of a relativistic charged bunch propagating in plasma is screened due to the presence of mobile charge carriers. We experimentally investigate such screening by measuring the effect of dielectric wakefields driven by the bunch in a uncoated dielectric capillary where the plasma is confined. We show that the plasma screens the space-charge field and therefore suppresses the dielectric wakefields when the distance between the bunch and the dielectric surface is much larger than the plasma skin depth. Before full screening is reached, the effects of dielectric and plasma wakefields are present simultaneously.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
APEIRON: composing smart TDAQ systems for high energy physics experiments
Authors:
Roberto Ammendola,
Andrea Biagioni,
Carlotta Chiarini,
Andrea Ciardiello,
Paolo Cretaro,
Ottorino Frezza,
Francesca Lo Cicero,
Alessandro Lonardo,
Michele Martinelli,
Pier Stanislao Paolucci,
Cristian Rossi,
Francesco Simula,
Matteo Turisini,
Piero Vicini
Abstract:
APEIRON is a framework encompassing the general architecture of a distributed heterogeneous processing platform and the corresponding software stack, from the low level device drivers up to the high level programming model. The framework is designed to be efficiently used for studying, prototy** and deploying smart trigger and data acquisition (TDAQ) systems for high energy physics experiments.
APEIRON is a framework encompassing the general architecture of a distributed heterogeneous processing platform and the corresponding software stack, from the low level device drivers up to the high level programming model. The framework is designed to be efficiently used for studying, prototy** and deploying smart trigger and data acquisition (TDAQ) systems for high energy physics experiments.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Design, optimization and experimental characterization of RF injectors for high brightness electron beams and plasma acceleration
Authors:
V. Shpakov,
D. Alesini,
M. P. Anania,
M. Behtouei,
B. Buonomo,
M. Bellaveglia,
A. Biagioni,
F. Cardelli,
M. Carillo,
E. Chiadroni,
A. Cianchi,
G. Costa,
M. Del Giorno,
L. Faillace,
M. Ferrario,
M. del Franco,
G. Franzini,
M. Galletti,
L. Giannessi,
A. Giribono,
A. Liedl,
V. Lollo,
A. Mostacci,
G. Di Pirro,
L. Piersanti
, et al. (8 additional authors not shown)
Abstract:
In this article, we share our experience related to the new photo-injector commissioning at the SPARC\_LAB test facility. The new photo-injector was installed into an existing machine and our goal was not only to improve the final beam parameters themselves but to improve the machine handling in day-to-day operations as well. Thus, besides the pure beam characterization, this article contains info…
▽ More
In this article, we share our experience related to the new photo-injector commissioning at the SPARC\_LAB test facility. The new photo-injector was installed into an existing machine and our goal was not only to improve the final beam parameters themselves but to improve the machine handling in day-to-day operations as well. Thus, besides the pure beam characterization, this article contains information about the improvements, that were introduced into the new photo-injector design from the machine maintenance point of view, and the benefits, that we gained by using the new technique to assemble the gun itself.
△ Less
Submitted 12 December, 2022;
originally announced December 2022.
-
HIKE, High Intensity Kaon Experiments at the CERN SPS
Authors:
E. Cortina Gil,
J. Jerhot,
N. Lurkin,
T. Numao,
B. Velghe,
V. W. S. Wong,
D. Bryman,
L. Bician,
Z. Hives,
T. Husek,
K. Kampf,
M. Koval,
A. T. Akmete,
R. Aliberti,
V. Büscher,
L. Di Lella,
N. Doble,
L. Peruzzo,
M. Schott,
H. Wahl,
R. Wanke,
B. Döbrich,
L. Montalto,
D. Rinaldi,
F. Dettori
, et al. (154 additional authors not shown)
Abstract:
A timely and long-term programme of kaon decay measurements at a new level of precision is presented, leveraging the capabilities of the CERN Super Proton Synchrotron (SPS). The proposed programme is firmly anchored on the experience built up studying kaon decays at the SPS over the past four decades, and includes rare processes, CP violation, dark sectors, symmetry tests and other tests of the St…
▽ More
A timely and long-term programme of kaon decay measurements at a new level of precision is presented, leveraging the capabilities of the CERN Super Proton Synchrotron (SPS). The proposed programme is firmly anchored on the experience built up studying kaon decays at the SPS over the past four decades, and includes rare processes, CP violation, dark sectors, symmetry tests and other tests of the Standard Model. The experimental programme is based on a staged approach involving experiments with charged and neutral kaon beams, as well as operation in beam-dump mode. The various phases will rely on a common infrastructure and set of detectors.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
Progress report on the online processing upgrade at the NA62 experiment
Authors:
M. Turisini,
R. Ammendola,
A. Biagioni,
A. Ciardiello,
P. Cretaro,
O. Frezza,
G. Lamanna,
F. Lo Cicero,
A. Lonardo,
M. Martinelli,
R. Piandani,
D. Soldi,
P. Vicini
Abstract:
A new FPGA-based low-level trigger processor has been installed at the NA62 experiment. It is intended to extend the features of its predecessor due to a faster interconnection technology and additional logic resources available on the new platform. With the aim of improving trigger selectivity and exploring new architectures for complex trigger computation, a GPU system has been developed and a n…
▽ More
A new FPGA-based low-level trigger processor has been installed at the NA62 experiment. It is intended to extend the features of its predecessor due to a faster interconnection technology and additional logic resources available on the new platform. With the aim of improving trigger selectivity and exploring new architectures for complex trigger computation, a GPU system has been developed and a neural network on FPGA is in progress. They both process data streams from the Ring Imaging Cherenkov detector of the experiment to extract in real time high level features for the trigger logic. Description of the systems, latest developments and design flows are reported in this paper.
△ Less
Submitted 8 February, 2022;
originally announced February 2022.
-
Architectural improvements and technological enhancements for the APEnet+ interconnect system
Authors:
R. Ammendola,
A. Biagioni,
O. Frezza,
A. Lonardo,
F. Lo Cicero,
M. Martinelli,
P. S. Paolucci,
E. Pastorelli,
D. Rossetti,
F. Simula,
L. Tosoratto,
P. Vicini
Abstract:
The APEnet+ board delivers a point-to-point, low-latency, 3D torus network interface card. In this paper we describe the latest generation of APEnet NIC, APEnet v5, integrated in a PCIe Gen3 board based on a state-of-the-art, 28 nm Altera Stratix V FPGA. The NIC features a network architecture designed following the Remote DMA paradigm and tailored to tightly bind the computing power of modern GPU…
▽ More
The APEnet+ board delivers a point-to-point, low-latency, 3D torus network interface card. In this paper we describe the latest generation of APEnet NIC, APEnet v5, integrated in a PCIe Gen3 board based on a state-of-the-art, 28 nm Altera Stratix V FPGA. The NIC features a network architecture designed following the Remote DMA paradigm and tailored to tightly bind the computing power of modern GPUs to the communication fabric. For the APEnet v5 board we show characterizing figures as achieved bandwidth and BER obtained by exploiting new high performance ALTERA transceivers and PCIe Gen3 compliancy.
△ Less
Submitted 4 January, 2022;
originally announced January 2022.
-
First emittance measurement of the beam-driven plasma wakefield accelerated electron beam
Authors:
V. Shpakov,
M. P. Anania,
M. Behtouei,
M. Bellaveglia,
A. Biagioni,
M. Cesarini,
E. Chiadroni,
A. Cianchi,
G. Costa,
M. Croia,
A. Del Dotto,
M. Diomede,
F. Dipace,
M. Ferrario,
M. Galletti,
A. Giribono,
A. Liedl,
V. Lollo,
L. Magnisi,
A. Mostacci,
G. Di Pirro,
L. Piersanti,
R. Pompili,
S. Romeo,
A. R. Rossi
, et al. (4 additional authors not shown)
Abstract:
Next-generation plasma-based accelerators can push electron beams to GeV energies within centimetre distances. The plasma, excited by a driver pulse, is indeed able to sustain huge electric fields that can efficiently accelerate a trailing witness bunch, which was experimentally demonstrated on multiple occasions. Thus, the main focus of the current research is being shifted towards achieving a hi…
▽ More
Next-generation plasma-based accelerators can push electron beams to GeV energies within centimetre distances. The plasma, excited by a driver pulse, is indeed able to sustain huge electric fields that can efficiently accelerate a trailing witness bunch, which was experimentally demonstrated on multiple occasions. Thus, the main focus of the current research is being shifted towards achieving a high quality of the beam after the plasma acceleration. In this letter we present beam-driven plasma wakefield acceleration experiment, where initially preformed high-quality witness beam was accelerated inside the plasma and characterized. In this experiment the witness beam quality after the acceleration was maintained on high level, with $0.2\%$ final energy spread and $3.8~μm$ resulting normalized transverse emittance after the acceleration. In this article, for the first time to our knowledge, the emittance of the PWFA beam was directly measured.
△ Less
Submitted 9 April, 2021;
originally announced April 2021.
-
Energy spread minimization in a beam-driven plasma wakefield accelerator
Authors:
R. Pompili,
M. P. Anania,
M. Behtouei,
M. Bellaveglia,
A. Biagioni,
F. G. Bisesto,
M. Cesarini,
E. Chiadroni,
A. Cianchi,
G. Costa,
M. Croia,
A. Del Dotto,
D. Di Giovenale,
M. Diomede,
F. Dipace,
M. Ferrario,
A. Giribono,
V. Lollo,
L. Magnisi,
M. Marongiu,
A. Mostacci,
G. Di Pirro,
S. Romeo,
A. R. Rossi,
J. Scifo
, et al. (4 additional authors not shown)
Abstract:
Next-generation plasma-based accelerators can push electron bunches to gigaelectronvolt energies within centimetre distances. The plasma, excited by a driver pulse, generates large electric fields that can efficiently accelerate a trailing witness bunch making possible the realization of laboratory-scale applications ranging from high-energy colliders to ultra-bright light sources. So far several…
▽ More
Next-generation plasma-based accelerators can push electron bunches to gigaelectronvolt energies within centimetre distances. The plasma, excited by a driver pulse, generates large electric fields that can efficiently accelerate a trailing witness bunch making possible the realization of laboratory-scale applications ranging from high-energy colliders to ultra-bright light sources. So far several experiments have demonstrated a significant acceleration but the resulting beam quality, especially the energy spread, is still far from state of the art conventional accelerators. Here we show the results of a beam-driven plasma acceleration experiment where we used an electron bunch as a driver followed by an ultra-short witness. The experiment demonstrates, for the first time, an innovative method to achieve an ultra-low energy spread of the accelerated witness of about 0.1%. This is an order of magnitude smaller than what has been obtained so far. The result can lead to a major breakthrough toward the optimization of the plasma acceleration process and its implementation in forthcoming compact machines for user-oriented applications.
△ Less
Submitted 2 June, 2020;
originally announced June 2020.
-
Event reconstruction for KM3NeT/ORCA using convolutional neural networks
Authors:
Sebastiano Aiello,
Arnauld Albert,
Sergio Alves Garre,
Zineb Aly,
Fabrizio Ameli,
Michel Andre,
Giorgos Androulakis,
Marco Anghinolfi,
Mancia Anguita,
Gisela Anton,
Miquel Ardid,
Julien Aublin,
Christos Bagatelas,
Giancarlo Barbarino,
Bruny Baret,
Suzan Basegmez du Pree,
Meriem Bendahman,
Edward Berbee,
Vincent Bertin,
Simone Biagi,
Andrea Biagioni,
Matthias Bissinger,
Markus Boettcher,
Jihad Boumaaza,
Mohammed Bouta
, et al. (207 additional authors not shown)
Abstract:
The KM3NeT research infrastructure is currently under construction at two locations in the Mediterranean Sea. The KM3NeT/ORCA water-Cherenkov neutrino detector off the French coast will instrument several megatons of seawater with photosensors. Its main objective is the determination of the neutrino mass ordering. This work aims at demonstrating the general applicability of deep convolutional neur…
▽ More
The KM3NeT research infrastructure is currently under construction at two locations in the Mediterranean Sea. The KM3NeT/ORCA water-Cherenkov neutrino detector off the French coast will instrument several megatons of seawater with photosensors. Its main objective is the determination of the neutrino mass ordering. This work aims at demonstrating the general applicability of deep convolutional neural networks to neutrino telescopes, using simulated datasets for the KM3NeT/ORCA detector as an example. To this end, the networks are employed to achieve reconstruction and classification tasks that constitute an alternative to the analysis pipeline presented for KM3NeT/ORCA in the KM3NeT Letter of Intent. They are used to infer event reconstruction estimates for the energy, the direction, and the interaction point of incident neutrinos. The spatial distribution of Cherenkov light generated by charged particles induced in neutrino interactions is classified as shower- or track-like, and the main background processes associated with the detection of atmospheric neutrinos are recognized. Performance comparisons to machine-learning classification and maximum-likelihood reconstruction algorithms previously developed for KM3NeT/ORCA are provided. It is shown that this application of deep convolutional neural networks to simulated datasets for a large-volume neutrino telescope yields competitive reconstruction results and performance improvements with respect to classical approaches.
△ Less
Submitted 17 April, 2020;
originally announced April 2020.
-
gSeaGen: the KM3NeT GENIE-based code for neutrino telescopes
Authors:
Sebastiano Aiello,
Arnauld Albert,
Sergio Alves Garre,
Zineb Aly,
Fabrizio Ameli,
Michel Andre,
Giorgos Androulakis,
Marco Anghinolfi,
Mancia Anguita,
Gisela Anton,
Miquel Ardid,
Julien Aublin,
Christos Bagatelas,
Giancarlo Barbarino,
Bruny Baret,
Suzan Basegmez du Pree,
Meriem Bendahman,
Edward Berbee,
Vincent Bertin,
Simone Biagi,
Andrea Biagioni,
Matthias Bissinger,
Markus Boettcher,
Jihad Boumaaza,
Simon Bourret
, et al. (211 additional authors not shown)
Abstract:
The gSeaGen code is a GENIE-based application developed to efficiently generate high statistics samples of events, induced by neutrino interactions, detectable in a neutrino telescope. The gSeaGen code is able to generate events induced by all neutrino flavours, considering topological differences between track-type and shower-like events. Neutrino interactions are simulated taking into account th…
▽ More
The gSeaGen code is a GENIE-based application developed to efficiently generate high statistics samples of events, induced by neutrino interactions, detectable in a neutrino telescope. The gSeaGen code is able to generate events induced by all neutrino flavours, considering topological differences between track-type and shower-like events. Neutrino interactions are simulated taking into account the density and the composition of the media surrounding the detector. The main features of gSeaGen are presented together with some examples of its application within the KM3NeT project.
△ Less
Submitted 31 March, 2020;
originally announced March 2020.
-
The Control Unit of the KM3NeT Data Acquisition System
Authors:
S. Aiello,
F. Ameli,
M. Andre,
G. Androulakis,
M. Anghinolfi,
G. Anton,
M. Ardid,
J. Aublin,
C. Bagatelas,
G. Barbarino,
B. Baret,
S. Basegmez du Pree,
M. Bendahman,
E. Berbee,
A. M. van den Berg,
V. Bertin,
S. Biagi,
A. Biagioni,
M. Bissinger,
J. Boumaaza,
S. Bourret,
M. Bouta,
G. Bouvet,
M. Bouwhuis,
C. Bozza
, et al. (195 additional authors not shown)
Abstract:
The KM3NeT Collaboration runs a multi-site neutrino observatory in the Mediterranean Sea. Water Cherenkov particle detectors, deep in the sea and far off the coasts of France and Italy, are already taking data while incremental construction progresses. Data Acquisition Control software is operating off-shore detectors as well as testing and qualification stations for their components. The software…
▽ More
The KM3NeT Collaboration runs a multi-site neutrino observatory in the Mediterranean Sea. Water Cherenkov particle detectors, deep in the sea and far off the coasts of France and Italy, are already taking data while incremental construction progresses. Data Acquisition Control software is operating off-shore detectors as well as testing and qualification stations for their components. The software, named Control Unit, is highly modular. It can undergo upgrades and reconfiguration with the acquisition running. Interplay with the central database of the Collaboration is obtained in a way that allows for data taking even if Internet links fail. In order to simplify the management of computing resources in the long term, and to cope with possible hardware failures of one or more computers, the KM3NeT Control Unit software features a custom dynamic resource provisioning and failover technology, which is especially important for ensuring continuity in case of rare transient events in multi-messenger astronomy. The software architecture relies on ubiquitous tools and broadly adopted technologies and has been successfully tested on several operating systems.
△ Less
Submitted 30 September, 2019;
originally announced October 2019.
-
KM3NeT front-end and readout electronics system: hardware, firmware and software
Authors:
The KM3NeT Collaboration,
S. Aiello,
F. Ameli,
M. Andre,
G. Androulakis,
M. Anghinolfi,
G. Anton,
M. Ardid,
J. Aublin,
C. Bagatelas,
G. Barbarino,
B. Baret,
S. Basegmez du Pree,
A. Belias,
E. Berbee,
A. M. van den Berg,
V. Bertin,
V. van Beveren,
S. Biagi,
A. Biagioni,
S. Bianucci,
M. Billault,
M. Bissinger,
P. Bos,
J. Boumaaza
, et al. (215 additional authors not shown)
Abstract:
The KM3NeT research infrastructure being built at the bottom of the Mediterranean Sea will host water-Cherenkov telescopes for the detection of cosmic neutrinos. The neutrino telescopes will consist of large volume three-dimensional grids of optical modules to detect the Cherenkov light from charged particles produced by neutrino-induced interactions. Each optical module houses 31 3-inch photomult…
▽ More
The KM3NeT research infrastructure being built at the bottom of the Mediterranean Sea will host water-Cherenkov telescopes for the detection of cosmic neutrinos. The neutrino telescopes will consist of large volume three-dimensional grids of optical modules to detect the Cherenkov light from charged particles produced by neutrino-induced interactions. Each optical module houses 31 3-inch photomultiplier tubes, instrumentation for calibration of the photomultiplier signal and positioning of the optical module and all associated electronics boards. By design, the total electrical power consumption of an optical module has been capped at seven watts. This paper presents an overview of the front-end and readout electronics system inside the optical module, which has been designed for a 1~ns synchronization between the clocks of all optical modules in the grid during a life time of at least 20 years.
△ Less
Submitted 29 July, 2019; v1 submitted 15 July, 2019;
originally announced July 2019.
-
Dependence of atmospheric muon flux on seawater depth measured with the first KM3NeT detection units
Authors:
KM3NeT Collaboration,
M. Ageron,
S. Aiello,
F. Ameli,
M. Andre,
G. Androulakis,
M. Anghinolfi,
G. Anton,
M. Ardid,
J. Aublin,
C. Bagatelas,
G. Barbarino,
B. Baret,
S. Basegmez du Pree,
A. Belias,
E. Berbee,
A. M. van den Berg,
V. Bertin,
V. van Beveren,
S. Biagi,
A. Biagioni,
S. Bianucci,
M. Billault,
M. Bissinger,
R. de Boer
, et al. (240 additional authors not shown)
Abstract:
KM3NeT is a research infrastructure located in the Mediterranean Sea, that will consist of two deep-sea Cherenkov neutrino detectors. With one detector (ARCA), the KM3NeT Collaboration aims at identifying and studying TeV-PeV astrophysical neutrino sources. With the other detector (ORCA), the neutrino mass ordering will be determined by studying GeV-scale atmospheric neutrino oscillations. The fir…
▽ More
KM3NeT is a research infrastructure located in the Mediterranean Sea, that will consist of two deep-sea Cherenkov neutrino detectors. With one detector (ARCA), the KM3NeT Collaboration aims at identifying and studying TeV-PeV astrophysical neutrino sources. With the other detector (ORCA), the neutrino mass ordering will be determined by studying GeV-scale atmospheric neutrino oscillations. The first KM3NeT detection units were deployed at the Italian and French sites between 2015 and 2017. In this paper, a description of the detector is presented, together with a summary of the procedures used to calibrate the detector in-situ. Finally, the measurement of the atmospheric muon flux between 2232-3386 m seawater depth is obtained.
△ Less
Submitted 4 February, 2020; v1 submitted 6 June, 2019;
originally announced June 2019.
-
The integrated low-level trigger and readout system of the CERN NA62 experiment
Authors:
R. Ammendola,
B. Angelucci,
M. Barbanera,
A. Biagioni,
V. Cerny,
B. Checcucci,
R. Fantechi,
F. Gonnella,
M. Koval,
M. Krivda,
G. Lamanna,
M. Lupi,
A. Lonardo,
A. Papi,
C. Parkinson,
E. Pedreschi,
P. Petrov,
R. Piandani,
J. Pinzino,
L. Pontisso,
M. Raggi,
D. Soldi,
M. S. Sozzi,
F. Spinella,
S. Venditti
, et al. (1 additional authors not shown)
Abstract:
The integrated low-level trigger and data acquisition (TDAQ) system of the NA62 experiment at CERN is described. The requirements of a large and fast data reduction in a high-rate environment for a medium-scale, distributed ensemble of many different sub-detectors led to the concept of a fully digital integrated system with good scaling capabilities. The NA62 TDAQ system is rather unique in allowi…
▽ More
The integrated low-level trigger and data acquisition (TDAQ) system of the NA62 experiment at CERN is described. The requirements of a large and fast data reduction in a high-rate environment for a medium-scale, distributed ensemble of many different sub-detectors led to the concept of a fully digital integrated system with good scaling capabilities. The NA62 TDAQ system is rather unique in allowing full flexibility on this scale, allowing in principle any information available from the detector to be used for triggering. The design concept, implementation and performances from the first years of running are illustrated.
△ Less
Submitted 25 March, 2019;
originally announced March 2019.
-
Longitudinal phase-space manipulation with beam-driven plasma wakefields
Authors:
V. Shpakov,
M. P. Anania,
M. Bellaveglia,
A. Biagioni,
F. Bisesto,
F. Cardelli,
M. Cesarini,
E. Chiadroni,
A. Cianchi,
G. Costa,
M. Croia,
A. DelDotto,
D. DiGiovenale,
M. Diomede,
M. Ferrario,
F. Filippi,
A. Giribono,
V. Lollo,
M. Marongiu,
V. Martinelli,
A. Mostacci,
L. Piersanti,
G. DiPirro,
R. Pompili,
S. Romeo
, et al. (4 additional authors not shown)
Abstract:
The development of compact accelerator facilities providing high-brightness beams is one of the most challenging tasks in field of next-generation compact and cost affordable particle accelerators, to be used in many fields for industrial, medical and research applications. The ability to shape the beam longitudinal phase-space, in particular, plays a key role to achieve high-peak brightness. Here…
▽ More
The development of compact accelerator facilities providing high-brightness beams is one of the most challenging tasks in field of next-generation compact and cost affordable particle accelerators, to be used in many fields for industrial, medical and research applications. The ability to shape the beam longitudinal phase-space, in particular, plays a key role to achieve high-peak brightness. Here we present a new approach that allows to tune the longitudinal phase-space of a high-brightness beam by means of a plasma wakefields. The electron beam passing through the plasma drives large wakefields that are used to manipulate the time-energy correlation of particles along the beam itself. We experimentally demonstrate that such solution is highly tunable by simply adjusting the density of the plasma and can be used to imprint or remove any correlation onto the beam. This is a fundamental requirement when dealing with largely time-energy correlated beams coming from future plasma accelerators.
△ Less
Submitted 21 February, 2019;
originally announced February 2019.
-
Real-time cortical simulations: energy and interconnect scaling on distributed systems
Authors:
Francesco Simula,
Elena Pastorelli,
Pier Stanislao Paolucci,
Michele Martinelli,
Alessandro Lonardo,
Andrea Biagioni,
Cristiano Capone,
Fabrizio Capuani,
Paolo Cretaro,
Giulia De Bonis,
Francesca Lo Cicero,
Luca Pontisso,
Piero Vicini,
Roberto Ammendola
Abstract:
We profile the impact of computation and inter-processor communication on the energy consumption and on the scaling of cortical simulations approaching the real-time regime on distributed computing platforms. Also, the speed and energy consumption of processor architectures typical of standard HPC and embedded platforms are compared. We demonstrate the importance of the design of low-latency inter…
▽ More
We profile the impact of computation and inter-processor communication on the energy consumption and on the scaling of cortical simulations approaching the real-time regime on distributed computing platforms. Also, the speed and energy consumption of processor architectures typical of standard HPC and embedded platforms are compared. We demonstrate the importance of the design of low-latency interconnect for speed and energy consumption. The cost of cortical simulations is quantified using the Joule per synaptic event metric on both architectures. Reaching efficient real-time on large scale cortical simulations is of increasing relevance for both future bio-inspired artificial intelligence applications and for understanding the cognitive functions of the brain, a scientific quest that will require to embed large scale simulations into highly complex virtual or real worlds. This work stands at the crossroads between the WaveScalES experiment in the Human Brain Project (HBP), which includes the objective of large scale thalamo-cortical simulations of brain states and their transitions, and the ExaNeSt and EuroExa projects, that investigate the design of an ARM-based, low-power High Performance Computing (HPC) architecture with a dedicated interconnect scalable to million of cores; simulation of deep sleep Slow Wave Activity (SWA) and Asynchronous aWake (AW) regimes expressed by thalamo-cortical models are among their benchmarks.
△ Less
Submitted 26 November, 2019; v1 submitted 12 December, 2018;
originally announced December 2018.
-
Sensitivity of the KM3NeT/ARCA neutrino telescope to point-like neutrino sources
Authors:
The KM3NeT Collaboration,
S. Aiello,
S. E. Akrame,
F. Ameli,
E. G. Anassontzis,
M. Andre,
G. Androulakis,
M. Anghinolfi,
G. Anton,
M. Ardid,
J. Aublin,
T. Avgitas,
C. Bagatelas,
G. Barbarino,
B. Baret,
J. Barrios-Martí,
A. Belias,
E. Berbee,
A. van den Berg,
V. Bertin,
S. Biagi,
A. Biagioni,
C. Biernoth,
J. Boumaaza,
S. Bourret
, et al. (197 additional authors not shown)
Abstract:
KM3NeT will be a network of deep-sea neutrino telescopes in the Mediterranean Sea. The KM3NeT/ARCA detector, to be installed at the Capo Passero site (Italy), is optimised for the detection of high-energy neutrinos of cosmic origin. Thanks to its geographical location on the Northern hemisphere, KM3NeT/ARCA can observe upgoing neutrinos from most of the Galactic Plane, including the Galactic Centr…
▽ More
KM3NeT will be a network of deep-sea neutrino telescopes in the Mediterranean Sea. The KM3NeT/ARCA detector, to be installed at the Capo Passero site (Italy), is optimised for the detection of high-energy neutrinos of cosmic origin. Thanks to its geographical location on the Northern hemisphere, KM3NeT/ARCA can observe upgoing neutrinos from most of the Galactic Plane, including the Galactic Centre. Given its effective area and excellent pointing resolution, KM3NeT/ARCA will measure or significantly constrain the neutrino flux from potential astrophysical neutrino sources. At the same time, it will test flux predictions based on gamma-ray measurements and the assumption that the gamma-ray flux is of hadronic origin. Assuming this scenario, discovery potentials and sensitivities for a selected list of Galactic sources and to generic point sources with an $E^{-2}$ spectrum are presented. These spectra are assumed to be time independent. The results indicate that an observation with $3σ$ significance is possible in about six years of operation for the most intense sources, such as Supernovae Remnants RX\,J1713.7-3946 and Vela Jr. If no signal will be found during this time, the fraction of the gamma-ray flux coming from hadronic processes can be constrained to be below 50\% for these two objects.
△ Less
Submitted 2 April, 2019; v1 submitted 19 October, 2018;
originally announced October 2018.
-
Search for $K^{+}\rightarrowπ^{+}ν\overlineν$ at NA62
Authors:
NA62 Collaboration,
G. Aglieri Rinella,
R. Aliberti,
F. Ambrosino,
R. Ammendola,
B. Angelucci,
A. Antonelli,
G. Anzivino,
R. Arcidiacono,
I. Azhinenko,
S. Balev,
M. Barbanera,
J. Bendotti,
A. Biagioni,
L. Bician,
C. Biino,
A. Bizzeti,
T. Blazek,
A. Blik,
B. Bloch-Devaux,
V. Bolotov,
V. Bonaiuto,
M. Boretto,
M. Bragadireanu,
D. Britton
, et al. (227 additional authors not shown)
Abstract:
$K^{+}\rightarrowπ^{+}ν\overlineν$ is one of the theoretically cleanest meson decay where to look for indirect effects of new physics complementary to LHC searches. The NA62 experiment at CERN SPS is designed to measure the branching ratio of this decay with 10\% precision. NA62 took data in pilot runs in 2014 and 2015 reaching the final designed beam intensity. The quality of 2015 data acquired,…
▽ More
$K^{+}\rightarrowπ^{+}ν\overlineν$ is one of the theoretically cleanest meson decay where to look for indirect effects of new physics complementary to LHC searches. The NA62 experiment at CERN SPS is designed to measure the branching ratio of this decay with 10\% precision. NA62 took data in pilot runs in 2014 and 2015 reaching the final designed beam intensity. The quality of 2015 data acquired, in view of the final measurement, will be presented.
△ Less
Submitted 24 July, 2018;
originally announced July 2018.
-
Large Scale Low Power Computing System - Status of Network Design in ExaNeSt and EuroExa Projects
Authors:
Roberto Ammendola,
Andrea Biagioni,
Fabrizio Capuani,
Paolo Cretaro,
Giulia De Bonis,
Francesca Lo Cicero,
Alessandro Lonardo,
Michele Martinelli,
Pier Stanislao Paolucci,
Elena Pastorelli,
Luca Pontisso,
Francesco Simula,
Piero Vicini
Abstract:
The deployment of the next generation computing platform at ExaFlops scale requires to solve new technological challenges mainly related to the impressive number (up to 10^6) of compute elements required. This impacts on system power consumption, in terms of feasibility and costs, and on system scalability and computing efficiency. In this perspective analysis, exploration and evaluation of techno…
▽ More
The deployment of the next generation computing platform at ExaFlops scale requires to solve new technological challenges mainly related to the impressive number (up to 10^6) of compute elements required. This impacts on system power consumption, in terms of feasibility and costs, and on system scalability and computing efficiency. In this perspective analysis, exploration and evaluation of technologies characterized by low power, high efficiency and high degree of customization is strongly needed. Among the various European initiative targeting the design of ExaFlops system, ExaNeSt and EuroExa are EU-H2020 funded initiatives leveraging on high end MPSoC FPGAs. Last generation MPSoC FPGAs can be seen as non-mainstream but powerful HPC Exascale enabling components thanks to the integration of embedded multi-core, ARM-based low power CPUs and a huge number of hardware resources usable to co-design application oriented accelerators and to develop a low latency high bandwidth network architecture. In this paper we introduce ExaNet the FPGA-based, scalable, direct network architecture of ExaNeSt system. ExaNet allow us to explore different interconnection topologies, to evaluate advanced routing functions for congestion control and fault tolerance and to design specific hardware components for acceleration of collective operations. After a brief introduction of the motivations and goals of ExaNeSt and EuroExa projects, we will report on the status of network architecture design and its hardware/software testbed adding preliminary bandwidth and latency achievements.
△ Less
Submitted 11 April, 2018;
originally announced April 2018.
-
The Brain on Low Power Architectures - Efficient Simulation of Cortical Slow Waves and Asynchronous States
Authors:
Roberto Ammendola,
Andrea Biagioni,
Fabrizio Capuani,
Paolo Cretaro,
Giulia De Bonis,
Francesca Lo Cicero,
Alessandro Lonardo,
Michele Martinelli,
Pier Stanislao Paolucci,
Elena Pastorelli,
Luca Pontisso,
Francesco Simula,
Piero Vicini
Abstract:
Efficient brain simulation is a scientific grand challenge, a parallel/distributed coding challenge and a source of requirements and suggestions for future computing architectures. Indeed, the human brain includes about 10^15 synapses and 10^11 neurons activated at a mean rate of several Hz. Full brain simulation poses Exascale challenges even if simulated at the highest abstraction level. The Wav…
▽ More
Efficient brain simulation is a scientific grand challenge, a parallel/distributed coding challenge and a source of requirements and suggestions for future computing architectures. Indeed, the human brain includes about 10^15 synapses and 10^11 neurons activated at a mean rate of several Hz. Full brain simulation poses Exascale challenges even if simulated at the highest abstraction level. The WaveScalES experiment in the Human Brain Project (HBP) has the goal of matching experimental measures and simulations of slow waves during deep-sleep and anesthesia and the transition to other brain states. The focus is the development of dedicated large-scale parallel/distributed simulation technologies. The ExaNeSt project designs an ARM-based, low-power HPC architecture scalable to million of cores, develo** a dedicated scalable interconnect system, and SWA/AW simulations are included among the driving benchmarks. At the joint between both projects is the INFN proprietary Distributed and Plastic Spiking Neural Networks (DPSNN) simulation engine. DPSNN can be configured to stress either the networking or the computation features available on the execution platforms. The simulation stresses the networking component when the neural net - composed by a relatively low number of neurons, each one projecting thousands of synapses - is distributed over a large number of hardware cores. When growing the number of neurons per core, the computation starts to be the dominating component for short range connections. This paper reports about preliminary performance results obtained on an ARM-based HPC prototype developed in the framework of the ExaNeSt project. Furthermore, a comparison is given of instantaneous power, total energy consumption, execution time and energetic cost per synaptic event of SWA/AW DPSNN simulations when executed on either ARM- or Intel-based server platforms.
△ Less
Submitted 10 April, 2018;
originally announced April 2018.
-
Gaussian and exponential lateral connectivity on distributed spiking neural network simulation
Authors:
Elena Pastorelli,
Pier Stanislao Paolucci,
Francesco Simula,
Andrea Biagioni,
Fabrizio Capuani,
Paolo Cretaro,
Giulia De Bonis,
Francesca Lo Cicero,
Alessandro Lonardo,
Michele Martinelli,
Luca Pontisso,
Piero Vicini,
Roberto Ammendola
Abstract:
We measured the impact of long-range exponentially decaying intra-areal lateral connectivity on the scaling and memory occupation of a distributed spiking neural network simulator compared to that of short-range Gaussian decays. While previous studies adopted short-range connectivity, recent experimental neurosciences studies are pointing out the role of longer-range intra-areal connectivity with…
▽ More
We measured the impact of long-range exponentially decaying intra-areal lateral connectivity on the scaling and memory occupation of a distributed spiking neural network simulator compared to that of short-range Gaussian decays. While previous studies adopted short-range connectivity, recent experimental neurosciences studies are pointing out the role of longer-range intra-areal connectivity with implications on neural simulation platforms. Two-dimensional grids of cortical columns composed by up to 11 M point-like spiking neurons with spike frequency adaption were connected by up to 30 G synapses using short- and long-range connectivity models. The MPI processes composing the distributed simulator were run on up to 1024 hardware cores, hosted on a 64 nodes server platform. The hardware platform was a cluster of IBM NX360 M5 16-core compute nodes, each one containing two Intel Xeon Haswell 8-core E5-2630 v3 processors, with a clock of 2.40 G Hz, interconnected through an InfiniBand network, equipped with 4x QDR switches.
△ Less
Submitted 19 February, 2019; v1 submitted 23 March, 2018;
originally announced March 2018.
-
Plasma ramps caused by outflow in gas-filled capillaries
Authors:
F. Filippi,
M. P. Anania,
A. Biagioni,
E. Brentegani,
E. Chiadroni,
A. Cianchi,
M. Ferrario,
A. Marocchino,
A. Zigler
Abstract:
Plasma confinement inside capillaries has been developed in the past years for plasma-based acceleration to ensure a stable and repeatable plasma density distribution during the interaction with either particles or laser beams. In particular, gas-filled capillaries allow a stable and almost predictable plasma distribution along the interaction with the particles. However, the plasma ejected throug…
▽ More
Plasma confinement inside capillaries has been developed in the past years for plasma-based acceleration to ensure a stable and repeatable plasma density distribution during the interaction with either particles or laser beams. In particular, gas-filled capillaries allow a stable and almost predictable plasma distribution along the interaction with the particles. However, the plasma ejected through the ends of the capillary interacts with the beam before the inner plasma, affecting the quality of the beam. In this article we report the measurements on the evolution of the plasma flow at the two ends of a 1 cm long, 1 mm diameter capillary filled with hydrogen. In particular, we measured the longitudinal density distribution and the expansion velocity of the plasma outside the capillary. This study will allow a better understanding of the beam-plasma interaction for future plasma-based experiments.
△ Less
Submitted 27 February, 2018;
originally announced February 2018.
-
Overview of Plasma Lens Experiments and Recent Results at SPARC_LAB
Authors:
E. Chiadroni,
M. P. Anania,
M. Bellaveglia,
A. Biagioni,
F. Bisesto,
E. Brentegani,
F. Cardelli,
A. Cianchi,
G. Costa,
D. Di Giovenale,
G. Di Pirro,
M. Ferrario,
F. Filippi,
A. Gallo,
A. Giribono,
A. Marocchino,
A. Mostacci,
L. Piersanti,
R. Pompili,
J. B. Rosenzweig,
A. R. Rossi,
J. Scifo,
V. Shpakov,
C. Vaccarezza,
F. Villa
, et al. (1 additional authors not shown)
Abstract:
Beam injection and extraction from a plasma module is still one of the crucial aspects to solve in order to produce high quality electron beams with a plasma accelerator. Proper matching conditions require to focus the incoming high brightness beam down to few microns size and to capture a high divergent beam at the exit without loss of beam quality. Plasma-based lenses have proven to provide focu…
▽ More
Beam injection and extraction from a plasma module is still one of the crucial aspects to solve in order to produce high quality electron beams with a plasma accelerator. Proper matching conditions require to focus the incoming high brightness beam down to few microns size and to capture a high divergent beam at the exit without loss of beam quality. Plasma-based lenses have proven to provide focusing gradients of the order of kT/m with radially symmetric focusing thus promising compact and affordable alternative to permanent magnets in the design of transport lines. In this paper an overview of recent experiments and future perspectives of plasma lenses is reported.
△ Less
Submitted 1 February, 2018;
originally announced February 2018.
-
EuPRAXIA@SPARC_LAB Design study towards a compact FEL facility at LNF
Authors:
M. Ferrario,
D. Alesini,
M. P. Anania,
M. Artioli,
A. Bacci,
S. Bartocci,
R. Bedogni,
M. Bellaveglia,
A. Biagioni,
F. Bisesto,
F. Brandi,
E. Brentegani,
F. Broggi,
B. Buonomo,
P. L. Campana,
G. Campogiani,
C. Cannaos,
S. Cantarella,
F. Cardelli,
M. Carpanese,
M. Castellano,
G. Castorina,
N. Catalan Lasheras,
E. Chiadroni,
A. Cianchi
, et al. (95 additional authors not shown)
Abstract:
On the wake of the results obtained so far at the SPARC\_LAB test-facility at the Laboratori Nazionali di Frascati (Italy), we are currently investigating the possibility to design and build a new multi-disciplinary user-facility, equipped with a soft X-ray Free Electron Laser (FEL) driven by a $\sim$1 GeV high brightness linac based on plasma accelerator modules. This design study is performed in…
▽ More
On the wake of the results obtained so far at the SPARC\_LAB test-facility at the Laboratori Nazionali di Frascati (Italy), we are currently investigating the possibility to design and build a new multi-disciplinary user-facility, equipped with a soft X-ray Free Electron Laser (FEL) driven by a $\sim$1 GeV high brightness linac based on plasma accelerator modules. This design study is performed in synergy with the EuPRAXIA design study. In this paper we report about the recent progresses in the on going design study of the new facility.
△ Less
Submitted 26 January, 2018;
originally announced January 2018.
-
Recent results at SPARC_LAB
Authors:
R. Pompili,
M. P. Anania,
M. Bellaveglia,
A. Biagioni,
S. Bini,
F. Bisesto,
E. Chiadroni,
A. Cianchi,
G. Costa,
D. Di Giovenale,
M. Ferrario,
F. Filippi,
A. Gallo,
A. Giribono,
V. Lollo,
A. Marocchino,
V. Martinelli,
A. Mostacci,
G. Di Pirro,
S. Romeo,
J. Scifo,
V. Shpakov,
C. Vaccarezza,
F. Villa,
A. Zigler
Abstract:
The current activity of the SPARC_LAB test-facility is focused on the realization of plasma-based acceleration experiments with the aim to provide accelerating field of the order of several GV/m while maintaining the overall quality (in terms of energy spread and emittance) of the accelerated electron bunch. In the following, the current status of such an activity is presented. We also show result…
▽ More
The current activity of the SPARC_LAB test-facility is focused on the realization of plasma-based acceleration experiments with the aim to provide accelerating field of the order of several GV/m while maintaining the overall quality (in terms of energy spread and emittance) of the accelerated electron bunch. In the following, the current status of such an activity is presented. We also show results related to the usability of plasmas as focusing lenses in view of a complete plasma-based focusing and accelerating system.
△ Less
Submitted 18 January, 2018;
originally announced January 2018.
-
Wake fields effects in dielectric capillary
Authors:
A. Biagioni,
M. P. Anania,
M. Bellaveglia,
E. Brentegani,
G. Castorina,
E. Chiadroni,
A. Cianchi,
D. Di Giovenale,
G. Di Pirro,
H. Fares,
L. Ficcadenti,
F. Filippi,
M. Ferrario,
A. Mostacci,
R. Pompili,
J. Scifo,
B. Spataro,
C. Vaccarezza,
F. Villa,
A. Zigler
Abstract:
Plasma wake-field acceleration experiments are performed at the SPARC LAB test facility by using a gas-filled capillary plasma source composed of a dielectric capillary. The electron can reach GeV energy in a few centimeters, with an accelerating gradient orders of magnitude larger than provided by conventional techniques. In this acceleration scheme, wake fields produced by passing electron beams…
▽ More
Plasma wake-field acceleration experiments are performed at the SPARC LAB test facility by using a gas-filled capillary plasma source composed of a dielectric capillary. The electron can reach GeV energy in a few centimeters, with an accelerating gradient orders of magnitude larger than provided by conventional techniques. In this acceleration scheme, wake fields produced by passing electron beams through dielectric structures can determine a strong beam instability that represents an important hurdle towards the capability to focus high-current electron beams in the transverse plane. For these reasons, the estimation of the transverse wakefield amplitudes assumes a fundamental role in the implementation of the plasma wake-field acceleration. In this work, it presented a study to investigate which parameters affect the wake-field formation inside a cylindrical dielectric structure, both the capillary dimensions and the beam parameters, and it is introduced a quantitative evaluation of the longitudinal and transverse electric fields.
△ Less
Submitted 12 January, 2018;
originally announced January 2018.
-
Nano-machining, surface analysis and emittance measurements of a copper photocathode at SPARC_LAB
Authors:
J. Scifo,
D. Alesini,
M. P. Anania,
M. Bellaveglia,
S. Bellucci,
A. Biagioni,
F. Bisesto,
F. Cardelli,
E. Chiadroni,
A. Cianchi,
G. Costa,
D. Di Giovenale,
G. Di Pirro,
R. Di Raddo,
D. H. Dowell,
M. Ferrario,
A. Giribono,
A. Lorusso,
F. Micciulla,
A. Mostacci,
D. Passeri,
A. Perrone,
L. Piersanti,
R. Pompili,
V. Shpakov
, et al. (3 additional authors not shown)
Abstract:
R\&D activity on Cu photocathodes is under development at the SPARC\_LAB test facility to fully characterize each stage of the photocathode "life" and to have a complete overview of the photoemission properties in high brightness photo-injectors. The nano(n)-machining process presented here consists in diamond milling, and blowing with dry nitrogen. This procedure reduces the roughness of the cath…
▽ More
R\&D activity on Cu photocathodes is under development at the SPARC\_LAB test facility to fully characterize each stage of the photocathode "life" and to have a complete overview of the photoemission properties in high brightness photo-injectors. The nano(n)-machining process presented here consists in diamond milling, and blowing with dry nitrogen. This procedure reduces the roughness of the cathode surface and prevents surface contamination introduced by other techniques, such as polishing with diamond paste or the machining with oil. Both high roughness and surface contamination cause an increase of intrinsic emittance and consequently a reduction of the overall electron beam brightness. To quantify these effects, we have characterized the photocathode surface in terms of roughness measurement, and morphology and chemical composition analysis by means of Scanning Electron Microscopy (SEM), Energy Dispersive Spectroscopy (EDS), and Atomic Force Microscopy (AFM) techniques. The effects of n-machining on the electron beam quality have been also investigated through emittance measurements before and after the surface processing technique. Finally, we present preliminary emittance studies of yttrium thin film on Cu photocathodes.
△ Less
Submitted 11 January, 2018;
originally announced January 2018.
-
Intrinsic limits on resolutions in muon- and electron-neutrino charged-current events in the KM3NeT/ORCA detector
Authors:
S. Adrián-Martínez,
M. Ageron,
S. Aiello,
A. Albert,
F. Ameli,
E. G. Anassontzis,
M. Andre,
G. Androulakis,
M. Anghinolfi,
G. Anton,
M. Ardid,
T. Avgitas,
G. Barbarino,
E. Barbarito,
B. Baret,
J. Barrios-Martí,
A. Belias,
E. Berbee,
A. van den Berg,
V. Bertin,
S. Beurthey,
V. van Beveren,
N. Beverini,
S. Biagi,
A. Biagioni
, et al. (228 additional authors not shown)
Abstract:
Studying atmospheric neutrino oscillations in the few-GeV range with a multimegaton detector promises to determine the neutrino mass hierarchy. This is the main science goal pursued by the future KM3NeT/ORCA water Cherenkov detector in the Mediterranean Sea. In this paper, the processes that limit the obtainable resolution in both energy and direction in charged-current neutrino events in the ORCA…
▽ More
Studying atmospheric neutrino oscillations in the few-GeV range with a multimegaton detector promises to determine the neutrino mass hierarchy. This is the main science goal pursued by the future KM3NeT/ORCA water Cherenkov detector in the Mediterranean Sea. In this paper, the processes that limit the obtainable resolution in both energy and direction in charged-current neutrino events in the ORCA detector are investigated. These processes include the composition of the hadronic fragmentation products, the subsequent particle propagation and the photon-sampling fraction of the detector. GEANT simulations of neutrino interactions in seawater produced by GENIE are used to study the effects in the 1 - 20 GeV range. It is found that fluctuations in the hadronic cascade in conjunction with the variation of the inelasticity y are most detrimental to the resolutions. The effect of limited photon sampling in the detector is of significantly less importance. These results will therefore also be applicable to similar detectors/media, such as those in ice.
△ Less
Submitted 19 May, 2017; v1 submitted 29 November, 2016;
originally announced December 2016.
-
GPU-based Real-time Triggering in the NA62 Experiment
Authors:
R. Ammendola,
A. Biagioni,
P. Cretaro,
S. Di Lorenzo,
R. Fantechi,
M. Fiorini,
O. Frezza,
G. Lamanna,
F. Lo Cicero,
A. Lonardo,
M. Martinelli,
I. Neri,
P. S. Paolucci,
E. Pastorelli,
R. Piandani,
L. Pontisso,
D. Rossetti,
F. Simula,
M. Sozzi,
P. Vicini
Abstract:
Over the last few years the GPGPU (General-Purpose computing on Graphics Processing Units) paradigm represented a remarkable development in the world of computing. Computing for High-Energy Physics is no exception: several works have demonstrated the effectiveness of the integration of GPU-based systems in high level trigger of different experiments. On the other hand the use of GPUs in the low le…
▽ More
Over the last few years the GPGPU (General-Purpose computing on Graphics Processing Units) paradigm represented a remarkable development in the world of computing. Computing for High-Energy Physics is no exception: several works have demonstrated the effectiveness of the integration of GPU-based systems in high level trigger of different experiments. On the other hand the use of GPUs in the low level trigger systems, characterized by stringent real-time constraints, such as tight time budget and high throughput, poses several challenges. In this paper we focus on the low level trigger in the CERN NA62 experiment, investigating the use of real-time computing on GPUs in this synchronous system. Our approach aimed at harvesting the GPU computing power to build in real-time refined physics-related trigger primitives for the RICH detector, as the the knowledge of Cerenkov rings parameters allows to build stringent conditions for data selection at trigger level. Latencies of all components of the trigger chain have been analyzed, pointing out that networking is the most critical one. To keep the latency of data transfer task under control, we devised NaNet, an FPGA-based PCIe Network Interface Card (NIC) with GPUDirect capabilities. For the processing task, we developed specific multiple ring trigger algorithms to leverage the parallel architecture of GPUs and increase the processing throughput to keep up with the high event rate. Results obtained during the first months of 2016 NA62 run are presented and discussed.
△ Less
Submitted 13 June, 2016;
originally announced June 2016.
-
Letter of Intent for KM3NeT 2.0
Authors:
S. Adrián-Martínez,
M. Ageron,
F. Aharonian,
S. Aiello,
A. Albert,
F. Ameli,
E. Anassontzis,
M. Andre,
G. Androulakis,
M. Anghinolfi,
G. Anton,
M. Ardid,
T. Avgitas,
G. Barbarino,
E. Barbarito,
B. Baret,
J. Barrios-Martí,
B. Belhorma,
A. Belias,
E. Berbee,
A. van den Berg,
V. Bertin,
S. Beurthey,
V. van Beveren,
N. Beverini
, et al. (222 additional authors not shown)
Abstract:
The main objectives of the KM3NeT Collaboration are i) the discovery and subsequent observation of high-energy neutrino sources in the Universe and ii) the determination of the mass hierarchy of neutrinos. These objectives are strongly motivated by two recent important discoveries, namely: 1) The high-energy astrophysical neutrino signal reported by IceCube and 2) the sizable contribution of elect…
▽ More
The main objectives of the KM3NeT Collaboration are i) the discovery and subsequent observation of high-energy neutrino sources in the Universe and ii) the determination of the mass hierarchy of neutrinos. These objectives are strongly motivated by two recent important discoveries, namely: 1) The high-energy astrophysical neutrino signal reported by IceCube and 2) the sizable contribution of electron neutrinos to the third neutrino mass eigenstate as reported by Daya Bay, Reno and others. To meet these objectives, the KM3NeT Collaboration plans to build a new Research Infrastructure consisting of a network of deep-sea neutrino telescopes in the Mediterranean Sea. A phased and distributed implementation is pursued which maximises the access to regional funds, the availability of human resources and the synergetic opportunities for the earth and sea sciences community. Three suitable deep-sea sites are identified, namely off-shore Toulon (France), Capo Passero (Italy) and Pylos (Greece). The infrastructure will consist of three so-called building blocks. A building block comprises 115 strings, each string comprises 18 optical modules and each optical module comprises 31 photo-multiplier tubes. Each building block thus constitutes a 3-dimensional array of photo sensors that can be used to detect the Cherenkov light produced by relativistic particles emerging from neutrino interactions. Two building blocks will be configured to fully explore the IceCube signal with different methodology, improved resolution and complementary field of view, including the Galactic plane. One building block will be configured to precisely measure atmospheric neutrino oscillations.
△ Less
Submitted 26 July, 2016; v1 submitted 27 January, 2016;
originally announced January 2016.
-
Impact of exponential long range and Gaussian short range lateral connectivity on the distributed simulation of neural networks including up to 30 billion synapses
Authors:
Elena Pastorelli,
Pier Stanislao Paolucci,
Roberto Ammendola,
Andrea Biagioni,
Ottorino Frezza,
Francesca Lo Cicero,
Alessandro Lonardo,
Michele Martinelli,
Francesco Simula,
Piero Vicini
Abstract:
Recent experimental neuroscience studies are pointing out the role of long-range intra-areal connectivity that can be modeled by a distance dependent exponential decay of the synaptic probability distribution. This short report provides a preliminary measure of the impact of exponentially decaying lateral connectivity compared to that of shorter-range Gaussian decays on the scaling behaviour and m…
▽ More
Recent experimental neuroscience studies are pointing out the role of long-range intra-areal connectivity that can be modeled by a distance dependent exponential decay of the synaptic probability distribution. This short report provides a preliminary measure of the impact of exponentially decaying lateral connectivity compared to that of shorter-range Gaussian decays on the scaling behaviour and memory occupation of a distributed spiking neural network simulator (DPSNN). Two-dimensional grids of cortical columns composed by point-like spiking neurons have been connected by up to 30 billion synapses using exponential and Gaussian connectivity models. Up to 1024 hardware cores, hosted on a 64 nodes server platform, executed the MPI processes composing the distributed simulator. The hardware platform was a cluster of IBM NX360 M5 16-core compute nodes, each one containing two Intel Xeon Haswell 8-core E5-2630 v3 processors, with a clock of 2.40GHz, interconnected through an InfiniBand network. This study is conducted in the framework of the CORTICONIC FET project, also in view of the next -to-start activities foreseen as part of the Human Brain Project (HBP), SubProject 3 Cognitive and Systems Neuroscience, WaveScalES work-package.
△ Less
Submitted 16 December, 2015;
originally announced December 2015.
-
Scaling to 1024 software processes and hardware cores of the distributed simulation of a spiking neural network including up to 20G synapses
Authors:
Elena Pastorelli,
Pier Stanislao Paolucci,
Roberto Ammendola,
Andrea Biagioni,
Ottorino Frezza,
Francesca Lo Cicero,
Alessandro Lonardo,
Michele Martinelli,
Francesco Simula,
Piero Vicini
Abstract:
This short report describes the scaling, up to 1024 software processes and hardware cores, of a distributed simulator of plastic spiking neural networks. A previous report demonstrated good scalability of the simulator up to 128 processes. Herein we extend the speed-up measurements and strong and weak scaling analysis of the simulator to the range between 1 and 1024 software processes and hardware…
▽ More
This short report describes the scaling, up to 1024 software processes and hardware cores, of a distributed simulator of plastic spiking neural networks. A previous report demonstrated good scalability of the simulator up to 128 processes. Herein we extend the speed-up measurements and strong and weak scaling analysis of the simulator to the range between 1 and 1024 software processes and hardware cores. We simulated two-dimensional grids of cortical columns including up to ~20G synapses connecting ~11M neurons. The neural network was distributed over a set of MPI processes and the simulations were run on a server platform composed of up to 64 dual-socket nodes, each socket equipped with Intel Haswell E5-2630 v3 processors (8 cores @ 2.4 GHz clock). All nodes are interconned through an InfiniBand network. The DPSNN simulator has been developed by INFN in the framework of EURETILE and CORTICONIC European FET Project and will be used by the WaveScalEW tem in the framework of the Human Brain Project (HBP), SubProject 2 - Cognitive and Systems Neuroscience. This report lays the groundwork for a more thorough comparison with the neural simulation tool NEST.
△ Less
Submitted 30 November, 2015;
originally announced November 2015.
-
The prototype detection unit of the KM3NeT detector
Authors:
KM3NeT Collaboration,
S. Adrián-Martínez,
M. Ageron,
F. Aharonian,
S. Aiello,
A. Albert,
F. Ameli,
E. G. Anassontzis,
G. C. Androulakis,
M. Anghinolfi,
G. Anton,
S. Anvar,
M. Ardid,
T. Avgitas,
K. Balasi,
H. Band,
G. Barbarino,
E. Barbarito,
F. Barbato,
B. Baret,
S. Baron,
J. Barrios,
A. Belias,
E. Berbee,
A. M. van den Berg
, et al. (224 additional authors not shown)
Abstract:
A prototype detection unit of the KM3NeT deep-sea neutrino telescope has been installed at 3500m depth 80km offshore the Italian coast. KM3NeT in its final configuration will contain several hundreds of detection units. Each detection unit is a mechanical structure anchored to the sea floor, held vertical by a submerged buoy and supporting optical modules for the detection of Cherenkov light emitt…
▽ More
A prototype detection unit of the KM3NeT deep-sea neutrino telescope has been installed at 3500m depth 80km offshore the Italian coast. KM3NeT in its final configuration will contain several hundreds of detection units. Each detection unit is a mechanical structure anchored to the sea floor, held vertical by a submerged buoy and supporting optical modules for the detection of Cherenkov light emitted by charged secondary particles emerging from neutrino interactions. This prototype string implements three optical modules with 31 photomultiplier tubes each. These optical modules were developed by the KM3NeT Collaboration to enhance the detection capability of neutrino interactions. The prototype detection unit was operated since its deployment in May 2014 until its decommissioning in July 2015. Reconstruction of the particle trajectories from the data requires a nanosecond accuracy in the time calibration. A procedure for relative time calibration of the photomultiplier tubes contained in each optical module is described. This procedure is based on the measured coincidences produced in the sea by the 40K background light and can easily be expanded to a detector with several thousands of optical modules. The time offsets between the different optical modules are obtained using LED nanobeacons mounted inside them. A set of data corresponding to 600 hours of livetime was analysed. The results show good agreement with Monte Carlo simulations of the expected optical background and the signal from atmospheric muons. An almost background-free sample of muons was selected by filtering the time correlated signals on all the three optical modules. The zenith angle of the selected muons was reconstructed with a precision of about 3°.
△ Less
Submitted 23 December, 2015; v1 submitted 6 October, 2015;
originally announced October 2015.
-
Long term monitoring of the optical background in the Capo Passero deep-sea site with the NEMO tower prototype
Authors:
S. Adrián-Martínez,
S. Aiello,
F. Ameli,
M. Anghinolfi,
M. Ardid,
G. Barbarino,
E. Barbarito,
F. C. T. Barbato,
N. Beverini,
S. Biagi,
A. Biagioni,
B. Bouhadef,
C. Bozza,
G. Cacopardo,
M. Calamai,
C. Calí,
D. Calvo,
A. Capone,
F. Caruso,
A. Ceres,
T. Chiarusi,
M. Circella,
R. Cocimano,
R. Coniglione,
M. Costa
, et al. (79 additional authors not shown)
Abstract:
The NEMO Phase-2 tower is the first detector which was operated underwater for more than one year at the "record" depth of 3500 m. It was designed and built within the framework of the NEMO (NEutrino Mediterranean Observatory) project. The 380 m high tower was successfully installed in March 2013 80 km offshore Capo Passero (Italy). This is the first prototype operated on the site where the italia…
▽ More
The NEMO Phase-2 tower is the first detector which was operated underwater for more than one year at the "record" depth of 3500 m. It was designed and built within the framework of the NEMO (NEutrino Mediterranean Observatory) project. The 380 m high tower was successfully installed in March 2013 80 km offshore Capo Passero (Italy). This is the first prototype operated on the site where the italian node of the KM3NeT neutrino telescope will be built. The installation and operation of the NEMO Phase-2 tower has proven the functionality of the infrastructure and the operability at 3500 m depth. A more than one year long monitoring of the deep water characteristics of the site has been also provided. In this paper the infrastructure and the tower structure and instrumentation are described. The results of long term optical background measurements are presented. The rates show stable and low baseline values, compatible with the contribution of 40K light emission, with a small percentage of light bursts due to bioluminescence. All these features confirm the stability and good optical properties of the site.
△ Less
Submitted 28 January, 2016; v1 submitted 17 July, 2015;
originally announced July 2015.
-
Power, Energy and Speed of Embedded and Server Multi-Cores applied to Distributed Simulation of Spiking Neural Networks: ARM in NVIDIA Tegra vs Intel Xeon quad-cores
Authors:
Pier Stanislao Paolucci,
Roberto Ammendola,
Andrea Biagioni,
Ottorino Frezza,
Francesca Lo Cicero,
Alessandro Lonardo,
Michele Martinelli,
Elena Pastorelli,
Francesco Simula,
Piero Vicini
Abstract:
This short note regards a comparison of instantaneous power, total energy consumption, execution time and energetic cost per synaptic event of a spiking neural network simulator (DPSNN-STDP) distributed on MPI processes when executed either on an embedded platform (based on a dual socket quad-core ARM platform) or a server platform (INTEL-based quad-core dual socket platform). We also compare the…
▽ More
This short note regards a comparison of instantaneous power, total energy consumption, execution time and energetic cost per synaptic event of a spiking neural network simulator (DPSNN-STDP) distributed on MPI processes when executed either on an embedded platform (based on a dual socket quad-core ARM platform) or a server platform (INTEL-based quad-core dual socket platform). We also compare the measure with those reported by leading custom and semi-custom designs: TrueNorth and SpiNNaker. In summary, we observed that: 1- we spent 2.2 micro-Joule per simulated event on the "embedded platform", approx. 4.4 times lower than what was spent by the "server platform"; 2- the instantaneous power consumption of the "embedded platform" was 14.4 times better than the "server" one; 3- the server platform is a factor 3.3 faster. The "embedded platform" is made of NVIDIA Jetson TK1 boards, interconnected by Ethernet, each mounting a Tegra K1 chip including a quad-core ARM Cortex-A15 at 2.3GHz. The "server platform" is based on dual-socket quad-core Intel Xeon CPUs (E5620 at 2.4GHz). The measures were obtained with the DPSNN-STDP simulator (Distributed Simulator of Polychronous Spiking Neural Network with synaptic Spike Timing Dependent Plasticity) developed by INFN, that already proved its efficient scalability and execution speed-up on hundreds of similar "server" cores and MPI processes, applied to neural nets composed of several billions of synapses.
△ Less
Submitted 12 May, 2015;
originally announced May 2015.
-
Prospects for $K^+ \to π^+ ν\bar{ ν}$ at CERN in NA62
Authors:
G. Aglieri Rinella,
R. Aliberti,
F. Ambrosino,
B. Angelucci,
A. Antonelli,
G. Anzivino,
R. Arcidiacono,
I. Azhinenko,
S. Balev,
J. Bendotti,
A. Biagioni,
C. Biino,
A. Bizzeti,
T. Blazek,
A. Blik,
B. Bloch-Devaux,
V. Bolotov,
V. Bonaiuto,
M. Bragadireanu,
D. Britton,
G. Britvich,
N. Brook,
F. Bucci,
V. Buescher,
F. Butin
, et al. (179 additional authors not shown)
Abstract:
The NA62 experiment will begin taking data in 2015. Its primary purpose is a 10% measurement of the branching ratio of the ultrarare kaon decay $K^+ \to π^+ ν\bar{ ν}$, using the decay in flight of kaons in an unseparated beam with momentum 75 GeV/c.The detector and analysis technique are described here.
The NA62 experiment will begin taking data in 2015. Its primary purpose is a 10% measurement of the branching ratio of the ultrarare kaon decay $K^+ \to π^+ ν\bar{ ν}$, using the decay in flight of kaons in an unseparated beam with momentum 75 GeV/c.The detector and analysis technique are described here.
△ Less
Submitted 1 November, 2014;
originally announced November 2014.
-
EURETILE D7.3 - Dynamic DAL benchmark coding, measurements on MPI version of DPSNN-STDP (distributed plastic spiking neural net) and improvements to other DAL codes
Authors:
Pier Stanislao Paolucci,
Iuliana Bacivarov,
Devendra Rai,
Lars Schor,
Lothar Thiele,
Hoeseok Yang,
Elena Pastorelli,
Roberto Ammendola,
Andrea Biagioni,
Ottorino Frezza,
Francesca Lo Cicero,
Alessandro Lonardo,
Francesco Simula,
Laura Tosoratto,
Piero Vicini
Abstract:
The EURETILE project required the selection and coding of a set of dedicated benchmarks. The project is about the software and hardware architecture of future many-tile distributed fault-tolerant systems. We focus on dynamic workloads characterised by heavy numerical processing requirements. The ambition is to identify common techniques that could be applied to both the Embedded Systems and HPC do…
▽ More
The EURETILE project required the selection and coding of a set of dedicated benchmarks. The project is about the software and hardware architecture of future many-tile distributed fault-tolerant systems. We focus on dynamic workloads characterised by heavy numerical processing requirements. The ambition is to identify common techniques that could be applied to both the Embedded Systems and HPC domains. This document is the first public deliverable of Work Package 7: Challenging Tiled Applications.
△ Less
Submitted 20 August, 2014;
originally announced August 2014.
-
NaNet: a Low-Latency, Real-Time, Multi-Standard Network Interface Card with GPUDirect Features
Authors:
A. Lonardo,
F. Ameli,
R. Ammendola,
A. Biagioni,
O. Frezza,
G. Lamanna,
F. Lo Cicero,
M. Martinelli,
P. S. Paolucci,
E. Pastorelli,
L. Pontisso,
D. Rossetti,
F. Simeone,
F. Simula,
M. Sozzi,
L. Tosoratto,
P. Vicini
Abstract:
While the GPGPU paradigm is widely recognized as an effective approach to high performance computing, its adoption in low-latency, real-time systems is still in its early stages.
Although GPUs typically show deterministic behaviour in terms of latency in executing computational kernels as soon as data is available in their internal memories, assessment of real-time features of a standard GPGPU s…
▽ More
While the GPGPU paradigm is widely recognized as an effective approach to high performance computing, its adoption in low-latency, real-time systems is still in its early stages.
Although GPUs typically show deterministic behaviour in terms of latency in executing computational kernels as soon as data is available in their internal memories, assessment of real-time features of a standard GPGPU system needs careful characterization of all subsystems along data stream path.
The networking subsystem results in being the most critical one in terms of absolute value and fluctuations of its response latency.
Our envisioned solution to this issue is NaNet, a FPGA-based PCIe Network Interface Card (NIC) design featuring a configurable and extensible set of network channels with direct access through GPUDirect to NVIDIA Fermi/Kepler GPU memories.
NaNet design currently supports both standard - GbE (1000BASE-T) and 10GbE (10Base-R) - and custom - 34~Gbps APElink and 2.5~Gbps deterministic latency KM3link - channels, but its modularity allows for a straightforward inclusion of other link technologies.
To avoid host OS intervention on data stream and remove a possible source of jitter, the design includes a network/transport layer offload module with cycle-accurate, upper-bound latency, supporting UDP, KM3link Time Division Multiplexing and APElink protocols.
After NaNet architecture description and its latency/bandwidth characterization for all supported links, two real world use cases will be presented: the GPU-based low level trigger for the RICH detector in the NA62 experiment at CERN and the on-/off-shore data link for KM3 underwater neutrino telescope.
△ Less
Submitted 13 June, 2014;
originally announced June 2014.
-
NaNet: a flexible and configurable low-latency NIC for real-time trigger systems based on GPUs
Authors:
R. Ammendola,
A. Biagioni,
O. Frezza,
G. Lamanna,
A. Lonardo,
F. Lo Cicero,
P. S. Paolucci,
F. Pantaleo,
D. Rossetti,
F. Simula,
M. Sozzi,
L. Tosoratto,
P. Vicini
Abstract:
NaNet is an FPGA-based PCIe X8 Gen2 NIC supporting 1/10 GbE links and the custom 34 Gbps APElink channel. The design has GPUDirect RDMA capabilities and features a network stack protocol offloading module, making it suitable for building low-latency, real-time GPU-based computing systems. We provide a detailed description of the NaNet hardware modular architecture. Benchmarks for latency and bandw…
▽ More
NaNet is an FPGA-based PCIe X8 Gen2 NIC supporting 1/10 GbE links and the custom 34 Gbps APElink channel. The design has GPUDirect RDMA capabilities and features a network stack protocol offloading module, making it suitable for building low-latency, real-time GPU-based computing systems. We provide a detailed description of the NaNet hardware modular architecture. Benchmarks for latency and bandwidth for GbE and APElink channels are presented, followed by a performance analysis on the case study of the GPU-based low level trigger for the RICH detector in the NA62 CERN experiment, using either the NaNet GbE and APElink channels. Finally, we give an outline of project future activities.
△ Less
Submitted 9 January, 2014; v1 submitted 15 November, 2013;
originally announced November 2013.
-
Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems
Authors:
Roberto Ammendola,
Andrea Biagioni,
Ottorino Frezza,
Francesca Lo Cicero,
Pier Stanislao Paolucci,
Alessandro Lonardo,
Davide Rossetti,
Francesco Simula,
Laura Tosoratto,
Piero Vicini
Abstract:
Modern Graphics Processing Units (GPUs) are now considered accelerators for general purpose computation. A tight interaction between the GPU and the interconnection network is the strategy to express the full potential on capability computing of a multi-GPU system on large HPC clusters; that is the reason why an efficient and scalable interconnect is a key technology to finally deliver GPUs for sc…
▽ More
Modern Graphics Processing Units (GPUs) are now considered accelerators for general purpose computation. A tight interaction between the GPU and the interconnection network is the strategy to express the full potential on capability computing of a multi-GPU system on large HPC clusters; that is the reason why an efficient and scalable interconnect is a key technology to finally deliver GPUs for scientific HPC. In this paper we show the latest architectural and performance improvement of the APEnet+ network fabric, a FPGA-based PCIe board with 6 fully bidirectional off-board links with 34 Gbps of raw bandwidth per direction, and X8 Gen2 bandwidth towards the host PC. The board implements a Remote Direct Memory Access (RDMA) protocol that leverages upon peer-to-peer (P2P) capabilities of Fermi- and Kepler-class NVIDIA GPUs to obtain real zero-copy, low-latency GPU-to-GPU transfers. Finally, we report on the development activities for 2013 focusing on the adoption of the latest generation 28 nm FPGAs and the preliminary tests performed on this new platform.
△ Less
Submitted 14 November, 2013; v1 submitted 7 November, 2013;
originally announced November 2013.
-
NaNet:a low-latency NIC enabling GPU-based, real-time low level trigger systems
Authors:
Roberto Ammendola,
Andrea Biagioni,
Riccardo Fantechi,
Ottorino Frezza,
Gianluca Lamanna,
Francesca Lo Cicero,
Alessandro Lonardo,
Pier Stanislao Paolucci,
Felice Pantaleo,
Roberto Piandani,
Luca Pontisso,
Davide Rossetti,
Francesco Simula,
Marco Sozzi,
Laura Tosoratto,
Piero Vicini
Abstract:
We implemented the NaNet FPGA-based PCI2 Gen2 GbE/APElink NIC, featuring GPUDirect RDMA capabilities and UDP protocol management offloading. NaNet is able to receive a UDP input data stream from its GbE interface and redirect it, without any intermediate buffering or CPU intervention, to the memory of a Fermi/Kepler GPU hosted on the same PCIe bus, provided that the two devices share the same upst…
▽ More
We implemented the NaNet FPGA-based PCI2 Gen2 GbE/APElink NIC, featuring GPUDirect RDMA capabilities and UDP protocol management offloading. NaNet is able to receive a UDP input data stream from its GbE interface and redirect it, without any intermediate buffering or CPU intervention, to the memory of a Fermi/Kepler GPU hosted on the same PCIe bus, provided that the two devices share the same upstream root complex. Synthetic benchmarks for latency and bandwidth are presented. We describe how NaNet can be employed in the prototype of the GPU-based RICH low-level trigger processor of the NA62 CERN experiment, to implement the data link between the TEL62 readout boards and the low level trigger processor. Results for the throughput and latency of the integrated system are presented and discussed.
△ Less
Submitted 22 November, 2013; v1 submitted 5 November, 2013;
originally announced November 2013.
-
Distributed simulation of polychronous and plastic spiking neural networks: strong and weak scaling of a representative mini-application benchmark executed on a small-scale commodity cluster
Authors:
Pier Stanislao Paolucci,
Roberto Ammendola,
Andrea Biagioni,
Ottorino Frezza,
Francesca Lo Cicero,
Alessandro Lonardo,
Elena Pastorelli,
Francesco Simula,
Laura Tosoratto,
Piero Vicini
Abstract:
We introduce a natively distributed mini-application benchmark representative of plastic spiking neural network simulators. It can be used to measure performances of existing computing platforms and to drive the development of future parallel/distributed computing systems dedicated to the simulation of plastic spiking networks. The mini-application is designed to generate spiking behaviors and syn…
▽ More
We introduce a natively distributed mini-application benchmark representative of plastic spiking neural network simulators. It can be used to measure performances of existing computing platforms and to drive the development of future parallel/distributed computing systems dedicated to the simulation of plastic spiking networks. The mini-application is designed to generate spiking behaviors and synaptic connectivity that do not change when the number of hardware processing nodes is varied, simplifying the quantitative study of scalability on commodity and custom architectures. Here, we present the strong and weak scaling and the profiling of the computational/communication components of the DPSNN-STDP benchmark (Distributed Simulation of Polychronous Spiking Neural Network with synaptic Spike-Timing Dependent Plasticity). In this first test, we used the benchmark to exercise a small-scale cluster of commodity processors (varying the number of used physical cores from 1 to 128). The cluster was interconnected through a commodity network. Bidimensional grids of columns composed of Izhikevich neurons projected synapses locally and toward first, second and third neighboring columns. The size of the simulated network varied from 6.6 Giga synapses down to 200 K synapses. The code demonstrated to be fast and scalable: 10 wall clock seconds were required to simulate one second of activity and plasticity (per Hertz of average firing rate) of a network composed by 3.2 G synapses running on 128 hardware cores clocked @ 2.4 GHz. The mini-application has been designed to be easily interfaced with standard and custom software and hardware communication interfaces. It has been designed from its foundation to be natively distributed and parallel, and should not pose major obstacles against distribution and parallelization on several platforms.
△ Less
Submitted 14 April, 2014; v1 submitted 31 October, 2013;
originally announced October 2013.
-
GPU peer-to-peer techniques applied to a cluster interconnect
Authors:
Roberto Ammendola,
Massimo Bernaschi,
Andrea Biagioni,
Mauro Bisson,
Massimiliano Fatica,
Ottorino Frezza,
Francesca Lo Cicero,
Alessandro Lonardo,
Enrico Mastrostefano,
Pier Stanislao Paolucci,
Davide Rossetti,
Francesco Simula,
Laura Tosoratto,
Piero Vicini
Abstract:
Modern GPUs support special protocols to exchange data directly across the PCI Express bus. While these protocols could be used to reduce GPU data transmission times, basically by avoiding staging to host memory, they require specific hardware features which are not available on current generation network adapters. In this paper we describe the architectural modifications required to implement pee…
▽ More
Modern GPUs support special protocols to exchange data directly across the PCI Express bus. While these protocols could be used to reduce GPU data transmission times, basically by avoiding staging to host memory, they require specific hardware features which are not available on current generation network adapters. In this paper we describe the architectural modifications required to implement peer-to-peer access to NVIDIA Fermi- and Kepler-class GPUs on an FPGA-based cluster interconnect. Besides, the current software implementation, which integrates this feature by minimally extending the RDMA programming model, is discussed, as well as some issues raised while employing it in a higher level API like MPI. Finally, the current limits of the technique are studied by analyzing the performance improvements on low-level benchmarks and on two GPU-accelerated applications, showing when and how they seem to benefit from the GPU peer-to-peer method.
△ Less
Submitted 31 July, 2013;
originally announced July 2013.
-
A heterogeneous many-core platform for experiments on scalable custom interconnects and management of fault and critical events, applied to many-process applications: Vol. II, 2012 technical report
Authors:
Roberto Ammendola,
Andrea Biagioni,
Ottorino Frezza,
Werner Geurts,
Gert Goossens,
Francesca Lo Cicero,
Alessandro Lonardo,
Pier Stanislao Paolucci,
Davide Rossetti,
Francesco Simula,
Laura Tosoratto,
Piero Vicini
Abstract:
This is the second of a planned collection of four yearly volumes describing the deployment of a heterogeneous many-core platform for experiments on scalable custom interconnects and management of fault and critical events, applied to many-process applications. This volume covers several topics, among which: 1- a system for awareness of faults and critical events (named LO|FA|MO) on experimental h…
▽ More
This is the second of a planned collection of four yearly volumes describing the deployment of a heterogeneous many-core platform for experiments on scalable custom interconnects and management of fault and critical events, applied to many-process applications. This volume covers several topics, among which: 1- a system for awareness of faults and critical events (named LO|FA|MO) on experimental heterogeneous many-core hardware platforms; 2- the integration and test of the experimental hardware heterogeneous many-core platform QUoNG, based on the APEnet+ custom interconnect; 3- the design of a Software-Programmable Distributed Network Processor architecture (DNP) using ASIP technology; 4- the initial stages of design of a new DNP generation onto a 28nm FPGA. These developments were performed in the framework of the EURETILE European Project under the Grant Agreement no. 247846.
△ Less
Submitted 4 July, 2013;
originally announced July 2013.
-
'Mutual Watch-dog Networking': Distributed Awareness of Faults and Critical Events in Petascale/Exascale systems
Authors:
Roberto Ammendola,
Andrea Biagioni,
Ottorino Frezza,
Francesca Lo Cicero,
Alessandro Lonardo,
Pier Stanislao Paolucci,
Davide Rossetti,
Francesco Simula,
Laura Tosoratto,
Piero Vicini
Abstract:
Many tile systems require techniques to be applied to increase components resilience and control the FIT (Failures In Time) rate. When scaling to peta- exa-scale systems the FIT rate may become unacceptable due to component numerosity, requiring more systemic countermeasures. Thus, the ability to be fault aware, i.e. to detect and collect information about fault and critical events, is a necessary…
▽ More
Many tile systems require techniques to be applied to increase components resilience and control the FIT (Failures In Time) rate. When scaling to peta- exa-scale systems the FIT rate may become unacceptable due to component numerosity, requiring more systemic countermeasures. Thus, the ability to be fault aware, i.e. to detect and collect information about fault and critical events, is a necessary feature that large scale distributed architectures must provide in order to apply systemic fault tolerance techniques. In this context, the LO|FA|MO approach is a way to obtain systemic fault awareness, by implementing a mutual watchdog mechanism and guaranteeing fault detection in a no-single-point-of-failure fashion. This document contains specification and implementation details about this approach, in the shape of a technical report.
△ Less
Submitted 2 July, 2013; v1 submitted 1 July, 2013;
originally announced July 2013.
-
The Distributed Network Processor: a novel off-chip and on-chip interconnection network architecture
Authors:
Andrea Biagioni,
Francesca Lo Cicero,
Alessandro Lonardo,
Pier Stanislao Paolucci,
Mersia Perra,
Davide Rossetti,
Carlo Sidore,
Francesco Simula,
Laura Tosoratto,
Piero Vicini
Abstract:
One of the most demanding challenges for the designers of parallel computing architectures is to deliver an efficient network infrastructure providing low latency, high bandwidth communications while preserving scalability. Besides off-chip communications between processors, recent multi-tile (i.e. multi-core) architectures face the challenge for an efficient on-chip interconnection network betwee…
▽ More
One of the most demanding challenges for the designers of parallel computing architectures is to deliver an efficient network infrastructure providing low latency, high bandwidth communications while preserving scalability. Besides off-chip communications between processors, recent multi-tile (i.e. multi-core) architectures face the challenge for an efficient on-chip interconnection network between processor's tiles. In this paper, we present a configurable and scalable architecture, based on our Distributed Network Processor (DNP) IP Library, targeting systems ranging from single MPSoCs to massive HPC platforms. The DNP provides inter-tile services for both on-chip and off-chip communications with a uniform RDMA style API, over a multi-dimensional direct network with a (possibly) hybrid topology.
△ Less
Submitted 7 March, 2012;
originally announced March 2012.
-
High-speed data transfer with FPGAs and QSFP+ modules
Authors:
R. Ammendola,
A. Biagioni,
G. Chiodi,
O. Frezza,
F. Lo Cicero,
A. Lonardo,
R. Lunadei,
P. S. Paolucci,
D. Rossetti,
A. Salamon,
G. Salina,
F. Simula,
L. Tosoratto,
P. Vicini
Abstract:
We present test results and characterization of a data transmission system based on a last generation FPGA and a commercial QSFP+ (Quad Small Form Pluggable +) module. QSFP+ standard defines a hot-pluggable transceiver available in copper or optical cable assemblies for an aggregated bandwidth of up to 40 Gbps. We implemented a complete testbench based on a commercial development card mounting an…
▽ More
We present test results and characterization of a data transmission system based on a last generation FPGA and a commercial QSFP+ (Quad Small Form Pluggable +) module. QSFP+ standard defines a hot-pluggable transceiver available in copper or optical cable assemblies for an aggregated bandwidth of up to 40 Gbps. We implemented a complete testbench based on a commercial development card mounting an Altera Stratix IV FPGA with 24 serial transceivers at 8.5 Gbps, together with a custom mezzanine hosting three QSFP+ modules. We present test results and signal integrity measurements up to an aggregated bandwidth of 12 Gbps.
△ Less
Submitted 1 March, 2011;
originally announced March 2011.
-
APEnet+: high bandwidth 3D torus direct network for petaflops scale commodity clusters
Authors:
Roberto Ammendola,
Andrea Biagioni,
Ottorino Frezza,
Francesca Lo Cicero,
Alessandro Lonardo,
Pier Stanislao Paolucci,
Davide Rossetti,
Andrea Salamon,
Gaetano Salina,
Francesco Simula,
Laura Tosoratto,
Piero Vicini
Abstract:
We describe herein the APElink+ board, a PCIe interconnect adapter featuring the latest advances in wire speed and interface technology plus hardware support for a RDMA programming model and experimental acceleration of GPU networking; this design allows us to build a low latency, high bandwidth PC cluster, the APEnet+ network, the new generation of our cost-effective, tens-of-thousands-scalable c…
▽ More
We describe herein the APElink+ board, a PCIe interconnect adapter featuring the latest advances in wire speed and interface technology plus hardware support for a RDMA programming model and experimental acceleration of GPU networking; this design allows us to build a low latency, high bandwidth PC cluster, the APEnet+ network, the new generation of our cost-effective, tens-of-thousands-scalable cluster network architecture. Some test results and characterization of data transmission of a complete testbench, based on a commercial development card mounting an Altera FPGA, are provided.
△ Less
Submitted 18 February, 2011;
originally announced February 2011.
-
APEnet+: a 3D toroidal network enabling Petaflops scale Lattice QCD simulations on commodity clusters
Authors:
Roberto Ammendola,
Andrea Biagioni,
Ottorino Frezza,
Francesca Lo Cicero,
Alessandro Lonardo,
Pier Paolucci,
Roberto Petronzio,
Davide Rossetti,
Andrea Salamon,
Gaetano Salina,
Francesco Simula,
Nazario Tantalo,
Laura Tosoratto,
Piero Vicini
Abstract:
Many scientific computations need multi-node parallelism for matching up both space (memory) and time (speed) ever-increasing requirements. The use of GPUs as accelerators introduces yet another level of complexity for the programmer and may potentially result in large overheads due to the complex memory hierarchy. Additionally, top-notch problems may easily employ more than a Petaflops of sustain…
▽ More
Many scientific computations need multi-node parallelism for matching up both space (memory) and time (speed) ever-increasing requirements. The use of GPUs as accelerators introduces yet another level of complexity for the programmer and may potentially result in large overheads due to the complex memory hierarchy. Additionally, top-notch problems may easily employ more than a Petaflops of sustained computing power, requiring thousands of GPUs orchestrated with some parallel programming model. Here we describe APEnet+, the new generation of our interconnect, which scales up to tens of thousands of nodes with linear cost, thus improving the price/performance ratio on large clusters. The project target is the development of the Apelink+ host adapter featuring a low latency, high bandwidth direct network, state-of-the-art wire speeds on the links and a PCIe X8 gen2 host interface. It features hardware support for the RDMA programming model and experimental acceleration of GPU networking. A Linux kernel driver, a set of low-level RDMA APIs and an OpenMPI library driver are available, allowing for painless porting of standard applications. Finally, we give an insight of future work and intended developments.
△ Less
Submitted 1 December, 2010;
originally announced December 2010.