-
Architectural improvements and technological enhancements for the APEnet+ interconnect system
Authors:
R. Ammendola,
A. Biagioni,
O. Frezza,
A. Lonardo,
F. Lo Cicero,
M. Martinelli,
P. S. Paolucci,
E. Pastorelli,
D. Rossetti,
F. Simula,
L. Tosoratto,
P. Vicini
Abstract:
The APEnet+ board delivers a point-to-point, low-latency, 3D torus network interface card. In this paper we describe the latest generation of APEnet NIC, APEnet v5, integrated in a PCIe Gen3 board based on a state-of-the-art, 28 nm Altera Stratix V FPGA. The NIC features a network architecture designed following the Remote DMA paradigm and tailored to tightly bind the computing power of modern GPU…
▽ More
The APEnet+ board delivers a point-to-point, low-latency, 3D torus network interface card. In this paper we describe the latest generation of APEnet NIC, APEnet v5, integrated in a PCIe Gen3 board based on a state-of-the-art, 28 nm Altera Stratix V FPGA. The NIC features a network architecture designed following the Remote DMA paradigm and tailored to tightly bind the computing power of modern GPUs to the communication fabric. For the APEnet v5 board we show characterizing figures as achieved bandwidth and BER obtained by exploiting new high performance ALTERA transceivers and PCIe Gen3 compliancy.
△ Less
Submitted 4 January, 2022;
originally announced January 2022.
-
GPU-based Real-time Triggering in the NA62 Experiment
Authors:
R. Ammendola,
A. Biagioni,
P. Cretaro,
S. Di Lorenzo,
R. Fantechi,
M. Fiorini,
O. Frezza,
G. Lamanna,
F. Lo Cicero,
A. Lonardo,
M. Martinelli,
I. Neri,
P. S. Paolucci,
E. Pastorelli,
R. Piandani,
L. Pontisso,
D. Rossetti,
F. Simula,
M. Sozzi,
P. Vicini
Abstract:
Over the last few years the GPGPU (General-Purpose computing on Graphics Processing Units) paradigm represented a remarkable development in the world of computing. Computing for High-Energy Physics is no exception: several works have demonstrated the effectiveness of the integration of GPU-based systems in high level trigger of different experiments. On the other hand the use of GPUs in the low le…
▽ More
Over the last few years the GPGPU (General-Purpose computing on Graphics Processing Units) paradigm represented a remarkable development in the world of computing. Computing for High-Energy Physics is no exception: several works have demonstrated the effectiveness of the integration of GPU-based systems in high level trigger of different experiments. On the other hand the use of GPUs in the low level trigger systems, characterized by stringent real-time constraints, such as tight time budget and high throughput, poses several challenges. In this paper we focus on the low level trigger in the CERN NA62 experiment, investigating the use of real-time computing on GPUs in this synchronous system. Our approach aimed at harvesting the GPU computing power to build in real-time refined physics-related trigger primitives for the RICH detector, as the the knowledge of Cerenkov rings parameters allows to build stringent conditions for data selection at trigger level. Latencies of all components of the trigger chain have been analyzed, pointing out that networking is the most critical one. To keep the latency of data transfer task under control, we devised NaNet, an FPGA-based PCIe Network Interface Card (NIC) with GPUDirect capabilities. For the processing task, we developed specific multiple ring trigger algorithms to leverage the parallel architecture of GPUs and increase the processing throughput to keep up with the high event rate. Results obtained during the first months of 2016 NA62 run are presented and discussed.
△ Less
Submitted 13 June, 2016;
originally announced June 2016.
-
NaNet: a Low-Latency, Real-Time, Multi-Standard Network Interface Card with GPUDirect Features
Authors:
A. Lonardo,
F. Ameli,
R. Ammendola,
A. Biagioni,
O. Frezza,
G. Lamanna,
F. Lo Cicero,
M. Martinelli,
P. S. Paolucci,
E. Pastorelli,
L. Pontisso,
D. Rossetti,
F. Simeone,
F. Simula,
M. Sozzi,
L. Tosoratto,
P. Vicini
Abstract:
While the GPGPU paradigm is widely recognized as an effective approach to high performance computing, its adoption in low-latency, real-time systems is still in its early stages.
Although GPUs typically show deterministic behaviour in terms of latency in executing computational kernels as soon as data is available in their internal memories, assessment of real-time features of a standard GPGPU s…
▽ More
While the GPGPU paradigm is widely recognized as an effective approach to high performance computing, its adoption in low-latency, real-time systems is still in its early stages.
Although GPUs typically show deterministic behaviour in terms of latency in executing computational kernels as soon as data is available in their internal memories, assessment of real-time features of a standard GPGPU system needs careful characterization of all subsystems along data stream path.
The networking subsystem results in being the most critical one in terms of absolute value and fluctuations of its response latency.
Our envisioned solution to this issue is NaNet, a FPGA-based PCIe Network Interface Card (NIC) design featuring a configurable and extensible set of network channels with direct access through GPUDirect to NVIDIA Fermi/Kepler GPU memories.
NaNet design currently supports both standard - GbE (1000BASE-T) and 10GbE (10Base-R) - and custom - 34~Gbps APElink and 2.5~Gbps deterministic latency KM3link - channels, but its modularity allows for a straightforward inclusion of other link technologies.
To avoid host OS intervention on data stream and remove a possible source of jitter, the design includes a network/transport layer offload module with cycle-accurate, upper-bound latency, supporting UDP, KM3link Time Division Multiplexing and APElink protocols.
After NaNet architecture description and its latency/bandwidth characterization for all supported links, two real world use cases will be presented: the GPU-based low level trigger for the RICH detector in the NA62 experiment at CERN and the on-/off-shore data link for KM3 underwater neutrino telescope.
△ Less
Submitted 13 June, 2014;
originally announced June 2014.
-
NaNet: a flexible and configurable low-latency NIC for real-time trigger systems based on GPUs
Authors:
R. Ammendola,
A. Biagioni,
O. Frezza,
G. Lamanna,
A. Lonardo,
F. Lo Cicero,
P. S. Paolucci,
F. Pantaleo,
D. Rossetti,
F. Simula,
M. Sozzi,
L. Tosoratto,
P. Vicini
Abstract:
NaNet is an FPGA-based PCIe X8 Gen2 NIC supporting 1/10 GbE links and the custom 34 Gbps APElink channel. The design has GPUDirect RDMA capabilities and features a network stack protocol offloading module, making it suitable for building low-latency, real-time GPU-based computing systems. We provide a detailed description of the NaNet hardware modular architecture. Benchmarks for latency and bandw…
▽ More
NaNet is an FPGA-based PCIe X8 Gen2 NIC supporting 1/10 GbE links and the custom 34 Gbps APElink channel. The design has GPUDirect RDMA capabilities and features a network stack protocol offloading module, making it suitable for building low-latency, real-time GPU-based computing systems. We provide a detailed description of the NaNet hardware modular architecture. Benchmarks for latency and bandwidth for GbE and APElink channels are presented, followed by a performance analysis on the case study of the GPU-based low level trigger for the RICH detector in the NA62 CERN experiment, using either the NaNet GbE and APElink channels. Finally, we give an outline of project future activities.
△ Less
Submitted 9 January, 2014; v1 submitted 15 November, 2013;
originally announced November 2013.
-
Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems
Authors:
Roberto Ammendola,
Andrea Biagioni,
Ottorino Frezza,
Francesca Lo Cicero,
Pier Stanislao Paolucci,
Alessandro Lonardo,
Davide Rossetti,
Francesco Simula,
Laura Tosoratto,
Piero Vicini
Abstract:
Modern Graphics Processing Units (GPUs) are now considered accelerators for general purpose computation. A tight interaction between the GPU and the interconnection network is the strategy to express the full potential on capability computing of a multi-GPU system on large HPC clusters; that is the reason why an efficient and scalable interconnect is a key technology to finally deliver GPUs for sc…
▽ More
Modern Graphics Processing Units (GPUs) are now considered accelerators for general purpose computation. A tight interaction between the GPU and the interconnection network is the strategy to express the full potential on capability computing of a multi-GPU system on large HPC clusters; that is the reason why an efficient and scalable interconnect is a key technology to finally deliver GPUs for scientific HPC. In this paper we show the latest architectural and performance improvement of the APEnet+ network fabric, a FPGA-based PCIe board with 6 fully bidirectional off-board links with 34 Gbps of raw bandwidth per direction, and X8 Gen2 bandwidth towards the host PC. The board implements a Remote Direct Memory Access (RDMA) protocol that leverages upon peer-to-peer (P2P) capabilities of Fermi- and Kepler-class NVIDIA GPUs to obtain real zero-copy, low-latency GPU-to-GPU transfers. Finally, we report on the development activities for 2013 focusing on the adoption of the latest generation 28 nm FPGAs and the preliminary tests performed on this new platform.
△ Less
Submitted 14 November, 2013; v1 submitted 7 November, 2013;
originally announced November 2013.
-
NaNet:a low-latency NIC enabling GPU-based, real-time low level trigger systems
Authors:
Roberto Ammendola,
Andrea Biagioni,
Riccardo Fantechi,
Ottorino Frezza,
Gianluca Lamanna,
Francesca Lo Cicero,
Alessandro Lonardo,
Pier Stanislao Paolucci,
Felice Pantaleo,
Roberto Piandani,
Luca Pontisso,
Davide Rossetti,
Francesco Simula,
Marco Sozzi,
Laura Tosoratto,
Piero Vicini
Abstract:
We implemented the NaNet FPGA-based PCI2 Gen2 GbE/APElink NIC, featuring GPUDirect RDMA capabilities and UDP protocol management offloading. NaNet is able to receive a UDP input data stream from its GbE interface and redirect it, without any intermediate buffering or CPU intervention, to the memory of a Fermi/Kepler GPU hosted on the same PCIe bus, provided that the two devices share the same upst…
▽ More
We implemented the NaNet FPGA-based PCI2 Gen2 GbE/APElink NIC, featuring GPUDirect RDMA capabilities and UDP protocol management offloading. NaNet is able to receive a UDP input data stream from its GbE interface and redirect it, without any intermediate buffering or CPU intervention, to the memory of a Fermi/Kepler GPU hosted on the same PCIe bus, provided that the two devices share the same upstream root complex. Synthetic benchmarks for latency and bandwidth are presented. We describe how NaNet can be employed in the prototype of the GPU-based RICH low-level trigger processor of the NA62 CERN experiment, to implement the data link between the TEL62 readout boards and the low level trigger processor. Results for the throughput and latency of the integrated system are presented and discussed.
△ Less
Submitted 22 November, 2013; v1 submitted 5 November, 2013;
originally announced November 2013.
-
GPU peer-to-peer techniques applied to a cluster interconnect
Authors:
Roberto Ammendola,
Massimo Bernaschi,
Andrea Biagioni,
Mauro Bisson,
Massimiliano Fatica,
Ottorino Frezza,
Francesca Lo Cicero,
Alessandro Lonardo,
Enrico Mastrostefano,
Pier Stanislao Paolucci,
Davide Rossetti,
Francesco Simula,
Laura Tosoratto,
Piero Vicini
Abstract:
Modern GPUs support special protocols to exchange data directly across the PCI Express bus. While these protocols could be used to reduce GPU data transmission times, basically by avoiding staging to host memory, they require specific hardware features which are not available on current generation network adapters. In this paper we describe the architectural modifications required to implement pee…
▽ More
Modern GPUs support special protocols to exchange data directly across the PCI Express bus. While these protocols could be used to reduce GPU data transmission times, basically by avoiding staging to host memory, they require specific hardware features which are not available on current generation network adapters. In this paper we describe the architectural modifications required to implement peer-to-peer access to NVIDIA Fermi- and Kepler-class GPUs on an FPGA-based cluster interconnect. Besides, the current software implementation, which integrates this feature by minimally extending the RDMA programming model, is discussed, as well as some issues raised while employing it in a higher level API like MPI. Finally, the current limits of the technique are studied by analyzing the performance improvements on low-level benchmarks and on two GPU-accelerated applications, showing when and how they seem to benefit from the GPU peer-to-peer method.
△ Less
Submitted 31 July, 2013;
originally announced July 2013.
-
A heterogeneous many-core platform for experiments on scalable custom interconnects and management of fault and critical events, applied to many-process applications: Vol. II, 2012 technical report
Authors:
Roberto Ammendola,
Andrea Biagioni,
Ottorino Frezza,
Werner Geurts,
Gert Goossens,
Francesca Lo Cicero,
Alessandro Lonardo,
Pier Stanislao Paolucci,
Davide Rossetti,
Francesco Simula,
Laura Tosoratto,
Piero Vicini
Abstract:
This is the second of a planned collection of four yearly volumes describing the deployment of a heterogeneous many-core platform for experiments on scalable custom interconnects and management of fault and critical events, applied to many-process applications. This volume covers several topics, among which: 1- a system for awareness of faults and critical events (named LO|FA|MO) on experimental h…
▽ More
This is the second of a planned collection of four yearly volumes describing the deployment of a heterogeneous many-core platform for experiments on scalable custom interconnects and management of fault and critical events, applied to many-process applications. This volume covers several topics, among which: 1- a system for awareness of faults and critical events (named LO|FA|MO) on experimental heterogeneous many-core hardware platforms; 2- the integration and test of the experimental hardware heterogeneous many-core platform QUoNG, based on the APEnet+ custom interconnect; 3- the design of a Software-Programmable Distributed Network Processor architecture (DNP) using ASIP technology; 4- the initial stages of design of a new DNP generation onto a 28nm FPGA. These developments were performed in the framework of the EURETILE European Project under the Grant Agreement no. 247846.
△ Less
Submitted 4 July, 2013;
originally announced July 2013.
-
'Mutual Watch-dog Networking': Distributed Awareness of Faults and Critical Events in Petascale/Exascale systems
Authors:
Roberto Ammendola,
Andrea Biagioni,
Ottorino Frezza,
Francesca Lo Cicero,
Alessandro Lonardo,
Pier Stanislao Paolucci,
Davide Rossetti,
Francesco Simula,
Laura Tosoratto,
Piero Vicini
Abstract:
Many tile systems require techniques to be applied to increase components resilience and control the FIT (Failures In Time) rate. When scaling to peta- exa-scale systems the FIT rate may become unacceptable due to component numerosity, requiring more systemic countermeasures. Thus, the ability to be fault aware, i.e. to detect and collect information about fault and critical events, is a necessary…
▽ More
Many tile systems require techniques to be applied to increase components resilience and control the FIT (Failures In Time) rate. When scaling to peta- exa-scale systems the FIT rate may become unacceptable due to component numerosity, requiring more systemic countermeasures. Thus, the ability to be fault aware, i.e. to detect and collect information about fault and critical events, is a necessary feature that large scale distributed architectures must provide in order to apply systemic fault tolerance techniques. In this context, the LO|FA|MO approach is a way to obtain systemic fault awareness, by implementing a mutual watchdog mechanism and guaranteeing fault detection in a no-single-point-of-failure fashion. This document contains specification and implementation details about this approach, in the shape of a technical report.
△ Less
Submitted 2 July, 2013; v1 submitted 1 July, 2013;
originally announced July 2013.
-
The Distributed Network Processor: a novel off-chip and on-chip interconnection network architecture
Authors:
Andrea Biagioni,
Francesca Lo Cicero,
Alessandro Lonardo,
Pier Stanislao Paolucci,
Mersia Perra,
Davide Rossetti,
Carlo Sidore,
Francesco Simula,
Laura Tosoratto,
Piero Vicini
Abstract:
One of the most demanding challenges for the designers of parallel computing architectures is to deliver an efficient network infrastructure providing low latency, high bandwidth communications while preserving scalability. Besides off-chip communications between processors, recent multi-tile (i.e. multi-core) architectures face the challenge for an efficient on-chip interconnection network betwee…
▽ More
One of the most demanding challenges for the designers of parallel computing architectures is to deliver an efficient network infrastructure providing low latency, high bandwidth communications while preserving scalability. Besides off-chip communications between processors, recent multi-tile (i.e. multi-core) architectures face the challenge for an efficient on-chip interconnection network between processor's tiles. In this paper, we present a configurable and scalable architecture, based on our Distributed Network Processor (DNP) IP Library, targeting systems ranging from single MPSoCs to massive HPC platforms. The DNP provides inter-tile services for both on-chip and off-chip communications with a uniform RDMA style API, over a multi-dimensional direct network with a (possibly) hybrid topology.
△ Less
Submitted 7 March, 2012;
originally announced March 2012.
-
High-speed data transfer with FPGAs and QSFP+ modules
Authors:
R. Ammendola,
A. Biagioni,
G. Chiodi,
O. Frezza,
F. Lo Cicero,
A. Lonardo,
R. Lunadei,
P. S. Paolucci,
D. Rossetti,
A. Salamon,
G. Salina,
F. Simula,
L. Tosoratto,
P. Vicini
Abstract:
We present test results and characterization of a data transmission system based on a last generation FPGA and a commercial QSFP+ (Quad Small Form Pluggable +) module. QSFP+ standard defines a hot-pluggable transceiver available in copper or optical cable assemblies for an aggregated bandwidth of up to 40 Gbps. We implemented a complete testbench based on a commercial development card mounting an…
▽ More
We present test results and characterization of a data transmission system based on a last generation FPGA and a commercial QSFP+ (Quad Small Form Pluggable +) module. QSFP+ standard defines a hot-pluggable transceiver available in copper or optical cable assemblies for an aggregated bandwidth of up to 40 Gbps. We implemented a complete testbench based on a commercial development card mounting an Altera Stratix IV FPGA with 24 serial transceivers at 8.5 Gbps, together with a custom mezzanine hosting three QSFP+ modules. We present test results and signal integrity measurements up to an aggregated bandwidth of 12 Gbps.
△ Less
Submitted 1 March, 2011;
originally announced March 2011.
-
APEnet+: high bandwidth 3D torus direct network for petaflops scale commodity clusters
Authors:
Roberto Ammendola,
Andrea Biagioni,
Ottorino Frezza,
Francesca Lo Cicero,
Alessandro Lonardo,
Pier Stanislao Paolucci,
Davide Rossetti,
Andrea Salamon,
Gaetano Salina,
Francesco Simula,
Laura Tosoratto,
Piero Vicini
Abstract:
We describe herein the APElink+ board, a PCIe interconnect adapter featuring the latest advances in wire speed and interface technology plus hardware support for a RDMA programming model and experimental acceleration of GPU networking; this design allows us to build a low latency, high bandwidth PC cluster, the APEnet+ network, the new generation of our cost-effective, tens-of-thousands-scalable c…
▽ More
We describe herein the APElink+ board, a PCIe interconnect adapter featuring the latest advances in wire speed and interface technology plus hardware support for a RDMA programming model and experimental acceleration of GPU networking; this design allows us to build a low latency, high bandwidth PC cluster, the APEnet+ network, the new generation of our cost-effective, tens-of-thousands-scalable cluster network architecture. Some test results and characterization of data transmission of a complete testbench, based on a commercial development card mounting an Altera FPGA, are provided.
△ Less
Submitted 18 February, 2011;
originally announced February 2011.
-
APEnet+: a 3D toroidal network enabling Petaflops scale Lattice QCD simulations on commodity clusters
Authors:
Roberto Ammendola,
Andrea Biagioni,
Ottorino Frezza,
Francesca Lo Cicero,
Alessandro Lonardo,
Pier Paolucci,
Roberto Petronzio,
Davide Rossetti,
Andrea Salamon,
Gaetano Salina,
Francesco Simula,
Nazario Tantalo,
Laura Tosoratto,
Piero Vicini
Abstract:
Many scientific computations need multi-node parallelism for matching up both space (memory) and time (speed) ever-increasing requirements. The use of GPUs as accelerators introduces yet another level of complexity for the programmer and may potentially result in large overheads due to the complex memory hierarchy. Additionally, top-notch problems may easily employ more than a Petaflops of sustain…
▽ More
Many scientific computations need multi-node parallelism for matching up both space (memory) and time (speed) ever-increasing requirements. The use of GPUs as accelerators introduces yet another level of complexity for the programmer and may potentially result in large overheads due to the complex memory hierarchy. Additionally, top-notch problems may easily employ more than a Petaflops of sustained computing power, requiring thousands of GPUs orchestrated with some parallel programming model. Here we describe APEnet+, the new generation of our interconnect, which scales up to tens of thousands of nodes with linear cost, thus improving the price/performance ratio on large clusters. The project target is the development of the Apelink+ host adapter featuring a low latency, high bandwidth direct network, state-of-the-art wire speeds on the links and a PCIe X8 gen2 host interface. It features hardware support for the RDMA programming model and experimental acceleration of GPU networking. A Linux kernel driver, a set of low-level RDMA APIs and an OpenMPI library driver are available, allowing for painless porting of standard applications. Finally, we give an insight of future work and intended developments.
△ Less
Submitted 1 December, 2010;
originally announced December 2010.
-
Status of the APENet project
Authors:
R. Ammendola,
R. Petronzio,
D. Rossetti,
A. Salamon,
N. Tantalo,
P. Vicini
Abstract:
We present the current status of APENet, our custom 3-dimensional interconnect architecture for PC clusters environment. We report some micro-benchmarks on our recent large installation as well as new developments on the software and hardware side. The low level device driver has been reworked by following a custom hardware RDMA architecture, and MPICH-VMI, an implementation of the MPI library,…
▽ More
We present the current status of APENet, our custom 3-dimensional interconnect architecture for PC clusters environment. We report some micro-benchmarks on our recent large installation as well as new developments on the software and hardware side. The low level device driver has been reworked by following a custom hardware RDMA architecture, and MPICH-VMI, an implementation of the MPI library, has been ported to APENet.
△ Less
Submitted 26 September, 2005;
originally announced September 2005.
-
APENet: LQCD clusters a la APE
Authors:
R. Ammendola,
M. Guagnelli,
G. Mazza,
F. Palombi,
R. Petronzio,
D. Rossetti,
A. Salamon,
P. Vicini
Abstract:
Developed by the APE group, APENet is a new high speed, low latency, 3-dimensional interconnect architecture optimized for PC clusters running LQCD-like numerical applications. The hardware implementation is based on a single PCI-X 133MHz network interface card hosting six indipendent bi-directional channels with a peak bandwidth of 676 MB/s each direction. We discuss preliminary benchmark resul…
▽ More
Developed by the APE group, APENet is a new high speed, low latency, 3-dimensional interconnect architecture optimized for PC clusters running LQCD-like numerical applications. The hardware implementation is based on a single PCI-X 133MHz network interface card hosting six indipendent bi-directional channels with a peak bandwidth of 676 MB/s each direction. We discuss preliminary benchmark results showing exciting performances similar or better than those found in high-end commercial network systems.
△ Less
Submitted 14 September, 2004;
originally announced September 2004.
-
apeNEXT: A multi-TFlops Computer for Simulations in Lattice Gauge Theory
Authors:
F. Bodin,
Ph. Boucaud,
N. Cabibbo,
F. Di Carlo,
R. De Pietri,
F. Di Renzo,
H. Kaldass,
A. Lonardo,
M. Lukyanov,
S. De Luca,
J. Micheli,
V. Morenas,
O. Pene,
D. Pleiter,
N. Paschedag,
F. Rapuano,
D. Rossetti,
L. Sartori,
F. Schifano,
H. Simma,
R. Tripiccione,
P. Vicini
Abstract:
We present the APE (Array Processor Experiment) project for the development of dedicated parallel computers for numerical simulations in lattice gauge theories. While APEmille is a production machine in today's physics simulations at various sites in Europe, a new machine, apeNEXT, is currently being developed to provide multi-Tflops computing performance. Like previous APE machines, the new sup…
▽ More
We present the APE (Array Processor Experiment) project for the development of dedicated parallel computers for numerical simulations in lattice gauge theories. While APEmille is a production machine in today's physics simulations at various sites in Europe, a new machine, apeNEXT, is currently being developed to provide multi-Tflops computing performance. Like previous APE machines, the new supercomputer is largely custom designed and specifically optimized for simulations of Lattice QCD.
△ Less
Submitted 8 October, 2003; v1 submitted 2 September, 2003;
originally announced September 2003.
-
The apeNEXT project (Status report)
Authors:
F. Bodin,
Ph. Boucaud,
J. Micheli,
O. Pene,
N. Cabibbo,
F. Di Carlo,
A. Lonardo,
S. de Luca,
F. Rapuano,
D. Rossetti,
P. Vicini,
R. De Pietri,
F. Di Renzo,
H. Kaldass,
N. Paschedag,
H. Simma,
V. Morenas,
D. Pleiter,
L. Sartori,
F. Schifano,
R. Tripiccione
Abstract:
We present the current status of the apeNEXT project. Aim of this project is the development of the next generation of APE machines which will provide multi-teraflop computing power. Like previous machines, apeNEXT is based on a custom designed processor, which is specifically optimized for simulating QCD. We discuss the machine design, report on benchmarks, and give an overview on the status of…
▽ More
We present the current status of the apeNEXT project. Aim of this project is the development of the next generation of APE machines which will provide multi-teraflop computing power. Like previous machines, apeNEXT is based on a custom designed processor, which is specifically optimized for simulating QCD. We discuss the machine design, report on benchmarks, and give an overview on the status of the software development.
△ Less
Submitted 4 September, 2003; v1 submitted 13 June, 2003;
originally announced June 2003.
-
Status of the apeNEXT project
Authors:
R. Ammendola,
F. Bodin,
Ph. Boucaud,
N. Cabibbo,
F. Di Carlo,
R. De Pietri,
F. Di Renzo,
W. Errico,
A. Fucci,
M. Guagnelli,
H. Kaldass,
A. Lonardo,
S. de Luca,
J. Micheli,
V. Morenas,
O. Pene,
R. Petronzio,
F. Palombi,
D. Pleiter,
N. Paschedag,
F. Rapuano,
P. De Riso,
D. Rossetti,
A. Salamon,
G. Salina
, et al. (5 additional authors not shown)
Abstract:
We present the current status of the apeNEXT project. Aim of this project is the development of the next generation of APE machines which will provide multi-teraflop computing power. Like previous machines, apeNEXT is based on a custom designed processor, which is specifically optimized for simulating QCD. We discuss the machine design, report on benchmarks, and give an overview on the status of…
▽ More
We present the current status of the apeNEXT project. Aim of this project is the development of the next generation of APE machines which will provide multi-teraflop computing power. Like previous machines, apeNEXT is based on a custom designed processor, which is specifically optimized for simulating QCD. We discuss the machine design, report on benchmarks, and give an overview on the status of the software development.
△ Less
Submitted 8 October, 2003; v1 submitted 15 November, 2002;
originally announced November 2002.
-
The APENEXT project
Authors:
F. Bodin,
P. Boucaud,
N. Cabibbo,
F. Calvayrac,
M. Della Morte,
R. De Pietri,
P. De Riso,
F. Di Carlo,
F. Di Renzo,
W. Errico,
R. Frezzotti,
U. Gensch,
T. Giorgino,
M. Guagnelli,
N. Herve,
H. Kaldass,
A. Lonardo,
M. Lukyanov,
G. Magazzu,
J. Micheli,
V. Morenas,
L. Mori,
F. Palombi,
N. Paschedag,
O. Pene
, et al. (9 additional authors not shown)
Abstract:
APENEXT is a new generation APE processor, optimized for LGT simulations. The project follows the basic ideas of previous APE machines and develops simple and cheap parallel systems with multi T-Flops processing power. This paper describes the main features of this new development.
APENEXT is a new generation APE processor, optimized for LGT simulations. The project follows the basic ideas of previous APE machines and develops simple and cheap parallel systems with multi T-Flops processing power. This paper describes the main features of this new development.
△ Less
Submitted 25 October, 2001;
originally announced October 2001.
-
Status of APEmille
Authors:
APE-Collaboration,
:,
A. Bartoloni,
P. Boucaud,
N. Cabibbo,
F. Calvayrac,
M. Della Morte,
R. De Pietri,
P. De Riso,
F. Di Carlo,
F. Di Renzo,
W. Errico,
R. Frezzotti,
T. Giorgino,
J. Heitger,
A. Lonardo,
M. Loukianov,
G. Magazzu,
J. Micheli,
V. Morenas,
N. Paschedag,
O. Pene,
R. Petronzio,
D. Pleiter,
F. Rapuano
, et al. (9 additional authors not shown)
Abstract:
This paper presents the status of the APEmille project, which is essentially completed, as far as machine development and construction is concerned. Several large installations of APEmille are in use for physics production runs leading to many new results presented at this conference. This paper briefly summarizes the APEmille architecture, reviews the status of the installations and presents so…
▽ More
This paper presents the status of the APEmille project, which is essentially completed, as far as machine development and construction is concerned. Several large installations of APEmille are in use for physics production runs leading to many new results presented at this conference. This paper briefly summarizes the APEmille architecture, reviews the status of the installations and presents some performance figures for physics codes.
△ Less
Submitted 17 October, 2001;
originally announced October 2001.
-
Progress and status of APEmille
Authors:
APE collaboration,
A. Bartoloni,
S. Cabasino,
N. Cabibbo,
M. Cosimi,
P. De Riso,
W. Errico,
S. Giovannetti,
F. Laico,
H. Leich,
A. Lonardo,
G. Magazzu,
A. Michelotti,
E. Panizzi,
P. S. Paolucci,
D. Rossetti,
U. Schwendicke,
H. Simma,
K. H. Sulanke,
M. Torelli,
R. Tripiccione,
P. Vicini
Abstract:
We report on the progress and status of the APEmille project: a SIMD parallel computer with a peak performance in the TeraFlops range which is now in an advanced development phase. We discuss the hardware and software architecture, and present some performance estimates for Lattice Gauge Theory (LGT) applications.
We report on the progress and status of the APEmille project: a SIMD parallel computer with a peak performance in the TeraFlops range which is now in an advanced development phase. We discuss the hardware and software architecture, and present some performance estimates for Lattice Gauge Theory (LGT) applications.
△ Less
Submitted 1 October, 1997;
originally announced October 1997.
-
Numerical Simulations of the Dynamical Behavior of the SK Model
Authors:
Enzo Marinari,
Giorgio Parisi,
Davide Rossetti
Abstract:
We study the dynamical behavior of the Sherrington Kirkpatrick model. Thanks to the APE supercomputer we are able to analyze large lattice volumes, and to investigate the low $T$ region. We present a determination of the remnant magnetization and of its time decay exponent, of the energy time decay exponent, and we discuss aging phenomena in the model.
We study the dynamical behavior of the Sherrington Kirkpatrick model. Thanks to the APE supercomputer we are able to analyze large lattice volumes, and to investigate the low $T$ region. We present a determination of the remnant magnetization and of its time decay exponent, of the energy time decay exponent, and we discuss aging phenomena in the model.
△ Less
Submitted 8 December, 1997; v1 submitted 4 August, 1997;
originally announced August 1997.