Skip to main content

Showing 1–22 of 22 results for author: Rossetti, D

.
  1. arXiv:2201.01088  [pdf, other

    physics.comp-ph cs.AR

    Architectural improvements and technological enhancements for the APEnet+ interconnect system

    Authors: R. Ammendola, A. Biagioni, O. Frezza, A. Lonardo, F. Lo Cicero, M. Martinelli, P. S. Paolucci, E. Pastorelli, D. Rossetti, F. Simula, L. Tosoratto, P. Vicini

    Abstract: The APEnet+ board delivers a point-to-point, low-latency, 3D torus network interface card. In this paper we describe the latest generation of APEnet NIC, APEnet v5, integrated in a PCIe Gen3 board based on a state-of-the-art, 28 nm Altera Stratix V FPGA. The NIC features a network architecture designed following the Remote DMA paradigm and tailored to tightly bind the computing power of modern GPU… ▽ More

    Submitted 4 January, 2022; originally announced January 2022.

    Journal ref: **st February 3, 2015

  2. arXiv:1606.04099  [pdf, other

    physics.ins-det hep-ex

    GPU-based Real-time Triggering in the NA62 Experiment

    Authors: R. Ammendola, A. Biagioni, P. Cretaro, S. Di Lorenzo, R. Fantechi, M. Fiorini, O. Frezza, G. Lamanna, F. Lo Cicero, A. Lonardo, M. Martinelli, I. Neri, P. S. Paolucci, E. Pastorelli, R. Piandani, L. Pontisso, D. Rossetti, F. Simula, M. Sozzi, P. Vicini

    Abstract: Over the last few years the GPGPU (General-Purpose computing on Graphics Processing Units) paradigm represented a remarkable development in the world of computing. Computing for High-Energy Physics is no exception: several works have demonstrated the effectiveness of the integration of GPU-based systems in high level trigger of different experiments. On the other hand the use of GPUs in the low le… ▽ More

    Submitted 13 June, 2016; originally announced June 2016.

  3. arXiv:1406.3568  [pdf, other

    physics.ins-det cs.AR

    NaNet: a Low-Latency, Real-Time, Multi-Standard Network Interface Card with GPUDirect Features

    Authors: A. Lonardo, F. Ameli, R. Ammendola, A. Biagioni, O. Frezza, G. Lamanna, F. Lo Cicero, M. Martinelli, P. S. Paolucci, E. Pastorelli, L. Pontisso, D. Rossetti, F. Simeone, F. Simula, M. Sozzi, L. Tosoratto, P. Vicini

    Abstract: While the GPGPU paradigm is widely recognized as an effective approach to high performance computing, its adoption in low-latency, real-time systems is still in its early stages. Although GPUs typically show deterministic behaviour in terms of latency in executing computational kernels as soon as data is available in their internal memories, assessment of real-time features of a standard GPGPU s… ▽ More

    Submitted 13 June, 2014; originally announced June 2014.

  4. arXiv:1311.4007  [pdf, other

    physics.ins-det cs.DC

    NaNet: a flexible and configurable low-latency NIC for real-time trigger systems based on GPUs

    Authors: R. Ammendola, A. Biagioni, O. Frezza, G. Lamanna, A. Lonardo, F. Lo Cicero, P. S. Paolucci, F. Pantaleo, D. Rossetti, F. Simula, M. Sozzi, L. Tosoratto, P. Vicini

    Abstract: NaNet is an FPGA-based PCIe X8 Gen2 NIC supporting 1/10 GbE links and the custom 34 Gbps APElink channel. The design has GPUDirect RDMA capabilities and features a network stack protocol offloading module, making it suitable for building low-latency, real-time GPU-based computing systems. We provide a detailed description of the NaNet hardware modular architecture. Benchmarks for latency and bandw… ▽ More

    Submitted 9 January, 2014; v1 submitted 15 November, 2013; originally announced November 2013.

    Comments: Proceedings for the TWEPP 2013 - Topical Workshop on Electronics for Particle Physics workshop

  5. arXiv:1311.1741  [pdf, other

    cs.AR cs.DC physics.comp-ph

    Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems

    Authors: Roberto Ammendola, Andrea Biagioni, Ottorino Frezza, Francesca Lo Cicero, Pier Stanislao Paolucci, Alessandro Lonardo, Davide Rossetti, Francesco Simula, Laura Tosoratto, Piero Vicini

    Abstract: Modern Graphics Processing Units (GPUs) are now considered accelerators for general purpose computation. A tight interaction between the GPU and the interconnection network is the strategy to express the full potential on capability computing of a multi-GPU system on large HPC clusters; that is the reason why an efficient and scalable interconnect is a key technology to finally deliver GPUs for sc… ▽ More

    Submitted 14 November, 2013; v1 submitted 7 November, 2013; originally announced November 2013.

    Comments: Proceedings for the 20th International Conference on Computing in High Energy and Nuclear Physics (CHEP)

  6. NaNet:a low-latency NIC enabling GPU-based, real-time low level trigger systems

    Authors: Roberto Ammendola, Andrea Biagioni, Riccardo Fantechi, Ottorino Frezza, Gianluca Lamanna, Francesca Lo Cicero, Alessandro Lonardo, Pier Stanislao Paolucci, Felice Pantaleo, Roberto Piandani, Luca Pontisso, Davide Rossetti, Francesco Simula, Marco Sozzi, Laura Tosoratto, Piero Vicini

    Abstract: We implemented the NaNet FPGA-based PCI2 Gen2 GbE/APElink NIC, featuring GPUDirect RDMA capabilities and UDP protocol management offloading. NaNet is able to receive a UDP input data stream from its GbE interface and redirect it, without any intermediate buffering or CPU intervention, to the memory of a Fermi/Kepler GPU hosted on the same PCIe bus, provided that the two devices share the same upst… ▽ More

    Submitted 22 November, 2013; v1 submitted 5 November, 2013; originally announced November 2013.

    Comments: Proceedings for the 20th International Conference on Computing in High Energy and Nuclear Physics (CHEP)

  7. arXiv:1307.8276  [pdf, other

    physics.comp-ph cs.DC

    GPU peer-to-peer techniques applied to a cluster interconnect

    Authors: Roberto Ammendola, Massimo Bernaschi, Andrea Biagioni, Mauro Bisson, Massimiliano Fatica, Ottorino Frezza, Francesca Lo Cicero, Alessandro Lonardo, Enrico Mastrostefano, Pier Stanislao Paolucci, Davide Rossetti, Francesco Simula, Laura Tosoratto, Piero Vicini

    Abstract: Modern GPUs support special protocols to exchange data directly across the PCI Express bus. While these protocols could be used to reduce GPU data transmission times, basically by avoiding staging to host memory, they require specific hardware features which are not available on current generation network adapters. In this paper we describe the architectural modifications required to implement pee… ▽ More

    Submitted 31 July, 2013; originally announced July 2013.

    Comments: paper accepted to CASS 2013

  8. arXiv:1307.1270  [pdf, other

    cs.DC

    A heterogeneous many-core platform for experiments on scalable custom interconnects and management of fault and critical events, applied to many-process applications: Vol. II, 2012 technical report

    Authors: Roberto Ammendola, Andrea Biagioni, Ottorino Frezza, Werner Geurts, Gert Goossens, Francesca Lo Cicero, Alessandro Lonardo, Pier Stanislao Paolucci, Davide Rossetti, Francesco Simula, Laura Tosoratto, Piero Vicini

    Abstract: This is the second of a planned collection of four yearly volumes describing the deployment of a heterogeneous many-core platform for experiments on scalable custom interconnects and management of fault and critical events, applied to many-process applications. This volume covers several topics, among which: 1- a system for awareness of faults and critical events (named LO|FA|MO) on experimental h… ▽ More

    Submitted 4 July, 2013; originally announced July 2013.

    Comments: 119 pages

    MSC Class: 68M10; 68M14; 68M15 ACM Class: B.8.1; C.1.4; C.3; C.4; C.5.1

  9. arXiv:1307.0433  [pdf, other

    cs.DC cs.NI

    'Mutual Watch-dog Networking': Distributed Awareness of Faults and Critical Events in Petascale/Exascale systems

    Authors: Roberto Ammendola, Andrea Biagioni, Ottorino Frezza, Francesca Lo Cicero, Alessandro Lonardo, Pier Stanislao Paolucci, Davide Rossetti, Francesco Simula, Laura Tosoratto, Piero Vicini

    Abstract: Many tile systems require techniques to be applied to increase components resilience and control the FIT (Failures In Time) rate. When scaling to peta- exa-scale systems the FIT rate may become unacceptable due to component numerosity, requiring more systemic countermeasures. Thus, the ability to be fault aware, i.e. to detect and collect information about fault and critical events, is a necessary… ▽ More

    Submitted 2 July, 2013; v1 submitted 1 July, 2013; originally announced July 2013.

    Comments: Technical Report, Preprint

  10. arXiv:1203.1536  [pdf, other

    cs.AR cs.NI

    The Distributed Network Processor: a novel off-chip and on-chip interconnection network architecture

    Authors: Andrea Biagioni, Francesca Lo Cicero, Alessandro Lonardo, Pier Stanislao Paolucci, Mersia Perra, Davide Rossetti, Carlo Sidore, Francesco Simula, Laura Tosoratto, Piero Vicini

    Abstract: One of the most demanding challenges for the designers of parallel computing architectures is to deliver an efficient network infrastructure providing low latency, high bandwidth communications while preserving scalability. Besides off-chip communications between processors, recent multi-tile (i.e. multi-core) architectures face the challenge for an efficient on-chip interconnection network betwee… ▽ More

    Submitted 7 March, 2012; originally announced March 2012.

    Comments: 8 pages, 11 figures, submitted to Hot Interconnect 2009

  11. arXiv:1103.0128  [pdf, other

    physics.ins-det physics.comp-ph

    High-speed data transfer with FPGAs and QSFP+ modules

    Authors: R. Ammendola, A. Biagioni, G. Chiodi, O. Frezza, F. Lo Cicero, A. Lonardo, R. Lunadei, P. S. Paolucci, D. Rossetti, A. Salamon, G. Salina, F. Simula, L. Tosoratto, P. Vicini

    Abstract: We present test results and characterization of a data transmission system based on a last generation FPGA and a commercial QSFP+ (Quad Small Form Pluggable +) module. QSFP+ standard defines a hot-pluggable transceiver available in copper or optical cable assemblies for an aggregated bandwidth of up to 40 Gbps. We implemented a complete testbench based on a commercial development card mounting an… ▽ More

    Submitted 1 March, 2011; originally announced March 2011.

    Comments: 5 pages, 3 figures, Published on JINST Journal of Instrumentation proceedings of Topical Workshop on Electronics for Particle Physics 2010, 20-24 September 2010, Aachen, Germany(R Ammendola et al 2010 JINST 5 C12019)

    Journal ref: JINST 5:C12019,2010

  12. arXiv:1102.3796  [pdf, other

    physics.comp-ph cs.AR

    APEnet+: high bandwidth 3D torus direct network for petaflops scale commodity clusters

    Authors: Roberto Ammendola, Andrea Biagioni, Ottorino Frezza, Francesca Lo Cicero, Alessandro Lonardo, Pier Stanislao Paolucci, Davide Rossetti, Andrea Salamon, Gaetano Salina, Francesco Simula, Laura Tosoratto, Piero Vicini

    Abstract: We describe herein the APElink+ board, a PCIe interconnect adapter featuring the latest advances in wire speed and interface technology plus hardware support for a RDMA programming model and experimental acceleration of GPU networking; this design allows us to build a low latency, high bandwidth PC cluster, the APEnet+ network, the new generation of our cost-effective, tens-of-thousands-scalable c… ▽ More

    Submitted 18 February, 2011; originally announced February 2011.

    Comments: 6 pages, 7 figures, proceeding of CHEP 2010, Taiwan, October 18-22

  13. arXiv:1012.0253  [pdf, other

    hep-lat cs.DC

    APEnet+: a 3D toroidal network enabling Petaflops scale Lattice QCD simulations on commodity clusters

    Authors: Roberto Ammendola, Andrea Biagioni, Ottorino Frezza, Francesca Lo Cicero, Alessandro Lonardo, Pier Paolucci, Roberto Petronzio, Davide Rossetti, Andrea Salamon, Gaetano Salina, Francesco Simula, Nazario Tantalo, Laura Tosoratto, Piero Vicini

    Abstract: Many scientific computations need multi-node parallelism for matching up both space (memory) and time (speed) ever-increasing requirements. The use of GPUs as accelerators introduces yet another level of complexity for the programmer and may potentially result in large overheads due to the complex memory hierarchy. Additionally, top-notch problems may easily employ more than a Petaflops of sustain… ▽ More

    Submitted 1 December, 2010; originally announced December 2010.

  14. arXiv:hep-lat/0509130  [pdf, ps, other

    hep-lat

    Status of the APENet project

    Authors: R. Ammendola, R. Petronzio, D. Rossetti, A. Salamon, N. Tantalo, P. Vicini

    Abstract: We present the current status of APENet, our custom 3-dimensional interconnect architecture for PC clusters environment. We report some micro-benchmarks on our recent large installation as well as new developments on the software and hardware side. The low level device driver has been reworked by following a custom hardware RDMA architecture, and MPICH-VMI, an implementation of the MPI library,… ▽ More

    Submitted 26 September, 2005; originally announced September 2005.

    Comments: 6 pages, 5 figures, poster presented at Lattice 2005 (Algorithms and Machines), Dublin, July 25-30

    Journal ref: PoS LAT2005 (2005) 100

  15. APENet: LQCD clusters a la APE

    Authors: R. Ammendola, M. Guagnelli, G. Mazza, F. Palombi, R. Petronzio, D. Rossetti, A. Salamon, P. Vicini

    Abstract: Developed by the APE group, APENet is a new high speed, low latency, 3-dimensional interconnect architecture optimized for PC clusters running LQCD-like numerical applications. The hardware implementation is based on a single PCI-X 133MHz network interface card hosting six indipendent bi-directional channels with a peak bandwidth of 676 MB/s each direction. We discuss preliminary benchmark resul… ▽ More

    Submitted 14 September, 2004; originally announced September 2004.

    Comments: Lattice2004(machines), 3 pages, 4 figures

  16. arXiv:hep-lat/0309007  [pdf, ps, other

    hep-lat

    apeNEXT: A multi-TFlops Computer for Simulations in Lattice Gauge Theory

    Authors: F. Bodin, Ph. Boucaud, N. Cabibbo, F. Di Carlo, R. De Pietri, F. Di Renzo, H. Kaldass, A. Lonardo, M. Lukyanov, S. De Luca, J. Micheli, V. Morenas, O. Pene, D. Pleiter, N. Paschedag, F. Rapuano, D. Rossetti, L. Sartori, F. Schifano, H. Simma, R. Tripiccione, P. Vicini

    Abstract: We present the APE (Array Processor Experiment) project for the development of dedicated parallel computers for numerical simulations in lattice gauge theories. While APEmille is a production machine in today's physics simulations at various sites in Europe, a new machine, apeNEXT, is currently being developed to provide multi-Tflops computing performance. Like previous APE machines, the new sup… ▽ More

    Submitted 8 October, 2003; v1 submitted 2 September, 2003; originally announced September 2003.

    Comments: Poster at the XXIII Physics in Collisions Conference (PIC03), Zeuthen, Germany, June 2003, 3 pages, Latex. PSN FRAP15. Replaced for adding forgotten author

    Journal ref: ECONF C030626:FRAP15,2003

  17. arXiv:hep-lat/0306018  [pdf, ps, other

    hep-lat

    The apeNEXT project (Status report)

    Authors: F. Bodin, Ph. Boucaud, J. Micheli, O. Pene, N. Cabibbo, F. Di Carlo, A. Lonardo, S. de Luca, F. Rapuano, D. Rossetti, P. Vicini, R. De Pietri, F. Di Renzo, H. Kaldass, N. Paschedag, H. Simma, V. Morenas, D. Pleiter, L. Sartori, F. Schifano, R. Tripiccione

    Abstract: We present the current status of the apeNEXT project. Aim of this project is the development of the next generation of APE machines which will provide multi-teraflop computing power. Like previous machines, apeNEXT is based on a custom designed processor, which is specifically optimized for simulating QCD. We discuss the machine design, report on benchmarks, and give an overview on the status of… ▽ More

    Submitted 4 September, 2003; v1 submitted 13 June, 2003; originally announced June 2003.

    Comments: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 8 pages, LaTeX, 12 eps figures. PSN THIT005

    Journal ref: ECONF C0303241:THIT005,2003

  18. Status of the apeNEXT project

    Authors: R. Ammendola, F. Bodin, Ph. Boucaud, N. Cabibbo, F. Di Carlo, R. De Pietri, F. Di Renzo, W. Errico, A. Fucci, M. Guagnelli, H. Kaldass, A. Lonardo, S. de Luca, J. Micheli, V. Morenas, O. Pene, R. Petronzio, F. Palombi, D. Pleiter, N. Paschedag, F. Rapuano, P. De Riso, D. Rossetti, A. Salamon, G. Salina , et al. (5 additional authors not shown)

    Abstract: We present the current status of the apeNEXT project. Aim of this project is the development of the next generation of APE machines which will provide multi-teraflop computing power. Like previous machines, apeNEXT is based on a custom designed processor, which is specifically optimized for simulating QCD. We discuss the machine design, report on benchmarks, and give an overview on the status of… ▽ More

    Submitted 8 October, 2003; v1 submitted 15 November, 2002; originally announced November 2002.

    Comments: 3 pages, Lattice2002(machines). Replaced for adding forgotten author

    Journal ref: Nucl.Phys.Proc.Suppl.119:1038-1040,2003

  19. The APENEXT project

    Authors: F. Bodin, P. Boucaud, N. Cabibbo, F. Calvayrac, M. Della Morte, R. De Pietri, P. De Riso, F. Di Carlo, F. Di Renzo, W. Errico, R. Frezzotti, U. Gensch, T. Giorgino, M. Guagnelli, N. Herve, H. Kaldass, A. Lonardo, M. Lukyanov, G. Magazzu, J. Micheli, V. Morenas, L. Mori, F. Palombi, N. Paschedag, O. Pene , et al. (9 additional authors not shown)

    Abstract: APENEXT is a new generation APE processor, optimized for LGT simulations. The project follows the basic ideas of previous APE machines and develops simple and cheap parallel systems with multi T-Flops processing power. This paper describes the main features of this new development.

    Submitted 25 October, 2001; originally announced October 2001.

    Comments: Lattice2001(plenary/machinestatus), 4 pages, 1 eps figure

    Journal ref: Nucl.Phys.Proc.Suppl.106:173-176,2002

  20. Status of APEmille

    Authors: APE-Collaboration, :, A. Bartoloni, P. Boucaud, N. Cabibbo, F. Calvayrac, M. Della Morte, R. De Pietri, P. De Riso, F. Di Carlo, F. Di Renzo, W. Errico, R. Frezzotti, T. Giorgino, J. Heitger, A. Lonardo, M. Loukianov, G. Magazzu, J. Micheli, V. Morenas, N. Paschedag, O. Pene, R. Petronzio, D. Pleiter, F. Rapuano , et al. (9 additional authors not shown)

    Abstract: This paper presents the status of the APEmille project, which is essentially completed, as far as machine development and construction is concerned. Several large installations of APEmille are in use for physics production runs leading to many new results presented at this conference. This paper briefly summarizes the APEmille architecture, reviews the status of the installations and presents so… ▽ More

    Submitted 17 October, 2001; originally announced October 2001.

    Comments: Lattice2001(algorithms), 3 pages, 1 eps figure

    Journal ref: Nucl.Phys.Proc.Suppl.106:1043-1045,2002

  21. Progress and status of APEmille

    Authors: APE collaboration, A. Bartoloni, S. Cabasino, N. Cabibbo, M. Cosimi, P. De Riso, W. Errico, S. Giovannetti, F. Laico, H. Leich, A. Lonardo, G. Magazzu, A. Michelotti, E. Panizzi, P. S. Paolucci, D. Rossetti, U. Schwendicke, H. Simma, K. H. Sulanke, M. Torelli, R. Tripiccione, P. Vicini

    Abstract: We report on the progress and status of the APEmille project: a SIMD parallel computer with a peak performance in the TeraFlops range which is now in an advanced development phase. We discuss the hardware and software architecture, and present some performance estimates for Lattice Gauge Theory (LGT) applications.

    Submitted 1 October, 1997; originally announced October 1997.

    Comments: Talk presented at LATTICE97, 3 pages, Latex

    Journal ref: Nucl.Phys.Proc.Suppl. 63 (1998) 991-993

  22. arXiv:cond-mat/9708025  [pdf, ps, other

    cond-mat.dis-nn cond-mat.stat-mech

    Numerical Simulations of the Dynamical Behavior of the SK Model

    Authors: Enzo Marinari, Giorgio Parisi, Davide Rossetti

    Abstract: We study the dynamical behavior of the Sherrington Kirkpatrick model. Thanks to the APE supercomputer we are able to analyze large lattice volumes, and to investigate the low $T$ region. We present a determination of the remnant magnetization and of its time decay exponent, of the energy time decay exponent, and we discuss aging phenomena in the model.

    Submitted 8 December, 1997; v1 submitted 4 August, 1997; originally announced August 1997.

    Comments: 11 pages including 8 figures. Revised version with major restructuring