Skip to main content

Showing 1–50 of 68 results for author: Wuerthwein, F

.
  1. Adoption of a token-based authentication model for the CMS Submission Infrastructure

    Authors: Antonio Perez-Calero Yzquierdo, Marco Mascheroni, Edita Kizinevic, Farrukh Aftab Khan, Hyunwoo Kim, Maria Acosta Flechas, Nikos Tsipinakis, Saqib Haleem, Frank Wurthwein

    Abstract: The CMS Submission Infrastructure (SI) is the main computing resource provisioning system for CMS workloads. A number of HTCondor pools are employed to manage this infrastructure, which aggregates geographically distributed resources from the WLCG and other providers. Historically, the model of authentication among the diverse components of this infrastructure has relied on the Grid Security Infra… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS - 2023

  2. Repurposing of the Run 2 CMS High Level Trigger Infrastructure as a Cloud Resource for Offline Computing

    Authors: Marco Mascheroni, Antonio Perez-Calero Yzquierdo, Edita Kizinevic, Farrukh Aftab Khan, Hyunwoo Kim, Maria Acosta Flechas, Nikos Tsipinakis, Saqib Haleem, Damiele Spiga, Christoph Wissing, Frank Wurthwein

    Abstract: The former CMS Run 2 High Level Trigger (HLT) farm is one of the largest contributors to CMS compute resources, providing about 25k job slots for offline computing. This CPU farm was initially employed as an opportunistic resource, exploited during inter-fill periods, in the LHC Run 2. Since then, it has become a nearly transparent extension of the CMS capacity at CERN, being located on-site at th… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS - 2023

  3. arXiv:2402.05244  [pdf, ps, other

    cs.DC

    CRIU -- Checkpoint Restore in Userspace for computational simulations and scientific applications

    Authors: Fabio Andrijauskas, Igor Sfiligoi, Diego Davila, Aashay Arora, Jonathan Guiang, Brian Bockelman, Greg Thain, Frank Wurthwein

    Abstract: Creating new materials, discovering new drugs, and simulating systems are essential processes for research and innovation and require substantial computational power. While many applications can be split into many smaller independent tasks, some cannot and may take hours or weeks to run to completion. To better manage those longer-running jobs, it would be desirable to stop them at any arbitrary p… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS - 2023

  4. arXiv:2312.12589  [pdf, other

    cs.NI

    400Gbps benchmark of XRootD HTTP-TPC

    Authors: Aashay Arora, Jonathan Guiang, Diego Davila, Frank Würthwein, Justas Balcas, Harvey Newman

    Abstract: Due to the increased demand of network traffic expected during the HL-LHC era, the T2 sites in the USA will be required to have 400Gbps of available bandwidth to their storage solution. With the above in mind we are pursuing a scale test of XRootD software when used to perform Third Party Copy transfers using the HTTP protocol. Our main objective is to understand the possible limitations in the so… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 8 pages, 4 figures, submitted to CHEP'23

  5. arXiv:2308.11733  [pdf

    cs.DC

    Demand-driven provisioning of Kubernetes-like resources in OSG

    Authors: Igor Sfiligoi, Frank Würthwein, Jeff Dost, Brian Lin, David Schultz

    Abstract: The OSG-operated Open Science Pool is an HTCondor-based virtual cluster that aggregates resources from compute clusters provided by several organizations. Most of the resources are not owned by OSG, so demand-based dynamic provisioning is important for maximizing usage without incurring excessive waste. OSG has long relied on GlideinWMS for most of its resource provisioning needs but is limited to… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: 6 pages, 3 figures, Submitted to Proceedings of CHEP23

  6. arXiv:2308.07999  [pdf

    physics.comp-ph astro-ph.IM cs.PF

    IceCube experience using XRootD-based Origins with GPU workflows in PNRP

    Authors: David Schultz, Igor Sfiligoi, Benedikt Riedel, Fabio Andrijauskas, Derek Weitzel, Frank Würthwein

    Abstract: The IceCube Neutrino Observatory is a cubic kilometer neutrino telescope located at the geographic South Pole. Understanding detector systematic effects is a continuous process. This requires the Monte Carlo simulation to be updated periodically to quantify potential changes and improvements in science results with more detailed modeling of the systematic effects. IceCube's largest systematic effe… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: 7 pages, 3 figures, 1 table, To be published in Proceedings of CHEP23

  7. arXiv:2308.03678  [pdf

    cs.PF astro-ph.IM

    Evaluation of ARM CPUs for IceCube available through Google Kubernetes Engine

    Authors: Igor Sfiligoi, David Schultz, Benedikt Riedel, Frank Würthwein

    Abstract: The IceCube experiment has substantial simulation needs and is in continuous search for the most cost-effective ways to satisfy them. The most CPU-intensive part relies on CORSIKA, a cosmic ray air shower simulation. Historically, IceCube relied exclusively on x86-based CPUs, like Intel Xeon and AMD EPYC, but recently server-class ARM-based CPUs are also becoming available, both on-prem and in the… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: 5 pages,3 tables, Submitted to proceedings of CHEP23

  8. Effectiveness and predictability of in-network storage cache for scientific workflows

    Authors: Caitlin Sim, Kesheng Wu, Alex Sim, Inder Monga, Chin Guok, Frank Wurthwein, Diego Davila, Harvey Newman, Justas Balcas

    Abstract: Large scientific collaborations often have multiple scientists accessing the same set of files while doing different analyses, which create repeated accesses to the large amounts of shared data located far away. These data accesses have long latency due to distance and occupy the limited bandwidth available over the wide-area network. To reduce the wide-area network traffic and the data access lat… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

  9. arXiv:2306.08106  [pdf, other

    hep-ex astro-ph.HE gr-qc

    Applications of Deep Learning to physics workflows

    Authors: Manan Agarwal, Jay Alameda, Jeroen Audenaert, Will Benoit, Damon Beveridge, Meghna Bhattacharya, Chayan Chatterjee, Deep Chatterjee, Andy Chen, Muhammed Saleem Cholayil, Chia-Jui Chou, Sunil Choudhary, Michael Coughlin, Maximilian Dax, Aman Desai, Andrea Di Luca, Javier Mauricio Duarte, Steven Farrell, Yongbin Feng, Pooyan Goodarzi, Ekaterina Govorkova, Matthew Graham, Jonathan Guiang, Alec Gunny, Weichangfeng Guo , et al. (43 additional authors not shown)

    Abstract: Modern large-scale physics experiments create datasets with sizes and streaming rates that can exceed those from industry leaders such as Google Cloud and Netflix. Fully processing these datasets requires both sufficient compute power and efficient workflows. Recent advances in Machine Learning (ML) and Artificial Intelligence (AI) can either improve or replace existing domain-specific algorithms… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

    Comments: Whitepaper resulting from Accelerating Physics with ML@MIT workshop in Jan/Feb 2023

  10. Defining a canonical unit for accounting purposes

    Authors: Fabio Andrijauskas, Igor Sfiligoi, Frank Würthwein

    Abstract: Compute resource providers often put in place batch compute systems to maximize the utilization of such resources. However, compute nodes in such clusters, both physical and logical, contain several complementary resources, with notable examples being CPUs, GPUs, memory and ephemeral storage. User jobs will typically require more than one such resource, resulting in co-scheduling trade-offs of par… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: 6 pages, 2 figures, To be published in proceedings of PEARC23

    Journal ref: Practice and Experience in Advanced Research Computing (PEARC '23). Association for Computing Machinery, New York, NY, USA, 288-291. (2023)

  11. Testing GitHub projects on custom resources using unprivileged Kubernetes runners

    Authors: Igor Sfiligoi, Daniel McDonald, Rob Knight, Frank Würthwein

    Abstract: GitHub is a popular repository for hosting software projects, both due to ease of use and the seamless integration with its testing environment. Native GitHub Actions make it easy for software developers to validate new commits and have confidence that new code does not introduce major bugs. The freely available test environments are limited to only a few popular setups but can be extended with cu… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: 5 pages, 1 figure, To be published in proceedings of PEARC23

    Journal ref: Practice and Experience in Advanced Research Computing (PEARC '23). Association for Computing Machinery, New York, NY, USA, 332-335. (2023)

  12. Analyzing Transatlantic Network Traffic over Scientific Data Caches

    Authors: Z. Deng, A. Sim, K. Wu, C. Guok, D. Hazen, I. Monga, F. Andrijauskas, F. Wuerthwein, D. Weitzel

    Abstract: Large scientific collaborations often share huge volumes of data around the world. Consequently a significant amount of network bandwidth is needed for data replication and data access. Users in the same region may possibly share resources as well as data, especially when they are working on related topics with similar datasets. In this work, we study the network traffic patterns and resource util… ▽ More

    Submitted 17 July, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

  13. arXiv:2210.05822  [pdf, other

    hep-ex hep-lat hep-ph hep-th

    The Future of High Energy Physics Software and Computing

    Authors: V. Daniel Elvira, Steven Gottlieb, Oliver Gutsche, Benjamin Nachman, S. Bailey, W. Bhimji, P. Boyle, G. Cerati, M. Carrasco Kind, K. Cranmer, G. Davies, V. D. Elvira, R. Gardner, K. Heitmann, M. Hildreth, W. Hopkins, T. Humble, M. Lin, P. Onyisi, J. Qiang, K. Pedro, G. Perdue, A. Roberts, M. Savage, P. Shanahan , et al. (3 additional authors not shown)

    Abstract: Software and Computing (S&C) are essential to all High Energy Physics (HEP) experiments and many theoretical studies. The size and complexity of S&C are now commensurate with that of experimental instruments, playing a critical role in experimental design, data acquisition/instrumental control, reconstruction, and analysis. Furthermore, S&C often plays a leading role in driving the precision of th… ▽ More

    Submitted 8 November, 2022; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: Computational Frontier Report Contribution to Snowmass 2021; 41 pages, 1 figure. v2: missing ref and added missing topical group conveners. v3: fixed typos

  14. Managed Network Services for Exascale Data Movement Across Large Global Scientific Collaborations

    Authors: Frank Würthwein, Jonathan Guiang, Aashay Arora, Diego Davila, John Graham, Dima Mishin, Thomas Hutton, Igor Sfiligoi, Harvey Newman, Justas Balcas, Tom Lehman, Xi Yang, Chin Guok

    Abstract: Unique scientific instruments designed and operated by large global collaborations are expected to produce Exabyte-scale data volumes per year by 2030. These collaborations depend on globally distributed storage and compute to turn raw data into science. While all of these infrastructures have batch scheduling capabilities to share compute, Research and Education networks lack those capabilities.… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: Submitted to the proceedings of the XLOOP workshop held in conjunction with Supercomputing 22

  15. arXiv:2209.08868  [pdf, other

    physics.comp-ph cs.DC hep-ex hep-lat hep-th

    Snowmass 2021 Computational Frontier CompF4 Topical Group Report: Storage and Processing Resource Access

    Authors: W. Bhimji, D. Carder, E. Dart, J. Duarte, I. Fisk, R. Gardner, C. Guok, B. Jayatilaka, T. Lehman, M. Lin, C. Maltzahn, S. McKee, M. S. Neubauer, O. Rind, O. Shadura, N. V. Tran, P. van Gemmeren, G. Watts, B. A. Weaver, F. Würthwein

    Abstract: Computing plays a significant role in all areas of high energy physics. The Snowmass 2021 CompF4 topical group's scope is facilities R&D, where we consider "facilities" as the computing hardware and software infrastructure inside the data centers plus the networking between data centers, irrespective of who owns them, and what policies are applied for using them. In other words, it includes commer… ▽ More

    Submitted 29 September, 2022; v1 submitted 19 September, 2022; originally announced September 2022.

    Comments: Snowmass 2021 Computational Frontier CompF4 topical group report. v2: Expanded introduction. Updated author list. 52 pages, 6 figures

  16. arXiv:2205.09682  [pdf

    cs.DC physics.plasm-ph

    Comparing single-node and multi-node performance of an important fusion HPC code benchmark

    Authors: Emily A. Belli, Jeff Candy, Igor Sfiligoi, Frank Würthwein

    Abstract: Fusion simulations have traditionally required the use of leadership scale High Performance Computing (HPC) resources in order to produce advances in physics. The impressive improvements in compute and memory capacity of many-GPU compute nodes are now allowing for some problems that once required a multi-node setup to be also solvable on a single node. When possible, the increased interconnect ban… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

    Comments: 6 pages, 1 table, 1 figure, to be published in proceedings of PEARC22

    Journal ref: PEARC '22: Practice and Experience in Advanced Research Computing (2022) 10 1-4

  17. The anachronism of whole-GPU accounting

    Authors: Igor Sfiligoi, David Schultz, Frank Würthwein, Benedikt Riedel, Dmitry Y. Mishin

    Abstract: NVIDIA has been making steady progress in increasing the compute performance of its GPUs, resulting in order of magnitude compute throughput improvements over the years. With several models of GPUs coexisting in many deployments, the traditional accounting method of treating all GPUs as being equal is not reflecting compute output anymore. Moreover, for applications that require significant CPU-ba… ▽ More

    Submitted 18 May, 2022; originally announced May 2022.

    Comments: 6 pages, 2 tables, 1 figure, to be published in proceedings of PEARC22

    Journal ref: PEARC '22: Practice and Experience in Advanced Research Computing (2022) 58 1-5

  18. arXiv:2205.05598  [pdf, other

    cs.DC cs.NI eess.SY

    Studying Scientific Data Lifecycle in On-demand Distributed Storage Caches

    Authors: Julian Bellavita, Alex Sim, Kesheng Wu, Inder Monga, Chin Guok, Frank Würthwein, Diego Davila

    Abstract: The XRootD system is used to transfer, store, and cache large datasets from high-energy physics (HEP). In this study we focus on its capability as distributed on-demand storage cache. Through exploring a large set of daily log files between 2020 and 2021, we seek to understand the data access patterns that might inform future cache design. Our study begins with a set of summary statistics regardin… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

  19. arXiv:2205.05563  [pdf, other

    cs.NI cs.DC cs.LG cs.PF

    Access Trends of In-network Cache for Scientific Data

    Authors: Ruize Han, Alex Sim, Kesheng Wu, Inder Monga, Chin Guok, Frank Würthwein, Diego Davila, Justas Balcas, Harvey Newman

    Abstract: Scientific collaborations are increasingly relying on large volumes of data for their work and many of them employ tiered systems to replicate the data to their worldwide user communities. Each user in the community often selects a different subset of data for their analysis tasks; however, members of a research group often are working on related research topics that require similar data objects.… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

  20. Auto-scaling HTCondor pools using Kubernetes compute resources

    Authors: Igor Sfiligoi, Thomas DeFanti, Frank Würthwein

    Abstract: HTCondor has been very successful in managing globally distributed, pleasantly parallel scientific workloads, especially as part of the Open Science Grid. HTCondor system design makes it ideal for integrating compute resources provisioned from anywhere, but it has very limited native support for autonomously provisioning resources managed by other solutions. This work presents a solution that allo… ▽ More

    Submitted 2 May, 2022; originally announced May 2022.

    Comments: 6 pages, 3 figures, to be published in proceedings of PEARC22

    Journal ref: PEARC '22: Practice and Experience in Advanced Research Computing (2022) 57 1-4

  21. arXiv:2203.08280  [pdf

    cs.NI

    Data Transfer and Network Services management for Domain Science Workflows

    Authors: Tom Lehman, Xi Yang, Chin Guok, Frank Wuerthwein, Igor Sfiligoi, John Graham, Aashay Arora, Dima Mishin, Diego Davila, Jonathan Guiang, Tom Hutton, Harvey Newman, Justas Balcas

    Abstract: This paper describes a vision and work in progress to elevate network resources and data transfer management to the same level as compute and storage in the context of services access, scheduling, life cycle management, and orchestration. While domain science workflows often include active compute resource allocation and management, the data transfers and associated network resource coordination i… ▽ More

    Submitted 20 March, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

    Comments: contribution to Snowmass 2022

  22. Data intensive physics analysis in Azure cloud

    Authors: Igor Sfiligoi, Frank Würthwein, Diego Davila

    Abstract: The Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) is one of the largest data producers in the scientific world, with standard data products centrally produced, and then used by often competing teams within the collaboration. This work is focused on how a local institution, University of California San Diego (UCSD), partnered with the Open Science Grid (OSG) to use Azure… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

    Comments: 11 pages, 5 figures, to be published in proceedings of ICOCBI 2021

    Journal ref: Lecture Notes on Data Engineering and Communications Technologies, vol 117. Springer, Singapore. 2022

  23. Expanding IceCube GPU computing into the Clouds

    Authors: Igor Sfiligoi, Shava Smallen, Frank Würthwein, Nicole Wolter, David Schultz, Benedikt Riedel

    Abstract: The IceCube collaboration relies on GPU compute for many of its needs, including ray tracing simulation and machine learning activities. GPUs are however still a relatively scarce commodity in the scientific resource provider community, so we expanded the available resource pool with GPUs provisioned from the commercial Cloud providers. The provisioned resources were fully integrated into the norm… ▽ More

    Submitted 8 July, 2021; originally announced July 2021.

    Comments: 2 pages, 2 figures, to be published in proceedings of eScience 2021

    Journal ref: 2021 IEEE 17th International Conference on eScience (eScience), 2021, pp. 227-228

  24. HTCondor data movement at 100 Gbps

    Authors: Igor Sfiligoi, Frank Würthwein, Thomas DeFanti, John Graham

    Abstract: HTCondor is a major workload management system used in distributed high throughput computing (dHTC) environments, e.g., the Open Science Grid. One of the distinguishing features of HTCondor is the native support for data movement, allowing it to operate without a shared filesystem. Coupling data handling and compute scheduling is both convenient for users and allows for significant infrastructure… ▽ More

    Submitted 8 July, 2021; originally announced July 2021.

    Comments: 2 pages, 2 figures, to be published in proceedings of eScience 2021

    Journal ref: 2021 IEEE 17th International Conference on eScience (eScience), 2021, pp. 239-240

  25. Analyzing scientific data sharing patterns for in-network data caching

    Authors: Elizabeth Copps, Huiyi Zhang, Alex Sim, Kesheng Wu, Inder Monga, Chin Guok, Frank Würthwein, Diego Davila, Edgar Fajardo

    Abstract: The volume of data moving through a network increases with new scientific experiments and simulations. Network bandwidth requirements also increase proportionally to deliver data within a certain time frame. We observe that a significant portion of the popular dataset is transferred multiple times to different users as well as to the same user for various reasons. In-network data caching for the s… ▽ More

    Submitted 3 May, 2021; originally announced May 2021.

  26. Managing Cloud networking costs for data-intensive applications by provisioning dedicated network links

    Authors: Igor Sfiligoi, Michael Hare, David Schultz, Frank Würthwein, Benedikt Riedel, Tom Hutton, Steve Barnet, Vladimir Brik

    Abstract: Many scientific high-throughput applications can benefit from the elastic nature of Cloud resources, especially when there is a need to reduce time to completion. Cost considerations are usually a major issue in such endeavors, with networking often a major component; for data-intensive applications, egress networking costs can exceed the compute costs. Dedicated network links provide a way to low… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

    Comments: 8 pages, 7 figures, 4 tables, to be published in proceedings of PEARC21

  27. Systematic benchmarking of HTTPS third party copy on 100Gbps links using XRootD

    Authors: Edgar Fajardo, Aashay Arora, Diego Davila, Richard Gao, Frank Würthwein, Brian Bockelman

    Abstract: The High Luminosity Large Hadron Collider provides a data challenge. The amount of data recorded from the experiments and transported to hundreds of sites will see a thirty fold increase in annual data volume. A systematic approach to contrast the performance of different Third Party Copy(TPC) transfer protocols arises. Two contenders, XRootD-HTTPS and the GridFTP are evaluated in their performanc… ▽ More

    Submitted 22 March, 2021; originally announced March 2021.

    Comments: 7 pages, 8 figures

  28. arXiv:2101.11489  [pdf, other

    hep-ex cs.DC

    Parallelizing the Unpacking and Clustering of Detector Data for Reconstruction of Charged Particle Tracks on Multi-core CPUs and Many-core GPUs

    Authors: Giuseppe Cerati, Peter Elmer, Brian Gravelle, Matti Kortelainen, Vyacheslav Krutelyov, Steven Lantz, Mario Masciovecchio, Kevin McDermott, Boyana Norris, Allison Reinsvold Hall, Micheal Reid, Daniel Riley, Matevž Tadel, Peter Wittich, Bei Wang, Frank Würthwein, Avraham Yagil

    Abstract: We present results from parallelizing the unpacking and clustering steps of the raw data from the silicon strip modules for reconstruction of charged particle tracks. Throughput is further improved by concurrently processing multiple events using nested OpenMP parallelism on CPU or CUDA streams on GPU. The new implementation along with earlier work in develo** a parallelized and vectorized imple… ▽ More

    Submitted 27 January, 2021; originally announced January 2021.

  29. arXiv:2011.14995  [pdf, other

    cs.DC physics.comp-ph

    Adapting LIGO workflows to run in the Open Science Grid

    Authors: Edgar Fajardo, Frank Wuerthwein, Brian Bockelman, Miron Livny, Greg Thain, James Alexander Clark, Peter Couvares, Josh Willis

    Abstract: During the first observation run the LIGO collaboration needed to offload some of its most, intense CPU workflows from its dedicated computing sites to opportunistic resources. Open Science Grid enabled LIGO to run PyCbC, RIFT and Bayeswave workflows to seamlessly run in a combination of owned and opportunistic resources. One of the challenges is enabling the workflows to use several heterogeneous… ▽ More

    Submitted 30 November, 2020; originally announced November 2020.

  30. Creating a content delivery network for general science on the internet backbone using XCaches

    Authors: Edgar Fajardo, Marian Zvada, Derek Weitzel, Mats Rynge, John Hicks, Mat Selmeci, Brian Lin, Pascal Paschos, Brian Bockelman, Igor Sfiligoi, Andrew Hanushevsky, Frank Würthwein

    Abstract: A general problem faced by computing on the grid for opportunistic users is that delivering cycles is simpler than delivering data to those cycles. In this project we show how we integrated XRootD caches placed on the internet backbone to implement a content delivery network for general science workflows. We will show that for some workflows on different science domains like high energy physics, g… ▽ More

    Submitted 28 September, 2020; v1 submitted 2 July, 2020; originally announced July 2020.

  31. arXiv:2006.00071  [pdf, other

    physics.ins-det hep-ex

    Speeding up Particle Track Reconstruction using a Parallel Kalman Filter Algorithm

    Authors: Steven Lantz, Kevin McDermott, Michael Reid, Daniel Riley, Peter Wittich, Sophie Berkman, Giuseppe Cerati, Matti Kortelainen, Allison Reinsvold Hall, Peter Elmer, Bei Wang, Leonardo Giannini, Vyacheslav Krutelyov, Mario Masciovecchio, Matevž Tadel, Frank Würthwein, Avraham Yagil, Brian Gravelle, Boyana Norris

    Abstract: One of the most computationally challenging problems expected for the High-Luminosity Large Hadron Collider (HL-LHC) is determining the trajectory of charged particles during event reconstruction. Algorithms used at the LHC today rely on Kalman filtering, which builds physical trajectories incrementally while incorporating material effects and error estimation. Recognizing the need for faster comp… ▽ More

    Submitted 10 July, 2020; v1 submitted 29 May, 2020; originally announced June 2020.

  32. The Scalable Systems Laboratory: a Platform for Software Innovation for HEP

    Authors: Robert Gardner, Lincoln Bryant, Mark Neubauer, Frank Wuerthwein, Judith Stephen, Andrew Chien

    Abstract: The Scalable Systems Laboratory (SSL), part of the IRIS-HEP Software Institute, provides Institute participants and HEP software developers generally with a means to transition their R&D from conceptual toys to testbeds to production-scale prototypes. The SSL enables tooling, infrastructure, and services supporting the innovation of novel analysis and data architectures, development of software el… ▽ More

    Submitted 13 May, 2020; originally announced May 2020.

  33. Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scientific Computing

    Authors: I. Sfiligoi, D. Schultz, B. Riedel, F. Wuerthwein, S. Barnet, V. Brik

    Abstract: Scientific computing needs are growing dramatically with time and are expanding in science domains that were previously not compute intensive. When compute workflows spike well in excess of the capacity of their local compute resource, capacity should be temporarily provisioned from somewhere else to both meet deadlines and to increase scientific output. Public Clouds have become an attractive opt… ▽ More

    Submitted 18 April, 2020; originally announced April 2020.

    Comments: 5 pages, 7 figures, to be published in proceedings of PEARC'20. arXiv admin note: text overlap with arXiv:2002.06667

  34. Moving the California distributed CMS xcache from bare metal into containers using Kubernetes

    Authors: Edgar Fajardo, Matevz Tadel, Justas Balcas, Alja Tadel, Frank Wuerthwein, Diego Davila, Jonathan Guiang, Igor Sfiligoi

    Abstract: The University of California system has excellent networking between all of its campuses as well as a number of other Universities in CA, including Caltech, most of them being connected at 100 Gbps. UCSD and Caltech have thus joined their disk systems into a single logical xcache system, with worker nodes from both sites accessing data from disks at either site. This setup has been in place for a… ▽ More

    Submitted 4 March, 2020; originally announced March 2020.

  35. Running a Pre-Exascale, Geographically Distributed, Multi-Cloud Scientific Simulation

    Authors: Igor Sfiligoi, Frank Wuerthwein, Benedikt Riedel, David Schultz

    Abstract: As we approach the Exascale era, it is important to verify that the existing frameworks and tools will still work at that scale. Moreover, public Cloud computing has been emerging as a viable solution for both prototy** and urgent computing. Using the elasticity of the Cloud, we have thus put in place a pre-exascale HTCondor setup for running a scientific simulation in the Cloud, with the chosen… ▽ More

    Submitted 16 February, 2020; originally announced February 2020.

    Comments: 18 pages, 5 figures, 4 tables, to be published in Proceedings of ISC High Performance 2020

    Journal ref: Lecture Notes in Computer Science, vol 12151, year 2020. Springer

  36. arXiv:2002.06295  [pdf, other

    physics.ins-det hep-ex

    Reconstruction of Charged Particle Tracks in Realistic Detector Geometry Using a Vectorized and Parallelized Kalman Filter Algorithm

    Authors: Giuseppe Cerati, Peter Elmer, Brian Gravelle, Matti Kortelainen, Vyacheslav Krutelyov, Steven Lantz, Mario Masciovecchio, Kevin McDermott, Boyana Norris, Allison Reinsvold Hall, Michael Reid, Daniel Riley, Matevž Tadel, Peter Wittich, Bei Wang, Frank Würthwein, Avraham Yagil

    Abstract: One of the most computationally challenging problems expected for the High-Luminosity Large Hadron Collider (HL-LHC) is finding and fitting particle tracks during event reconstruction. Algorithms used at the LHC today rely on Kalman filtering, which builds physical trajectories incrementally while incorporating material effects and error estimation. Recognizing the need for faster computational th… ▽ More

    Submitted 9 July, 2020; v1 submitted 14 February, 2020; originally announced February 2020.

    Report number: FERMILAB-CONF-20-075-SCD

  37. Characterizing network paths in and out of the clouds

    Authors: Igor Sfiligoi, John Graham, Frank Wuerthwein

    Abstract: Commercial Cloud computing is becoming mainstream, with funding agencies moving beyond prototy** and starting to fund production campaigns, too. An important aspect of any scientific computing production campaign is data movement, both incoming and outgoing. And while the performance and cost of VMs is relatively well understood, the network performance and cost is not. This paper provides a cha… ▽ More

    Submitted 11 February, 2020; originally announced February 2020.

    Comments: 7 pages, 1 figure, 5 tables, to be published in CHEP19 proceedings

    Journal ref: EPJ Web of Conferences 245, 07059 (2020)

  38. arXiv:1906.11744  [pdf, other

    physics.ins-det hep-ex physics.comp-ph

    Speeding up Particle Track Reconstruction in the CMS Detector using a Vectorized and Parallelized Kalman Filter Algorithm

    Authors: Giuseppe Cerati, Peter Elmer, Brian Gravelle, Matti Kortelainen, Vyacheslav Krutelyov, Steven Lantz, Mario Masciovecchio, Kevin McDermott, Boyana Norris, Michael Reid, Allison Reinsvold Hall, Daniel Riley, Matevž Tadel, Peter Wittich, Frank Würthwein, Avi Yagil

    Abstract: Building particle tracks is the most computationally intense step of event reconstruction at the LHC. With the increased instantaneous luminosity and associated increase in pileup expected from the High-Luminosity LHC, the computational challenge of track finding and fitting requires novel solutions. The current track reconstruction algorithms used at the LHC are based on Kalman filter methods tha… ▽ More

    Submitted 6 November, 2019; v1 submitted 27 June, 2019; originally announced June 2019.

    Comments: Submitted to proceedings of the 2019 Connecting the Dots and Workshop on Intelligent Trackers (CTD/WIT 2019); 6 pages, 4 figures

  39. arXiv:1906.02253  [pdf, other

    physics.ins-det hep-ex physics.comp-ph

    Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Architectures with the CMS Detector

    Authors: Giuseppe Cerati, Peter Elmer, Brian Gravelle, Matti Kortelainen, Vyacheslav Krutelyov, Steven Lantz, Mario Masciovecchio, Kevin McDermott, Boyana Norris, Allison Reinsvold Hall, Daniel Riley, Matevž Tadel, Peter Wittich, Frank Würthwein, Avi Yagil

    Abstract: In the High-Luminosity Large Hadron Collider (HL-LHC), one of the most challenging computational problems is expected to be finding and fitting charged-particle tracks during event reconstruction. The methods currently in use at the LHC are based on the Kalman filter. Such methods have shown to be robust and to provide good physics performance, both in the trigger and offline. In order to improve… ▽ More

    Submitted 5 June, 2019; originally announced June 2019.

    Comments: Submitted to proceedings of 19th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2019); 6 pages, 5 figures

  40. arXiv:1812.00761  [pdf, ps, other

    physics.comp-ph

    HEP Software Foundation Community White Paper Working Group -- Data Organization, Management and Access (DOMA)

    Authors: Dario Berzano, Riccardo Maria Bianchi, Ian Bird, Brian Bockelman, Simone Campana, Kaushik De, Dirk Duellmann, Peter Elmer, Robert Gardner, Vincent Garonne, Claudio Grandi, Oliver Gutsche, Andrew Hanushevsky, Burt Holzman, Bodhitha Jayatilaka, Ivo Jimenez, Michel Jouvin, Oliver Keeble, Alexei Klimentov, Valentin Kuznetsov, Eric Lancon, Mario Lassnig, Miron Livny, Carlos Maltzahn, Shawn McKee , et al. (13 additional authors not shown)

    Abstract: Without significant changes to data organization, management, and access (DOMA), HEP experiments will find scientific output limited by how fast data can be accessed and digested by computational resources. In this white paper we discuss challenges in DOMA that HEP experiments, such as the HL-LHC, will face as well as potential ways to address them. A research and development timeline to assess th… ▽ More

    Submitted 30 November, 2018; originally announced December 2018.

    Comments: arXiv admin note: text overlap with arXiv:1712.06592

    Report number: HSF-CWP-2017-04

  41. Parallelized and Vectorized Tracking Using Kalman Filters with CMS Detector Geometry and Events

    Authors: Giuseppe Cerati, Peter Elmer, Brian Gravelle, Matti Kortelainen, Vyacheslav Krutelyov, Steven Lantz, Matthieu Lefebvre, Mario Masciovecchio, Kevin McDermott, Boyana Norris, Allison Reinsvold Hall, Daniel Riley, Matevz Tadel, Peter Wittich, Frank Wuerthwein, Avi Yagil

    Abstract: The High-Luminosity Large Hadron Collider at CERN will be characterized by greater pileup of events and higher occupancy, making the track reconstruction even more computationally demanding. Existing algorithms at the LHC are based on Kalman filter techniques with proven excellent physics performance under a variety of conditions. Starting in 2014, we have been develo** Kalman-filter-based metho… ▽ More

    Submitted 9 July, 2019; v1 submitted 9 November, 2018; originally announced November 2018.

  42. arXiv:1804.03983  [pdf, other

    physics.comp-ph hep-ex

    HEP Software Foundation Community White Paper Working Group - Data Analysis and Interpretation

    Authors: Lothar Bauerdick, Riccardo Maria Bianchi, Brian Bockelman, Nuno Castro, Kyle Cranmer, Peter Elmer, Robert Gardner, Maria Girone, Oliver Gutsche, Benedikt Hegner, José M. Hernández, Bodhitha Jayatilaka, David Lange, Mark S. Neubauer, Daniel S. Katz, Lukasz Kreczko, James Letts, Shawn McKee, Christoph Paus, Kevin Pedro, Jim Pivarski, Martin Ritter, Eduardo Rodrigues, Tai Sakuma, Elizabeth Sexton-Kennedy , et al. (4 additional authors not shown)

    Abstract: At the heart of experimental high energy physics (HEP) is the development of facilities and instrumentation that provide sensitivity to new phenomena. Our understanding of nature at its most fundamental level is advanced through the analysis and interpretation of data from sophisticated detectors in HEP experiments. The goal of data analysis systems is to realize the maximum possible scientific po… ▽ More

    Submitted 9 April, 2018; originally announced April 2018.

    Comments: arXiv admin note: text overlap with arXiv:1712.06592

    Report number: HSF-CWP-2017-05

  43. arXiv:1802.08640  [pdf, ps, other

    physics.comp-ph

    HEP Community White Paper on Software trigger and event reconstruction: Executive Summary

    Authors: Johannes Albrecht, Kenneth Bloom, Tommaso Boccali, Antonio Boveia, Michel De Cian, Caterina Doglioni, Agnieszka Dziurda, Amir Farbin, Conor Fitzpatrick, Frank Gaede, Simon George, Vladimir Gligorov, Hadrien Grasland, Lucia Grillo, Benedikt Hegner, William Kalderon, Sami Kama, Patrick Koppenburg, Slava Krutelyov, Rob Kutschke, Walter Lampl, David Lange, Ed Moyse, Andrew Norman, Marko Petric , et al. (17 additional authors not shown)

    Abstract: Realizing the physics programs of the planned and upgraded high-energy physics (HEP) experiments over the next 10 years will require the HEP community to address a number of challenges in the area of software and computing. For this reason, the HEP software community has engaged in a planning process over the past two years, with the objective of identifying and prioritizing the research and devel… ▽ More

    Submitted 23 February, 2018; originally announced February 2018.

    Comments: Editors: Vladimir Gligorov and David Lange

  44. arXiv:1802.08638  [pdf, ps, other

    physics.comp-ph

    HEP Community White Paper on Software trigger and event reconstruction

    Authors: Johannes Albrecht, Kenneth Bloom, Tommaso Boccali, Antonio Boveia, Michel De Cian, Caterina Doglioni, Agnieszka Dziurda, Amir Farbin, Conor Fitzpatrick, Frank Gaede, Simon George, Vladimir Gligorov, Hadrien Grasland, Lucia Grillo, Benedikt Hegner, William Kalderon, Sami Kama, Patrick Koppenburg, Slava Krutelyov, Rob Kutschke, Walter Lampl, David Lange, Ed Moyse, Andrew Norman, Marko Petric , et al. (17 additional authors not shown)

    Abstract: Realizing the physics programs of the planned and upgraded high-energy physics (HEP) experiments over the next 10 years will require the HEP community to address a number of challenges in the area of software and computing. For this reason, the HEP software community has engaged in a planning process over the past two years, with the objective of identifying and prioritizing the research and devel… ▽ More

    Submitted 23 February, 2018; originally announced February 2018.

    Comments: Editors Vladimir Vava Gligorov and David Lange

  45. arXiv:1712.06982  [pdf, other

    physics.comp-ph hep-ex

    A Roadmap for HEP Software and Computing R&D for the 2020s

    Authors: Johannes Albrecht, Antonio Augusto Alves Jr, Guilherme Amadio, Giuseppe Andronico, Nguyen Anh-Ky, Laurent Aphecetche, John Apostolakis, Makoto Asai, Luca Atzori, Marian Babik, Giuseppe Bagliesi, Marilena Bandieramonte, Sunanda Banerjee, Martin Barisits, Lothar A. T. Bauerdick, Stefano Belforte, Douglas Benjamin, Catrin Bernius, Wahid Bhimji, Riccardo Maria Bianchi, Ian Bird, Catherine Biscarat, Jakob Blomer, Kenneth Bloom, Tommaso Boccali , et al. (285 additional authors not shown)

    Abstract: Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for… ▽ More

    Submitted 19 December, 2018; v1 submitted 18 December, 2017; originally announced December 2017.

    Report number: HSF-CWP-2017-01

    Journal ref: Comput Softw Big Sci (2019) 3, 7

  46. arXiv:1711.06571  [pdf, other

    physics.comp-ph hep-ex physics.ins-det

    Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Architectures

    Authors: Giuseppe Cerati, Peter Elmer, Slava Krutelyov, Steven Lantz, Matthieu Lefebvre, Mario Masciovecchio, Kevin McDermott, Daniel Riley, Matevž Tadel, Peter Wittich, Frank Würthwein, Avi Yagil

    Abstract: Faced with physical and energy density limitations on clock speed, contemporary microprocessor designers have increasingly turned to on-chip parallelism for performance gains. Algorithms should accordingly be designed with ample amounts of fine-grained parallelism if they are to realize the full performance of the hardware. This requirement can be challenging for algorithms that are naturally expr… ▽ More

    Submitted 27 March, 2018; v1 submitted 16 November, 2017; originally announced November 2017.

    Comments: Accepted to the Proceedings of the 18th International Workshop on Advanced Computing and Analysis Techniques in Physics Research; 6 pages, 5 figures. arXiv admin note: text overlap with arXiv:1702.06359

  47. arXiv:1705.06202  [pdf, other

    cs.DC astro-ph.IM

    Data Access for LIGO on the OSG

    Authors: Derek Weitzel, Brian Bockelman, Duncan A. Brown, Peter Couvares, Frank Würthwein, Edgar Fajardo Hernandez

    Abstract: During 2015 and 2016, the Laser Interferometer Gravitational-Wave Observatory (LIGO) conducted a three-month observing campaign. These observations delivered the first direct detection of gravitational waves from binary black hole mergers. To search for these signals, the LIGO Scientific Collaboration uses the PyCBC search pipeline. To deliver science results in a timely manner, LIGO collaborated… ▽ More

    Submitted 17 May, 2017; originally announced May 2017.

    Comments: 6 pages, 3 figures, submitted to PEARC17

  48. arXiv:1705.02876  [pdf, other

    physics.comp-ph hep-ex physics.ins-det

    Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs

    Authors: Giuseppe Cerati, Peter Elmer, Slava Krutelyov, Steven Lantz, Matthieu Lefebvre, Mario Masciovecchio, Kevin McDermott, Daniel Riley, Matevž Tadel, Peter Wittich, Frank Würthwein, Avi Yagil

    Abstract: For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as GPGPU, ARM and Intel MIC. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. How… ▽ More

    Submitted 19 June, 2017; v1 submitted 8 May, 2017; originally announced May 2017.

    Comments: Submitted to proceedings of Connecting The Dots 2017 (CTD2017), Orsay. arXiv admin note: substantial text overlap with arXiv:1605.05508

  49. arXiv:1702.06359  [pdf, other

    physics.ins-det hep-ex physics.data-an

    Kalman filter tracking on parallel architectures

    Authors: Giuseppe Cerati, Peter Elmer, Slava Krutelyov, Steven Lantz, Matthieu Lefebvre, Kevin McDermott, Daniel Riley, Matevž Tadel, Peter Wittich, Frank Würthwein, Avi Yagil

    Abstract: Limits on power dissipation have pushed CPUs to grow in parallel processing capabilities rather than clock rate, leading to the rise of "manycore" or GPU-like processors. In order to achieve the best performance, applications must be able to take full advantage of vector units across multiple cores, or some analogous arrangement on an accelerator card. Such parallel performance is becoming a criti… ▽ More

    Submitted 21 November, 2017; v1 submitted 21 February, 2017; originally announced February 2017.

    Comments: Proceedings of the 22nd International Conference on Computing in High Energy and Nuclear Physics, CHEP 2016; 8 pages, 9 figures

    Journal ref: G Cerati et al 2017 J. Phys.: Conf. Ser. 898 042051

  50. arXiv:1605.05508  [pdf, other

    physics.comp-ph hep-ex physics.ins-det

    Kalman Filter Tracking on Parallel Architectures

    Authors: Giuseppe Cerati, Peter Elmer, Slava Krutelyov, Steven Lantz, Matthieu Lefebvre, Kevin McDermott, Daniel Riley, Matevz Tadel, Peter Wittich, Frank Wuerthwein, Avi Yagil

    Abstract: Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors such as GPGPU, ARM and Intel MIC. To stay within the power density limits but still obtain Moore's Law performance/price gains, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specia… ▽ More

    Submitted 18 May, 2016; originally announced May 2016.

    Comments: Submitted to proceedings of Connecting The Dots 2016 (CTD2016), Vienna. arXiv admin note: text overlap with arXiv:1601.08245