Skip to main content

Showing 1–18 of 18 results for author: Pivarski, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.18170  [pdf, other

    cs.PL physics.data-an

    Bridging Worlds: Achieving Language Interoperability between Julia and Python in Scientific Computing

    Authors: Ianna Osborne, Jim Pivarski, Jerry Ling

    Abstract: In the realm of scientific computing, both Julia and Python have established themselves as powerful tools. Within the context of High Energy Physics (HEP) data analysis, Python has been traditionally favored, yet there exists a compelling case for migrating legacy software to Julia. This article focuses on language interoperability, specifically exploring how Awkward Array data structures can brid… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 8 pages, 1 figure, ACAT2024 workshop

  2. arXiv:2310.01461  [pdf, ps, other

    cs.PL

    Awkward Just-In-Time (JIT) Compilation: A Developer's Experience

    Authors: Ianna Osborne, Jim Pivarski, Ioana Ifrim, Angus Hollands, Henry Schreiner

    Abstract: Awkward Array is a library for performing NumPy-like computations on nested, variable-sized data, enabling array-oriented programming on arbitrary data structures in Python. However, imperative (procedural) solutions can sometimes be easier to write or faster to run. Performant imperative programming requires compilation; JIT-compilation makes it convenient to compile in an interactive Python envi… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Comments: 7 pages

  3. arXiv:2306.03675  [pdf, other

    hep-ph cs.PL hep-ex physics.comp-ph

    Potential of the Julia programming language for high energy physics computing

    Authors: J. Eschle, T. Gal, M. Giordano, P. Gras, B. Hegner, L. Heinrich, U. Hernandez Acosta, S. Kluth, J. Ling, P. Mato, M. Mikhasenko, A. Moreno BriceƱo, J. Pivarski, K. Samaras-Tsakiris, O. Schulz, G. . A. Stewart, J. Strube, V. Vassilev

    Abstract: Research in high energy physics (HEP) requires huge amounts of computing and storage, putting strong constraints on the code speed and resource usage. To meet these requirements, a compiled high-performance language is typically used; while for physicists, who focus on the application when develo** the code, better research productivity pleads for a high-level programming language. A popular app… ▽ More

    Submitted 6 October, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: 32 pages, 5 figures, 4 tables

    ACM Class: J.2

    Journal ref: Computing. Comput Softw Big Sci 7, 10 (2023)

  4. arXiv:2303.02205  [pdf, other

    cs.MS hep-ex

    The Awkward World of Python and C++

    Authors: Manasvi Goyal, Ianna Osborne, Jim Pivarski

    Abstract: There are undeniable benefits of binding Python and C++ to take advantage of the best features of both languages. This is especially relevant to the HEP and other scientific communities that have invested heavily in the C++ frameworks and are rapidly moving their data analyses to Python. Version 2 of Awkward Array, a Scikit-HEP Python library, introduces a set of header-only C++ libraries that do… ▽ More

    Submitted 1 May, 2024; v1 submitted 3 March, 2023; originally announced March 2023.

    Comments: 6 pages, 2 figures; submitted to ACAT 2022 proceedings

  5. arXiv:2303.02202  [pdf, other

    hep-ex cs.PF

    Using a DSL to read ROOT TTrees faster in Uproot

    Authors: Aryan Roy, Jim Pivarski

    Abstract: Uproot reads ROOT TTrees using pure Python. For numerical and (singly) jagged arrays, this is fast because a whole block of data can be interpreted as an array without modifying the data. For other cases, such as arrays of std::vector<std::vector<float>>, numerical data are interleaved with structure, and the only way to deserialize them is with a sequential algorithm. When written in Python, such… ▽ More

    Submitted 3 March, 2023; originally announced March 2023.

    Comments: 6 pages, 3 figures; submitted to ACAT 2022 proceedings

  6. arXiv:2302.09860  [pdf, other

    hep-ex astro-ph.IM cs.CL physics.data-an

    Awkward to RDataFrame and back

    Authors: Ianna Osborne, Jim Pivarski

    Abstract: Awkward Arrays and RDataFrame provide two very different ways of performing calculations at scale. By adding the ability to zero-copy convert between them, users get the best of both. It gives users a better flexibility in mixing different packages and languages in their analysis. In Awkward Array version 2, the ak.to_rdataframe function presents a view of an Awkward Array as an RDataFrame source.… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

    Comments: 5 pages, 3 figures

  7. arXiv:2202.03911  [pdf, other

    hep-ex cs.PL physics.comp-ph physics.data-an

    An array-oriented Python interface for FastJet

    Authors: Aryan Roy, Jim Pivarski, Chad Wells Freer

    Abstract: Analysis on HEP data is an iterative process in which the results of one step often inform the next. In an exploratory analysis, it is common to perform one computation on a collection of events, then view the results (often with histograms) to decide what to try next. Awkward Array is a Scikit-HEP Python package that enables data analysis with array-at-a-time operations to implement cuts as slice… ▽ More

    Submitted 8 February, 2022; originally announced February 2022.

    Comments: 5 pages, 2 figures, submitted to ACAT 2021 proceedings

    Journal ref: J. Phys.: Conf. Ser. 2438 012011 (2023)

  8. AwkwardForth: accelerating Uproot with an internal DSL

    Authors: Jim Pivarski, Ianna Osborne, Pratyush Das, David Lange, Peter Elmer

    Abstract: File formats for generic data structures, such as ROOT, Avro, and Parquet, pose a problem for deserialization: it must be fast, but its code depends on the type of the data structure, not known at compile-time. Just-in-time compilation can satisfy both constraints, but we propose a more portable solution: specialized virtual machines. AwkwardForth is a Forth-driven virtual machine for deserializin… ▽ More

    Submitted 24 February, 2021; originally announced February 2021.

    Comments: 11 pages, 2 figures, submitted to the 25th International Conference on Computing in High Energy & Nuclear Physics

  9. Coffea -- Columnar Object Framework For Effective Analysis

    Authors: Nicholas Smith, Lindsey Gray, Matteo Cremonesi, Bo Jayatilaka, Oliver Gutsche, Allison Hall, Kevin Pedro, Maria Acosta, Andrew Melo, Stefano Belforte, Jim Pivarski

    Abstract: The coffea framework provides a new approach to High-Energy Physics analysis, via columnar operations, that improves time-to-insight, scalability, portability, and reproducibility of analysis. It is implemented with the Python programming language, the scientific python package ecosystem, and commodity big data technologies. To achieve this suite of improvements across many use cases, coffea takes… ▽ More

    Submitted 6 August, 2021; v1 submitted 28 August, 2020; originally announced August 2020.

    Comments: As presented at CHEP 2019

    Journal ref: EPJ Web of Conferences 245, 06012 (2020)

  10. Awkward Arrays in Python, C++, and Numba

    Authors: Jim Pivarski, Peter Elmer, David Lange

    Abstract: The Awkward Array library has been an important tool for physics analysis in Python since September 2018. However, some interface and implementation issues have been raised in Awkward Array's first year that argue for a reimplementation in C++ and Numba. We describe those issues, the new architecture, and present some examples of how the new interface will look to users. Of particular importance i… ▽ More

    Submitted 2 July, 2020; v1 submitted 15 January, 2020; originally announced January 2020.

    Comments: To be published in CHEP 2019 proceedings, EPJ Web of Conferences; post-review update

  11. Using Big Data Technologies for HEP Analysis

    Authors: Matteo Cremonesi, Claudio Bellini, Bianny Bian, Luca Canali, Vasileios Dimakopoulos, Peter Elmer, Ian Fisk, Maria Girone, Oliver Gutsche, Siew-Yan Hoh, Bo Jayatilaka, Viktor Khristenko, Andrea Luiselli, Andrew Melo, Evangelos Evangelos, Dominick Olivito, Jacopo Pazzini, Jim Pivarski, Alexey Svyatkovskiy, Marco Zanetti

    Abstract: The HEP community is approaching an era were the excellent performances of the particle accelerators in delivering collision at high rate will force the experiments to record a large amount of information. The growing size of the datasets could potentially become a limiting factor in the capability to produce scientific results timely and efficiently. Recently, new technologies and new approaches… ▽ More

    Submitted 21 January, 2019; originally announced January 2019.

  12. arXiv:1807.02876  [pdf, other

    physics.comp-ph cs.LG hep-ex stat.ML

    Machine Learning in High Energy Physics Community White Paper

    Authors: Kim Albertsson, Piero Altoe, Dustin Anderson, John Anderson, Michael Andrews, Juan Pedro Araque Espinosa, Adam Aurisano, Laurent Basara, Adrian Bevan, Wahid Bhimji, Daniele Bonacorsi, Bjorn Burkle, Paolo Calafiura, Mario Campanelli, Louis Capps, Federico Carminati, Stefano Carrazza, Yi-fan Chen, Taylor Childers, Yann Coadou, Elias Coniavitis, Kyle Cranmer, Claire David, Douglas Davis, Andrea De Simone , et al. (103 additional authors not shown)

    Abstract: Machine learning has been applied to several problems in particle physics research, beginning with applications to high-level physics analysis in the 1990s and 2000s, followed by an explosion of applications in particle and event identification and reconstruction in the 2010s. In this document we discuss promising future research and development areas for machine learning in particle physics. We d… ▽ More

    Submitted 16 May, 2019; v1 submitted 8 July, 2018; originally announced July 2018.

    Comments: Editors: Sergei Gleyzer, Paul Seyfert and Steven Schramm

  13. arXiv:1711.02659  [pdf, other

    cs.DC

    Optimizing ROOT IO For Analysis

    Authors: Brian Bockelman, Zhe Zhang, Jim Pivarski

    Abstract: The ROOT I/O (RIO) subsystem is foundational to most HEP experiments - it provides a file format, a set of APIs/semantics, and a reference implementation in C++. It is often found at the base of an experiment's framework and is used to serialize the experiment's data; in the case of an LHC experiment, this may be hundreds of petabytes of files! Individual physicists will further use RIO to perform… ▽ More

    Submitted 7 November, 2017; originally announced November 2017.

    Comments: 18th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT)

  14. arXiv:1711.01229  [pdf, other

    cs.DC

    Toward real-time data query systems in HEP

    Authors: Jim Pivarski, David Lange, Thanat Jatuphattharachat

    Abstract: Exploratory data analysis tools must respond quickly to a user's questions, so that the answer to one question (e.g. a visualized histogram or fit) can influence the next. In some SQL-based query systems used in industry, even very large (petabyte) datasets can be summarized on a human timescale (seconds), employing techniques such as columnar data representation, caching, indexing, and code gener… ▽ More

    Submitted 8 November, 2017; v1 submitted 3 November, 2017; originally announced November 2017.

    Comments: 6 pages, 2 figures, proceedings for ACAT 2017

  15. arXiv:1711.00375  [pdf, other

    cs.DC

    CMS Analysis and Data Reduction with Apache Spark

    Authors: Oliver Gutsche, Luca Canali, Illia Cremer, Matteo Cremonesi, Peter Elmer, Ian Fisk, Maria Girone, Bo Jayatilaka, Jim Kowalkowski, Viktor Khristenko, Evangelos Motesnitsalis, Jim Pivarski, Saba Sehrish, Kacper Surdy, Alexey Svyatkovskiy

    Abstract: Experimental Particle Physics has been at the forefront of analyzing the world's largest datasets for decades. The HEP community was among the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems for distributed data processing, collectively called "Big Data" technologies have emerged from industry and open source projects to support the a… ▽ More

    Submitted 31 October, 2017; originally announced November 2017.

    Comments: Proceedings for 18th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2017). arXiv admin note: text overlap with arXiv:1703.04171

  16. arXiv:1708.08319  [pdf, other

    cs.PL cs.DB cs.IR

    Fast Access to Columnar, Hierarchically Nested Data via Code Transformation

    Authors: Jim Pivarski, Peter Elmer, Brian Bockelman, Zhe Zhang

    Abstract: Big Data query systems represent data in a columnar format for fast, selective access, and in some cases (e.g. Apache Drill), perform calculations directly on the columnar data without row materialization, avoiding runtime costs. However, many analysis procedures cannot be easily or efficiently expressed as SQL. In High Energy Physics, the majority of data processing requires nested loops with c… ▽ More

    Submitted 3 November, 2017; v1 submitted 20 August, 2017; originally announced August 2017.

    Comments: 10 pages, 2 figures, submitted to IEEE Big Data

  17. Big Data in HEP: A comprehensive use case study

    Authors: Oliver Gutsche, Matteo Cremonesi, Peter Elmer, Bo Jayatilaka, Jim Kowalkowski, Jim Pivarski, Saba Sehrish, Cristina Mantilla Surez, Alexey Svyatkovskiy, Nhan Tran

    Abstract: Experimental Particle Physics has been at the forefront of analyzing the worlds largest datasets for decades. The HEP community was the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems collectively called Big Data technologies have emerged to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of dat… ▽ More

    Submitted 12 March, 2017; originally announced March 2017.

    Comments: Proceedings for 22nd International Conference on Computing in High Energy and Nuclear Physics (CHEP 2016)

  18. The Matsu Wheel: A Cloud-based Framework for Efficient Analysis and Reanalysis of Earth Satellite Imagery

    Authors: Maria T Patterson, Nikolas Anderson, Collin Bennett, Jacob Bruggemann, Robert Grossman, Matthew Handy, Vuong Ly, Dan Mandl, Shane Pederson, Jim Pivarski, Ray Powell, Jonathan Spring, Walt Wells

    Abstract: Project Matsu is a collaboration between the Open Commons Consortium and NASA focused on develo** open source technology for the cloud-based processing of Earth satellite imagery. A particular focus is the development of applications for detecting fires and floods to help support natural disaster detection and relief. Project Matsu has developed an open source cloud-based infrastructure to proce… ▽ More

    Submitted 22 February, 2016; originally announced February 2016.

    Comments: 10 pages, accepted for presentation to IEEE BigDataService 2016