Search | arXiv e-print repository

Bridging Worlds: Achieving Language Interoperability between Julia and Python in Scientific Computing

Authors: Ianna Osborne, Jim Pivarski, Jerry Ling

Abstract: In the realm of scientific computing, both Julia and Python have established themselves as powerful tools. Within the context of High Energy Physics (HEP) data analysis, Python has been traditionally favored, yet there exists a compelling case for migrating legacy software to Julia. This article focuses on language interoperability, specifically exploring how Awkward Array data structures can brid… ▽ More In the realm of scientific computing, both Julia and Python have established themselves as powerful tools. Within the context of High Energy Physics (HEP) data analysis, Python has been traditionally favored, yet there exists a compelling case for migrating legacy software to Julia. This article focuses on language interoperability, specifically exploring how Awkward Array data structures can bridge the gap between Julia and Python. The talk offers insights into key considerations such as memory management, data buffer copies, and dependency handling. It delves into the performance enhancements achieved by invoking Julia from Python and vice versa, particularly for intensive array-oriented calculations involving large-scale, though not excessively dimensional, arrays of HEP data. The advantages and challenges inherent in achieving interoperability between Julia and Python in the domain of scientific computing are discussed. △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: 8 pages, 1 figure, ACAT2024 workshop

arXiv:2404.02100 [pdf, other]

Analysis Facilities White Paper

Authors: D. Ciangottini, A. Forti, L. Heinrich, N. Skidmore, C. Alpigiani, M. Aly, D. Benjamin, B. Bockelman, L. Bryant, J. Catmore, M. D'Alfonso, A. Delgado Peris, C. Doglioni, G. Duckeck, P. Elmer, J. Eschle, M. Feickert, J. Frost, R. Gardner, V. Garonne, M. Giffels, J. Gooding, E. Gramstad, L. Gray, B. Hegner , et al. (41 additional authors not shown)

Abstract: This white paper presents the current status of the R&D for Analysis Facilities (AFs) and attempts to summarize the views on the future direction of these facilities. These views have been collected through the High Energy Physics (HEP) Software Foundation's (HSF) Analysis Facilities forum, established in March 2022, the Analysis Ecosystems II workshop, that took place in May 2022, and the WLCG/HS… ▽ More This white paper presents the current status of the R&D for Analysis Facilities (AFs) and attempts to summarize the views on the future direction of these facilities. These views have been collected through the High Energy Physics (HEP) Software Foundation's (HSF) Analysis Facilities forum, established in March 2022, the Analysis Ecosystems II workshop, that took place in May 2022, and the WLCG/HSF pre-CHEP workshop, that took place in May 2023. The paper attempts to cover all the aspects of an analysis facility. △ Less

Submitted 15 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2310.01461 [pdf, ps, other]

Awkward Just-In-Time (JIT) Compilation: A Developer's Experience

Authors: Ianna Osborne, Jim Pivarski, Ioana Ifrim, Angus Hollands, Henry Schreiner

Abstract: Awkward Array is a library for performing NumPy-like computations on nested, variable-sized data, enabling array-oriented programming on arbitrary data structures in Python. However, imperative (procedural) solutions can sometimes be easier to write or faster to run. Performant imperative programming requires compilation; JIT-compilation makes it convenient to compile in an interactive Python envi… ▽ More Awkward Array is a library for performing NumPy-like computations on nested, variable-sized data, enabling array-oriented programming on arbitrary data structures in Python. However, imperative (procedural) solutions can sometimes be easier to write or faster to run. Performant imperative programming requires compilation; JIT-compilation makes it convenient to compile in an interactive Python environment. Various functions in Awkward Arrays JIT-compile a user's code into executable machine code. They use several different techniques, but reuse parts of each others' implementations. We discuss the techniques used to achieve the Awkward Arrays acceleration with JIT-compilation, focusing on RDataFrame, cppyy, and Numba, particularly Numba on GPUs: conversions of Awkward Arrays to and from RDataFrame; standalone cppyy; passing Awkward Arrays to and from Python functions compiled by Numba; passing Awkward Arrays to Python functions compiled for GPUs by Numba; and header-only libraries for populating Awkward Arrays from C++ without any Python dependencies. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Comments: 7 pages

arXiv:2306.03675 [pdf, other]

doi 10.1007/s41781-023-00104-x

Potential of the Julia programming language for high energy physics computing

Authors: J. Eschle, T. Gal, M. Giordano, P. Gras, B. Hegner, L. Heinrich, U. Hernandez Acosta, S. Kluth, J. Ling, P. Mato, M. Mikhasenko, A. Moreno Briceño, J. Pivarski, K. Samaras-Tsakiris, O. Schulz, G. . A. Stewart, J. Strube, V. Vassilev

Abstract: Research in high energy physics (HEP) requires huge amounts of computing and storage, putting strong constraints on the code speed and resource usage. To meet these requirements, a compiled high-performance language is typically used; while for physicists, who focus on the application when develo** the code, better research productivity pleads for a high-level programming language. A popular app… ▽ More Research in high energy physics (HEP) requires huge amounts of computing and storage, putting strong constraints on the code speed and resource usage. To meet these requirements, a compiled high-performance language is typically used; while for physicists, who focus on the application when develo** the code, better research productivity pleads for a high-level programming language. A popular approach consists of combining Python, used for the high-level interface, and C++, used for the computing intensive part of the code. A more convenient and efficient approach would be to use a language that provides both high-level programming and high-performance. The Julia programming language, developed at MIT especially to allow the use of a single language in research activities, has followed this path. In this paper the applicability of using the Julia language for HEP research is explored, covering the different aspects that are important for HEP code development: runtime performance, handling of large projects, interface with legacy code, distributed computing, training, and ease of programming. The study shows that the HEP community would benefit from a large scale adoption of this programming language. The HEP-specific foundation libraries that would need to be consolidated are identified △ Less

Submitted 6 October, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

Comments: 32 pages, 5 figures, 4 tables

ACM Class: J.2

Journal ref: Computing. Comput Softw Big Sci 7, 10 (2023)

arXiv:2303.02205 [pdf, other]

The Awkward World of Python and C++

Authors: Manasvi Goyal, Ianna Osborne, Jim Pivarski

Abstract: There are undeniable benefits of binding Python and C++ to take advantage of the best features of both languages. This is especially relevant to the HEP and other scientific communities that have invested heavily in the C++ frameworks and are rapidly moving their data analyses to Python. Version 2 of Awkward Array, a Scikit-HEP Python library, introduces a set of header-only C++ libraries that do… ▽ More There are undeniable benefits of binding Python and C++ to take advantage of the best features of both languages. This is especially relevant to the HEP and other scientific communities that have invested heavily in the C++ frameworks and are rapidly moving their data analyses to Python. Version 2 of Awkward Array, a Scikit-HEP Python library, introduces a set of header-only C++ libraries that do not depend on any application binary interface. Users can directly include these libraries in their compilation instead of linking against platform-specific libraries. This new development makes the integration of Awkward Arrays into other projects easier and more portable, as the implementation is easily separable from the rest of the Awkward Array codebase. The code is minimal; it does not include all of the code needed to use Awkward Arrays in Python, nor does it include references to Python or pybind11. The C++ users can use it to make arrays and then copy them to Python without any specialized data types - only raw buffers, strings, and integers. This C++ code also simplifies the process of just-in-time (JIT) compilation in ROOT. This implementation approach solves some of the drawbacks, like packaging projects where native dependencies can be challenging. In this paper, we demonstrate the technique to integrate C++ and Python using a header-only approach. We also describe the implementation of a new LayoutBuilder and a GrowableBuffer. Furthermore, examples of wrap** the C++ data into Awkward Arrays and exposing Awkward Arrays to C++ without copying them are discussed. △ Less

Submitted 1 May, 2024; v1 submitted 3 March, 2023; originally announced March 2023.

Comments: 6 pages, 2 figures; submitted to ACAT 2022 proceedings

arXiv:2303.02202 [pdf, other]

Using a DSL to read ROOT TTrees faster in Uproot

Authors: Aryan Roy, Jim Pivarski

Abstract: Uproot reads ROOT TTrees using pure Python. For numerical and (singly) jagged arrays, this is fast because a whole block of data can be interpreted as an array without modifying the data. For other cases, such as arrays of std::vector<std::vector<float>>, numerical data are interleaved with structure, and the only way to deserialize them is with a sequential algorithm. When written in Python, such… ▽ More Uproot reads ROOT TTrees using pure Python. For numerical and (singly) jagged arrays, this is fast because a whole block of data can be interpreted as an array without modifying the data. For other cases, such as arrays of std::vector<std::vector<float>>, numerical data are interleaved with structure, and the only way to deserialize them is with a sequential algorithm. When written in Python, such algorithms are very slow. We solve this problem by writing the same logic in a language that can be executed quickly. AwkwardForth is a Domain Specific Language (DSL), based on Standard Forth with I/O extensions for making Awkward Arrays, and it can be interpreted as a fast virtual machine without requiring LLVM as a dependency. We generate code as late as possible to take advantage of optimization opportunities. All ROOT types previously implemented with Python have been converted to AwkwardForth. Double and triple-jagged arrays, for example, are 400x faster in AwkwardForth than in Python, with multithreaded scaling up to 1 second/GB because AwkwardForth releases the Python GIL. We also investigate the possibility of JIT-compiling the generated AwkwardForth code using LLVM to increase the performance gains. In this paper, we describe design aspects, performance studies, and future directions in accelerating Uproot with AwkwardForth. △ Less

Submitted 3 March, 2023; originally announced March 2023.

Comments: 6 pages, 3 figures; submitted to ACAT 2022 proceedings

arXiv:2302.09860 [pdf, other]

Awkward to RDataFrame and back

Authors: Ianna Osborne, Jim Pivarski

Abstract: Awkward Arrays and RDataFrame provide two very different ways of performing calculations at scale. By adding the ability to zero-copy convert between them, users get the best of both. It gives users a better flexibility in mixing different packages and languages in their analysis. In Awkward Array version 2, the ak.to_rdataframe function presents a view of an Awkward Array as an RDataFrame source.… ▽ More Awkward Arrays and RDataFrame provide two very different ways of performing calculations at scale. By adding the ability to zero-copy convert between them, users get the best of both. It gives users a better flexibility in mixing different packages and languages in their analysis. In Awkward Array version 2, the ak.to_rdataframe function presents a view of an Awkward Array as an RDataFrame source. This view is generated on demand and the data are not copied. The column readers are generated based on the run-time type of the views. The readers are passed to a generated source derived from ROOT::RDF::RDataSource. The ak.from_rdataframe function converts the selected columns as native Awkward Arrays. The details of the implementation exploiting JIT techniques are discussed. The examples of analysis of data stored in Awkward Arrays via a high-level interface of an RDataFrame are presented. A few examples of the column definition, applying user-defined filters written in C++, and plotting or extracting the columnar data as Awkward Arrays are shown. Current limitations and future plans are discussed. △ Less

Submitted 20 February, 2023; originally announced February 2023.

Comments: 5 pages, 3 figures

arXiv:2212.04889 [pdf, ps, other]

doi 10.5281/zenodo.7418818

Second Analysis Ecosystem Workshop Report

Authors: Mohamed Aly, Jackson Burzynski, Bryan Cardwell, Daniel C. Craik, Tal van Daalen, Tomas Dado, Ayanabha Das, Antonio Delgado Peris, Caterina Doglioni, Peter Elmer, Engin Eren, Martin B. Eriksen, Jonas Eschle, Giulio Eulisse, Conor Fitzpatrick, José Flix Molina, Alessandra Forti, Ben Galewsky, Sean Gasiorowski, Aman Goel, Loukas Gouskos, Enrico Guiraud, Kanhaiya Gupta, Stephan Hageboeck, Allison Reinsvold Hall , et al. (44 additional authors not shown)

Abstract: The second workshop on the HEP Analysis Ecosystem took place 23-25 May 2022 at IJCLab in Orsay, to look at progress and continuing challenges in scaling up HEP analysis to meet the needs of HL-LHC and DUNE, as well as the very pressing needs of LHC Run 3 analysis. The workshop was themed around six particular topics, which were felt to capture key questions, opportunities and challenges. Each to… ▽ More The second workshop on the HEP Analysis Ecosystem took place 23-25 May 2022 at IJCLab in Orsay, to look at progress and continuing challenges in scaling up HEP analysis to meet the needs of HL-LHC and DUNE, as well as the very pressing needs of LHC Run 3 analysis. The workshop was themed around six particular topics, which were felt to capture key questions, opportunities and challenges. Each topic arranged a plenary session introduction, often with speakers summarising the state-of-the art and the next steps for analysis. This was then followed by parallel sessions, which were much more discussion focused, and where attendees could grapple with the challenges and propose solutions that could be tried. Where there was significant overlap between topics, a joint discussion between them was arranged. In the weeks following the workshop the session conveners wrote this document, which is a summary of the main discussions, the key points raised and the conclusions and outcomes. The document was circulated amongst the participants for comments before being finalised here. △ Less

Submitted 9 December, 2022; originally announced December 2022.

Report number: HSF-DOC-2022-02

arXiv:2202.03911 [pdf, other]

doi 10.1088/1742-6596/2438/1/012011

An array-oriented Python interface for FastJet

Authors: Aryan Roy, Jim Pivarski, Chad Wells Freer

Abstract: Analysis on HEP data is an iterative process in which the results of one step often inform the next. In an exploratory analysis, it is common to perform one computation on a collection of events, then view the results (often with histograms) to decide what to try next. Awkward Array is a Scikit-HEP Python package that enables data analysis with array-at-a-time operations to implement cuts as slice… ▽ More Analysis on HEP data is an iterative process in which the results of one step often inform the next. In an exploratory analysis, it is common to perform one computation on a collection of events, then view the results (often with histograms) to decide what to try next. Awkward Array is a Scikit-HEP Python package that enables data analysis with array-at-a-time operations to implement cuts as slices, combinatorics as composable functions, etc. However, most C++ HEP libraries, such as FastJet, have an imperative, one-particle-at-a-time interface, which would be inefficient in Python and goes against the grain of the array-at-a-time logic of scientific Python. Therefore, we developed fastjet, a pip-installable Python package that provides FastJet C++ binaries, the classic (particle-at-a-time) Python interface, and the new array-oriented interface for use with Awkward Array. The new interface streamlines interoperability with scientific Python software beyond HEP, such as machine learning. In one case, adopting this library along with other array-oriented tools accelerated HEP analysis code by a factor of 20. It was designed to be easily integrated with libraries in the Scikit-HEP ecosystem, including Uproot (file I/O), hist (histogramming), Vector (Lorentz vectors), and Coffea (high-level glue). We discuss the design of the fastjet Python library, integrating the classic interface with the array oriented interface and with the Vector library for Lorentz vector operations. The new interface was developed as open source. △ Less

Submitted 8 February, 2022; originally announced February 2022.

Comments: 5 pages, 2 figures, submitted to ACAT 2021 proceedings

Journal ref: J. Phys.: Conf. Ser. 2438 012011 (2023)

arXiv:2202.02194 [pdf, other]

HL-LHC Computing Review Stage 2, Common Software Projects: Data Science Tools for Analysis

Authors: Jim Pivarski, Eduardo Rodrigues, Kevin Pedro, Oksana Shadura, Benjamin Krikler, Graeme A. Stewart

Abstract: This paper was prepared by the HEP Software Foundation (HSF) PyHEP Working Group as input to the second phase of the LHCC review of High-Luminosity LHC (HL-LHC) computing, which took place in November, 2021. It describes the adoption of Python and data science tools in HEP, discusses the likelihood of future scenarios, and recommendations for action by the HEP community. This paper was prepared by the HEP Software Foundation (HSF) PyHEP Working Group as input to the second phase of the LHCC review of High-Luminosity LHC (HL-LHC) computing, which took place in November, 2021. It describes the adoption of Python and data science tools in HEP, discusses the likelihood of future scenarios, and recommendations for action by the HEP community. △ Less

Submitted 4 February, 2022; originally announced February 2022.

Comments: 25 pages, 7 figures; presented at https://indico.cern.ch/event/1058274/ (LHCC Review of HL-LHC Computing)

Report number: FERMILAB-CONF-22-061-SCD

arXiv:2106.15783 [pdf, other]

Learning from the Pandemic: the Future of Meetings in HEP and Beyond

Authors: Mark S. Neubauer, Todd Adams, Jennifer Adelman-McCarthy, Gabriele Benelli, Tulika Bose, David Britton, Pat Burchat, Joel Butler, Timothy A. Cartwright, Tomáš Davídek, Jacques Dumarchez, Peter Elmer, Matthew Feickert, Ben Galewsky, Mandeep Gill, Maciej Gladki, Aman Goel, Jonathan E. Guyer, Bo Jayatilaka, Brendan Kiburg, Benjamin Krikler, David Lange, Claire Lee, Nick Manganelli, Giovanni Marchiori , et al. (14 additional authors not shown)

Abstract: The COVID-19 pandemic has by-and-large prevented in-person meetings since March 2020. While the increasing deployment of effective vaccines around the world is a very positive development, the timeline and pathway to "normality" is uncertain and the "new normal" we will settle into is anyone's guess. Particle physics, like many other scientific fields, has more than a year of experience in holding… ▽ More The COVID-19 pandemic has by-and-large prevented in-person meetings since March 2020. While the increasing deployment of effective vaccines around the world is a very positive development, the timeline and pathway to "normality" is uncertain and the "new normal" we will settle into is anyone's guess. Particle physics, like many other scientific fields, has more than a year of experience in holding virtual meetings, workshops, and conferences. A great deal of experimentation and innovation to explore how to execute these meetings effectively has occurred. Therefore, it is an appropriate time to take stock of what we as a community learned from running virtual meetings and discuss possible strategies for the future. Continuing to develop effective strategies for meetings with a virtual component is likely to be important for reducing the carbon footprint of our research activities, while also enabling greater diversity and inclusion for participation. This report summarizes a virtual two-day workshop on Virtual Meetings held May 5-6, 2021 which brought together experts from both inside and outside of high-energy physics to share their experiences and practices with organizing and executing virtual workshops, and to develop possible strategies for future meetings as we begin to emerge from the COVID-19 pandemic. This report outlines some of the practices and tools that have worked well which we hope will serve as a valuable resource for future virtual meeting organizers in all scientific fields. △ Less

Submitted 29 June, 2021; originally announced June 2021.

Comments: A report from the "Virtual Meetings" IRIS-HEP Blueprint Workshop: https://indico.cern.ch/event/1026363/

arXiv:2102.13516 [pdf, other]

doi 10.1051/epjconf/202125103002

AwkwardForth: accelerating Uproot with an internal DSL

Authors: Jim Pivarski, Ianna Osborne, Pratyush Das, David Lange, Peter Elmer

Abstract: File formats for generic data structures, such as ROOT, Avro, and Parquet, pose a problem for deserialization: it must be fast, but its code depends on the type of the data structure, not known at compile-time. Just-in-time compilation can satisfy both constraints, but we propose a more portable solution: specialized virtual machines. AwkwardForth is a Forth-driven virtual machine for deserializin… ▽ More File formats for generic data structures, such as ROOT, Avro, and Parquet, pose a problem for deserialization: it must be fast, but its code depends on the type of the data structure, not known at compile-time. Just-in-time compilation can satisfy both constraints, but we propose a more portable solution: specialized virtual machines. AwkwardForth is a Forth-driven virtual machine for deserializing data into Awkward Arrays. As a language, it is not intended for humans to write, but it loosens the coupling between Uproot and Awkward Array. AwkwardForth programs for deserializing record-oriented formats (ROOT and Avro) are about as fast as C++ ROOT and 10-80$\times$ faster than fastavro. Columnar formats (simple TTrees, RNTuple, and Parquet) only require specialization to interpret metadata and are therefore faster with precompiled code. △ Less

Submitted 24 February, 2021; originally announced February 2021.

Comments: 11 pages, 2 figures, submitted to the 25th International Conference on Computing in High Energy & Nuclear Physics

arXiv:2011.01950 [pdf, ps, other]

Analysis Description Languages for the LHC

Authors: Sezen Sekmen, Philippe Gras, Lindsey Gray, Benjamin Krikler, Jim Pivarski, Harrison B. Prosper, Andrea Rizzi, Gokhan Unel, Gordon Watts

Abstract: An analysis description language is a domain specific language capable of describing the contents of an LHC analysis in a standard and unambiguous way, independent of any computing framework. It is designed for use by anyone with an interest in, and knowledge of, LHC physics, i.e., experimentalists, phenomenologists and other enthusiasts. Adopting analysis description languages would bring numerou… ▽ More An analysis description language is a domain specific language capable of describing the contents of an LHC analysis in a standard and unambiguous way, independent of any computing framework. It is designed for use by anyone with an interest in, and knowledge of, LHC physics, i.e., experimentalists, phenomenologists and other enthusiasts. Adopting analysis description languages would bring numerous benefits for the LHC experimental and phenomenological communities ranging from analysis preservation beyond the lifetimes of experiments or analysis software to facilitating the abstraction, design, visualization, validation, combination, reproduction, interpretation and overall communication of the analysis contents. Here, we introduce the analysis description language concept and summarize the current efforts ongoing to develop such languages and tools to use them in LHC analyses. △ Less

Submitted 3 November, 2020; originally announced November 2020.

Comments: Accepted contribution to the proceedings of The 8th Annual Conference on Large Hadron Collider Physics, LHCP2020, 25-30 May, 2020, online

Journal ref: Proceedings of Science, PoS(LHCP2020)065

arXiv:2008.13636 [pdf, ps, other]

doi 10.5281/zenodo.4009114

HL-LHC Computing Review: Common Tools and Community Software

Authors: HEP Software Foundation, :, Thea Aarrestad, Simone Amoroso, Markus Julian Atkinson, Joshua Bendavid, Tommaso Boccali, Andrea Bocci, Andy Buckley, Matteo Cacciari, Paolo Calafiura, Philippe Canal, Federico Carminati, Taylor Childers, Vitaliano Ciulli, Gloria Corti, Davide Costanzo, Justin Gage Dezoort, Caterina Doglioni, Javier Mauricio Duarte, Agnieszka Dziurda, Peter Elmer, Markus Elsing, V. Daniel Elvira, Giulio Eulisse , et al. (85 additional authors not shown)

Abstract: Common and community software packages, such as ROOT, Geant4 and event generators have been a key part of the LHC's success so far and continued development and optimisation will be critical in the future. The challenges are driven by an ambitious physics programme, notably the LHC accelerator upgrade to high-luminosity, HL-LHC, and the corresponding detector upgrades of ATLAS and CMS. In this doc… ▽ More Common and community software packages, such as ROOT, Geant4 and event generators have been a key part of the LHC's success so far and continued development and optimisation will be critical in the future. The challenges are driven by an ambitious physics programme, notably the LHC accelerator upgrade to high-luminosity, HL-LHC, and the corresponding detector upgrades of ATLAS and CMS. In this document we address the issues for software that is used in multiple experiments (usually even more widely than ATLAS and CMS) and maintained by teams of developers who are either not linked to a particular experiment or who contribute to common software within the context of their experiment activity. We also give space to general considerations for future software and projects that tackle upcoming challenges, no matter who writes it, which is an area where community convergence on best practice is extremely useful. △ Less

Submitted 31 August, 2020; originally announced August 2020.

Comments: 40 pages contribution to Snowmass 2021

Report number: HSF-DOC-2020-01

arXiv:2008.12712 [pdf, other]

doi 10.1051/epjconf/202024506012

Coffea -- Columnar Object Framework For Effective Analysis

Authors: Nicholas Smith, Lindsey Gray, Matteo Cremonesi, Bo Jayatilaka, Oliver Gutsche, Allison Hall, Kevin Pedro, Maria Acosta, Andrew Melo, Stefano Belforte, Jim Pivarski

Abstract: The coffea framework provides a new approach to High-Energy Physics analysis, via columnar operations, that improves time-to-insight, scalability, portability, and reproducibility of analysis. It is implemented with the Python programming language, the scientific python package ecosystem, and commodity big data technologies. To achieve this suite of improvements across many use cases, coffea takes… ▽ More The coffea framework provides a new approach to High-Energy Physics analysis, via columnar operations, that improves time-to-insight, scalability, portability, and reproducibility of analysis. It is implemented with the Python programming language, the scientific python package ecosystem, and commodity big data technologies. To achieve this suite of improvements across many use cases, coffea takes a factorized approach, separating the analysis implementation and data delivery scheme. All analysis operations are implemented using the NumPy or awkward-array packages which are wrapped to yield user code whose purpose is quickly intuited. Various data delivery schemes are wrapped into a common front-end which accepts user inputs and code, and returns user defined outputs. We will discuss our experience in implementing analysis of CMS data using the coffea framework along with a discussion of the user experience and future directions. △ Less

Submitted 6 August, 2021; v1 submitted 28 August, 2020; originally announced August 2020.

Comments: As presented at CHEP 2019

Journal ref: EPJ Web of Conferences 245, 06012 (2020)

arXiv:2007.03577 [pdf, other]

doi 10.1051/epjconf/202024506028

The Scikit HEP Project -- overview and prospects

Authors: Eduardo Rodrigues, Benjamin Krikler, Chris Burr, Dmitri Smirnov, Hans Dembinski, Henry Schreiner, Jaydeep Nandi, Jim Pivarski, Matthew Feickert, Matthieu Marinangeli, Nick Smith, Pratyush Das

Abstract: Scikit-HEP is a community-driven and community-oriented project with the goal of providing an ecosystem for particle physics data analysis in Python. Scikit-HEP is a toolset of approximately twenty packages and a few "affiliated" packages. It expands the typical Python data analysis tools for particle physicists. Each package focuses on a particular topic, and interacts with other packages in the… ▽ More Scikit-HEP is a community-driven and community-oriented project with the goal of providing an ecosystem for particle physics data analysis in Python. Scikit-HEP is a toolset of approximately twenty packages and a few "affiliated" packages. It expands the typical Python data analysis tools for particle physicists. Each package focuses on a particular topic, and interacts with other packages in the toolset, where appropriate. Most of the packages are easy to install in many environments; much work has been done this year to provide binary "wheels" on PyPI and conda-forge packages. The Scikit-HEP project has been gaining interest and momentum, by building a user and developer community engaging collaboration across experiments. Some of the packages are being used by other communities, including the astroparticle physics community. An overview of the overall project and toolset will be presented, as well as a vision for development and sustainability. △ Less

Submitted 7 July, 2020; originally announced July 2020.

Comments: 6 pages, 3 figures, Proceedings of the 24th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2019), Adelaide, Australia, 4-8 November 2019

arXiv:2001.06307 [pdf, other]

doi 10.1051/epjconf/202024505023

Awkward Arrays in Python, C++, and Numba

Authors: Jim Pivarski, Peter Elmer, David Lange

Abstract: The Awkward Array library has been an important tool for physics analysis in Python since September 2018. However, some interface and implementation issues have been raised in Awkward Array's first year that argue for a reimplementation in C++ and Numba. We describe those issues, the new architecture, and present some examples of how the new interface will look to users. Of particular importance i… ▽ More The Awkward Array library has been an important tool for physics analysis in Python since September 2018. However, some interface and implementation issues have been raised in Awkward Array's first year that argue for a reimplementation in C++ and Numba. We describe those issues, the new architecture, and present some examples of how the new interface will look to users. Of particular importance is the separation of kernel functions from data structure management, which allows a C++ implementation and a Numba implementation to share kernel functions, and the algorithm that transforms record-oriented data into columnar Awkward Arrays. △ Less

Submitted 2 July, 2020; v1 submitted 15 January, 2020; originally announced January 2020.

Comments: To be published in CHEP 2019 proceedings, EPJ Web of Conferences; post-review update

arXiv:1901.07143 [pdf, other]

doi 10.1051/epjconf/201921406030

Using Big Data Technologies for HEP Analysis

Authors: Matteo Cremonesi, Claudio Bellini, Bianny Bian, Luca Canali, Vasileios Dimakopoulos, Peter Elmer, Ian Fisk, Maria Girone, Oliver Gutsche, Siew-Yan Hoh, Bo Jayatilaka, Viktor Khristenko, Andrea Luiselli, Andrew Melo, Evangelos Evangelos, Dominick Olivito, Jacopo Pazzini, Jim Pivarski, Alexey Svyatkovskiy, Marco Zanetti

Abstract: The HEP community is approaching an era were the excellent performances of the particle accelerators in delivering collision at high rate will force the experiments to record a large amount of information. The growing size of the datasets could potentially become a limiting factor in the capability to produce scientific results timely and efficiently. Recently, new technologies and new approaches… ▽ More The HEP community is approaching an era were the excellent performances of the particle accelerators in delivering collision at high rate will force the experiments to record a large amount of information. The growing size of the datasets could potentially become a limiting factor in the capability to produce scientific results timely and efficiently. Recently, new technologies and new approaches have been developed in industry to answer to the necessity to retrieve information as quickly as possible to analyze PB and EB datasets. Providing the scientists with these modern computing tools will lead to rethinking the principles of data analysis in HEP, making the overall scientific process faster and smoother. In this paper, we are presenting the latest developments and the most recent results on the usage of Apache Spark for HEP analysis. The study aims at evaluating the efficiency of the application of the new tools both quantitatively, by measuring the performances, and qualitatively, focusing on the user experience. The first goal is achieved by develo** a data reduction facility: working together with CERN Openlab and Intel, CMS replicates a real physics search using Spark-based technologies, with the ambition of reducing 1 PB of public data in 5 hours, collected by the CMS experiment, to 1 TB of data in a format suitable for physics analysis. The second goal is achieved by implementing multiple physics use-cases in Apache Spark using as input preprocessed datasets derived from official CMS data and simulation. By performing different end-analyses up to the publication plots on different hardware, feasibility, usability and portability are compared to the ones of a traditional ROOT-based workflow. △ Less

Submitted 21 January, 2019; originally announced January 2019.

arXiv:1812.00761 [pdf, ps, other]

HEP Software Foundation Community White Paper Working Group -- Data Organization, Management and Access (DOMA)

Authors: Dario Berzano, Riccardo Maria Bianchi, Ian Bird, Brian Bockelman, Simone Campana, Kaushik De, Dirk Duellmann, Peter Elmer, Robert Gardner, Vincent Garonne, Claudio Grandi, Oliver Gutsche, Andrew Hanushevsky, Burt Holzman, Bodhitha Jayatilaka, Ivo Jimenez, Michel Jouvin, Oliver Keeble, Alexei Klimentov, Valentin Kuznetsov, Eric Lancon, Mario Lassnig, Miron Livny, Carlos Maltzahn, Shawn McKee , et al. (13 additional authors not shown)

Abstract: Without significant changes to data organization, management, and access (DOMA), HEP experiments will find scientific output limited by how fast data can be accessed and digested by computational resources. In this white paper we discuss challenges in DOMA that HEP experiments, such as the HL-LHC, will face as well as potential ways to address them. A research and development timeline to assess th… ▽ More Without significant changes to data organization, management, and access (DOMA), HEP experiments will find scientific output limited by how fast data can be accessed and digested by computational resources. In this white paper we discuss challenges in DOMA that HEP experiments, such as the HL-LHC, will face as well as potential ways to address them. A research and development timeline to assess these changes is also proposed. △ Less

Submitted 30 November, 2018; originally announced December 2018.

Comments: arXiv admin note: text overlap with arXiv:1712.06592

Report number: HSF-CWP-2017-04

arXiv:1811.10309 [pdf, other]

HEP Software Foundation Community White Paper Working Group --- Visualization

Authors: Matthew Bellis, Riccardo Maria Bianchi, Sebastien Binet, Ciril Bohak, Benjamin Couturier, Hadrien Grasland, Oliver Gutsche, Sergey Linev, Alex Martyniuk, Thomas McCauley, Edward Moyse, Alja Mrak Tadel, Mark Neubauer, Jeremi Niedziela, Leo Piilonen, Jim Pivarski, Martin Ritter, Tai Sakuma, Matevz Tadel, Barthélémy von Haller, Ilija Vukotic, Ben Waugh

Abstract: In modern High Energy Physics (HEP) experiments visualization of experimental data has a key role in many activities and tasks across the whole data chain: from detector development to monitoring, from event generation to reconstruction of physics objects, from detector simulation to data analysis, and all the way to outreach and education. In this paper, the definition, status, and evolution of d… ▽ More In modern High Energy Physics (HEP) experiments visualization of experimental data has a key role in many activities and tasks across the whole data chain: from detector development to monitoring, from event generation to reconstruction of physics objects, from detector simulation to data analysis, and all the way to outreach and education. In this paper, the definition, status, and evolution of data visualization for HEP experiments will be presented. Suggestions for the upgrade of data visualization tools and techniques in current experiments will be outlined, along with guidelines for future experiments. This paper expands on the summary content published in the HSF \emph{Roadmap} Community White Paper~\cite{HSF-CWP-2017-01} △ Less

Submitted 26 November, 2018; originally announced November 2018.

Report number: HSF-CWP-2017-15

arXiv:1810.01191 [pdf, other]

HEP Software Foundation Community White Paper Working Group - Data and Software Preservation to Enable Reuse

Authors: M. D. Hildreth, A. Boehnlein, K. Cranmer, S. Dallmeier, R. Gardner, T. Hacker, L. Heinrich, I. Jimenez, M. Kane, D. S. Katz, T. Malik, C. Maltzahn, M. Neubauer, S. Neubert, Jim Pivarski, E. Sexton, J. Shiers, T. Simko, S. Smith, D. South, A. Verbytskyi, G. Watts, J. Wozniak

Abstract: In this chapter of the High Energy Physics Software Foundation Community Whitepaper, we discuss the current state of infrastructure, best practices, and ongoing developments in the area of data and software preservation in high energy physics. A re-framing of the motivation for preservation to enable re-use is presented. A series of research and development goals in software and other cyberinfrast… ▽ More In this chapter of the High Energy Physics Software Foundation Community Whitepaper, we discuss the current state of infrastructure, best practices, and ongoing developments in the area of data and software preservation in high energy physics. A re-framing of the motivation for preservation to enable re-use is presented. A series of research and development goals in software and other cyberinfrastructure that will aid in the enabling of reuse of particle physics analyses and production software are presented and discussed. △ Less

Submitted 2 October, 2018; originally announced October 2018.

Report number: HSF-CWP-2017-06

arXiv:1807.02876 [pdf, other]

Machine Learning in High Energy Physics Community White Paper

Authors: Kim Albertsson, Piero Altoe, Dustin Anderson, John Anderson, Michael Andrews, Juan Pedro Araque Espinosa, Adam Aurisano, Laurent Basara, Adrian Bevan, Wahid Bhimji, Daniele Bonacorsi, Bjorn Burkle, Paolo Calafiura, Mario Campanelli, Louis Capps, Federico Carminati, Stefano Carrazza, Yi-fan Chen, Taylor Childers, Yann Coadou, Elias Coniavitis, Kyle Cranmer, Claire David, Douglas Davis, Andrea De Simone , et al. (103 additional authors not shown)

Abstract: Machine learning has been applied to several problems in particle physics research, beginning with applications to high-level physics analysis in the 1990s and 2000s, followed by an explosion of applications in particle and event identification and reconstruction in the 2010s. In this document we discuss promising future research and development areas for machine learning in particle physics. We d… ▽ More Machine learning has been applied to several problems in particle physics research, beginning with applications to high-level physics analysis in the 1990s and 2000s, followed by an explosion of applications in particle and event identification and reconstruction in the 2010s. In this document we discuss promising future research and development areas for machine learning in particle physics. We detail a roadmap for their implementation, software and hardware resource requirements, collaborative initiatives with the data science community, academia and industry, and training the particle physics community in data science. The main objective of the document is to connect and motivate these areas of research and development with the physics drivers of the High-Luminosity Large Hadron Collider and future neutrino experiments and identify the resource needs for their implementation. Additionally we identify areas where collaboration with external communities will be of great benefit. △ Less

Submitted 16 May, 2019; v1 submitted 8 July, 2018; originally announced July 2018.

Comments: Editors: Sergei Gleyzer, Paul Seyfert and Steven Schramm

arXiv:1804.03983 [pdf, other]

HEP Software Foundation Community White Paper Working Group - Data Analysis and Interpretation

Authors: Lothar Bauerdick, Riccardo Maria Bianchi, Brian Bockelman, Nuno Castro, Kyle Cranmer, Peter Elmer, Robert Gardner, Maria Girone, Oliver Gutsche, Benedikt Hegner, José M. Hernández, Bodhitha Jayatilaka, David Lange, Mark S. Neubauer, Daniel S. Katz, Lukasz Kreczko, James Letts, Shawn McKee, Christoph Paus, Kevin Pedro, Jim Pivarski, Martin Ritter, Eduardo Rodrigues, Tai Sakuma, Elizabeth Sexton-Kennedy , et al. (4 additional authors not shown)

Abstract: At the heart of experimental high energy physics (HEP) is the development of facilities and instrumentation that provide sensitivity to new phenomena. Our understanding of nature at its most fundamental level is advanced through the analysis and interpretation of data from sophisticated detectors in HEP experiments. The goal of data analysis systems is to realize the maximum possible scientific po… ▽ More At the heart of experimental high energy physics (HEP) is the development of facilities and instrumentation that provide sensitivity to new phenomena. Our understanding of nature at its most fundamental level is advanced through the analysis and interpretation of data from sophisticated detectors in HEP experiments. The goal of data analysis systems is to realize the maximum possible scientific potential of the data within the constraints of computing and human resources in the least time. To achieve this goal, future analysis systems should empower physicists to access the data with a high level of interactivity, reproducibility and throughput capability. As part of the HEP Software Foundation Community White Paper process, a working group on Data Analysis and Interpretation was formed to assess the challenges and opportunities in HEP data analysis and develop a roadmap for activities in this area over the next decade. In this report, the key findings and recommendations of the Data Analysis and Interpretation Working Group are presented. △ Less

Submitted 9 April, 2018; originally announced April 2018.

Comments: arXiv admin note: text overlap with arXiv:1712.06592

Report number: HSF-CWP-2017-05

arXiv:1712.06982 [pdf, other]

doi 10.1007/s41781-018-0018-8

A Roadmap for HEP Software and Computing R&D for the 2020s

Authors: Johannes Albrecht, Antonio Augusto Alves Jr, Guilherme Amadio, Giuseppe Andronico, Nguyen Anh-Ky, Laurent Aphecetche, John Apostolakis, Makoto Asai, Luca Atzori, Marian Babik, Giuseppe Bagliesi, Marilena Bandieramonte, Sunanda Banerjee, Martin Barisits, Lothar A. T. Bauerdick, Stefano Belforte, Douglas Benjamin, Catrin Bernius, Wahid Bhimji, Riccardo Maria Bianchi, Ian Bird, Catherine Biscarat, Jakob Blomer, Kenneth Bloom, Tommaso Boccali , et al. (285 additional authors not shown)

Abstract: Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for… ▽ More Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for the HL-LHC in particular, it is critical that all of the collaborating stakeholders agree on the software goals and priorities, and that the efforts complement each other. In this spirit, this white paper describes the R&D activities required to prepare for this software upgrade. △ Less

Submitted 19 December, 2018; v1 submitted 18 December, 2017; originally announced December 2017.

Report number: HSF-CWP-2017-01

Journal ref: Comput Softw Big Sci (2019) 3, 7

arXiv:1711.02659 [pdf, other]

Optimizing ROOT IO For Analysis

Authors: Brian Bockelman, Zhe Zhang, Jim Pivarski

Abstract: The ROOT I/O (RIO) subsystem is foundational to most HEP experiments - it provides a file format, a set of APIs/semantics, and a reference implementation in C++. It is often found at the base of an experiment's framework and is used to serialize the experiment's data; in the case of an LHC experiment, this may be hundreds of petabytes of files! Individual physicists will further use RIO to perform… ▽ More The ROOT I/O (RIO) subsystem is foundational to most HEP experiments - it provides a file format, a set of APIs/semantics, and a reference implementation in C++. It is often found at the base of an experiment's framework and is used to serialize the experiment's data; in the case of an LHC experiment, this may be hundreds of petabytes of files! Individual physicists will further use RIO to perform their end-stage analysis, reading from intermediate files they generate from experiment data. RIO is thus incredibly flexible: it must serve as a file format for archival (optimized for space) and for working data (optimized for read speed). To date, most of the technical work has focused on improving the former use case. We present work designed to help improve RIO for analysis. We analyze the real-world impact of LZ4 to decrease decompression times (and the corresponding cost in disk space). We introduce new APIs that read RIO data in bulk, removing the per-event overhead of a C++ function call. We compare the performance with the existing RIO APIs for simple structure data and show how this can be complimentary with efforts to improve the parallelism of the RIO stack. △ Less

Submitted 7 November, 2017; originally announced November 2017.

Comments: 18th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT)

arXiv:1711.01229 [pdf, other]

Toward real-time data query systems in HEP

Authors: Jim Pivarski, David Lange, Thanat Jatuphattharachat

Abstract: Exploratory data analysis tools must respond quickly to a user's questions, so that the answer to one question (e.g. a visualized histogram or fit) can influence the next. In some SQL-based query systems used in industry, even very large (petabyte) datasets can be summarized on a human timescale (seconds), employing techniques such as columnar data representation, caching, indexing, and code gener… ▽ More Exploratory data analysis tools must respond quickly to a user's questions, so that the answer to one question (e.g. a visualized histogram or fit) can influence the next. In some SQL-based query systems used in industry, even very large (petabyte) datasets can be summarized on a human timescale (seconds), employing techniques such as columnar data representation, caching, indexing, and code generation/JIT-compilation. This article describes progress toward realizing such a system for High Energy Physics (HEP), focusing on the intermediate problems of optimizing data access and calculations for "query sized" payloads, such as a single histogram or group of histograms, rather than large reconstruction or data-skimming jobs. These techniques include direct extraction of ROOT TBranches into Numpy arrays and compilation of Python analysis functions (rather than SQL) to be executed very quickly. We will also discuss the problem of caching and actively delivering jobs to worker nodes that have the necessary input data preloaded in cache. All of these pieces of the larger solution are available as standalone GitHub repositories, and could be used in current analyses. △ Less

Submitted 8 November, 2017; v1 submitted 3 November, 2017; originally announced November 2017.

Comments: 6 pages, 2 figures, proceedings for ACAT 2017

arXiv:1711.00375 [pdf, other]

CMS Analysis and Data Reduction with Apache Spark

Authors: Oliver Gutsche, Luca Canali, Illia Cremer, Matteo Cremonesi, Peter Elmer, Ian Fisk, Maria Girone, Bo Jayatilaka, Jim Kowalkowski, Viktor Khristenko, Evangelos Motesnitsalis, Jim Pivarski, Saba Sehrish, Kacper Surdy, Alexey Svyatkovskiy

Abstract: Experimental Particle Physics has been at the forefront of analyzing the world's largest datasets for decades. The HEP community was among the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems for distributed data processing, collectively called "Big Data" technologies have emerged from industry and open source projects to support the a… ▽ More Experimental Particle Physics has been at the forefront of analyzing the world's largest datasets for decades. The HEP community was among the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems for distributed data processing, collectively called "Big Data" technologies have emerged from industry and open source projects to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of data analysis in HEP have not changed (filtering and transforming experiment-specific data formats), these new technologies use different approaches and tools, promising a fresh look at analysis of very large datasets that could potentially reduce the time-to-physics with increased interactivity. Moreover these new tools are typically actively developed by large communities, often profiting of industry resources, and under open source licensing. These factors result in a boost for adoption and maturity of the tools and for the communities supporting them, at the same time hel** in reducing the cost of ownership for the end-users. In this talk, we are presenting studies of using Apache Spark for end user data analysis. We are studying the HEP analysis workflow separated into two thrusts: the reduction of centrally produced experiment datasets and the end-analysis up to the publication plot. Studying the first thrust, CMS is working together with CERN openlab and Intel on the CMS Big Data Reduction Facility. The goal is to reduce 1 PB of official CMS data to 1 TB of ntuple output for analysis. We are presenting the progress of this 2-year project with first results of scaling up Spark-based HEP analysis. Studying the second thrust, we are presenting studies on using Apache Spark for a CMS Dark Matter physics search, comparing Spark's feasibility, usability and performance to the ROOT-based analysis. △ Less

Submitted 31 October, 2017; originally announced November 2017.

Comments: Proceedings for 18th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2017). arXiv admin note: text overlap with arXiv:1703.04171

arXiv:1708.08319 [pdf, other]

Fast Access to Columnar, Hierarchically Nested Data via Code Transformation

Authors: Jim Pivarski, Peter Elmer, Brian Bockelman, Zhe Zhang

Abstract: Big Data query systems represent data in a columnar format for fast, selective access, and in some cases (e.g. Apache Drill), perform calculations directly on the columnar data without row materialization, avoiding runtime costs. However, many analysis procedures cannot be easily or efficiently expressed as SQL. In High Energy Physics, the majority of data processing requires nested loops with c… ▽ More Big Data query systems represent data in a columnar format for fast, selective access, and in some cases (e.g. Apache Drill), perform calculations directly on the columnar data without row materialization, avoiding runtime costs. However, many analysis procedures cannot be easily or efficiently expressed as SQL. In High Energy Physics, the majority of data processing requires nested loops with complex dependencies. When faced with tasks like these, the conventional approach is to convert the columnar data back into an object form, usually with a performance price. This paper describes a new technique to transform procedural code so that it operates on hierarchically nested, columnar data natively, without row materialization. It can be viewed as a compiler pass on the typed abstract syntax tree, rewriting references to objects as columnar array lookups. We will also present performance comparisons between transformed code and conventional object-oriented code in a High Energy Physics context. △ Less

Submitted 3 November, 2017; v1 submitted 20 August, 2017; originally announced August 2017.

Comments: 10 pages, 2 figures, submitted to IEEE Big Data

arXiv:1703.04171 [pdf, other]

doi 10.1088/1742-6596/898/7/072012

Big Data in HEP: A comprehensive use case study

Authors: Oliver Gutsche, Matteo Cremonesi, Peter Elmer, Bo Jayatilaka, Jim Kowalkowski, Jim Pivarski, Saba Sehrish, Cristina Mantilla Surez, Alexey Svyatkovskiy, Nhan Tran

Abstract: Experimental Particle Physics has been at the forefront of analyzing the worlds largest datasets for decades. The HEP community was the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems collectively called Big Data technologies have emerged to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of dat… ▽ More Experimental Particle Physics has been at the forefront of analyzing the worlds largest datasets for decades. The HEP community was the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems collectively called Big Data technologies have emerged to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of data analysis in HEP have not changed (filtering and transforming experiment-specific data formats), these new technologies use different approaches and promise a fresh look at analysis of very large datasets and could potentially reduce the time-to-physics with increased interactivity. In this talk, we present an active LHC Run 2 analysis, searching for dark matter with the CMS detector, as a testbed for Big Data technologies. We directly compare the traditional NTuple-based analysis with an equivalent analysis using Apache Spark on the Hadoop ecosystem and beyond. In both cases, we start the analysis with the official experiment data formats and produce publication physics plots. We will discuss advantages and disadvantages of each approach and give an outlook on further studies needed. △ Less

Submitted 12 March, 2017; originally announced March 2017.

Comments: Proceedings for 22nd International Conference on Computing in High Energy and Nuclear Physics (CHEP 2016)

arXiv:1602.06888 [pdf, other]

doi 10.1109/BigDataService.2016.39

The Matsu Wheel: A Cloud-based Framework for Efficient Analysis and Reanalysis of Earth Satellite Imagery

Authors: Maria T Patterson, Nikolas Anderson, Collin Bennett, Jacob Bruggemann, Robert Grossman, Matthew Handy, Vuong Ly, Dan Mandl, Shane Pederson, Jim Pivarski, Ray Powell, Jonathan Spring, Walt Wells

Abstract: Project Matsu is a collaboration between the Open Commons Consortium and NASA focused on develo** open source technology for the cloud-based processing of Earth satellite imagery. A particular focus is the development of applications for detecting fires and floods to help support natural disaster detection and relief. Project Matsu has developed an open source cloud-based infrastructure to proce… ▽ More Project Matsu is a collaboration between the Open Commons Consortium and NASA focused on develo** open source technology for the cloud-based processing of Earth satellite imagery. A particular focus is the development of applications for detecting fires and floods to help support natural disaster detection and relief. Project Matsu has developed an open source cloud-based infrastructure to process, analyze, and reanalyze large collections of hyperspectral satellite image data using OpenStack, Hadoop, MapReduce, Storm and related technologies. We describe a framework for efficient analysis of large amounts of data called the Matsu "Wheel." The Matsu Wheel is currently used to process incoming hyperspectral satellite data produced daily by NASA's Earth Observing-1 (EO-1) satellite. The framework is designed to be able to support scanning queries using cloud computing applications, such as Hadoop and Accumulo. A scanning query processes all, or most of the data, in a database or data repository. We also describe our preliminary Wheel analytics, including an anomaly detector for rare spectral signatures or thermal anomalies in hyperspectral data and a land cover classifier that can be used for water and flood detection. Each of these analytics can generate visual reports accessible via the web for the public and interested decision makers. The resultant products of the analytics are also made accessible through an Open Geospatial Compliant (OGC)-compliant Web Map Service (WMS) for further distribution. The Matsu Wheel allows many shared data services to be performed together to efficiently use resources for processing hyperspectral satellite image data and other, e.g., large environmental datasets that may be analyzed for many purposes. △ Less

Submitted 22 February, 2016; originally announced February 2016.

Comments: 10 pages, accepted for presentation to IEEE BigDataService 2016

arXiv:1002.1956 [pdf, ps, other]

doi 10.1103/PhysRevD.81.075021

LHC discovery potential of the lightest NMSSM Higgs in the h1 -> a1 a1 -> 4 muons channel

Authors: Alexander Belyaev, Jim Pivarski, Alexei Safonov, Sergey Senkin, Aysen Tatarinov

Abstract: We explore the potential of the Large Hadron Collider to observe the h1 -> a1 a1 -> 4 muons signal from the lightest scalar Higgs boson (h1) decaying into the two lightest pseudoscalar Higgs bosons (a1), followed by their decays into four muons in the Next-to-Minimal Supersymmetric Standard Model (NMSSM). The signature under study applies to the region of the NMSSM parameter space in which m_a1… ▽ More We explore the potential of the Large Hadron Collider to observe the h1 -> a1 a1 -> 4 muons signal from the lightest scalar Higgs boson (h1) decaying into the two lightest pseudoscalar Higgs bosons (a1), followed by their decays into four muons in the Next-to-Minimal Supersymmetric Standard Model (NMSSM). The signature under study applies to the region of the NMSSM parameter space in which m_a1 < 2 m_tau, which has not been studied previously. In such a scenario, the suggested strategy of searching for a four-muon signal with the appropriate background suppression would provide a powerful method to discover the lightest CP-even and CP-odd NMSSM Higgs bosons h1 and a1. △ Less

Submitted 3 March, 2010; v1 submitted 9 February, 2010; originally announced February 2010.

Comments: 12 pages, 11 figures; added more discussion of collider constraints

Journal ref: Phys.Rev.D81:075021,2010

arXiv:hep-ex/0604026 [pdf, ps, other]

A High-Precision Measurement of the Di-Electron Widths of the Upsilon(1S), Upsilon(2S), and Upsilon(3S) Mesons at CLEO-III

Authors: J. Pivarski

Abstract: The di-electron width of an Upsilon meson is the decay rate of the Upsilon into an electron-positron pair, expressed in units of energy. We measure the di-electron width of the Upsilon(1S) meson to be 1.354 +- 0.004 +- 0.020 keV (the first uncertainty is statistical and the second is systematic), the di-electron width of the Upsilon(2S) to be 0.619 +- 0.004 +- 0.010 keV and that of the Upsilon(3… ▽ More The di-electron width of an Upsilon meson is the decay rate of the Upsilon into an electron-positron pair, expressed in units of energy. We measure the di-electron width of the Upsilon(1S) meson to be 1.354 +- 0.004 +- 0.020 keV (the first uncertainty is statistical and the second is systematic), the di-electron width of the Upsilon(2S) to be 0.619 +- 0.004 +- 0.010 keV and that of the Upsilon(3S) to be 0.446 +- 0.004 +- 0.007 keV. We determine these values with better than 2% precision by integrating the Upsilon production cross-section from electron-positron collisions over their collision energy. Our incident electrons and positrons were accelerated and collided in the Cornell Electron Storage Ring, and the Upsilon decay products were observed by the CLEO-III detector. The di-electron widths probe the wavefunctions of the Strongly-interacting bottom quarks that constitute the three Upsilon mesons, information which is especially interesting to check high-precision Lattice QCD calculations of the nuclear Strong force. △ Less

Submitted 17 August, 2007; v1 submitted 12 April, 2006; originally announced April 2006.

Comments: 160 pages, 73 figures, Ph.D. dissertation, also available through http://www.lepp.cornell.edu/public/THESIS/2006/ and http://hdl.handle.net/1813/2672, see hep-ex/0512056; corrected numerical values in abstract

Report number: Cornell University Laboratory of Elementary Particle Physics THESIS 06-1

arXiv:hep-ph/0507214 [pdf, ps, other]

Testing Cosmology at the ILC

Authors: A. Birkedal, K. Matchev, J. Alexander, K. Ecklund, L. Fields, R. C. Gray, D. Hertz, C. D. Jones, J. Pivarski

Abstract: We investigate the capabilities for the LHC and the ILC to perform measurements of new physics parameters relevant for the calculation of the cosmological relic abundance of the lightest neutralino in supersymmetry. Specifically, we delineate the range of values for the cold dark matter relic abundance $Ω_χ h^2$, which will be consistent with the expected precision measurements at the LHC, and,… ▽ More We investigate the capabilities for the LHC and the ILC to perform measurements of new physics parameters relevant for the calculation of the cosmological relic abundance of the lightest neutralino in supersymmetry. Specifically, we delineate the range of values for the cold dark matter relic abundance $Ω_χ h^2$, which will be consistent with the expected precision measurements at the LHC, and, subsequently, at the ILC. We illustrate our approach with a toy study of an "updated benchmark" point B'. We then show some preliminary results of a similar analysis along those lines of the LCC2 benchmark point in the focus point region. △ Less

Submitted 18 July, 2005; originally announced July 2005.

Comments: 6 pages, 4 figures. Based on talks given by A. Birkedal at Linear Collider workshops in 2004 and 2005

Report number: UFIFT-HEP-05-11

Journal ref: ECONF C050318:0708,2005

arXiv:hep-ex/0507008 [pdf, ps, other]

Measuring Mass and Cross Section Parameters at a Focus Point Region

Authors: R. Gray, J. Alexander, K. M. Ecklund, L. Fields, D. Hertz, C. D. Jones, J. Pivarski, A. Birkedal, K. Matchev

Abstract: The purpose of this study is to determine the experimental uncertainties in measuring mass and cross section parameters of SUSY particles at a 500 GeV Linear Collider. In this study SUSY is a point in the focus point region of mSUGRA parameter space that is compatible with WMAP constraints on dark matter relic density. At this study point the masses of the squarks and sleptons are very heavy, an… ▽ More The purpose of this study is to determine the experimental uncertainties in measuring mass and cross section parameters of SUSY particles at a 500 GeV Linear Collider. In this study SUSY is a point in the focus point region of mSUGRA parameter space that is compatible with WMAP constraints on dark matter relic density. At this study point the masses of the squarks and sleptons are very heavy, and the only SUSY particles accessible at the Linear Collider would be the three lightest neutralinos, and the two lightest charginos: nino1, nino2, nino3, cino1+, cino2+, where nino1 is the lightest supersymmetric particle (LSP). The charginos or neutralinos may be pair produced, and the subsequent decay cascades to the LSP allow us to measure the SUSY couplings and mass spectrum. We find that by looking for the signature 2 jets plus 2 leptons plus missing energy we can determine the mass of the LSP to within 1 GeV uncertainty and that the mass differences of nino2 and nino3 with the LSP mass can be determined to better than 0.5 GeV. △ Less

Submitted 1 July, 2005; originally announced July 2005.

Comments: Invited talk at 2005 International Linear Collider Workshop, Stanford Ca (LCWS05) 6 pages, LaTex, 2 eps figures

Report number: CLNS 05/1924

Journal ref: ECONF C050318:0711,2005

Showing 1–34 of 34 results for author: Pivarski, J