-
Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs
Authors:
John Tramm,
Paul Romano,
Patrick Shriwise,
Amanda Lund,
Johannes Doerfert,
Patrick Steinbrecher,
Andrew Siegel,
Gavin Ridley
Abstract:
OpenMC is an open source Monte Carlo neutral particle transport application that has recently been ported to GPU using the OpenMP target offloading model. We examine the performance of OpenMC at scale on the Frontier, Polaris, and Aurora supercomputers, demonstrating that performance portability has been achieved by OpenMC across all three major GPU vendors (AMD, NVIDIA, and Intel). OpenMC's GPU p…
▽ More
OpenMC is an open source Monte Carlo neutral particle transport application that has recently been ported to GPU using the OpenMP target offloading model. We examine the performance of OpenMC at scale on the Frontier, Polaris, and Aurora supercomputers, demonstrating that performance portability has been achieved by OpenMC across all three major GPU vendors (AMD, NVIDIA, and Intel). OpenMC's GPU performance is compared to both the traditional CPU-based version of OpenMC as well as several other state-of-the-art CPU-based Monte Carlo particle transport applications. We also provide historical context by analyzing OpenMC's performance on several legacy GPU and CPU architectures. This work includes some of the first published results for a scientific simulation application at scale on a supercomputer featuring Intel's Max series "Ponte Vecchio" GPUs. It is also one of the first demonstrations of a large scientific production application using the OpenMP target offloading model to achieve high performance on all three major GPU platforms.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
C2P-GCN: Cell-to-Patch Graph Convolutional Network for Colorectal Cancer Grading
Authors:
Sudipta Paul,
Bulent Yener,
Amanda W. Lund
Abstract:
Graph-based learning approaches, due to their ability to encode tissue/organ structure information, are increasingly favored for grading colorectal cancer histology images. Recent graph-based techniques involve dividing whole slide images (WSIs) into smaller or medium-sized patches, and then building graphs on each patch for direct use in training. This method, however, fails to capture the tissue…
▽ More
Graph-based learning approaches, due to their ability to encode tissue/organ structure information, are increasingly favored for grading colorectal cancer histology images. Recent graph-based techniques involve dividing whole slide images (WSIs) into smaller or medium-sized patches, and then building graphs on each patch for direct use in training. This method, however, fails to capture the tissue structure information present in an entire WSI and relies on training from a significantly large dataset of image patches. In this paper, we propose a novel cell-to-patch graph convolutional network (C2P-GCN), which is a two-stage graph formation-based approach. In the first stage, it forms a patch-level graph based on the cell organization on each patch of a WSI. In the second stage, it forms an image-level graph based on a similarity measure between patches of a WSI considering each patch as a node of a graph. This graph representation is then fed into a multi-layer GCN-based classification network. Our approach, through its dual-phase graph construction, effectively gathers local structural details from individual patches and establishes a meaningful connection among all patches across a WSI. As C2P-GCN integrates the structural data of an entire WSI into a single graph, it allows our model to work with significantly fewer training data compared to the latest models for colorectal cancer. Experimental validation of C2P-GCN on two distinct colorectal cancer datasets demonstrates the effectiveness of our method.
△ Less
Submitted 13 May, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Generation and Classification of Activity Sequences for Spatiotemporal Modeling of Human Populations
Authors:
Albert M Lund,
Ramkiran Gouripeddi,
Julio C Facelli
Abstract:
Human activity encompasses a series of complex spatiotemporal processes that are difficult to model, but represents an essential component of human exposure assessment. A significant empirical data source like the American Time Use Survey (ATUS) can be leveraged to model human activity, but tractable models require a better stratification of activity data to inform about different, but classifiabl…
▽ More
Human activity encompasses a series of complex spatiotemporal processes that are difficult to model, but represents an essential component of human exposure assessment. A significant empirical data source like the American Time Use Survey (ATUS) can be leveraged to model human activity, but tractable models require a better stratification of activity data to inform about different, but classifiable groups of individuals that exhibit similar activities and mobility patterns. We have developed a simple unsupervised classification and sequence generation method from existing machine learning algorithms that is capable of generating coherent and stochastic sequences of activity from the data in the ATUS. This classification, when combined with any spatiotemporal exposure profile, allows the development of stochastic models of exposure patterns for groups of individuals exhibiting similar activity behaviors.
△ Less
Submitted 11 November, 2019;
originally announced November 2019.
-
Quantum Correlations in Nonlocal BosonSampling
Authors:
Farid Shahandeh,
Austin P. Lund,
Timothy C. Ralph
Abstract:
Determination of the quantum nature of correlations between two spatially separated systems plays a crucial role in quantum information science. Of particular interest is the questions of if and how these correlations enable quantum information protocols to be more powerful. Here, we report on a distributed quantum computation protocol in which the input and output quantum states are considered to…
▽ More
Determination of the quantum nature of correlations between two spatially separated systems plays a crucial role in quantum information science. Of particular interest is the questions of if and how these correlations enable quantum information protocols to be more powerful. Here, we report on a distributed quantum computation protocol in which the input and output quantum states are considered to be classically correlated in quantum informatics. Nevertheless, we show that the correlations between the outcomes of the measurements on the output state cannot be efficiently simulated using classical algorithms. Crucially, at the same time, local measurement outcomes can be efficiently simulated on classical computers. We show that the only known classicality criterion violated by the input and output states in our protocol is the one used in quantum optics, namely, phase-space nonclassicality. As a result, we argue that the global phase-space nonclassicality inherent within the output state of our protocol represents true quantum correlations.
△ Less
Submitted 7 February, 2017;
originally announced February 2017.
-
Event-based, 6-DOF Camera Tracking from Photometric Depth Maps
Authors:
Guillermo Gallego,
Jon E. A. Lund,
Elias Mueggler,
Henri Rebecq,
Tobi Delbruck,
Davide Scaramuzza
Abstract:
Event cameras are bio-inspired vision sensors that output pixel-level brightness changes instead of standard intensity frames. These cameras do not suffer from motion blur and have a very high dynamic range, which enables them to provide reliable visual information during high-speed motions or in scenes characterized by high dynamic range. These features, along with a very low power consumption, m…
▽ More
Event cameras are bio-inspired vision sensors that output pixel-level brightness changes instead of standard intensity frames. These cameras do not suffer from motion blur and have a very high dynamic range, which enables them to provide reliable visual information during high-speed motions or in scenes characterized by high dynamic range. These features, along with a very low power consumption, make event cameras an ideal complement to standard cameras for VR/AR and video game applications. With these applications in mind, this paper tackles the problem of accurate, low-latency tracking of an event camera from an existing photometric depth map (i.e., intensity plus depth information) built via classic dense reconstruction pipelines. Our approach tracks the 6-DOF pose of the event camera upon the arrival of each event, thus virtually eliminating latency. We successfully evaluate the method in both indoor and outdoor scenes and show that---because of the technological advantages of the event camera---our pipeline works in scenes characterized by high-speed motion, which are still unaccessible to standard cameras.
△ Less
Submitted 31 October, 2017; v1 submitted 12 July, 2016;
originally announced July 2016.
-
Applying Seamful Design in Location-based Mobile Museum Applications
Authors:
Tommy Nilsson,
Carl Hogsden,
Charith Perera,
Saeed Aghaee,
David Scruton,
Andreas Lund,
Alan F. Blackwell
Abstract:
The application of mobile computing is currently altering patterns of our behavior to a greater degree than perhaps any other invention. In combination with the introduction of power efficient wireless communication technologies, such as Bluetooth Low Energy (BLE), designers are today increasingly empowered to shape the way we interact with our physical surroundings and thus build entirely new exp…
▽ More
The application of mobile computing is currently altering patterns of our behavior to a greater degree than perhaps any other invention. In combination with the introduction of power efficient wireless communication technologies, such as Bluetooth Low Energy (BLE), designers are today increasingly empowered to shape the way we interact with our physical surroundings and thus build entirely new experiences. However, our evaluations of BLE and its abilities to facilitate mobile location-based experiences in public environments revealed a number of potential problems. Most notably, the position and orientation of the user in combination with various environmental factors, such as crowds of people traversing the space, were found to cause major fluctuations of the received BLE signal strength. These issues are rendering a seamless functioning of any location-based application practically impossible. Instead of achieving seamlessness by eliminating these technical issues, we thus choose to advocate the use of a seamful approach, i.e. to reveal and exploit these problems and turn them into a part of the actual experience. In order to demonstrate the viability of this approach, we designed, implemented and evaluated the Ghost Detector - an educational location-based museum game for children. By presenting a qualitative evaluation of this game and by motivating our design decisions, this paper provides insight into some of the challenges and possible solutions connected to the process of develo** location-based BLE-enabled experiences for public cultural spaces.
△ Less
Submitted 8 June, 2016; v1 submitted 18 May, 2016;
originally announced May 2016.
-
Fusion of Array Operations at Runtime
Authors:
Mads R. B. Kristensen,
Simon A. F. Lund,
Troels Blum,
James Avery
Abstract:
We address the problem of fusing array operations based on criteria such as shape compatibility, data reusability, and communication. We formulate the problem as a graph partition problem that is general enough to handle loop fusion, combinator fusion, and other types of subroutines.
We address the problem of fusing array operations based on criteria such as shape compatibility, data reusability, and communication. We formulate the problem as a graph partition problem that is general enough to handle loop fusion, combinator fusion, and other types of subroutines.
△ Less
Submitted 21 January, 2016; v1 submitted 20 January, 2016;
originally announced January 2016.
-
What can quantum optics say about computational complexity theory?
Authors:
Saleh Rahimi-Keshari,
Austin P. Lund,
Timothy C. Ralph
Abstract:
Considering the problem of sampling from the output photon-counting probability distribution of a linear-optical network for input Gaussian states, we obtain results that are of interest from both quantum theory and the computational complexity theory point of view. We derive a general formula for calculating the output probabilities, and by considering input thermal states, we show that the outpu…
▽ More
Considering the problem of sampling from the output photon-counting probability distribution of a linear-optical network for input Gaussian states, we obtain results that are of interest from both quantum theory and the computational complexity theory point of view. We derive a general formula for calculating the output probabilities, and by considering input thermal states, we show that the output probabilities are proportional to permanents of positive-semidefinite Hermitian matrices. It is believed that approximating permanents of complex matrices in general is a #P-hard problem. However, we show that these permanents can be approximated with an algorithm in BPP^NP complexity class, as there exists an efficient classical algorithm for sampling from the output probability distribution. We further consider input squeezed-vacuum states and discuss the complexity of sampling from the probability distribution at the output.
△ Less
Submitted 12 February, 2015; v1 submitted 16 August, 2014;
originally announced August 2014.
-
cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications
Authors:
Mads Ruben Burgdorff Kristensen,
Simon Andreas Frimann Lund,
Troels Blum,
Brian Vinter
Abstract:
Modern processor architectures, in addition to having still more cores, also require still more consideration to memory-layout in order to run at full capacity. The usefulness of most languages is deprecating as their abstractions, structures or objects are hard to map onto modern processor architectures efficiently.
The work in this paper introduces a new abstract machine framework, cphVB, that…
▽ More
Modern processor architectures, in addition to having still more cores, also require still more consideration to memory-layout in order to run at full capacity. The usefulness of most languages is deprecating as their abstractions, structures or objects are hard to map onto modern processor architectures efficiently.
The work in this paper introduces a new abstract machine framework, cphVB, that enables vector oriented high-level programming languages to map onto a broad range of architectures efficiently. The idea is to close the gap between high-level languages and hardware optimized low-level implementations. By translating high-level vector operations into an intermediate vector bytecode, cphVB enables specialized vector engines to efficiently execute the vector operations.
The primary success parameters are to maintain a complete abstraction from low-level details and to provide efficient code execution across different, modern, processors. We evaluate the presented design through a setup that targets multi-core CPU architectures. We evaluate the performance of the implementation using Python implementations of well-known algorithms: a jacobi solver, a kNN search, a shallow water simulation and a synthetic stencil simulation. All demonstrate good performance.
△ Less
Submitted 25 March, 2013; v1 submitted 26 October, 2012;
originally announced October 2012.