-
IAMCV Multi-Scenario Vehicle Interaction Dataset
Authors:
Novel Certad,
Enrico del Re,
Helena Korndörfer,
Gregory Schröder,
Walter Morales-Alvarez,
Sebastian Tschernuth,
Delgermaa Gankhuyag,
Luigi del Re,
Cristina Olaverri-Monreal
Abstract:
The acquisition and analysis of high-quality sensor data constitute an essential requirement in sha** the development of fully autonomous driving systems. This process is indispensable for enhancing road safety and ensuring the effectiveness of the technological advancements in the automotive industry. This study introduces the Interaction of Autonomous and Manually-Controlled Vehicles (IAMCV) d…
▽ More
The acquisition and analysis of high-quality sensor data constitute an essential requirement in sha** the development of fully autonomous driving systems. This process is indispensable for enhancing road safety and ensuring the effectiveness of the technological advancements in the automotive industry. This study introduces the Interaction of Autonomous and Manually-Controlled Vehicles (IAMCV) dataset, a novel and extensive dataset focused on inter-vehicle interactions. The dataset, enriched with a sophisticated array of sensors such as Light Detection and Ranging, cameras, Inertial Measurement Unit/Global Positioning System, and vehicle bus data acquisition, provides a comprehensive representation of real-world driving scenarios that include roundabouts, intersections, country roads, and highways, recorded across diverse locations in Germany. Furthermore, the study shows the versatility of the IAMCV dataset through several proof-of-concept use cases. Firstly, an unsupervised trajectory clustering algorithm illustrates the dataset's capability in categorizing vehicle movements without the need for labeled training data. Secondly, we compare an online camera calibration method with the Robot Operating System-based standard, using images captured in the dataset. Finally, a preliminary test employing the YOLOv8 object-detection model is conducted, augmented by reflections on the transferability of object detection across various LIDAR resolutions. These use cases underscore the practical utility of the collected dataset, emphasizing its potential to advance research and innovation in the area of intelligent vehicles.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
LB4OMP: A Dynamic Load Balancing Library for Multithreaded Applications
Authors:
Jonas H. Müller Korndörfer,
Ahmed Eleliemy,
Ali Mohammed,
Florina M. Ciorba
Abstract:
Exascale computing systems will exhibit high degrees of hierarchical parallelism, with thousands of computing nodes and hundreds of cores per node.
Efficiently exploiting hierarchical parallelism is challenging due to load imbalance that arises at multiple levels.
OpenMP is the most widely-used standard for expressing and exploiting the ever-increasing node-level parallelism.
The scheduling…
▽ More
Exascale computing systems will exhibit high degrees of hierarchical parallelism, with thousands of computing nodes and hundreds of cores per node.
Efficiently exploiting hierarchical parallelism is challenging due to load imbalance that arises at multiple levels.
OpenMP is the most widely-used standard for expressing and exploiting the ever-increasing node-level parallelism.
The scheduling options in OpenMP are insufficient to address the load imbalance that arises during the execution of multithreaded applications.
The limited scheduling options in OpenMP hinder research on novel scheduling techniques which require comparison with others from the literature.
This work introduces LB4OMP, an open-source dynamic load balancing library that implements successful scheduling algorithms from the literature.
LB4OMP is a research infrastructure designed to spur and support present and future scheduling research, for the benefit of multithreaded applications performance.
Through an extensive performance analysis campaign, we assess the effectiveness and demystify the performance of all loop scheduling techniques in the library.
We show that, for numerous applications-systems pairs, the scheduling techniques in LB4OMP outperform the scheduling options in OpenMP.
Node-level load balancing using LB4OMP leads to reduced cross-node load imbalance and to improved MPI+OpenMP applications performance, which is critical for Exascale computing.
△ Less
Submitted 9 June, 2021;
originally announced June 2021.
-
Map** Matters: Application Process Map** on 3-D Processor Topologies
Authors:
Jonas H. Müller Korndörfer,
Mario Bielert,
Laércio L. Pilla,
Florina M. Ciorba
Abstract:
Applications' performance is influenced by the map** of processes to computing nodes, the frequency and volume of exchanges among processing elements, the network capacity, and the routing protocol. A poor map** of application processes degrades performance and wastes resources. Process map** is frequently ignored as an explicit optimization step since the system typically offers a default m…
▽ More
Applications' performance is influenced by the map** of processes to computing nodes, the frequency and volume of exchanges among processing elements, the network capacity, and the routing protocol. A poor map** of application processes degrades performance and wastes resources. Process map** is frequently ignored as an explicit optimization step since the system typically offers a default map**, users may lack awareness of their applications' communication behavior, and the opportunities for improving performance through map** are often unclear. This work studies the impact of application process map** on several processor topologies. We propose a workflow that renders map** as an explicit optimization step for parallel applications. We apply the workflow to a set of four applications, twelve map** algorithms, and three direct network topologies. We assess the map**s' quality in terms of volume, frequency, and distance of exchanges using metrics such as dilation (measured in hop$\cdot$Byte). With a parallel trace-based simulator, we predict the applications' execution on the three topologies using the twelve map**s. We evaluate the impact of process map** on the applications' simulated performance in terms of execution and communication times and identify the map**s that achieve the highest performance in both cases. To ensure the correctness of the simulations, we compare the pre- and post-simulation results. This work emphasizes the importance of process map** as an explicit optimization step and offers a solution for parallel applications to exploit the full potential of the allocated resources on a given system.
△ Less
Submitted 10 March, 2021; v1 submitted 20 May, 2020;
originally announced May 2020.
-
Finding Neighbors in a Forest: A b-tree for Smoothed Particle Hydrodynamics Simulations
Authors:
Aurélien Cavelan,
Rubén M. Cabezón,
Jonas H. M. Korndorfer,
Florina M. Ciorba
Abstract:
Finding the exact close neighbors of each fluid element in mesh-free computational hydrodynamical methods, such as the Smoothed Particle Hydrodynamics (SPH), often becomes a main bottleneck for scaling their performance beyond a few million fluid elements per computing node. Tree structures are particularly suitable for SPH simulation codes, which rely on finding the exact close neighbors of each…
▽ More
Finding the exact close neighbors of each fluid element in mesh-free computational hydrodynamical methods, such as the Smoothed Particle Hydrodynamics (SPH), often becomes a main bottleneck for scaling their performance beyond a few million fluid elements per computing node. Tree structures are particularly suitable for SPH simulation codes, which rely on finding the exact close neighbors of each fluid element (or SPH particle). In this work we present a novel tree structure, named \textit{$b$-tree}, which features an adaptive branching factor to reduce the depth of the neighbor search. Depending on the particle spatial distribution, finding neighbors using \tree has an asymptotic best case complexity of $O(n)$, as opposed to $O(n \log n)$ for other classical tree structures such as octrees and quadtrees. We also present the proposed tree structure as well as the algorithms to build it and to find the exact close neighbors of all particles. We assess the scalability of the proposed tree-based algorithms through an extensive set of performance experiments in a shared-memory system. Results show that b-tree is up to $12\times$ faster for building the tree and up to $1.6\times$ faster for finding the exact neighbors of all particles when compared to its octree form. Moreover, we apply b-tree to a SPH code and show its usefulness over the existing octree implementation, where b-tree is up to $5\times$ faster for finding the exact close neighbors compared to the legacy code.
△ Less
Submitted 18 May, 2020; v1 submitted 7 October, 2019;
originally announced October 2019.
-
Toward a Standard Interface for User-Defined Scheduling in OpenMP
Authors:
Vivek Kale,
Christian Iwainsky,
Michael Klemm,
Jonas H. Muller Korndorfer,
Florina M. Ciorba
Abstract:
Parallel loops are an important part of OpenMP programs. Efficient scheduling of parallel loops can improve performance of the programs. The current OpenMP specification only offers three options for loop scheduling, which are insufficient in certain instances. Given the large number of other possible scheduling strategies, it is infeasible to standardize each one. A more viable approach is to ext…
▽ More
Parallel loops are an important part of OpenMP programs. Efficient scheduling of parallel loops can improve performance of the programs. The current OpenMP specification only offers three options for loop scheduling, which are insufficient in certain instances. Given the large number of other possible scheduling strategies, it is infeasible to standardize each one. A more viable approach is to extend the OpenMP standard to allow for users to define loop scheduling strategies. The approach will enable standard-compliant application-specific scheduling. This work analyzes the principal components required by user-defined scheduling and proposes two competing interfaces as candidates for the OpenMP standard. We conceptually compare the two proposed interfaces with respect to the three host languages of OpenMP, i.e., C, C++, and Fortran. These interfaces serve the OpenMP community as a basis for discussion and prototype implementation for user-defined scheduling.
△ Less
Submitted 8 July, 2019; v1 submitted 20 June, 2019;
originally announced June 2019.