-
Large-Scale Application of Fault Injection into PyTorch Models -- an Extension to PyTorchFI for Validation Efficiency
Authors:
Ralf Graafe,
Qutub Syed Sha,
Florian Geissler,
Michael Paulitsch
Abstract:
Transient or permanent faults in hardware can render the output of Neural Networks (NN) incorrect without user-specific traces of the error, i.e. silent data errors (SDE). On the other hand, modern NNs also possess an inherent redundancy that can tolerate specific faults. To establish a safety case, it is necessary to distinguish and quantify both types of corruptions. To study the effects of hard…
▽ More
Transient or permanent faults in hardware can render the output of Neural Networks (NN) incorrect without user-specific traces of the error, i.e. silent data errors (SDE). On the other hand, modern NNs also possess an inherent redundancy that can tolerate specific faults. To establish a safety case, it is necessary to distinguish and quantify both types of corruptions. To study the effects of hardware (HW) faults on software (SW) in general and NN models in particular, several fault injection (FI) methods have been established in recent years. Current FI methods focus on the methodology of injecting faults but often fall short of accounting for large-scale FI tests, where many fault locations based on a particular fault model need to be analyzed in a short time. Results need to be concise, repeatable, and comparable. To address these requirements and enable fault injection as the default component in a machine learning development cycle, we introduce a novel fault injection framework called PyTorchALFI (Application Level Fault Injection for PyTorch) based on PyTorchFI. PyTorchALFI provides an efficient way to define randomly generated and reusable sets of faults to inject into PyTorch models, defines complex test scenarios, enhances data sets, and generates test KPIs while tightly coupling fault-free, faulty, and modified NN. In this paper, we provide details about the definition of test scenarios, software architecture, and several examples of how to use the new framework to apply iterative changes in fault location and number, compare different model modifications, and analyze test results.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Lessons from Robot-Assisted Disaster Response Deployments by the German Rescue Robotics Center Task Force
Authors:
Hartmut Surmann,
Ivana Kruijff-Korbayova,
Kevin Daun,
Marius Schnaubelt,
Oskar von Stryk,
Manuel Patchou,
Stefan Boecker,
Christian Wietfeld,
Jan Quenzel,
Daniel Schleich,
Sven Behnke,
Robert Grafe,
Nils Heidemann,
Dominik Slomma
Abstract:
Earthquakes, fire, and floods often cause structural collapses of buildings. The inspection of damaged buildings poses a high risk for emergency forces or is even impossible, though. We present three recent selected missions of the Robotics Task Force of the German Rescue Robotics Center, where both ground and aerial robots were used to explore destroyed buildings. We describe and reflect the miss…
▽ More
Earthquakes, fire, and floods often cause structural collapses of buildings. The inspection of damaged buildings poses a high risk for emergency forces or is even impossible, though. We present three recent selected missions of the Robotics Task Force of the German Rescue Robotics Center, where both ground and aerial robots were used to explore destroyed buildings. We describe and reflect the missions as well as the lessons learned that have resulted from them. In order to make robots from research laboratories fit for real operations, realistic test environments were set up for outdoor and indoor use and tested in regular exercises by researchers and emergency forces. Based on this experience, the robots and their control software were significantly improved. Furthermore, top teams of researchers and first responders were formed, each with realistic assessments of the operational and practical suitability of robotic systems.
△ Less
Submitted 19 December, 2022;
originally announced December 2022.
-
Hardware faults that matter: Understanding and Estimating the safety impact of hardware faults on object detection DNNs
Authors:
Syed Qutub,
Florian Geissler,
Yang Peng,
Ralf Grafe,
Michael Paulitsch,
Gereon Hinz,
Alois Knoll
Abstract:
Object detection neural network models need to perform reliably in highly dynamic and safety-critical environments like automated driving or robotics. Therefore, it is paramount to verify the robustness of the detection under unexpected hardware faults like soft errors that can impact a systems perception module. Standard metrics based on average precision produce model vulnerability estimates at…
▽ More
Object detection neural network models need to perform reliably in highly dynamic and safety-critical environments like automated driving or robotics. Therefore, it is paramount to verify the robustness of the detection under unexpected hardware faults like soft errors that can impact a systems perception module. Standard metrics based on average precision produce model vulnerability estimates at the object level rather than at an image level. As we show in this paper, this does not provide an intuitive or representative indicator of the safety-related impact of silent data corruption caused by bit flips in the underlying memory but can lead to an over- or underestimation of typical fault-induced hazards. With an eye towards safety-related real-time applications, we propose a new metric IVMOD (Image-wise Vulnerability Metric for Object Detection) to quantify vulnerability based on an incorrect image-wise object detection due to false positive (FPs) or false negative (FNs) objects, combined with a severity analysis. The evaluation of several representative object detection models shows that even a single bit flip can lead to a severe silent data corruption event with potentially critical safety implications, with e.g., up to (much greater than) 100 FPs generated, or up to approx. 90% of true positives (TPs) are lost in an image. Furthermore, with a single stuck-at-1 fault, an entire sequence of images can be affected, causing temporally persistent ghost detections that can be mistaken for actual objects (covering up to approx. 83% of the image). Furthermore, actual objects in the scene are continuously missed (up to approx. 64% of TPs are lost). Our work establishes a detailed understanding of the safety-related vulnerability of such critical workloads against hardware faults.
△ Less
Submitted 7 September, 2022;
originally announced September 2022.
-
Deployment of Aerial Robots during the Flood Disaster in Erftstadt / Blessem in July 2021
Authors:
Hartmut Surmann,
Dominik Slomma,
Robert Grafe,
Stefan Grobelny
Abstract:
Climate change is leading to more and more extreme weather events such as heavy rainfall and flooding. This technical report deals with the question of how rescue commanders can be better and faster provided with current information during flood disasters using Unmanned Aerial Vehicles (UAVs), i.e. during the flood in July 2021 in Central Europe, more specifically in Erftstadt / Blessem. The UAVs…
▽ More
Climate change is leading to more and more extreme weather events such as heavy rainfall and flooding. This technical report deals with the question of how rescue commanders can be better and faster provided with current information during flood disasters using Unmanned Aerial Vehicles (UAVs), i.e. during the flood in July 2021 in Central Europe, more specifically in Erftstadt / Blessem. The UAVs were used for live observation and regular inspections of the flood edge on the one hand, and on the other hand for the systematic data acquisition in order to calculate 3D models using Structure from Motion and MultiView Stereo. The 3D models embedded in a GIS application serve as a planning basis for the systematic exploration and decision support for the deployment of additional smaller UAVs but also rescue forces. The systematic data acquisition of the UAVs by means of autonomous meander flights provides high-resolution images which are computed to a georeferenced 3D model of the surrounding area within 15 minutes in a specially equipped robotic command vehicle (RobLW). From the comparison of high-resolution elevation profiles extracted from the 3D model on successive days, changes in the water level become visible. This information enables the emergency management to plan further inspections of the buildings and to search for missing persons on site.
△ Less
Submitted 7 September, 2022;
originally announced September 2022.
-
Deployment of Aerial Robots after a major fire of an industrial hall with hazardous substances, a report
Authors:
Hartmut Surmann,
Dominik Slomma,
Stefan Grobelny,
Robert Grafe
Abstract:
This technical report is about the mission and the experience gained during the reconnaissance of an industrial hall with hazardous substances after a major fire in Berlin. During this operation, only UAVs and cameras were used to obtain information about the site and the building. First, a geo-referenced 3D model of the building was created in order to plan the entry into the hall. Subsequently,…
▽ More
This technical report is about the mission and the experience gained during the reconnaissance of an industrial hall with hazardous substances after a major fire in Berlin. During this operation, only UAVs and cameras were used to obtain information about the site and the building. First, a geo-referenced 3D model of the building was created in order to plan the entry into the hall. Subsequently, the UAVs were used to fly in the heavily damaged interior and take pictures from inside of the hall. A 360° camera mounted under the UAV was used to collect images of the surrounding area especially from sections that were difficult to fly into. Since the collected data set contained similar images as well as blurred images, it was cleaned from non-optimal images using visual SLAM, bundle adjustment and blur detection so that a 3D model and overviews could be calculated. It was shown that the emergency services were not able to extract the necessary information from the 3D model. Therefore, an interactive panorama viewer with links to other 360° images was implemented where the links to the other images depends on the semi dense point cloud and located camera positions of the visual SLAM algorithm so that the emergency forces could view the surroundings.
△ Less
Submitted 29 November, 2021;
originally announced November 2021.
-
Towards a Safety Case for Hardware Fault Tolerance in Convolutional Neural Networks Using Activation Range Supervision
Authors:
Florian Geissler,
Syed Qutub,
Sayanta Roychowdhury,
Ali Asgari,
Yang Peng,
Akash Dhamasia,
Ralf Graefe,
Karthik Pattabiraman,
Michael Paulitsch
Abstract:
Convolutional neural networks (CNNs) have become an established part of numerous safety-critical computer vision applications, including human robot interactions and automated driving. Real-world implementations will need to guarantee their robustness against hardware soft errors corrupting the underlying platform memory. Based on the previously observed efficacy of activation clip** techniques,…
▽ More
Convolutional neural networks (CNNs) have become an established part of numerous safety-critical computer vision applications, including human robot interactions and automated driving. Real-world implementations will need to guarantee their robustness against hardware soft errors corrupting the underlying platform memory. Based on the previously observed efficacy of activation clip** techniques, we build a prototypical safety case for classifier CNNs by demonstrating that range supervision represents a highly reliable fault detector and mitigator with respect to relevant bit flips, adopting an eight-exponent floating point data representation. We further explore novel, non-uniform range restriction methods that effectively suppress the probability of silent data corruptions and uncorrectable errors. As a safety-relevant end-to-end use case, we showcase the benefit of our approach in a vehicle classification scenario, using ResNet-50 and the traffic camera data set MIOVision. The quantitative evidence provided in this work can be leveraged to inspire further and possibly more complex CNN safety arguments.
△ Less
Submitted 16 August, 2021;
originally announced August 2021.
-
Optimized sensor placement for dependable roadside infrastructures
Authors:
Florian Geissler,
Ralf Graefe
Abstract:
We present a multi-stage optimization method for efficient sensor deployment in traffic surveillance scenarios. Based on a genetic optimization scheme, our algorithm places an optimal number of roadside sensors to obtain full road coverage in the presence of obstacles and dynamic occlusions. The efficiency of the procedure is demonstrated for selected, realistic road sections. Our analysis helps t…
▽ More
We present a multi-stage optimization method for efficient sensor deployment in traffic surveillance scenarios. Based on a genetic optimization scheme, our algorithm places an optimal number of roadside sensors to obtain full road coverage in the presence of obstacles and dynamic occlusions. The efficiency of the procedure is demonstrated for selected, realistic road sections. Our analysis helps to leverage the economic feasibility of distributed infrastructure sensor networks with high perception quality.
△ Less
Submitted 10 October, 2019;
originally announced October 2019.