-
Identification of Fine-grained Systematic Errors via Controlled Scene Generation
Authors:
Valentyn Boreiko,
Matthias Hein,
Jan Hendrik Metzen
Abstract:
Many safety-critical applications, especially in autonomous driving, require reliable object detectors. They can be very effectively assisted by a method to search for and identify potential failures and systematic errors before these detectors are deployed. Systematic errors are characterized by combinations of attributes such as object location, scale, orientation, and color, as well as the comp…
▽ More
Many safety-critical applications, especially in autonomous driving, require reliable object detectors. They can be very effectively assisted by a method to search for and identify potential failures and systematic errors before these detectors are deployed. Systematic errors are characterized by combinations of attributes such as object location, scale, orientation, and color, as well as the composition of their respective backgrounds. To identify them, one must rely on something other than real images from a test set because they do not account for very rare but possible combinations of attributes. To overcome this limitation, we propose a pipeline for generating realistic synthetic scenes with fine-grained control, allowing the creation of complex scenes with multiple objects. Our approach, BEV2EGO, allows for a realistic generation of the complete scene with road-contingent control that maps 2D bird's-eye view (BEV) scene configurations to a first-person view (EGO). In addition, we propose a benchmark for controlled scene generation to select the most appropriate generative outpainting model for BEV2EGO. We further use it to perform a systematic analysis of multiple state-of-the-art object detection models and discover differences between them.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Generating Realistic Counterfactuals for Retinal Fundus and OCT Images using Diffusion Models
Authors:
Indu Ilanchezian,
Valentyn Boreiko,
Laura Kühlewein,
Ziwei Huang,
Murat Seçkin Ayhan,
Matthias Hein,
Lisa Koch,
Philipp Berens
Abstract:
Counterfactual reasoning is often used in clinical settings to explain decisions or weigh alternatives. Therefore, for imaging based specialties such as ophthalmology, it would be beneficial to be able to create counterfactual images, illustrating answers to questions like "If the subject had had diabetic retinopathy, how would the fundus image have looked?". Here, we demonstrate that using a diff…
▽ More
Counterfactual reasoning is often used in clinical settings to explain decisions or weigh alternatives. Therefore, for imaging based specialties such as ophthalmology, it would be beneficial to be able to create counterfactual images, illustrating answers to questions like "If the subject had had diabetic retinopathy, how would the fundus image have looked?". Here, we demonstrate that using a diffusion model in combination with an adversarially robust classifier trained on retinal disease classification tasks enables the generation of highly realistic counterfactuals of retinal fundus images and optical coherence tomography (OCT) B-scans. The key to the realism of counterfactuals is that these classifiers encode salient features indicative for each disease class and can steer the diffusion model to depict disease signs or remove disease-related lesions in a realistic way. In a user study, domain experts also found the counterfactuals generated using our method significantly more realistic than counterfactuals generated from a previous method, and even indistinguishable from real images.
△ Less
Submitted 4 December, 2023; v1 submitted 20 November, 2023;
originally announced November 2023.
-
Identifying Systematic Errors in Object Detectors with the SCROD Pipeline
Authors:
Valentyn Boreiko,
Matthias Hein,
Jan Hendrik Metzen
Abstract:
The identification and removal of systematic errors in object detectors can be a prerequisite for their deployment in safety-critical applications like automated driving and robotics. Such systematic errors can for instance occur under very specific object poses (location, scale, orientation), object colors/textures, and backgrounds. Real images alone are unlikely to cover all relevant combination…
▽ More
The identification and removal of systematic errors in object detectors can be a prerequisite for their deployment in safety-critical applications like automated driving and robotics. Such systematic errors can for instance occur under very specific object poses (location, scale, orientation), object colors/textures, and backgrounds. Real images alone are unlikely to cover all relevant combinations. We overcome this limitation by generating synthetic images with fine-granular control. While generating synthetic images with physical simulators and hand-designed 3D assets allows fine-grained control over generated images, this approach is resource-intensive and has limited scalability. In contrast, using generative models is more scalable but less reliable in terms of fine-grained control. In this paper, we propose a novel framework that combines the strengths of both approaches. Our meticulously designed pipeline along with custom models enables us to generate street scenes with fine-grained control in a fully automated and scalable manner. Moreover, our framework introduces an evaluation setting that can serve as a benchmark for similar pipelines. This evaluation setting will contribute to advancing the field and promoting standardized testing procedures.
△ Less
Submitted 23 September, 2023;
originally announced September 2023.
-
Identification of Systematic Errors of Image Classifiers on Rare Subgroups
Authors:
Jan Hendrik Metzen,
Robin Hutmacher,
N. Grace Hua,
Valentyn Boreiko,
Dan Zhang
Abstract:
Despite excellent average-case performance of many image classifiers, their performance can substantially deteriorate on semantically coherent subgroups of the data that were under-represented in the training data. These systematic errors can impact both fairness for demographic minority groups as well as robustness and safety under domain shift. A major challenge is to identify such subgroups wit…
▽ More
Despite excellent average-case performance of many image classifiers, their performance can substantially deteriorate on semantically coherent subgroups of the data that were under-represented in the training data. These systematic errors can impact both fairness for demographic minority groups as well as robustness and safety under domain shift. A major challenge is to identify such subgroups with subpar performance when the subgroups are not annotated and their occurrence is very rare. We leverage recent advances in text-to-image models and search in the space of textual descriptions of subgroups ("prompts") for subgroups where the target model has low performance on the prompt-conditioned synthesized data. To tackle the exponentially growing number of subgroups, we employ combinatorial testing. We denote this procedure as PromptAttack as it can be interpreted as an adversarial attack in a prompt space. We study subgroup coverage and identifiability with PromptAttack in a controlled setting and find that it identifies systematic errors with high accuracy. Thereupon, we apply PromptAttack to ImageNet classifiers and identify novel systematic errors on rare subgroups.
△ Less
Submitted 12 April, 2023; v1 submitted 9 March, 2023;
originally announced March 2023.
-
Spurious Features Everywhere -- Large-Scale Detection of Harmful Spurious Features in ImageNet
Authors:
Yannic Neuhaus,
Maximilian Augustin,
Valentyn Boreiko,
Matthias Hein
Abstract:
Benchmark performance of deep learning classifiers alone is not a reliable predictor for the performance of a deployed model. In particular, if the image classifier has picked up spurious features in the training data, its predictions can fail in unexpected ways. In this paper, we develop a framework that allows us to systematically identify spurious features in large datasets like ImageNet. It is…
▽ More
Benchmark performance of deep learning classifiers alone is not a reliable predictor for the performance of a deployed model. In particular, if the image classifier has picked up spurious features in the training data, its predictions can fail in unexpected ways. In this paper, we develop a framework that allows us to systematically identify spurious features in large datasets like ImageNet. It is based on our neural PCA components and their visualization. Previous work on spurious features often operates in toy settings or requires costly pixel-wise annotations. In contrast, we work with ImageNet and validate our results by showing that presence of the harmful spurious feature of a class alone is sufficient to trigger the prediction of that class. We introduce the novel dataset "Spurious ImageNet" which allows to measure the reliance of any ImageNet classifier on harmful spurious features. Moreover, we introduce SpuFix as a simple mitigation method to reduce the dependence of any ImageNet classifier on previously identified harmful spurious features without requiring additional labels or retraining of the model. We provide code and data at https://github.com/YanNeu/spurious_imagenet .
△ Less
Submitted 22 August, 2023; v1 submitted 9 December, 2022;
originally announced December 2022.
-
Diffusion Visual Counterfactual Explanations
Authors:
Maximilian Augustin,
Valentyn Boreiko,
Francesco Croce,
Matthias Hein
Abstract:
Visual Counterfactual Explanations (VCEs) are an important tool to understand the decisions of an image classifier. They are 'small' but 'realistic' semantic changes of the image changing the classifier decision. Current approaches for the generation of VCEs are restricted to adversarially robust models and often contain non-realistic artefacts, or are limited to image classification problems with…
▽ More
Visual Counterfactual Explanations (VCEs) are an important tool to understand the decisions of an image classifier. They are 'small' but 'realistic' semantic changes of the image changing the classifier decision. Current approaches for the generation of VCEs are restricted to adversarially robust models and often contain non-realistic artefacts, or are limited to image classification problems with few classes. In this paper, we overcome this by generating Diffusion Visual Counterfactual Explanations (DVCEs) for arbitrary ImageNet classifiers via a diffusion process. Two modifications to the diffusion process are key for our DVCEs: first, an adaptive parameterization, whose hyperparameters generalize across images and models, together with distance regularization and late start of the diffusion process, allow us to generate images with minimal semantic changes to the original ones but different classification. Second, our cone regularization via an adversarially robust model ensures that the diffusion process does not converge to trivial non-semantic changes, but instead produces realistic images of the target class which achieve high confidence by the classifier.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
Sparse Visual Counterfactual Explanations in Image Space
Authors:
Valentyn Boreiko,
Maximilian Augustin,
Francesco Croce,
Philipp Berens,
Matthias Hein
Abstract:
Visual counterfactual explanations (VCEs) in image space are an important tool to understand decisions of image classifiers as they show under which changes of the image the decision of the classifier would change. Their generation in image space is challenging and requires robust models due to the problem of adversarial examples. Existing techniques to generate VCEs in image space suffer from spu…
▽ More
Visual counterfactual explanations (VCEs) in image space are an important tool to understand decisions of image classifiers as they show under which changes of the image the decision of the classifier would change. Their generation in image space is challenging and requires robust models due to the problem of adversarial examples. Existing techniques to generate VCEs in image space suffer from spurious changes in the background. Our novel perturbation model for VCEs together with its efficient optimization via our novel Auto-Frank-Wolfe scheme yields sparse VCEs which lead to subtle changes specific for the target class. Moreover, we show that VCEs can be used to detect undesired behavior of ImageNet classifiers due to spurious features in the ImageNet dataset.
△ Less
Submitted 29 September, 2022; v1 submitted 16 May, 2022;
originally announced May 2022.
-
Experimental study of $μ$--atomic and $μ$--molecular processes in pure helium and deuterium--helium mixtures
Authors:
V. M. Bystritsky,
V. F. Boreiko,
W. Czapliński,
M. Filipowicz,
V. V. Gerasimov,
O. Huot,
P. E. Knowles,
F. Mulhauser,
V. N. Pavlov,
N. P. Popov,
L. A. Schaller,
H. Schneuwly,
V. G. Sandukovsky,
V. A. Stolupin,
V. P. Volnykh,
J. Woźniak
Abstract:
We present experimental results of $μ$--atomic and $μ$--molecular processes induced by negative muons in pure helium and helium--deuterium mixtures. The experiment was performed at the Paul Scherrer Institute (Switzerland). We measured muonic x--ray $K$ series transitions relative intensities in $(μ^{3,4}\mathrm{He})^*$ atoms in pure helium as well as in helium--deuterium mixture. The muon stopp…
▽ More
We present experimental results of $μ$--atomic and $μ$--molecular processes induced by negative muons in pure helium and helium--deuterium mixtures. The experiment was performed at the Paul Scherrer Institute (Switzerland). We measured muonic x--ray $K$ series transitions relative intensities in $(μ^{3,4}\mathrm{He})^*$ atoms in pure helium as well as in helium--deuterium mixture. The muon stop** powers ratio between helium and deuterium atoms and the $d μ^3 \mathrm{He}$ radiative decay probability of for two different helium densities in $\mathrm{D}_2 + {}^3\mathrm{He}$ mixture were also determined. Finally, the $\mathrm{q}_{1s}^{\mathrm{He}}$ probability for a $dμ$ atom formed in an excited state to reach the ground state was measured and compared with theoretical calculations using a simple cascade model.
△ Less
Submitted 17 December, 2003;
originally announced December 2003.
-
Muon capture by 3He nuclei followed by proton and deuteron production
Authors:
V. M. Bystritsky,
V. F. Boreiko,
M. Filipowicz,
V. V. Gerasimov,
O. Huot,
P. E. Knowles,
F. Mulhauser,
V. N. Pavlov,
L. A. Schaller,
H. Schneuwly,
V. G. Sandukovsky,
V. A. Stolupin,
V. P. Volnykh,
J. Wozniak
Abstract:
The paper describes an experiment aimed at studying muon capture by ${}^{3}\mathrm{He}$ nuclei in pure ${}^{3}\mathrm{He}$ and $\mathrm{D}_2 + {}^{3}\mathrm{He}$ mixtures at various densities. Energy distributions of protons and deuterons produced via $μ^-+{}^{3}\mathrm{He}\to p+n+n + ν_{μ}$ and $μ^-+{}^{3} \mathrm{He} \to d+n + ν_μ$ are measured for the energy intervals $10 - 49$ MeV and…
▽ More
The paper describes an experiment aimed at studying muon capture by ${}^{3}\mathrm{He}$ nuclei in pure ${}^{3}\mathrm{He}$ and $\mathrm{D}_2 + {}^{3}\mathrm{He}$ mixtures at various densities. Energy distributions of protons and deuterons produced via $μ^-+{}^{3}\mathrm{He}\to p+n+n + ν_{μ}$ and $μ^-+{}^{3} \mathrm{He} \to d+n + ν_μ$ are measured for the energy intervals $10 - 49$ MeV and $13 - 31$ MeV, respectively. Muon capture rates, $λ_\mathrm{cap}^p (ΔE_p)$ and $λ_\mathrm{cap}^d (ΔE_d)$ are obtained using two different analysis methods. The least--squares methods gives $λ_\mathrm{cap}^p = (36.7\pm 1.2) {s}^{- 1}$, $λ_\mathrm{cap}^d = (21.3 \pm 1.6) {s}^{- 1}$. The Bayes theorem gives $λ_\mathrm{cap}^p = (36.8 \pm 0.8) {s}^{- 1}$, $λ_\mathrm{cap}^d = (21.9 \pm 0.6) {s}^{- 1}$. The experimental differential capture rates, $dλ_\mathrm{cap}^p (E_p) / dE_p $ and $ dλ_\mathrm{cap}^d (E_d) / dE_d$, are compared with theoretical calculations performed using the plane--wave impulse approximation (PWIA) with the realistic NN interaction Bonn B potential. Extrapolation to the full energy range yields total proton and deuteron capture rates in good agreement with former results.
△ Less
Submitted 17 December, 2003; v1 submitted 27 July, 2003;
originally announced July 2003.
-
Search for NN-decoupled dibaryons using the process $pp \to γγX$ below the pion production threshold
Authors:
A. S. Khrykin,
V. F. Boreiko,
Yu. G. Budyashov,
S. B. Gerasimov,
N. V. Khomutov,
Yu. G. Sobolev,
V. P. Zorin
Abstract:
The energy spectrum for high energy $γ$-rays ($E_γ\geq 10$ MeV) from the process $pp \to γγX$ emitted at $90^0$ in the laboratory frame has been measured at an energy below the pion production threshold, namely, at 216 MeV. The resulting photon energy spectrum extracted from $γ-γ$ coincidence events consists of a narrow peak at a photon energy of about 24 MeV and a relatively broad peak in the e…
▽ More
The energy spectrum for high energy $γ$-rays ($E_γ\geq 10$ MeV) from the process $pp \to γγX$ emitted at $90^0$ in the laboratory frame has been measured at an energy below the pion production threshold, namely, at 216 MeV. The resulting photon energy spectrum extracted from $γ-γ$ coincidence events consists of a narrow peak at a photon energy of about 24 MeV and a relatively broad peak in the energy range of (50 - 70) MeV. The statistical significances for the narrow and broad peaks are 5.3$σ$ and 3.5$σ$, respectively. This behavior of the photon energy spectrum is interpreted as a signature of the exotic dibaryon resonance $d^\star_1$ with a mass of about 1956 MeV which is assumed to be formed in the radiative process $pp \to γd^\star_1$ followed by its electromagnetic decay via the $d^\star_1 \to pp γ$ mode. The experimental spectrum is compared with those obtained by means of Monte Carlo simulations.
△ Less
Submitted 19 June, 2001; v1 submitted 26 December, 2000;
originally announced December 2000.