-
Red-Teaming Segment Anything Model
Authors:
Krzysztof Jankowski,
Bartlomiej Sobieski,
Mateusz Kwiatkowski,
Jakub Szulc,
Michal Janik,
Hubert Baniecki,
Przemyslaw Biecek
Abstract:
Foundation models have emerged as pivotal tools, tackling many complex tasks through pre-training on vast datasets and subsequent fine-tuning for specific applications. The Segment Anything Model is one of the first and most well-known foundation models for computer vision segmentation tasks. This work presents a multi-faceted red-teaming analysis that tests the Segment Anything Model against chal…
▽ More
Foundation models have emerged as pivotal tools, tackling many complex tasks through pre-training on vast datasets and subsequent fine-tuning for specific applications. The Segment Anything Model is one of the first and most well-known foundation models for computer vision segmentation tasks. This work presents a multi-faceted red-teaming analysis that tests the Segment Anything Model against challenging tasks: (1) We analyze the impact of style transfer on segmentation masks, demonstrating that applying adverse weather conditions and raindrops to dashboard images of city roads significantly distorts generated masks. (2) We focus on assessing whether the model can be used for attacks on privacy, such as recognizing celebrities' faces, and show that the model possesses some undesired knowledge in this task. (3) Finally, we check how robust the model is to adversarial attacks on segmentation masks under text prompts. We not only show the effectiveness of popular white-box attacks and resistance to black-box attacks but also introduce a novel approach - Focused Iterative Gradient Attack (FIGA) that combines white-box approaches to construct an efficient attack resulting in a smaller number of modified pixels. All of our testing methods and analyses indicate a need for enhanced safety measures in foundation models for image segmentation.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Decomposer: Semi-supervised Learning of Image Restoration and Image Decomposition
Authors:
Boris Meinardus,
Mariusz Trzeciakiewicz,
Tim Herzig,
Monika Kwiatkowski,
Simon Matern,
Olaf Hellwich
Abstract:
We present Decomposer, a semi-supervised reconstruction model that decomposes distorted image sequences into their fundamental building blocks - the original image and the applied augmentations, i.e., shadow, light, and occlusions. To solve this problem, we use the SIDAR dataset that provides a large number of distorted image sequences: each sequence contains images with shadows, lighting, and occ…
▽ More
We present Decomposer, a semi-supervised reconstruction model that decomposes distorted image sequences into their fundamental building blocks - the original image and the applied augmentations, i.e., shadow, light, and occlusions. To solve this problem, we use the SIDAR dataset that provides a large number of distorted image sequences: each sequence contains images with shadows, lighting, and occlusions applied to an undistorted version. Each distortion changes the original signal in different ways, e.g., additive or multiplicative noise. We propose a transformer-based model to explicitly learn this decomposition. The sequential model uses 3D Swin-Transformers for spatio-temporal encoding and 3D U-Nets as prediction heads for individual parts of the decomposition. We demonstrate that by separately pre-training our model on weakly supervised pseudo labels, we can steer our model to optimize for our ambiguous problem definition and learn to differentiate between the different image distortions.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
DIAR: Deep Image Alignment and Reconstruction using Swin Transformers
Authors:
Monika Kwiatkowski,
Simon Matern,
Olaf Hellwich
Abstract:
When taking images of some occluded content, one is often faced with the problem that every individual image frame contains unwanted artifacts, but a collection of images contains all relevant information if properly aligned and aggregated. In this paper, we attempt to build a deep learning pipeline that simultaneously aligns a sequence of distorted images and reconstructs them. We create a datase…
▽ More
When taking images of some occluded content, one is often faced with the problem that every individual image frame contains unwanted artifacts, but a collection of images contains all relevant information if properly aligned and aggregated. In this paper, we attempt to build a deep learning pipeline that simultaneously aligns a sequence of distorted images and reconstructs them. We create a dataset that contains images with image distortions, such as lighting, specularities, shadows, and occlusion. We create perspective distortions with corresponding ground-truth homographies as labels. We use our dataset to train Swin transformer models to analyze sequential image data. The attention maps enable the model to detect relevant image content and differentiate it from outliers and artifacts. We further explore using neural feature maps as alternatives to classical key point detectors. The feature maps of trained convolutional layers provide dense image descriptors that can be used to find point correspondences between images. We utilize this to compute coarse image alignments and explore its limitations.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
SIDAR: Synthetic Image Dataset for Alignment & Restoration
Authors:
Monika Kwiatkowski,
Simon Matern,
Olaf Hellwich
Abstract:
Image alignment and image restoration are classical computer vision tasks. However, there is still a lack of datasets that provide enough data to train and evaluate end-to-end deep learning models. Obtaining ground-truth data for image alignment requires sophisticated structure-from-motion methods or optical flow systems that often do not provide enough data variance, i.e., typically providing a h…
▽ More
Image alignment and image restoration are classical computer vision tasks. However, there is still a lack of datasets that provide enough data to train and evaluate end-to-end deep learning models. Obtaining ground-truth data for image alignment requires sophisticated structure-from-motion methods or optical flow systems that often do not provide enough data variance, i.e., typically providing a high number of image correspondences, while only introducing few changes of scenery within the underlying image sequences. Alternative approaches utilize random perspective distortions on existing image data. However, this only provides trivial distortions, lacking the complexity and variance of real-world scenarios. Instead, our proposed data augmentation helps to overcome the issue of data scarcity by using 3D rendering: images are added as textures onto a plane, then varying lighting conditions, shadows, and occlusions are added to the scene. The scene is rendered from multiple viewpoints, generating perspective distortions more consistent with real-world scenarios, with homographies closely resembling those of camera projections rather than randomized homographies. For each scene, we provide a sequence of distorted images with corresponding occlusion masks, homographies, and ground-truth labels. The resulting dataset can serve as a training and evaluation set for a multitude of tasks involving image alignment and artifact removal, such as deep homography estimation, dense image matching, 2D bundle adjustment, inpainting, shadow removal, denoising, content retrieval, and background subtraction. Our data generation pipeline is customizable and can be applied to any existing dataset, serving as a data augmentation to further improve the feature learning of any existing method.
△ Less
Submitted 19 May, 2023;
originally announced May 2023.
-
Content-Based Landmark Retrieval Combining Global and Local Features using Siamese Neural Networks
Authors:
Tianyi Hu,
Monika Kwiatkowski,
Simon Matern,
Olaf Hellwich
Abstract:
In this work, we present a method for landmark retrieval that utilizes global and local features. A Siamese network is used for global feature extraction and metric learning, which gives an initial ranking of the landmark search. We utilize the extracted feature maps from the Siamese architecture as local descriptors, the search results are then further refined using a cosine similarity between lo…
▽ More
In this work, we present a method for landmark retrieval that utilizes global and local features. A Siamese network is used for global feature extraction and metric learning, which gives an initial ranking of the landmark search. We utilize the extracted feature maps from the Siamese architecture as local descriptors, the search results are then further refined using a cosine similarity between local descriptors. We conduct a deeper analysis of the Google Landmark Dataset, which is used for evaluation, and augment the dataset to handle various intra-class variances. Furthermore, we conduct several experiments to compare the effects of transfer learning and metric learning, as well as experiments using other local descriptors. We show that a re-ranking using local features can improve the search results. We believe that the proposed local feature extraction using cosine similarity is a simple approach that can be extended to many other retrieval tasks.
△ Less
Submitted 3 August, 2022;
originally announced August 2022.
-
Image-based Detection of Surface Defects in Concrete during Construction
Authors:
Dominik Kuhnke,
Monika Kwiatkowski,
Olaf Hellwich
Abstract:
Defects increase the cost and duration of construction projects as they require significant inspection and documentation efforts. Automating defect detection could significantly reduce these efforts. This work focuses on detecting honeycombs, a substantial defect in concrete structures that may affect structural integrity. We compared honeycomb images scraped from the web with images obtained from…
▽ More
Defects increase the cost and duration of construction projects as they require significant inspection and documentation efforts. Automating defect detection could significantly reduce these efforts. This work focuses on detecting honeycombs, a substantial defect in concrete structures that may affect structural integrity. We compared honeycomb images scraped from the web with images obtained from real construction inspections. We found that web images do not capture the complete variance found in real-case scenarios and that there is still a lack of data in this domain. Our dataset is therefore freely available for further research. A Mask R-CNN and EfficientNet-B0 were trained for honeycomb detection. The Mask R-CNN model allows detecting honeycombs based on instance segmentation, whereas the EfficientNet-B0 model allows a patch-based classification. Our experiments demonstrate that both approaches are suitable for solving and automating honeycomb detection. In the future, this solution can be incorporated into defect documentation systems.
△ Less
Submitted 6 December, 2022; v1 submitted 3 August, 2022;
originally announced August 2022.
-
On the Grassmann Graph of Linear Codes
Authors:
Ilaria Cardinali,
Luca Giuzzi,
Mariusz Kwiatkowski
Abstract:
Let $Γ(n,k)$ be the Grassmann graph formed by the $k$-dimensional subspaces of a vector space of dimension $n$ over a field $\mathbb F$ and, for $t\in \mathbb{N}\setminus \{0\}$, let $Δ_t(n,k)$ be the subgraph of $Γ(n,k)$ formed by the set of linear $[n,k]$-codes having minimum dual distance at least $t+1$. We show that if $|{\mathbb F}|\geq{n\choose t}$ then $Δ_t(n,k)$ is connected and it is isom…
▽ More
Let $Γ(n,k)$ be the Grassmann graph formed by the $k$-dimensional subspaces of a vector space of dimension $n$ over a field $\mathbb F$ and, for $t\in \mathbb{N}\setminus \{0\}$, let $Δ_t(n,k)$ be the subgraph of $Γ(n,k)$ formed by the set of linear $[n,k]$-codes having minimum dual distance at least $t+1$. We show that if $|{\mathbb F}|\geq{n\choose t}$ then $Δ_t(n,k)$ is connected and it is isometrically embedded in $Γ(n,k)$. This generalizes some results of [M. Kwiatkowski, M. Pankov, "On the distance between linear codes", Finite Fields Appl. 39 (2016), 251--263] and [M. Kwiatkowski, M. Pankov, A. Pasini, "The graphs of projective codes" Finite Fields Appl. 54 (2018), 15--29].
△ Less
Submitted 2 July, 2021; v1 submitted 9 May, 2020;
originally announced May 2020.