Search | arXiv e-print repository

arXiv:2404.02067 [pdf, other]

Red-Teaming Segment Anything Model

Authors: Krzysztof Jankowski, Bartlomiej Sobieski, Mateusz Kwiatkowski, Jakub Szulc, Michal Janik, Hubert Baniecki, Przemyslaw Biecek

Abstract: Foundation models have emerged as pivotal tools, tackling many complex tasks through pre-training on vast datasets and subsequent fine-tuning for specific applications. The Segment Anything Model is one of the first and most well-known foundation models for computer vision segmentation tasks. This work presents a multi-faceted red-teaming analysis that tests the Segment Anything Model against chal… ▽ More Foundation models have emerged as pivotal tools, tackling many complex tasks through pre-training on vast datasets and subsequent fine-tuning for specific applications. The Segment Anything Model is one of the first and most well-known foundation models for computer vision segmentation tasks. This work presents a multi-faceted red-teaming analysis that tests the Segment Anything Model against challenging tasks: (1) We analyze the impact of style transfer on segmentation masks, demonstrating that applying adverse weather conditions and raindrops to dashboard images of city roads significantly distorts generated masks. (2) We focus on assessing whether the model can be used for attacks on privacy, such as recognizing celebrities' faces, and show that the model possesses some undesired knowledge in this task. (3) Finally, we check how robust the model is to adversarial attacks on segmentation masks under text prompts. We not only show the effectiveness of popular white-box attacks and resistance to black-box attacks but also introduce a novel approach - Focused Iterative Gradient Attack (FIGA) that combines white-box approaches to construct an efficient attack resulting in a smaller number of modified pixels. All of our testing methods and analyses indicate a need for enhanced safety measures in foundation models for image segmentation. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: CVPR 2024 - The 4th Workshop of Adversarial Machine Learning on Computer Vision: Robustness of Foundation Models

arXiv:2311.16829 [pdf, other]

Decomposer: Semi-supervised Learning of Image Restoration and Image Decomposition

Authors: Boris Meinardus, Mariusz Trzeciakiewicz, Tim Herzig, Monika Kwiatkowski, Simon Matern, Olaf Hellwich

Abstract: We present Decomposer, a semi-supervised reconstruction model that decomposes distorted image sequences into their fundamental building blocks - the original image and the applied augmentations, i.e., shadow, light, and occlusions. To solve this problem, we use the SIDAR dataset that provides a large number of distorted image sequences: each sequence contains images with shadows, lighting, and occ… ▽ More We present Decomposer, a semi-supervised reconstruction model that decomposes distorted image sequences into their fundamental building blocks - the original image and the applied augmentations, i.e., shadow, light, and occlusions. To solve this problem, we use the SIDAR dataset that provides a large number of distorted image sequences: each sequence contains images with shadows, lighting, and occlusions applied to an undistorted version. Each distortion changes the original signal in different ways, e.g., additive or multiplicative noise. We propose a transformer-based model to explicitly learn this decomposition. The sequential model uses 3D Swin-Transformers for spatio-temporal encoding and 3D U-Nets as prediction heads for individual parts of the decomposition. We demonstrate that by separately pre-training our model on weakly supervised pseudo labels, we can steer our model to optimize for our ambiguous problem definition and learn to differentiate between the different image distortions. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2310.11605 [pdf, other]

doi 10.1007/978-3-031-45725-8_12

DIAR: Deep Image Alignment and Reconstruction using Swin Transformers

Authors: Monika Kwiatkowski, Simon Matern, Olaf Hellwich

Abstract: When taking images of some occluded content, one is often faced with the problem that every individual image frame contains unwanted artifacts, but a collection of images contains all relevant information if properly aligned and aggregated. In this paper, we attempt to build a deep learning pipeline that simultaneously aligns a sequence of distorted images and reconstructs them. We create a datase… ▽ More When taking images of some occluded content, one is often faced with the problem that every individual image frame contains unwanted artifacts, but a collection of images contains all relevant information if properly aligned and aggregated. In this paper, we attempt to build a deep learning pipeline that simultaneously aligns a sequence of distorted images and reconstructs them. We create a dataset that contains images with image distortions, such as lighting, specularities, shadows, and occlusion. We create perspective distortions with corresponding ground-truth homographies as labels. We use our dataset to train Swin transformer models to analyze sequential image data. The attention maps enable the model to detect relevant image content and differentiate it from outliers and artifacts. We further explore using neural feature maps as alternatives to classical key point detectors. The feature maps of trained convolutional layers provide dense image descriptors that can be used to find point correspondences between images. We utilize this to compute coarse image alignments and explore its limitations. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2305.12036 [pdf, other]

doi 10.5220/0012391400003660

SIDAR: Synthetic Image Dataset for Alignment & Restoration

Authors: Monika Kwiatkowski, Simon Matern, Olaf Hellwich

Abstract: Image alignment and image restoration are classical computer vision tasks. However, there is still a lack of datasets that provide enough data to train and evaluate end-to-end deep learning models. Obtaining ground-truth data for image alignment requires sophisticated structure-from-motion methods or optical flow systems that often do not provide enough data variance, i.e., typically providing a h… ▽ More Image alignment and image restoration are classical computer vision tasks. However, there is still a lack of datasets that provide enough data to train and evaluate end-to-end deep learning models. Obtaining ground-truth data for image alignment requires sophisticated structure-from-motion methods or optical flow systems that often do not provide enough data variance, i.e., typically providing a high number of image correspondences, while only introducing few changes of scenery within the underlying image sequences. Alternative approaches utilize random perspective distortions on existing image data. However, this only provides trivial distortions, lacking the complexity and variance of real-world scenarios. Instead, our proposed data augmentation helps to overcome the issue of data scarcity by using 3D rendering: images are added as textures onto a plane, then varying lighting conditions, shadows, and occlusions are added to the scene. The scene is rendered from multiple viewpoints, generating perspective distortions more consistent with real-world scenarios, with homographies closely resembling those of camera projections rather than randomized homographies. For each scene, we provide a sequence of distorted images with corresponding occlusion masks, homographies, and ground-truth labels. The resulting dataset can serve as a training and evaluation set for a multitude of tasks involving image alignment and artifact removal, such as deep homography estimation, dense image matching, 2D bundle adjustment, inpainting, shadow removal, denoising, content retrieval, and background subtraction. Our data generation pipeline is customizable and can be applied to any existing dataset, serving as a data augmentation to further improve the feature learning of any existing method. △ Less

Submitted 19 May, 2023; originally announced May 2023.

arXiv:2208.04201 [pdf, other]

Content-Based Landmark Retrieval Combining Global and Local Features using Siamese Neural Networks

Authors: Tianyi Hu, Monika Kwiatkowski, Simon Matern, Olaf Hellwich

Abstract: In this work, we present a method for landmark retrieval that utilizes global and local features. A Siamese network is used for global feature extraction and metric learning, which gives an initial ranking of the landmark search. We utilize the extracted feature maps from the Siamese architecture as local descriptors, the search results are then further refined using a cosine similarity between lo… ▽ More In this work, we present a method for landmark retrieval that utilizes global and local features. A Siamese network is used for global feature extraction and metric learning, which gives an initial ranking of the landmark search. We utilize the extracted feature maps from the Siamese architecture as local descriptors, the search results are then further refined using a cosine similarity between local descriptors. We conduct a deeper analysis of the Google Landmark Dataset, which is used for evaluation, and augment the dataset to handle various intra-class variances. Furthermore, we conduct several experiments to compare the effects of transfer learning and metric learning, as well as experiments using other local descriptors. We show that a re-ranking using local features can improve the search results. We believe that the proposed local feature extraction using cosine similarity is a simple approach that can be extended to many other retrieval tasks. △ Less

Submitted 3 August, 2022; originally announced August 2022.

arXiv:2208.02313 [pdf, other]

doi 10.1007/978-3-658-42796-2_13

Image-based Detection of Surface Defects in Concrete during Construction

Authors: Dominik Kuhnke, Monika Kwiatkowski, Olaf Hellwich

Abstract: Defects increase the cost and duration of construction projects as they require significant inspection and documentation efforts. Automating defect detection could significantly reduce these efforts. This work focuses on detecting honeycombs, a substantial defect in concrete structures that may affect structural integrity. We compared honeycomb images scraped from the web with images obtained from… ▽ More Defects increase the cost and duration of construction projects as they require significant inspection and documentation efforts. Automating defect detection could significantly reduce these efforts. This work focuses on detecting honeycombs, a substantial defect in concrete structures that may affect structural integrity. We compared honeycomb images scraped from the web with images obtained from real construction inspections. We found that web images do not capture the complete variance found in real-case scenarios and that there is still a lack of data in this domain. Our dataset is therefore freely available for further research. A Mask R-CNN and EfficientNet-B0 were trained for honeycomb detection. The Mask R-CNN model allows detecting honeycombs based on instance segmentation, whereas the EfficientNet-B0 model allows a patch-based classification. Our experiments demonstrate that both approaches are suitable for solving and automating honeycomb detection. In the future, this solution can be incorporated into defect documentation systems. △ Less

Submitted 6 December, 2022; v1 submitted 3 August, 2022; originally announced August 2022.

arXiv:2005.04402 [pdf, ps, other]

doi 10.1016/j.ffa.2021.101895

On the Grassmann Graph of Linear Codes

Authors: Ilaria Cardinali, Luca Giuzzi, Mariusz Kwiatkowski

Abstract: Let $Γ(n,k)$ be the Grassmann graph formed by the $k$-dimensional subspaces of a vector space of dimension $n$ over a field $\mathbb F$ and, for $t\in \mathbb{N}\setminus \{0\}$, let $Δ_t(n,k)$ be the subgraph of $Γ(n,k)$ formed by the set of linear $[n,k]$-codes having minimum dual distance at least $t+1$. We show that if $|{\mathbb F}|\geq{n\choose t}$ then $Δ_t(n,k)$ is connected and it is isom… ▽ More Let $Γ(n,k)$ be the Grassmann graph formed by the $k$-dimensional subspaces of a vector space of dimension $n$ over a field $\mathbb F$ and, for $t\in \mathbb{N}\setminus \{0\}$, let $Δ_t(n,k)$ be the subgraph of $Γ(n,k)$ formed by the set of linear $[n,k]$-codes having minimum dual distance at least $t+1$. We show that if $|{\mathbb F}|\geq{n\choose t}$ then $Δ_t(n,k)$ is connected and it is isometrically embedded in $Γ(n,k)$. This generalizes some results of [M. Kwiatkowski, M. Pankov, "On the distance between linear codes", Finite Fields Appl. 39 (2016), 251--263] and [M. Kwiatkowski, M. Pankov, A. Pasini, "The graphs of projective codes" Finite Fields Appl. 54 (2018), 15--29]. △ Less

Submitted 2 July, 2021; v1 submitted 9 May, 2020; originally announced May 2020.

Comments: 13 pages/final version

MSC Class: 51E22; 94B27

Journal ref: Finite Fields Appl. 75 (2021) 101895

Showing 1–7 of 7 results for author: Kwiatkowski, M