Skip to main content

Showing 1–26 of 26 results for author: Khoreva, A

.
  1. arXiv:2407.03482  [pdf, other

    cs.CV cs.AI cs.LG

    Domain-Aware Fine-Tuning of Foundation Models

    Authors: Ugur Ali Kaplan, Margret Keuper, Anna Khoreva, Dan Zhang, Yumeng Li

    Abstract: Foundation models (FMs) have revolutionized computer vision, enabling effective learning across different domains. However, their performance under domain shift is yet underexplored. This paper investigates the zero-shot domain adaptation potential of FMs by comparing different backbone architectures and introducing novel domain-aware components that leverage domain related textual embeddings. We… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted at ICML 2024 Workshop on Foundation Models in the Wild

  2. arXiv:2407.01790  [pdf, other

    cs.CV cs.AI cs.LG

    Label-free Neural Semantic Image Synthesis

    Authors: Jiayi Wang, Kevin Alexander Laube, Yumeng Li, Jan Hendrik Metzen, Shin-I Cheng, Julio Borges, Anna Khoreva

    Abstract: Recent work has shown great progress in integrating spatial conditioning to control large, pre-trained text-to-image diffusion models. Despite these advances, existing methods describe the spatial image content using hand-crafted conditioning inputs, which are either semantically ambiguous (e.g., edges) or require expensive manual annotations (e.g., semantic segmentation). To address these limitat… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  3. arXiv:2405.20271  [pdf, other

    cs.LG cs.CL cs.CV

    ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections

    Authors: Massimo Bini, Karsten Roth, Zeynep Akata, Anna Khoreva

    Abstract: Parameter-efficient finetuning (PEFT) has become ubiquitous to adapt foundation models to downstream task requirements while retaining their generalization ability. However, the amount of additionally introduced parameters and compute for successful adaptation and hyperparameter searches can explode quickly, especially when deployed at scale to serve numerous individual requests. To ensure effecti… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML 2024. Code available at https://github.com/mwbini/ether

  4. arXiv:2403.13501  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis

    Authors: Yumeng Li, William Beluch, Margret Keuper, Dan Zhang, Anna Khoreva

    Abstract: Despite tremendous progress in the field of text-to-video (T2V) synthesis, open-sourced T2V diffusion models struggle to generate longer videos with dynamically varying and evolving content. They tend to synthesize quasi-static videos, ignoring the necessary visual change-over-time implied in the text prompt. At the same time, scaling these models to enable longer, more dynamic video synthesis oft… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Project page: https://yumengli007.github.io/VSTAR

  5. arXiv:2401.08815  [pdf, other

    cs.CV cs.AI cs.LG

    Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive

    Authors: Yumeng Li, Margret Keuper, Dan Zhang, Anna Khoreva

    Abstract: Despite the recent advances in large-scale diffusion models, little progress has been made on the layout-to-image (L2I) synthesis task. Current L2I models either suffer from poor editability via text or weak alignment between the generated image and the input layout. This limits their usability in practice. To mitigate this, we propose to integrate adversarial supervision into the conventional tra… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted at ICLR 2024. Project page: https://yumengli007.github.io/ALDM/ and code: https://github.com/boschresearch/ALDM

  6. arXiv:2307.10864  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Divide & Bind Your Attention for Improved Generative Semantic Nursing

    Authors: Yumeng Li, Margret Keuper, Dan Zhang, Anna Khoreva

    Abstract: Emerging large-scale text-to-image generative models, e.g., Stable Diffusion (SD), have exhibited overwhelming results with high fidelity. Despite the magnificent progress, current state-of-the-art models still struggle to generate images fully adhering to the input prompt. Prior work, Attend & Excite, has introduced the concept of Generative Semantic Nursing (GSN), aiming to optimize cross-attent… ▽ More

    Submitted 9 October, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: Accepted at BMVC 2023 as Oral. Code: https://github.com/boschresearch/Divide-and-Bind and project page: https://sites.google.com/view/divide-and-bind

  7. arXiv:2307.00648  [pdf, other

    cs.CV cs.AI cs.LG

    Intra- & Extra-Source Exemplar-Based Style Synthesis for Improved Domain Generalization

    Authors: Yumeng Li, Dan Zhang, Margret Keuper, Anna Khoreva

    Abstract: The generalization with respect to domain shifts, as they frequently appear in applications such as autonomous driving, is one of the remaining big challenges for deep learning models. Therefore, we propose an exemplar-based style synthesis pipeline to improve domain generalization in semantic segmentation. Our method is based on a novel masked noise encoder for StyleGAN2 inversion. The model lear… ▽ More

    Submitted 2 July, 2023; originally announced July 2023.

    Comments: An extended version of the accepted WACV paper arXiv:2210.10175

  8. arXiv:2212.01455  [pdf, other

    cs.CV

    Discovering Class-Specific GAN Controls for Semantic Image Synthesis

    Authors: Edgar Schönfeld, Julio Borges, Vadim Sushko, Bernt Schiele, Anna Khoreva

    Abstract: Prior work has extensively studied the latent space structure of GANs for unconditional image synthesis, enabling global editing of generated images by the unsupervised discovery of interpretable latent directions. However, the discovery of latent directions for conditional GANs for semantic image synthesis (SIS) has remained unexplored. In this work, we specifically focus on addressing this gap.… ▽ More

    Submitted 2 December, 2022; originally announced December 2022.

  9. arXiv:2210.10175  [pdf, other

    cs.CV cs.AI cs.LG

    Intra-Source Style Augmentation for Improved Domain Generalization

    Authors: Yumeng Li, Dan Zhang, Margret Keuper, Anna Khoreva

    Abstract: The generalization with respect to domain shifts, as they frequently appear in applications such as autonomous driving, is one of the remaining big challenges for deep learning models. Therefore, we propose an intra-source style augmentation (ISSA) method to improve domain generalization in semantic segmentation. Our method is based on a novel masked noise encoder for StyleGAN2 inversion. The mode… ▽ More

    Submitted 29 May, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

    Comments: Accepted at WACV 2023. Code is available at https://github.com/boschresearch/ISSA

  10. arXiv:2209.07547  [pdf, other

    cs.CV cs.LG

    One-Shot Synthesis of Images and Segmentation Masks

    Authors: Vadim Sushko, Dan Zhang, Juergen Gall, Anna Khoreva

    Abstract: Joint synthesis of images and segmentation masks with generative adversarial networks (GANs) is promising to reduce the effort needed for collecting image data with pixel-wise annotations. However, to learn high-fidelity image-mask synthesis, existing GAN approaches first need a pre-training phase requiring large amounts of image data, which limits their utilization in restricted image domains. In… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: Accepted as a conference paper at IEEE Winter Conference on Applications of Computer Vision (WACV) 2023

  11. arXiv:2105.05847  [pdf, other

    cs.CV cs.LG

    Learning to Generate Novel Scene Compositions from Single Images and Videos

    Authors: Vadim Sushko, Juergen Gall, Anna Khoreva

    Abstract: Training GANs in low-data regimes remains a challenge, as overfitting often leads to memorization or training divergence. In this work, we introduce One-Shot GAN that can learn to generate samples from a training set as little as one image or one video. We propose a two-branch discriminator, with content and layout branches designed to judge the internal content separately from the scene layout re… ▽ More

    Submitted 12 May, 2021; originally announced May 2021.

    Comments: The AI for Content Creation (AICC) workshop at CVPR 2021. The full 8-page version of this submission is available at arXiv:2103.13389

  12. arXiv:2103.13389  [pdf, other

    cs.CV cs.LG

    Generating Novel Scene Compositions from Single Images and Videos

    Authors: Vadim Sushko, Dan Zhang, Juergen Gall, Anna Khoreva

    Abstract: Given a large dataset for training, generative adversarial networks (GANs) can achieve remarkable performance for the image synthesis task. However, training GANs in extremely low data regimes remains a challenge, as overfitting often occurs, leading to memorization or training divergence. In this work, we introduce SIV-GAN, an unconditional generative model that can generate new scene composition… ▽ More

    Submitted 13 December, 2023; v1 submitted 24 March, 2021; originally announced March 2021.

    Comments: Accepted for publication in Computer Vision and Image Understanding: https://www.sciencedirect.com/science/article/pii/S1077314223002680. Code repository: https://github.com/boschresearch/one-shot-synthesis

  13. arXiv:2012.04781  [pdf, other

    cs.CV cs.LG eess.IV

    You Only Need Adversarial Supervision for Semantic Image Synthesis

    Authors: Vadim Sushko, Edgar Schönfeld, Dan Zhang, Juergen Gall, Bernt Schiele, Anna Khoreva

    Abstract: Despite their recent successes, GAN models for semantic image synthesis still suffer from poor image quality when trained with only adversarial supervision. Historically, additionally employing the VGG-based perceptual loss has helped to overcome this issue, significantly improving the synthesis quality, but at the same time limiting the progress of GAN models for semantic image synthesis. In this… ▽ More

    Submitted 19 March, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

    Comments: Published at ICLR 2021 (Main Conference). Code repository: https://github.com/boschresearch/OASIS

  14. arXiv:2011.12636  [pdf, other

    cs.CV cs.CG cs.LG

    Improving Augmentation and Evaluation Schemes for Semantic Image Synthesis

    Authors: Prateek Katiyar, Anna Khoreva

    Abstract: Despite data augmentation being a de facto technique for boosting the performance of deep neural networks, little attention has been paid to develo** augmentation strategies for generative adversarial networks (GANs). To this end, we introduce a novel augmentation scheme designed specifically for GAN-based semantic image synthesis models. We propose to randomly warp object shapes in the semantic… ▽ More

    Submitted 30 January, 2021; v1 submitted 25 November, 2020; originally announced November 2020.

  15. arXiv:2002.12655  [pdf, other

    cs.CV cs.LG eess.IV

    A U-Net Based Discriminator for Generative Adversarial Networks

    Authors: Edgar Schönfeld, Bernt Schiele, Anna Khoreva

    Abstract: Among the major remaining challenges for generative adversarial networks (GANs) is the capacity to synthesize globally and locally coherent images with object shapes and textures indistinguishable from real images. To target this issue we propose an alternative U-Net based discriminator architecture, borrowing the insights from the segmentation literature. The proposed U-Net based architecture all… ▽ More

    Submitted 19 March, 2021; v1 submitted 28 February, 2020; originally announced February 2020.

    Comments: CVPR 2020 (Main Conference). Code repository: https://github.com/boschresearch/unetgan

  16. arXiv:1907.13054  [pdf, other

    cs.CV cs.AI cs.LG

    Grid Saliency for Context Explanations of Semantic Segmentation

    Authors: Lukas Hoyer, Mauricio Munoz, Prateek Katiyar, Anna Khoreva, Volker Fischer

    Abstract: Recently, there has been a growing interest in develo** saliency methods that provide visual explanations of network predictions. Still, the usability of existing methods is limited to image classification models. To overcome this limitation, we extend the existing approaches to generate grid saliencies, which provide spatially coherent visual explanations for (pixel-level) dense prediction netw… ▽ More

    Submitted 7 November, 2019; v1 submitted 30 July, 2019; originally announced July 2019.

    Comments: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019)

  17. arXiv:1903.08960  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Short-Term Prediction and Multi-Camera Fusion on Semantic Grids

    Authors: Lukas Hoyer, Patrick Kesper, Anna Khoreva, Volker Fischer

    Abstract: An environment representation (ER) is a substantial part of every autonomous system. It introduces a common interface between perception and other system components, such as decision making, and allows downstream algorithms to deal with abstracted data without knowledge of the used sensor. In this work, we propose and evaluate a novel architecture that generates an egocentric, grid-based, predicti… ▽ More

    Submitted 26 July, 2019; v1 submitted 21 March, 2019; originally announced March 2019.

  18. arXiv:1901.10422  [pdf, other

    cs.CV

    Progressive Augmentation of GANs

    Authors: Dan Zhang, Anna Khoreva

    Abstract: Training of Generative Adversarial Networks (GANs) is notoriously fragile, requiring to maintain a careful balance between the generator and the discriminator in order to perform well. To mitigate this issue we introduce a new regularization technique - progressive augmentation of GANs (PA-GAN). The key idea is to gradually increase the task difficulty of the discriminator by progressively augment… ▽ More

    Submitted 28 October, 2019; v1 submitted 29 January, 2019; originally announced January 2019.

    Comments: Accepted at NeurIPS'19

  19. arXiv:1804.07909  [pdf, other

    cs.CV

    Learning to Refine Human Pose Estimation

    Authors: Mihai Fieraru, Anna Khoreva, Leonid Pishchulin, Bernt Schiele

    Abstract: Multi-person pose estimation in images and videos is an important yet challenging task with many applications. Despite the large improvements in human pose estimation enabled by the development of convolutional neural networks, there still exist a lot of difficult cases where even the state-of-the-art models fail to correctly localize all body joints. This motivates the need for an additional refi… ▽ More

    Submitted 21 April, 2018; originally announced April 2018.

    Comments: To appear in CVPRW (2018). Workshop: Visual Understanding of Humans in Crowd Scene and the 2nd Look Into Person Challenge (VUHCS-LIP)

  20. arXiv:1803.08006  [pdf, other

    cs.CV

    Video Object Segmentation with Language Referring Expressions

    Authors: Anna Khoreva, Anna Rohrbach, Bernt Schiele

    Abstract: Most state-of-the-art semi-supervised video object segmentation methods rely on a pixel-accurate mask of a target object provided for the first frame of a video. However, obtaining a detailed segmentation mask is expensive and time-consuming. In this work we explore an alternative way of identifying a target object, namely by employing language referring expressions. Besides being a more practical… ▽ More

    Submitted 5 February, 2019; v1 submitted 21 March, 2018; originally announced March 2018.

    Comments: ACCV 2018: 14th Asian Conference on Computer Vision

  21. arXiv:1703.09554  [pdf, other

    cs.CV

    Lucid Data Dreaming for Video Object Segmentation

    Authors: Anna Khoreva, Rodrigo Benenson, Eddy Ilg, Thomas Brox, Bernt Schiele

    Abstract: Convolutional networks reach top quality in pixel-level video object segmentation but require a large amount of training data (1k~100k) to deliver such results. We propose a new training strategy which achieves state-of-the-art results across three evaluation datasets while using 20x~1000x less annotated data than competing methods. Our approach is suitable for both single and multiple object segm… ▽ More

    Submitted 13 March, 2019; v1 submitted 28 March, 2017; originally announced March 2017.

    Comments: Accepted in International Journal of Computer Vision (IJCV)

  22. arXiv:1701.08261  [pdf, other

    cs.CV

    Exploiting saliency for object segmentation from image level labels

    Authors: Seong Joon Oh, Rodrigo Benenson, Anna Khoreva, Zeynep Akata, Mario Fritz, Bernt Schiele

    Abstract: There have been remarkable improvements in the semantic labelling task in the recent years. However, the state of the art methods rely on large-scale pixel-level annotations. This paper studies the problem of training a pixel-wise semantic labeller network from image-level annotations of the present object classes. Recently, it has been shown that high quality seeds indicating discriminative objec… ▽ More

    Submitted 14 July, 2017; v1 submitted 28 January, 2017; originally announced January 2017.

    Comments: CVPR 2017

  23. arXiv:1612.02646  [pdf, other

    cs.CV

    Learning Video Object Segmentation from Static Images

    Authors: Anna Khoreva, Federico Perazzi, Rodrigo Benenson, Bernt Schiele, Alexander Sorkine-Hornung

    Abstract: Inspired by recent advances of deep learning in instance segmentation and object tracking, we introduce video object segmentation problem as a concept of guided instance segmentation. Our model proceeds on a per-frame basis, guided by the output of the previous frame towards the object of interest in the next frame. We demonstrate that highly accurate object segmentation in videos can be enabled b… ▽ More

    Submitted 8 December, 2016; originally announced December 2016.

    Comments: Submitted to CVPR 2017

  24. arXiv:1605.03718  [pdf, other

    cs.CV

    Improved Image Boundaries for Better Video Segmentation

    Authors: Anna Khoreva, Rodrigo Benenson, Fabio Galasso, Matthias Hein, Bernt Schiele

    Abstract: Graph-based video segmentation methods rely on superpixels as starting point. While most previous work has focused on the construction of the graph edges and weights as well as solving the graph partitioning problem, this paper focuses on better superpixels for video segmentation. We demonstrate by a comparative analysis that superpixels extracted from boundaries perform best, and show that bounda… ▽ More

    Submitted 23 November, 2016; v1 submitted 12 May, 2016; originally announced May 2016.

  25. arXiv:1603.07485  [pdf, other

    cs.CV

    Simple Does It: Weakly Supervised Instance and Semantic Segmentation

    Authors: Anna Khoreva, Rodrigo Benenson, Jan Hosang, Matthias Hein, Bernt Schiele

    Abstract: Semantic labelling and instance segmentation are two tasks that require particularly costly annotations. Starting from weak supervision in the form of bounding box detection annotations, we propose a new approach that does not require modification of the segmentation training procedure. We show that when carefully designing the input labels from given bounding boxes, even a single round of trainin… ▽ More

    Submitted 23 November, 2016; v1 submitted 24 March, 2016; originally announced March 2016.

  26. arXiv:1511.07803  [pdf, other

    cs.CV

    Weakly Supervised Object Boundaries

    Authors: Anna Khoreva, Rodrigo Benenson, Mohamed Omran, Matthias Hein, Bernt Schiele

    Abstract: State-of-the-art learning based boundary detection methods require extensive training data. Since labelling object boundaries is one of the most expensive types of annotations, there is a need to relax the requirement to carefully annotate images to make both the training more affordable and to extend the amount of training data. In this paper we propose a technique to generate weakly supervised a… ▽ More

    Submitted 24 November, 2015; originally announced November 2015.