Skip to main content

Showing 1–24 of 24 results for author: Raue, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.17670  [pdf, other

    eess.IV cs.AI cs.CV cs.ET cs.LG

    Federated Learning for Blind Image Super-Resolution

    Authors: Brian B. Moser, Ahmed Anwar, Federico Raue, Stanislav Frolov, Andreas Dengel

    Abstract: Traditional blind image SR methods need to model real-world degradations precisely. Consequently, current research struggles with this dilemma by assuming idealized degradations, which leads to limited applicability to actual user data. Moreover, the ideal scenario - training models on data from the targeted user base - presents significant privacy concerns. To address both challenges, we propose… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  2. arXiv:2403.17083  [pdf, other

    eess.IV cs.AI cs.CV cs.GR cs.LG

    A Study in Dataset Pruning for Image Super-Resolution

    Authors: Brian B. Moser, Federico Raue, Andreas Dengel

    Abstract: In image Super-Resolution (SR), relying on large datasets for training is a double-edged sword. While offering rich training material, they also demand substantial computational and storage resources. In this work, we analyze dataset pruning to solve these challenges. We introduce a novel approach that reduces a dataset to a core-set of training samples, selected based on their loss values as dete… ▽ More

    Submitted 8 June, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  3. arXiv:2403.03881  [pdf, other

    cs.CV cs.AI cs.LG

    Latent Dataset Distillation with Diffusion Models

    Authors: Brian B. Moser, Federico Raue, Sebastian Palacio, Stanislav Frolov, Andreas Dengel

    Abstract: The efficacy of machine learning has traditionally relied on the availability of increasingly larger datasets. However, large datasets pose storage challenges and contain non-influential samples, which could be ignored during training without impacting the final accuracy of the model. In response to these limitations, the concept of distilling the information on a dataset into a condensed set of (… ▽ More

    Submitted 24 March, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  4. arXiv:2401.00736  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Diffusion Models, Image Super-Resolution And Everything: A Survey

    Authors: Brian B. Moser, Arundhati S. Shanbhag, Federico Raue, Stanislav Frolov, Sebastian Palacio, Andreas Dengel

    Abstract: Diffusion Models (DMs) have disrupted the image Super-Resolution (SR) field and further closed the gap between image quality and human perceptual preferences. They are easy to train and can produce very high-quality samples that exceed the realism of those produced by previous generative methods. Despite their promising results, they also come with new challenges that need further research: high c… ▽ More

    Submitted 23 June, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

  5. arXiv:2308.07977  [pdf, other

    cs.CV cs.AI cs.LG

    Dynamic Attention-Guided Diffusion for Image Super-Resolution

    Authors: Brian B. Moser, Stanislav Frolov, Federico Raue, Sebastian Palacio, Andreas Dengel

    Abstract: Diffusion models in image Super-Resolution (SR) treat all image regions with uniform intensity, which risks compromising the overall image quality. To address this, we introduce "You Only Diffuse Areas" (YODA), a dynamic attention-guided diffusion method for image SR. YODA selectively focuses on spatial regions using attention maps derived from the low-resolution image and the current time step in… ▽ More

    Submitted 7 March, 2024; v1 submitted 15 August, 2023; originally announced August 2023.

    Comments: Brian B. Moser and Stanislav Frolov contributed equally

  6. arXiv:2307.04593  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    DWA: Differential Wavelet Amplifier for Image Super-Resolution

    Authors: Brian B. Moser, Stanislav Frolov, Federico Raue, Sebastian Palacio, Andreas Dengel

    Abstract: This work introduces Differential Wavelet Amplifier (DWA), a drop-in module for wavelet-based image Super-Resolution (SR). DWA invigorates an approach recently receiving less attention, namely Discrete Wavelet Transformation (DWT). DWT enables an efficient image representation for SR and reduces the spatial area of its input by a factor of 4, the overall model size, and computation cost, framing i… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

  7. DartsReNet: Exploring new RNN cells in ReNet architectures

    Authors: Brian Moser, Federico Raue, Jörn Hees, Andreas Dengel

    Abstract: We present new Recurrent Neural Network (RNN) cells for image classification using a Neural Architecture Search (NAS) approach called DARTS. We are interested in the ReNet architecture, which is a RNN based approach presented as an alternative for convolutional and pooling steps. ReNet can be defined using any standard RNN cells, such as LSTM and GRU. One limitation is that standard RNN cells were… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  8. arXiv:2304.01994  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Waving Goodbye to Low-Res: A Diffusion-Wavelet Approach for Image Super-Resolution

    Authors: Brian Moser, Stanislav Frolov, Federico Raue, Sebastian Palacio, Andreas Dengel

    Abstract: This paper presents a novel Diffusion-Wavelet (DiWa) approach for Single-Image Super-Resolution (SISR). It leverages the strengths of Denoising Diffusion Probabilistic Models (DDPMs) and Discrete Wavelet Transformation (DWT). By enabling DDPMs to operate in the DWT domain, our DDPM models effectively hallucinate high-frequency information for super-resolved images on the wavelet spectrum, resultin… ▽ More

    Submitted 5 April, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

  9. arXiv:2209.13131  [pdf, other

    cs.CV cs.LG eess.IV

    Hitchhiker's Guide to Super-Resolution: Introduction and Recent Advances

    Authors: Brian Moser, Federico Raue, Stanislav Frolov, Jörn Hees, Sebastian Palacio, Andreas Dengel

    Abstract: With the advent of Deep Learning (DL), Super-Resolution (SR) has also become a thriving research area. However, despite promising results, the field still faces challenges that require further research e.g., allowing flexible upsampling, more effective loss functions, and better evaluation metrics. We review the domain of SR in light of recent advances, and examine state-of-the-art models such as… ▽ More

    Submitted 14 February, 2023; v1 submitted 26 September, 2022; originally announced September 2022.

    Comments: accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

  10. Less is More: Proxy Datasets in NAS approaches

    Authors: Brian Moser, Federico Raue, Jörn Hees, Andreas Dengel

    Abstract: Neural Architecture Search (NAS) defines the design of Neural Networks as a search problem. Unfortunately, NAS is computationally intensive because of various possibilities depending on the number of elements in the design and the possible connections between them. In this work, we extensively analyze the role of the dataset size based on several sampling approaches for reducing the dataset size (… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Journal ref: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

  11. arXiv:2108.09696  [pdf, other

    cs.CV

    Spatial Transformer Networks for Curriculum Learning

    Authors: Fatemeh Azimi, Jean-Francois Jacques Nicolas Nies, Sebastian Palacio, Federico Raue, Jörn Hees, Andreas Dengel

    Abstract: Curriculum learning is a bio-inspired training technique that is widely adopted to machine learning for improved optimization and better training of neural networks regarding the convergence rate or obtained accuracy. The main concept in curriculum learning is to start the training with simpler tasks and gradually increase the level of difficulty. Therefore, a natural question is how to determine… ▽ More

    Submitted 22 August, 2021; originally announced August 2021.

  12. arXiv:2106.14295  [pdf, other

    cs.LG

    A Reinforcement Learning Approach for Sequential Spatial Transformer Networks

    Authors: Fatemeh Azimi, Federico Raue, Joern Hees, Andreas Dengel

    Abstract: Spatial Transformer Networks (STN) can generate geometric transformations which modify input images to improve the classifier's performance. In this work, we combine the idea of STN with Reinforcement Learning (RL). To this end, we break the affine transformation down into a sequence of simple and discrete transformations. We formulate the task as a Markovian Decision Process (MDP) and use RL to s… ▽ More

    Submitted 27 June, 2021; originally announced June 2021.

  13. arXiv:2106.13043  [pdf, ps, other

    cs.SD cs.CV eess.AS

    AudioCLIP: Extending CLIP to Image, Text and Audio

    Authors: Andrey Guzhov, Federico Raue, Jörn Hees, Andreas Dengel

    Abstract: In the past, the rapidly evolving field of sound classification greatly benefited from the application of methods from other domains. Today, we observe the trend to fuse domain-specific tasks and approaches together, which provides the community with new outstanding models. In this work, we present an extension of the CLIP model that handles audio in addition to text and images. Our proposed mod… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Comments: submitted to GCPR 2021

  14. arXiv:2105.10189  [pdf, other

    cs.CV

    Combining Transformer Generators with Convolutional Discriminators

    Authors: Ricard Durall, Stanislav Frolov, Jörn Hees, Federico Raue, Franz-Josef Pfreundt, Andreas Dengel, Janis Keupe

    Abstract: Transformer models have recently attracted much interest from computer vision researchers and have since been successfully employed for several problems traditionally addressed with convolutional neural networks. At the same time, image synthesis using generative adversarial networks (GANs) has drastically improved over the last few years. The recently proposed TransGAN is the first GAN using only… ▽ More

    Submitted 10 July, 2021; v1 submitted 21 May, 2021; originally announced May 2021.

  15. arXiv:2104.11587  [pdf, other

    cs.SD eess.AS

    ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio

    Authors: Andrey Guzhov, Federico Raue, Jörn Hees, Andreas Dengel

    Abstract: Environmental Sound Classification (ESC) is a rapidly evolving field that recently demonstrated the advantages of application of visual domain techniques to the audio-related tasks. Previous studies indicate that the domain-specific modification of cross-domain approaches show a promise in pushing the whole area of ESC forward. In this paper, we present a new time-frequency transformation layer… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

    Comments: submitted IJCNN 2021

  16. arXiv:2103.13722  [pdf, other

    cs.CV

    AttrLostGAN: Attribute Controlled Image Synthesis from Reconfigurable Layout and Style

    Authors: Stanislav Frolov, Avneesh Sharma, Jörn Hees, Tushar Karayil, Federico Raue, Andreas Dengel

    Abstract: Conditional image synthesis from layout has recently attracted much interest. Previous approaches condition the generator on object locations as well as class labels but lack fine-grained control over the diverse appearance aspects of individual objects. Gaining control over the image generation process is fundamental to build practical applications with a user-friendly interface. In this paper, w… ▽ More

    Submitted 26 August, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

    Comments: Accepted to GCPR 2021. Link to code: https://github.com/stanifrolov/AttrLostGAN

  17. Adversarial Text-to-Image Synthesis: A Review

    Authors: Stanislav Frolov, Tobias Hinz, Federico Raue, Jörn Hees, Andreas Dengel

    Abstract: With the advent of generative adversarial networks, synthesizing images from textual descriptions has recently become an active research area. It is a flexible and intuitive way for conditional image generation with significant progress in the last years regarding visual realism, diversity, and semantic alignment. However, the field still faces several challenges that require further research effo… ▽ More

    Submitted 6 October, 2021; v1 submitted 25 January, 2021; originally announced January 2021.

    Comments: Published at Neural Networks Journal, available at https://www.sciencedirect.com/science/article/pii/S0893608021002823

    Journal ref: Neural Networks, 2021

  18. arXiv:2010.05069  [pdf, other

    cs.CV

    Hybrid-S2S: Video Object Segmentation with Recurrent Networks and Correspondence Matching

    Authors: Fatemeh Azimi, Stanislav Frolov, Federico Raue, Joern Hees, Andreas Dengel

    Abstract: One-shot Video Object Segmentation~(VOS) is the task of pixel-wise tracking an object of interest within a video sequence, where the segmentation mask of the first frame is given at inference time. In recent years, Recurrent Neural Networks~(RNNs) have been widely used for VOS tasks, but they often suffer from limitations such as drift and error propagation. In this work, we study an RNN-based arc… ▽ More

    Submitted 7 November, 2020; v1 submitted 10 October, 2020; originally announced October 2020.

  19. arXiv:2004.12170  [pdf, other

    cs.CV

    Revisiting Sequence-to-Sequence Video Object Segmentation with Multi-Task Loss and Skip-Memory

    Authors: Fatemeh Azimi, Benjamin Bischke, Sebastian Palacio, Federico Raue, Joern Hees, Andreas Dengel

    Abstract: Video Object Segmentation (VOS) is an active research area of the visual domain. One of its fundamental sub-tasks is semi-supervised / one-shot learning: given only the segmentation mask for the first frame, the task is to provide pixel-accurate masks for the object over the rest of the sequence. Despite much progress in the last years, we noticed that many of the existing approaches lose objects… ▽ More

    Submitted 25 April, 2020; originally announced April 2020.

  20. arXiv:2004.07301  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    ESResNet: Environmental Sound Classification Based on Visual Domain Models

    Authors: Andrey Guzhov, Federico Raue, Jörn Hees, Andreas Dengel

    Abstract: Environmental Sound Classification (ESC) is an active research area in the audio domain and has seen a lot of progress in the past years. However, many of the existing approaches achieve high accuracy by relying on domain-specific features and architectures, making it harder to benefit from advances in other fields (e.g., the image domain). Additionally, some of the past successes have been attrib… ▽ More

    Submitted 15 April, 2020; originally announced April 2020.

    Comments: 8 pages, 4 figures; submitted to ICPR 2020

  21. arXiv:2003.11844  [pdf, other

    cs.CV

    P $\approx$ NP, at least in Visual Question Answering

    Authors: Shailza Jolly, Sebastian Palacio, Joachim Folz, Federico Raue, Joern Hees, Andreas Dengel

    Abstract: In recent years, progress in the Visual Question Answering (VQA) field has largely been driven by public challenges and large datasets. One of the most widely-used of these is the VQA 2.0 dataset, consisting of polar ("yes/no") and non-polar questions. Looking at the question distribution over all answers, we find that the answers "yes" and "no" account for 38 % of the questions, while the remaini… ▽ More

    Submitted 27 March, 2020; v1 submitted 26 March, 2020; originally announced March 2020.

  22. arXiv:1901.02322  [pdf, other

    cs.LG cs.AI stat.ML

    Fusion Strategies for Learning User Embeddings with Neural Networks

    Authors: Philipp Blandfort, Tushar Karayil, Federico Raue, Jörn Hees, Andreas Dengel

    Abstract: Growing amounts of online user data motivate the need for automated processing techniques. In case of user ratings, one interesting option is to use neural networks for learning to predict ratings given an item and a user. While training for prediction, such an approach at the same time learns to map each user to a vector, a so-called user embedding. Such embeddings can for example be valuable for… ▽ More

    Submitted 8 January, 2019; originally announced January 2019.

    Comments: submitted to IJCNN 2019

  23. arXiv:1803.08337  [pdf, other

    cs.CV cs.LG

    What do Deep Networks Like to See?

    Authors: Sebastian Palacio, Joachim Folz, Jörn Hees, Federico Raue, Damian Borth, Andreas Dengel

    Abstract: We propose a novel way to measure and understand convolutional neural networks by quantifying the amount of input signal they let in. To do this, an autoencoder (AE) was fine-tuned on gradients from a pre-trained classifier with fixed parameters. We compared the reconstructed samples from AEs that were fine-tuned on a set of image classifiers (AlexNet, VGG16, ResNet-50, and Inception~v3) and found… ▽ More

    Submitted 22 March, 2018; originally announced March 2018.

  24. arXiv:1511.04401  [pdf, other

    cs.CV cs.CL cs.LG cs.NE

    Symbol Grounding Association in Multimodal Sequences with Missing Elements

    Authors: Federico Raue, Andreas Dengel, Thomas M. Breuel, Marcus Liwicki

    Abstract: In this paper, we extend a symbolic association framework for being able to handle missing elements in multimodal sequences. The general scope of the work is the symbolic associations of object-word map**s as it happens in language development in infants. In other words, two different representations of the same abstract concepts can associate in both directions. This scenario has been long inte… ▽ More

    Submitted 7 December, 2017; v1 submitted 13 November, 2015; originally announced November 2015.

    Comments: Under review on Journal of Artificial Intelligence Research (JAIR) -- Special Track on Deep Learning, Knowledge Representation, and Reasoning