-
Grid Partitioned Attention: Efficient TransformerApproximation with Inductive Bias for High Resolution Detail Generation
Authors:
Nikolay Jetchev,
Gökhan Yildirim,
Christian Bracher,
Roland Vollgraf
Abstract:
Attention is a general reasoning mechanism than can flexibly deal with image information, but its memory requirements had made it so far impractical for high resolution image generation. We present Grid Partitioned Attention (GPA), a new approximate attention algorithm that leverages a sparse inductive bias for higher computational and memory efficiency in image domains: queries attend only to few…
▽ More
Attention is a general reasoning mechanism than can flexibly deal with image information, but its memory requirements had made it so far impractical for high resolution image generation. We present Grid Partitioned Attention (GPA), a new approximate attention algorithm that leverages a sparse inductive bias for higher computational and memory efficiency in image domains: queries attend only to few keys, spatially close queries attend to close keys due to correlations. Our paper introduces the new attention layer, analyzes its complexity and how the trade-off between memory usage and model power can be tuned by the hyper-parameters.We will show how such attention enables novel deep learning architectures with copying modules that are especially useful for conditional image generation tasks like pose morphing. Our contributions are (i) algorithm and code1of the novel GPA layer, (ii) a novel deep attention-copying architecture, and (iii) new state-of-the art experimental results in human pose morphing generation benchmarks.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
Autoregressive Denoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting
Authors:
Kashif Rasul,
Calvin Seward,
Ingmar Schuster,
Roland Vollgraf
Abstract:
In this work, we propose \texttt{TimeGrad}, an autoregressive model for multivariate probabilistic time series forecasting which samples from the data distribution at each time step by estimating its gradient. To this end, we use diffusion probabilistic models, a class of latent variable models closely connected to score matching and energy-based methods. Our model learns gradients by optimizing a…
▽ More
In this work, we propose \texttt{TimeGrad}, an autoregressive model for multivariate probabilistic time series forecasting which samples from the data distribution at each time step by estimating its gradient. To this end, we use diffusion probabilistic models, a class of latent variable models closely connected to score matching and energy-based methods. Our model learns gradients by optimizing a variational bound on the data likelihood and at inference time converts white noise into a sample of the distribution of interest through a Markov chain using Langevin sampling. We demonstrate experimentally that the proposed autoregressive denoising diffusion model is the new state-of-the-art multivariate probabilistic forecasting method on real-world data sets with thousands of correlated dimensions. We hope that this method is a useful tool for practitioners and lays the foundation for future research in this area.
△ Less
Submitted 2 February, 2021; v1 submitted 28 January, 2021;
originally announced January 2021.
-
CRISP: A Probabilistic Model for Individual-Level COVID-19 Infection Risk Estimation Based on Contact Data
Authors:
Ralf Herbrich,
Rajeev Rastogi,
Roland Vollgraf
Abstract:
We present CRISP (COVID-19 Risk Score Prediction), a probabilistic graphical model for COVID-19 infection spread through a population based on the SEIR model where we assume access to (1) mutual contacts between pairs of individuals across time across various channels (e.g., Bluetooth contact traces), as well as (2) test outcomes at given times for infection, exposure and immunity tests. Our micro…
▽ More
We present CRISP (COVID-19 Risk Score Prediction), a probabilistic graphical model for COVID-19 infection spread through a population based on the SEIR model where we assume access to (1) mutual contacts between pairs of individuals across time across various channels (e.g., Bluetooth contact traces), as well as (2) test outcomes at given times for infection, exposure and immunity tests. Our micro-level model keeps track of the infection state for each individual at every point in time, ranging from susceptible, exposed, infectious to recovered. We develop both a Monte Carlo EM as well as a message passing algorithm to infer contact-channel specific infection transmission probabilities. Our Monte Carlo algorithm uses Gibbs sampling to draw samples of the latent infection status of each individual over the entire time period of analysis, given the latent infection status of all contacts and test outcome data. Experimental results with simulated data demonstrate our CRISP model can be parametrized by the reproduction factor $R_0$ and exhibits population-level infectiousness and recovery time series similar to those of the classical SEIR model. However, due to the individual contact data, this model allows fine grained control and inference for a wide range of COVID-19 mitigation and suppression policy measures. Moreover, the block-Gibbs sampling algorithm is able to support efficient testing in a test-trace-isolate approach to contain COVID-19 infection spread. To the best of our knowledge, this is the first model with efficient inference for COVID-19 infection spread based on individual-level contact data; most epidemic models are macro-level models that reason over entire populations. The implementation of CRISP is available in Python and C++ at https://github.com/zalandoresearch/CRISP.
△ Less
Submitted 30 June, 2022; v1 submitted 9 June, 2020;
originally announced June 2020.
-
Multivariate Probabilistic Time Series Forecasting via Conditioned Normalizing Flows
Authors:
Kashif Rasul,
Abdul-Saboor Sheikh,
Ingmar Schuster,
Urs Bergmann,
Roland Vollgraf
Abstract:
Time series forecasting is often fundamental to scientific and engineering problems and enables decision making. With ever increasing data set sizes, a trivial solution to scale up predictions is to assume independence between interacting time series. However, modeling statistical dependencies can improve accuracy and enable analysis of interaction effects. Deep learning methods are well suited fo…
▽ More
Time series forecasting is often fundamental to scientific and engineering problems and enables decision making. With ever increasing data set sizes, a trivial solution to scale up predictions is to assume independence between interacting time series. However, modeling statistical dependencies can improve accuracy and enable analysis of interaction effects. Deep learning methods are well suited for this problem, but multivariate models often assume a simple parametric distribution and do not scale to high dimensions. In this work we model the multivariate temporal dynamics of time series via an autoregressive deep learning model, where the data distribution is represented by a conditioned normalizing flow. This combination retains the power of autoregressive models, such as good performance in extrapolation into the future, with the flexibility of flows as a general purpose high-dimensional distribution model, while remaining computationally tractable. We show that it improves over the state-of-the-art for standard metrics on many real-world data sets with several thousand interacting time-series.
△ Less
Submitted 14 January, 2021; v1 submitted 14 February, 2020;
originally announced February 2020.
-
Set Flow: A Permutation Invariant Normalizing Flow
Authors:
Kashif Rasul,
Ingmar Schuster,
Roland Vollgraf,
Urs Bergmann
Abstract:
We present a generative model that is defined on finite sets of exchangeable, potentially high dimensional, data. As the architecture is an extension of RealNVPs, it inherits all its favorable properties, such as being invertible and allowing for exact log-likelihood evaluation. We show that this architecture is able to learn finite non-i.i.d. set data distributions, learn statistical dependencies…
▽ More
We present a generative model that is defined on finite sets of exchangeable, potentially high dimensional, data. As the architecture is an extension of RealNVPs, it inherits all its favorable properties, such as being invertible and allowing for exact log-likelihood evaluation. We show that this architecture is able to learn finite non-i.i.d. set data distributions, learn statistical dependencies between entities of the set and is able to train and sample with variable set sizes in a computationally efficient manner. Experiments on 3D point clouds show state-of-the art likelihoods.
△ Less
Submitted 6 September, 2019;
originally announced September 2019.
-
Generating High-Resolution Fashion Model Images Wearing Custom Outfits
Authors:
Gökhan Yildirim,
Nikolay Jetchev,
Roland Vollgraf,
Urs Bergmann
Abstract:
Visualizing an outfit is an essential part of shop** for clothes. Due to the combinatorial aspect of combining fashion articles, the available images are limited to a pre-determined set of outfits. In this paper, we broaden these visualizations by generating high-resolution images of fashion models wearing a custom outfit under an input body pose. We show that our approach can not only transfer…
▽ More
Visualizing an outfit is an essential part of shop** for clothes. Due to the combinatorial aspect of combining fashion articles, the available images are limited to a pre-determined set of outfits. In this paper, we broaden these visualizations by generating high-resolution images of fashion models wearing a custom outfit under an input body pose. We show that our approach can not only transfer the style and the pose of one generated outfit to another, but also create realistic images of human bodies and garments.
△ Less
Submitted 23 August, 2019;
originally announced August 2019.
-
A Deep Learning System for Predicting Size and Fit in Fashion E-Commerce
Authors:
Abdul-Saboor Sheikh,
Romain Guigoures,
Evgenii Koriagin,
Yuen King Ho,
Reza Shirvany,
Roland Vollgraf,
Urs Bergmann
Abstract:
Personalized size and fit recommendations bear crucial significance for any fashion e-commerce platform. Predicting the correct fit drives customer satisfaction and benefits the business by reducing costs incurred due to size-related returns. Traditional collaborative filtering algorithms seek to model customer preferences based on their previous orders. A typical challenge for such methods stems…
▽ More
Personalized size and fit recommendations bear crucial significance for any fashion e-commerce platform. Predicting the correct fit drives customer satisfaction and benefits the business by reducing costs incurred due to size-related returns. Traditional collaborative filtering algorithms seek to model customer preferences based on their previous orders. A typical challenge for such methods stems from extreme sparsity of customer-article orders. To alleviate this problem, we propose a deep learning based content-collaborative methodology for personalized size and fit recommendation. Our proposed method can ingest arbitrary customer and article data and can model multiple individuals or intents behind a single account. The method optimizes a global set of parameters to learn population-level abstractions of size and fit relevant information from observed customer-article interactions. It further employs customer and article specific embedding variables to learn their properties. Together with learned entity embeddings, the method maps additional customer and article attributes into a latent space to derive personalized recommendations. Application of our method to two publicly available datasets demonstrate an improvement over the state-of-the-art published results. On two proprietary datasets, one containing fit feedback from fashion experts and the other involving customer purchases, we further outperform comparable methodologies, including a recent Bayesian approach for size recommendation.
△ Less
Submitted 23 July, 2019;
originally announced July 2019.
-
Learning Set-equivariant Functions with SWARM Map**s
Authors:
Roland Vollgraf
Abstract:
In this work we propose a new neural network architecture that efficiently implements and learns general purpose set-equivariant functions. Such a function f maps a set of entities x = {x1, . . . , xn} from one domain to a set of same cardinality y = f (x) = {y1, . . . , yn} in another domain regardless of the ordering of the entities. The architecture is based on a gated recurrent network which i…
▽ More
In this work we propose a new neural network architecture that efficiently implements and learns general purpose set-equivariant functions. Such a function f maps a set of entities x = {x1, . . . , xn} from one domain to a set of same cardinality y = f (x) = {y1, . . . , yn} in another domain regardless of the ordering of the entities. The architecture is based on a gated recurrent network which is iteratively applied to all entities individually and at the same time syncs with the progression of the whole population. In reminiscence to this pattern, which can be frequently observed in nature, we call our approach SWARM map**. Set-equivariant and generally permutation invariant functions are important building blocks for many state of the art machine learning approaches. Even in applications where the permutation invariance is not of primary interest, as to be seen in the recent success of attention based transformer models (Vaswani et. al. 2017). Accordingly, we demonstrate the power and usefulness of SWARM map**s in different applications. We compare the performance of our approach with another recently proposed set-equivariant function, the Set Transformer (Lee et.al. 2018) and we demonstrate that models solely based on SWARM layers gives state of the art results.
△ Less
Submitted 20 September, 2019; v1 submitted 22 June, 2019;
originally announced June 2019.
-
A Bandit Framework for Optimal Selection of Reinforcement Learning Agents
Authors:
Andreas Merentitis,
Kashif Rasul,
Roland Vollgraf,
Abdul-Saboor Sheikh,
Urs Bergmann
Abstract:
Deep Reinforcement Learning has been shown to be very successful in complex games, e.g. Atari or Go. These games have clearly defined rules, and hence allow simulation. In many practical applications, however, interactions with the environment are costly and a good simulator of the environment is not available. Further, as environments differ by application, the optimal inductive bias (architectur…
▽ More
Deep Reinforcement Learning has been shown to be very successful in complex games, e.g. Atari or Go. These games have clearly defined rules, and hence allow simulation. In many practical applications, however, interactions with the environment are costly and a good simulator of the environment is not available. Further, as environments differ by application, the optimal inductive bias (architecture, hyperparameters, etc.) of a reinforcement agent depends on the application. In this work, we propose a multi-arm bandit framework that selects from a set of different reinforcement learning agents to choose the one with the best inductive bias. To alleviate the problem of sparse rewards, the reinforcement learning agents are augmented with surrogate rewards. This helps the bandit framework to select the best agents early, since these rewards are smoother and less sparse than the environment reward. The bandit has the double objective of maximizing the reward while the agents are learning and selecting the best agent after a finite number of learning steps. Our experimental results on standard environments show that the proposed framework is able to consistently select the optimal agent after a finite number of steps, while collecting more cumulative reward compared to selecting a sub-optimal architecture or uniformly alternating between different agents.
△ Less
Submitted 10 February, 2019;
originally announced February 2019.
-
Studio2Shop: from studio photo shoots to fashion articles
Authors:
Julia Lasserre,
Katharina Rasch,
Roland Vollgraf
Abstract:
Fashion is an increasingly important topic in computer vision, in particular the so-called street-to-shop task of matching street images with shop images containing similar fashion items. Solving this problem promises new means of making fashion searchable and hel** shoppers find the articles they are looking for. This paper focuses on finding pieces of clothing worn by a person in full-body or…
▽ More
Fashion is an increasingly important topic in computer vision, in particular the so-called street-to-shop task of matching street images with shop images containing similar fashion items. Solving this problem promises new means of making fashion searchable and hel** shoppers find the articles they are looking for. This paper focuses on finding pieces of clothing worn by a person in full-body or half-body images with neutral backgrounds. Such images are ubiquitous on the web and in fashion blogs, and are typically studio photos, we refer to this setting as studio-to-shop. Recent advances in computational fashion include the development of domain-specific numerical representations. Our model Studio2Shop builds on top of such representations and uses a deep convolutional network trained to match a query image to the numerical feature vectors of all the articles annotated in this image. Top-$k$ retrieval evaluation on test query images shows that the correct items are most often found within a range that is sufficiently small for building realistic visual search engines for the studio-to-shop setting.
△ Less
Submitted 2 July, 2018;
originally announced July 2018.
-
Syntax-Aware Language Modeling with Recurrent Neural Networks
Authors:
Duncan Blythe,
Alan Akbik,
Roland Vollgraf
Abstract:
Neural language models (LMs) are typically trained using only lexical features, such as surface forms of words. In this paper, we argue this deprives the LM of crucial syntactic signals that can be detected at high confidence using existing parsers. We present a simple but highly effective approach for training neural LMs using both lexical and syntactic information, and a novel approach for apply…
▽ More
Neural language models (LMs) are typically trained using only lexical features, such as surface forms of words. In this paper, we argue this deprives the LM of crucial syntactic signals that can be detected at high confidence using existing parsers. We present a simple but highly effective approach for training neural LMs using both lexical and syntactic information, and a novel approach for applying such LMs to unparsed text using sequential Monte Carlo sampling. In experiments on a range of corpora and corpus sizes, we show our approach consistently outperforms standard lexical LMs in character-level language modeling; on the other hand, for word-level models the models are on a par with standard language models. These results indicate potential for expanding LMs beyond lexical surface features to higher-level NLP features for character-level models.
△ Less
Submitted 2 March, 2018;
originally announced March 2018.
-
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
Authors:
Han Xiao,
Kashif Rasul,
Roland Vollgraf
Abstract:
We present Fashion-MNIST, a new dataset comprising of 28x28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per category. The training set has 60,000 images and the test set has 10,000 images. Fashion-MNIST is intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms, as it shares the same image s…
▽ More
We present Fashion-MNIST, a new dataset comprising of 28x28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per category. The training set has 60,000 images and the test set has 10,000 images. Fashion-MNIST is intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits. The dataset is freely available at https://github.com/zalandoresearch/fashion-mnist
△ Less
Submitted 15 September, 2017; v1 submitted 25 August, 2017;
originally announced August 2017.
-
An LSTM-Based Dynamic Customer Model for Fashion Recommendation
Authors:
Sebastian Heinz,
Christian Bracher,
Roland Vollgraf
Abstract:
Online fashion sales present a challenging use case for personalized recommendation: Stores offer a huge variety of items in multiple sizes. Small stocks, high return rates, seasonality, and changing trends cause continuous turnover of articles for sale on all time scales. Customers tend to shop rarely, but often buy multiple items at once. We report on backtest experiments with sales data of 100k…
▽ More
Online fashion sales present a challenging use case for personalized recommendation: Stores offer a huge variety of items in multiple sizes. Small stocks, high return rates, seasonality, and changing trends cause continuous turnover of articles for sale on all time scales. Customers tend to shop rarely, but often buy multiple items at once. We report on backtest experiments with sales data of 100k frequent shoppers at Zalando, Europe's leading online fashion platform. To model changing customer and store environments, our recommendation method employs a pair of neural networks: To overcome the cold start problem, a feedforward network generates article embeddings in "fashion space," which serve as input to a recurrent neural network that predicts a style vector in this space for each client, based on their past purchase sequence. We compare our results with a static collaborative filtering approach, and a popularity ranking baseline.
△ Less
Submitted 24 August, 2017;
originally announced August 2017.
-
Learning Texture Manifolds with the Periodic Spatial GAN
Authors:
Urs Bergmann,
Nikolay Jetchev,
Roland Vollgraf
Abstract:
This paper introduces a novel approach to texture synthesis based on generative adversarial networks (GAN) (Goodfellow et al., 2014). We extend the structure of the input noise distribution by constructing tensors with different types of dimensions. We call this technique Periodic Spatial GAN (PSGAN). The PSGAN has several novel abilities which surpass the current state of the art in texture synth…
▽ More
This paper introduces a novel approach to texture synthesis based on generative adversarial networks (GAN) (Goodfellow et al., 2014). We extend the structure of the input noise distribution by constructing tensors with different types of dimensions. We call this technique Periodic Spatial GAN (PSGAN). The PSGAN has several novel abilities which surpass the current state of the art in texture synthesis. First, we can learn multiple textures from datasets of one or more complex large images. Second, we show that the image generation with PSGANs has properties of a texture manifold: we can smoothly interpolate between samples in the structured noise space and generate novel samples, which lie perceptually between the textures of the original dataset. In addition, we can also accurately learn periodical textures. We make multiple experiments which show that PSGANs can flexibly handle diverse texture and image data sources. Our method is highly scalable and it can generate output images of arbitrary large size.
△ Less
Submitted 8 September, 2017; v1 submitted 18 May, 2017;
originally announced May 2017.
-
Texture Synthesis with Spatial Generative Adversarial Networks
Authors:
Nikolay Jetchev,
Urs Bergmann,
Roland Vollgraf
Abstract:
Generative adversarial networks (GANs) are a recent approach to train generative models of data, which have been shown to work particularly well on image data. In the current paper we introduce a new model for texture synthesis based on GAN learning. By extending the input noise distribution space from a single vector to a whole spatial tensor, we create an architecture with properties well suited…
▽ More
Generative adversarial networks (GANs) are a recent approach to train generative models of data, which have been shown to work particularly well on image data. In the current paper we introduce a new model for texture synthesis based on GAN learning. By extending the input noise distribution space from a single vector to a whole spatial tensor, we create an architecture with properties well suited to the task of texture synthesis, which we call spatial GAN (SGAN). To our knowledge, this is the first successful completely data-driven texture synthesis method based on GANs.
Our method has the following features which make it a state of the art algorithm for texture synthesis: high image quality of the generated textures, very high scalability w.r.t. the output texture size, fast real-time forward generation, the ability to fuse multiple diverse source images in complex textures. To illustrate these capabilities we present multiple experiments with different classes of texture images and use cases. We also discuss some limitations of our method with respect to the types of texture images it can synthesize, and compare it to other neural techniques for texture generation.
△ Less
Submitted 8 September, 2017; v1 submitted 24 November, 2016;
originally announced November 2016.
-
Fashion DNA: Merging Content and Sales Data for Recommendation and Article Map**
Authors:
Christian Bracher,
Sebastian Heinz,
Roland Vollgraf
Abstract:
We present a method to determine Fashion DNA, coordinate vectors locating fashion items in an abstract space. Our approach is based on a deep neural network architecture that ingests curated article information such as tags and images, and is trained to predict sales for a large set of frequent customers. In the process, a dual space of customer style preferences naturally arises. Interpretation o…
▽ More
We present a method to determine Fashion DNA, coordinate vectors locating fashion items in an abstract space. Our approach is based on a deep neural network architecture that ingests curated article information such as tags and images, and is trained to predict sales for a large set of frequent customers. In the process, a dual space of customer style preferences naturally arises. Interpretation of the metric of these spaces is straightforward: The product of Fashion DNA and customer style vectors yields the forecast purchase likelihood for the customer-item pair, while the angle between Fashion DNA vectors is a measure of item similarity. Importantly, our models are able to generate unbiased purchase probabilities for fashion items based solely on article information, even in absence of sales data, thus circumventing the "cold-start problem" of collaborative recommendation approaches. Likewise, it generalizes easily and reliably to customers outside the training set. We experiment with Fashion DNA models based on visual and/or tag item data, evaluate their recommendation power, and discuss the resulting article similarities.
△ Less
Submitted 8 September, 2016;
originally announced September 2016.