-
Continuous-time Particle Filtering for Latent Stochastic Differential Equations
Authors:
Ruizhi Deng,
Greg Mori,
Andreas M. Lehrmann
Abstract:
Particle filtering is a standard Monte-Carlo approach for a wide range of sequential inference tasks. The key component of a particle filter is a set of particles with importance weights that serve as a proxy of the true posterior distribution of some stochastic process. In this work, we propose continuous latent particle filters, an approach that extends particle filtering to the continuous-time…
▽ More
Particle filtering is a standard Monte-Carlo approach for a wide range of sequential inference tasks. The key component of a particle filter is a set of particles with importance weights that serve as a proxy of the true posterior distribution of some stochastic process. In this work, we propose continuous latent particle filters, an approach that extends particle filtering to the continuous-time domain. We demonstrate how continuous latent particle filters can be used as a generic plug-in replacement for inference techniques relying on a learned variational posterior. Our experiments with different model families based on latent neural stochastic differential equations demonstrate superior performance of continuous-time particle filtering in inference tasks like likelihood estimation and sequential prediction for a variety of stochastic processes.
△ Less
Submitted 31 August, 2022;
originally announced September 2022.
-
Continuous Latent Process Flows
Authors:
Ruizhi Deng,
Marcus A. Brubaker,
Greg Mori,
Andreas M. Lehrmann
Abstract:
Partial observations of continuous time-series dynamics at arbitrary time stamps exist in many disciplines. Fitting this type of data using statistical models with continuous dynamics is not only promising at an intuitive level but also has practical benefits, including the ability to generate continuous trajectories and to perform inference on previously unseen time stamps. Despite exciting progr…
▽ More
Partial observations of continuous time-series dynamics at arbitrary time stamps exist in many disciplines. Fitting this type of data using statistical models with continuous dynamics is not only promising at an intuitive level but also has practical benefits, including the ability to generate continuous trajectories and to perform inference on previously unseen time stamps. Despite exciting progress in this area, the existing models still face challenges in terms of their representational power and the quality of their variational approximations. We tackle these challenges with continuous latent process flows (CLPF), a principled architecture decoding continuous latent processes into continuous observable processes using a time-dependent normalizing flow driven by a stochastic differential equation. To optimize our model using maximum likelihood, we propose a novel piecewise construction of a variational posterior process and derive the corresponding variational lower bound using trajectory re-weighting. Our ablation studies demonstrate the effectiveness of our contributions in various inference tasks on irregular time grids. Comparisons to state-of-the-art baselines show our model's favourable performance on both synthetic and real-world time-series data.
△ Less
Submitted 27 October, 2021; v1 submitted 29 June, 2021;
originally announced June 2021.
-
Unsupervised Video Decomposition using Spatio-temporal Iterative Inference
Authors:
Polina Zablotskaia,
Edoardo A. Dominici,
Leonid Sigal,
Andreas M. Lehrmann
Abstract:
Unsupervised multi-object scene decomposition is a fast-emerging problem in representation learning. Despite significant progress in static scenes, such models are unable to leverage important dynamic cues present in video. We propose a novel spatio-temporal iterative inference framework that is powerful enough to jointly model complex multi-object representations and explicit temporal dependencie…
▽ More
Unsupervised multi-object scene decomposition is a fast-emerging problem in representation learning. Despite significant progress in static scenes, such models are unable to leverage important dynamic cues present in video. We propose a novel spatio-temporal iterative inference framework that is powerful enough to jointly model complex multi-object representations and explicit temporal dependencies between latent variables across frames. This is achieved by leveraging 2D-LSTM, temporally conditioned inference and generation within the iterative amortized inference for posterior refinement. Our method improves the overall quality of decompositions, encodes information about the objects' dynamics, and can be used to predict trajectories of each object separately. Additionally, we show that our model has a high accuracy even without color information. We demonstrate the decomposition, segmentation, and prediction capabilities of our model and show that it outperforms the state-of-the-art on several benchmark datasets, one of which was curated for this work and will be made publicly available.
△ Less
Submitted 25 June, 2020;
originally announced June 2020.
-
Learning Physics-guided Face Relighting under Directional Light
Authors:
Thomas Nestmeyer,
Jean-François Lalonde,
Iain Matthews,
Andreas M. Lehrmann
Abstract:
Relighting is an essential step in realistically transferring objects from a captured image into another environment. For example, authentic telepresence in Augmented Reality requires faces to be displayed and relit consistent with the observer's scene lighting. We investigate end-to-end deep learning architectures that both de-light and relight an image of a human face. Our model decomposes the i…
▽ More
Relighting is an essential step in realistically transferring objects from a captured image into another environment. For example, authentic telepresence in Augmented Reality requires faces to be displayed and relit consistent with the observer's scene lighting. We investigate end-to-end deep learning architectures that both de-light and relight an image of a human face. Our model decomposes the input image into intrinsic components according to a diffuse physics-based image formation model. We enable non-diffuse effects including cast shadows and specular highlights by predicting a residual correction to the diffuse render. To train and evaluate our model, we collected a portrait database of 21 subjects with various expressions and poses. Each sample is captured in a controlled light stage setup with 32 individual light sources. Our method creates precise and believable relighting results and generalizes to complex illumination conditions and challenging poses, including when the subject is not looking straight at the camera.
△ Less
Submitted 19 April, 2020; v1 submitted 7 June, 2019;
originally announced June 2019.
-
Traversing the Continuous Spectrum of Image Retrieval with Deep Dynamic Models
Authors:
Ziad Al-Halah,
Andreas M. Lehrmann,
Leonid Sigal
Abstract:
We introduce the first work to tackle the image retrieval problem as a continuous operation. While the proposed approaches in the literature can be roughly categorized into two main groups: category- and instance-based retrieval, in this work we show that the retrieval task is much richer and more complex. Image similarity goes beyond this discrete vantage point and spans a continuous spectrum amo…
▽ More
We introduce the first work to tackle the image retrieval problem as a continuous operation. While the proposed approaches in the literature can be roughly categorized into two main groups: category- and instance-based retrieval, in this work we show that the retrieval task is much richer and more complex. Image similarity goes beyond this discrete vantage point and spans a continuous spectrum among the classical operating points of category and instance similarity. However, current retrieval models are static and incapable of exploring this rich structure of the retrieval space since they are trained and evaluated with a single operating point as a target objective. Hence, we introduce a novel retrieval model that for a given query is capable of producing a dynamic embedding that can target an arbitrary point along the continuous retrieval spectrum. Our model disentangles the visual signal of a query image into its basic components of categorical and attribute information. Furthermore, using a continuous control parameter our model learns to reconstruct a dynamic embedding of the query by mixing these components with different proportions to target a specific point along the retrieval simplex. We demonstrate our idea in a comprehensive evaluation of the proposed model and highlight the advantages of our approach against a set of well-established discrete retrieval models.
△ Less
Submitted 31 March, 2019; v1 submitted 1 December, 2018;
originally announced December 2018.