-
Sequential Disentanglement by Extracting Static Information From A Single Sequence Element
Authors:
Nimrod Berman,
Ilan Naiman,
Idan Arbiv,
Gal Fadlon,
Omri Azencot
Abstract:
One of the fundamental representation learning tasks is unsupervised sequential disentanglement, where latent codes of inputs are decomposed to a single static factor and a sequence of dynamic factors. To extract this latent information, existing methods condition the static and dynamic codes on the entire input sequence. Unfortunately, these models often suffer from information leakage, i.e., the…
▽ More
One of the fundamental representation learning tasks is unsupervised sequential disentanglement, where latent codes of inputs are decomposed to a single static factor and a sequence of dynamic factors. To extract this latent information, existing methods condition the static and dynamic codes on the entire input sequence. Unfortunately, these models often suffer from information leakage, i.e., the dynamic vectors encode both static and dynamic information, or vice versa, leading to a non-disentangled representation. Attempts to alleviate this problem via reducing the dynamic dimension and auxiliary loss terms gain only partial success. Instead, we propose a novel and simple architecture that mitigates information leakage by offering a simple and effective subtraction inductive bias while conditioning on a single sample. Remarkably, the resulting variational framework is simpler in terms of required loss terms, hyperparameters, and data augmentation. We evaluate our method on multiple data-modality benchmarks including general time series, video, and audio, and we show beyond state-of-the-art results on generation and prediction tasks in comparison to several strong baselines.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
First-Order Manifold Data Augmentation for Regression Learning
Authors:
Ilya Kaufman,
Omri Azencot
Abstract:
Data augmentation (DA) methods tailored to specific domains generate synthetic samples by applying transformations that are appropriate for the characteristics of the underlying data domain, such as rotations on images and time war** on time series data. In contrast, domain-independent approaches, e.g. mixup, are applicable to various data modalities, and as such they are general and versatile.…
▽ More
Data augmentation (DA) methods tailored to specific domains generate synthetic samples by applying transformations that are appropriate for the characteristics of the underlying data domain, such as rotations on images and time war** on time series data. In contrast, domain-independent approaches, e.g. mixup, are applicable to various data modalities, and as such they are general and versatile. While regularizing classification tasks via DA is a well-explored research topic, the effect of DA on regression problems received less attention. To bridge this gap, we study the problem of domain-independent augmentation for regression, and we introduce FOMA: a new data-driven domain-independent data augmentation method. Essentially, our approach samples new examples from the tangent planes of the train distribution. Augmenting data in this way aligns with the network tendency towards capturing the dominant features of its input signals. We evaluate FOMA on in-distribution generalization and out-of-distribution robustness benchmarks, and we show that it improves the generalization of several neural architectures. We also find that strong baselines based on mixup are less effective in comparison to our approach. Our code is publicly available athttps://github.com/azencot-group/FOMA.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Data Augmentation Policy Search for Long-Term Forecasting
Authors:
Liran Nochumsohn,
Omri Azencot
Abstract:
Data augmentation serves as a popular regularization technique to combat overfitting challenges in neural networks. While automatic augmentation has demonstrated success in image classification tasks, its application to time-series problems, particularly in long-term forecasting, has received comparatively less attention. To address this gap, we introduce a time-series automatic augmentation appro…
▽ More
Data augmentation serves as a popular regularization technique to combat overfitting challenges in neural networks. While automatic augmentation has demonstrated success in image classification tasks, its application to time-series problems, particularly in long-term forecasting, has received comparatively less attention. To address this gap, we introduce a time-series automatic augmentation approach named TSAA, which is both efficient and easy to implement. The solution involves tackling the associated bilevel optimization problem through a two-step process: initially training a non-augmented model for a limited number of epochs, followed by an iterative split procedure. During this iterative process, we alternate between identifying a robust augmentation policy through Bayesian optimization and refining the model while discarding suboptimal runs. Extensive evaluations on challenging univariate and multivariate forecasting benchmark problems demonstrate that TSAA consistently outperforms several robust baselines, suggesting its potential integration into prediction pipelines.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Generative Modeling of Graphs via Joint Diffusion of Node and Edge Attributes
Authors:
Nimrod Berman,
Eitan Kosman,
Dotan Di Castro,
Omri Azencot
Abstract:
Graph generation is integral to various engineering and scientific disciplines. Nevertheless, existing methodologies tend to overlook the generation of edge attributes. However, we identify critical applications where edge attributes are essential, making prior methods potentially unsuitable in such contexts. Moreover, while trivial adaptations are available, empirical investigations reveal their…
▽ More
Graph generation is integral to various engineering and scientific disciplines. Nevertheless, existing methodologies tend to overlook the generation of edge attributes. However, we identify critical applications where edge attributes are essential, making prior methods potentially unsuitable in such contexts. Moreover, while trivial adaptations are available, empirical investigations reveal their limited efficacy as they do not properly model the interplay among graph components. To address this, we propose a joint score-based model of nodes and edges for graph generation that considers all graph components. Our approach offers two key novelties: (i) node and edge attributes are combined in an attention module that generates samples based on the two ingredients; and (ii) node, edge and adjacency information are mutually dependent during the graph diffusion process. We evaluate our method on challenging benchmarks involving real-world and synthetic datasets in which edge features are crucial. Additionally, we introduce a new synthetic dataset that incorporates edge values. Furthermore, we propose a novel application that greatly benefits from the method due to its nature: the generation of traffic scenes represented as graphs. Our method outperforms other graph generation methods, demonstrating a significant advantage in edge-related measures.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Generative Modeling of Regular and Irregular Time Series Data via Koopman VAEs
Authors:
Ilan Naiman,
N. Benjamin Erichson,
Pu Ren,
Michael W. Mahoney,
Omri Azencot
Abstract:
Generating realistic time series data is important for many engineering and scientific applications. Existing work tackles this problem using generative adversarial networks (GANs). However, GANs are unstable during training, and they can suffer from mode collapse. While variational autoencoders (VAEs) are known to be more robust to the these issues, they are (surprisingly) less considered for tim…
▽ More
Generating realistic time series data is important for many engineering and scientific applications. Existing work tackles this problem using generative adversarial networks (GANs). However, GANs are unstable during training, and they can suffer from mode collapse. While variational autoencoders (VAEs) are known to be more robust to the these issues, they are (surprisingly) less considered for time series generation. In this work, we introduce Koopman VAE (KoVAE), a new generative framework that is based on a novel design for the model prior, and that can be optimized for either regular and irregular training data. Inspired by Koopman theory, we represent the latent conditional prior dynamics using a linear map. Our approach enhances generative modeling with two desired features: (i) incorporating domain knowledge can be achieved by leveraging spectral tools that prescribe constraints on the eigenvalues of the linear map; and (ii) studying the qualitative behavior and stability of the system can be performed using tools from dynamical systems theory. Our results show that KoVAE outperforms state-of-the-art GAN and VAE methods across several challenging synthetic and real-world time series generation benchmarks. Whether trained on regular or irregular data, KoVAE generates time series that improve both discriminative and predictive metrics. We also present visual evidence suggesting that KoVAE learns probability density functions that better approximate the empirical ground truth distribution.
△ Less
Submitted 13 May, 2024; v1 submitted 4 October, 2023;
originally announced October 2023.
-
Data Representations' Study of Latent Image Manifolds
Authors:
Ilya Kaufman,
Omri Azencot
Abstract:
Deep neural networks have been demonstrated to achieve phenomenal success in many domains, and yet their inner mechanisms are not well understood. In this paper, we investigate the curvature of image manifolds, i.e., the manifold deviation from being flat in its principal directions. We find that state-of-the-art trained convolutional neural networks for image classification have a characteristic…
▽ More
Deep neural networks have been demonstrated to achieve phenomenal success in many domains, and yet their inner mechanisms are not well understood. In this paper, we investigate the curvature of image manifolds, i.e., the manifold deviation from being flat in its principal directions. We find that state-of-the-art trained convolutional neural networks for image classification have a characteristic curvature profile along layers: an initial steep increase, followed by a long phase of a plateau, and followed by another increase. In contrast, this behavior does not appear in untrained networks in which the curvature flattens. We also show that the curvature gap between the last two layers has a strong correlation with the generalization capability of the network. Moreover, we find that the intrinsic dimension of latent codes is not necessarily indicative of curvature. Finally, we observe that common regularization methods such as mixup yield flatter representations when compared to other methods. Our experiments show consistent results over a variety of deep learning architectures and multiple data sets. Our code is publicly available at https://github.com/azencot-group/CRLM
△ Less
Submitted 16 November, 2023; v1 submitted 31 May, 2023;
originally announced May 2023.
-
Sample and Predict Your Latent: Modality-free Sequential Disentanglement via Contrastive Estimation
Authors:
Ilan Naiman,
Nimrod Berman,
Omri Azencot
Abstract:
Unsupervised disentanglement is a long-standing challenge in representation learning. Recently, self-supervised techniques achieved impressive results in the sequential setting, where data is time-dependent. However, the latter methods employ modality-based data augmentations and random sampling or solve auxiliary tasks. In this work, we propose to avoid that by generating, sampling, and comparing…
▽ More
Unsupervised disentanglement is a long-standing challenge in representation learning. Recently, self-supervised techniques achieved impressive results in the sequential setting, where data is time-dependent. However, the latter methods employ modality-based data augmentations and random sampling or solve auxiliary tasks. In this work, we propose to avoid that by generating, sampling, and comparing empirical distributions from the underlying variational model. Unlike existing work, we introduce a self-supervised sequential disentanglement framework based on contrastive estimation with no external signals, while using common batch sizes and samples from the latent space itself. In practice, we propose a unified, efficient, and easy-to-code sampling strategy for semantically similar and dissimilar views of the data. We evaluate our approach on video, audio, and time series benchmarks. Our method presents state-of-the-art results in comparison to existing techniques. The code is available at https://github.com/azencot-group/SPYL.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Multifactor Sequential Disentanglement via Structured Koopman Autoencoders
Authors:
Nimrod Berman,
Ilan Naiman,
Omri Azencot
Abstract:
Disentangling complex data to its latent factors of variation is a fundamental task in representation learning. Existing work on sequential disentanglement mostly provides two factor representations, i.e., it separates the data to time-varying and time-invariant factors. In contrast, we consider multifactor disentanglement in which multiple (more than two) semantic disentangled components are gene…
▽ More
Disentangling complex data to its latent factors of variation is a fundamental task in representation learning. Existing work on sequential disentanglement mostly provides two factor representations, i.e., it separates the data to time-varying and time-invariant factors. In contrast, we consider multifactor disentanglement in which multiple (more than two) semantic disentangled components are generated. Key to our approach is a strong inductive bias where we assume that the underlying dynamics can be represented linearly in the latent space. Under this assumption, it becomes natural to exploit the recently introduced Koopman autoencoder models. However, disentangled representations are not guaranteed in Koopman approaches, and thus we propose a novel spectral loss term which leads to structured Koopman matrices and disentanglement. Overall, we propose a simple and easy to code new deep model that is fully unsupervised and it supports multifactor disentanglement. We showcase new disentangling abilities such as swap** of individual static factors between characters, and an incremental swap of disentangled factors from the source to the target. Moreover, we evaluate our method extensively on two factor standard benchmark tasks where we significantly improve over competing unsupervised approaches, and we perform competitively in comparison to weakly- and self-supervised state-of-the-art approaches. The code is available at https://github.com/azencot-group/SKD.
△ Less
Submitted 30 March, 2023;
originally announced March 2023.
-
Eigenvalue initialisation and regularisation for Koopman autoencoders
Authors:
Jack W. Miller,
Charles O'Neill,
Navid C. Constantinou,
Omri Azencot
Abstract:
Regularising the parameter matrices of neural networks is ubiquitous in training deep models. Typical regularisation approaches suggest initialising weights using small random values, and to penalise weights to promote sparsity. However, these widely used techniques may be less effective in certain scenarios. Here, we study the Koopman autoencoder model which includes an encoder, a Koopman operato…
▽ More
Regularising the parameter matrices of neural networks is ubiquitous in training deep models. Typical regularisation approaches suggest initialising weights using small random values, and to penalise weights to promote sparsity. However, these widely used techniques may be less effective in certain scenarios. Here, we study the Koopman autoencoder model which includes an encoder, a Koopman operator layer, and a decoder. These models have been designed and dedicated to tackle physics-related problems with interpretable dynamics and an ability to incorporate physics-related constraints. However, the majority of existing work employs standard regularisation practices. In our work, we take a step toward augmenting Koopman autoencoders with initialisation and penalty schemes tailored for physics-related settings. Specifically, we propose the "eigeninit" initialisation scheme that samples initial Koopman operators from specific eigenvalue distributions. In addition, we suggest the "eigenloss" penalty scheme that penalises the eigenvalues of the Koopman operator during training. We demonstrate the utility of these schemes on two synthetic data sets: a driven pendulum and flow past a cylinder; and two real-world problems: ocean surface temperatures and cyclone wind fields. We find on these datasets that eigenloss and eigeninit improves the convergence rate by up to a factor of 5, and that they reduce the cumulative long-term prediction error by up to a factor of 3. Such a finding points to the utility of incorporating similar schemes as an inductive bias in other physics-related deep learning approaches.
△ Less
Submitted 25 December, 2022; v1 submitted 22 December, 2022;
originally announced December 2022.
-
A Differential Geometry Perspective on Orthogonal Recurrent Models
Authors:
Omri Azencot,
N. Benjamin Erichson,
Mirela Ben-Chen,
Michael W. Mahoney
Abstract:
Recently, orthogonal recurrent neural networks (RNNs) have emerged as state-of-the-art models for learning long-term dependencies. This class of models mitigates the exploding and vanishing gradients problem by design. In this work, we employ tools and insights from differential geometry to offer a novel perspective on orthogonal RNNs. We show that orthogonal RNNs may be viewed as optimizing in th…
▽ More
Recently, orthogonal recurrent neural networks (RNNs) have emerged as state-of-the-art models for learning long-term dependencies. This class of models mitigates the exploding and vanishing gradients problem by design. In this work, we employ tools and insights from differential geometry to offer a novel perspective on orthogonal RNNs. We show that orthogonal RNNs may be viewed as optimizing in the space of divergence-free vector fields. Specifically, based on a well-known result in differential geometry that relates vector fields and linear operators, we prove that every divergence-free vector field is related to a skew-symmetric matrix. Motivated by this observation, we study a new recurrent model, which spans the entire space of vector fields. Our method parameterizes vector fields via the directional derivatives of scalar functions. This requires the construction of latent inner product, gradient, and divergence operators. In comparison to state-of-the-art orthogonal RNNs, our approach achieves comparable or better results on a variety of benchmark tasks.
△ Less
Submitted 18 February, 2021;
originally announced February 2021.
-
An Operator Theoretic Approach for Analyzing Sequence Neural Networks
Authors:
Ilan Naiman,
Omri Azencot
Abstract:
Analyzing the inner mechanisms of deep neural networks is a fundamental task in machine learning. Existing work provides limited analysis or it depends on local theories, such as fixed-point analysis. In contrast, we propose to analyze trained neural networks using an operator theoretic approach which is rooted in Koopman theory, the Koopman Analysis of Neural Networks (KANN). Key to our method is…
▽ More
Analyzing the inner mechanisms of deep neural networks is a fundamental task in machine learning. Existing work provides limited analysis or it depends on local theories, such as fixed-point analysis. In contrast, we propose to analyze trained neural networks using an operator theoretic approach which is rooted in Koopman theory, the Koopman Analysis of Neural Networks (KANN). Key to our method is the Koopman operator, which is a linear object that globally represents the dominant behavior of the network dynamics. The linearity of the Koopman operator facilitates analysis via its eigenvectors and eigenvalues. Our method reveals that the latter eigendecomposition holds semantic information related to the neural network inner workings. For instance, the eigenvectors highlight positive and negative n-grams in the sentiments analysis task; similarly, the eigenvectors capture the salient features of healthy heart beat signals in the ECG classification problem.
△ Less
Submitted 22 February, 2023; v1 submitted 15 February, 2021;
originally announced February 2021.
-
Modes of Homogeneous Gradient Flows
Authors:
Ido Cohen,
Omri Azencot,
Pavel Lifshitz,
Guy Gilboa
Abstract:
Finding latent structures in data is drawing increasing attention in diverse fields such as image and signal processing, fluid dynamics, and machine learning. In this work we examine the problem of finding the main modes of gradient flows. Gradient descent is a fundamental process in optimization where its stochastic version is prominent in training of neural networks. Here our aim is to establish…
▽ More
Finding latent structures in data is drawing increasing attention in diverse fields such as image and signal processing, fluid dynamics, and machine learning. In this work we examine the problem of finding the main modes of gradient flows. Gradient descent is a fundamental process in optimization where its stochastic version is prominent in training of neural networks. Here our aim is to establish a consistent theory for gradient flows $ψ_t = P(ψ)$, where $P$ is a nonlinear homogeneous operator. Our proposed framework stems from analytic solutions of homogeneous flows, previously formalized by Cohen-Gilboa, where the initial condition $ψ_0$ admits the nonlinear eigenvalue problem $P(ψ_0)=λψ_0 $. We first present an analytic solution for \ac{DMD} in such cases. We show an inherent flaw of \ac{DMD}, which is unable to recover the essential dynamics of the flow. It is evident that \ac{DMD} is best suited for homogeneous flows of degree one. We propose an adaptive time sampling scheme and show its dynamics are analogue to homogeneous flows of degree one with a fixed step size. Moreover, we adapt \ac{DMD} to yield a real spectrum, using symmetric matrices. Our analytic solution of the proposed scheme recovers the dynamics perfectly and yields zero error. We then proceed to show that in the general case the orthogonal modes $\{ φ_i \}$ are approximately nonlinear eigenfunctions $P(φ_i) \approxλ_i φ_i $. We formulate Orthogonal Nonlinear Spectral decomposition (\emph{OrthoNS}), which recovers the essential latent structures of the gradient descent process. Definitions for spectrum and filtering are given, and a Parseval-type identity is shown.
△ Less
Submitted 28 December, 2020; v1 submitted 3 July, 2020;
originally announced July 2020.
-
Lipschitz Recurrent Neural Networks
Authors:
N. Benjamin Erichson,
Omri Azencot,
Alejandro Queiruga,
Liam Hodgkinson,
Michael W. Mahoney
Abstract:
Viewing recurrent neural networks (RNNs) as continuous-time dynamical systems, we propose a recurrent unit that describes the hidden state's evolution with two parts: a well-understood linear component plus a Lipschitz nonlinearity. This particular functional form facilitates stability analysis of the long-term behavior of the recurrent unit using tools from nonlinear systems theory. In turn, this…
▽ More
Viewing recurrent neural networks (RNNs) as continuous-time dynamical systems, we propose a recurrent unit that describes the hidden state's evolution with two parts: a well-understood linear component plus a Lipschitz nonlinearity. This particular functional form facilitates stability analysis of the long-term behavior of the recurrent unit using tools from nonlinear systems theory. In turn, this enables architectural design decisions before experimentation. Sufficient conditions for global stability of the recurrent unit are obtained, motivating a novel scheme for constructing hidden-to-hidden matrices. Our experiments demonstrate that the Lipschitz RNN can outperform existing recurrent units on a range of benchmark tasks, including computer vision, language modeling and speech prediction tasks. Finally, through Hessian-based analysis we demonstrate that our Lipschitz recurrent unit is more robust with respect to input and parameter perturbations as compared to other continuous-time RNNs.
△ Less
Submitted 23 April, 2021; v1 submitted 22 June, 2020;
originally announced June 2020.
-
Forecasting Sequential Data using Consistent Koopman Autoencoders
Authors:
Omri Azencot,
N. Benjamin Erichson,
Vanessa Lin,
Michael W. Mahoney
Abstract:
Recurrent neural networks are widely used on time series data, yet such models often ignore the underlying physical structures in such sequences. A new class of physics-based methods related to Koopman theory has been introduced, offering an alternative for processing nonlinear dynamical systems. In this work, we propose a novel Consistent Koopman Autoencoder model which, unlike the majority of ex…
▽ More
Recurrent neural networks are widely used on time series data, yet such models often ignore the underlying physical structures in such sequences. A new class of physics-based methods related to Koopman theory has been introduced, offering an alternative for processing nonlinear dynamical systems. In this work, we propose a novel Consistent Koopman Autoencoder model which, unlike the majority of existing work, leverages the forward and backward dynamics. Key to our approach is a new analysis which explores the interplay between consistent dynamics and their associated Koopman operators. Our network is directly related to the derived analysis, and its computational requirements are comparable to other baselines. We evaluate our method on a wide range of high-dimensional and short-term dependent problems, and it achieves accurate estimates for significant prediction horizons, while also being robust to noise.
△ Less
Submitted 30 June, 2020; v1 submitted 4 March, 2020;
originally announced March 2020.
-
Shape Analysis via Functional Map Construction and Bases Pursuit
Authors:
Omri Azencot,
Rongjie Lai
Abstract:
We propose a method to simultaneously compute scalar basis functions with an associated functional map for a given pair of triangle meshes. Unlike previous techniques that put emphasis on smoothness with respect to the Laplace--Beltrami operator and thus favor low-frequency eigenfunctions, we aim for a spectrum that allows for better feature matching. This change of perspective introduces many deg…
▽ More
We propose a method to simultaneously compute scalar basis functions with an associated functional map for a given pair of triangle meshes. Unlike previous techniques that put emphasis on smoothness with respect to the Laplace--Beltrami operator and thus favor low-frequency eigenfunctions, we aim for a spectrum that allows for better feature matching. This change of perspective introduces many degrees of freedom into the problem which we exploit to improve the accuracy of our computed correspondences. To effectively search in this high dimensional space of solutions, we incorporate into our minimization state-of-the-art regularizers. We solve the resulting highly non-linear and non-convex problem using an iterative scheme via the Alternating Direction Method of Multipliers. At each step, our optimization involves simple to solve linear or Sylvester-type equations. In practice, our method performs well in terms of convergence, and we additionally show that it is similar to a provably convergent problem. We show the advantages of our approach by extensively testing it on multiple datasets in a few applications including shape matching, consistent quadrangulation and scalar function transfer.
△ Less
Submitted 29 September, 2019;
originally announced September 2019.