-
Identifying latent state transition in non-linear dynamical systems
Authors:
Çağlar Hızlı,
Çağatay Yıldız,
Matthias Bethge,
ST John,
Pekka Marttinen
Abstract:
This work aims to improve generalization and interpretability of dynamical systems by recovering the underlying lower-dimensional latent states and their time evolutions. Previous work on disentangled representation learning within the realm of dynamical systems focused on the latent states, possibly with linear transition approximations. As such, they cannot identify nonlinear transition dynamics…
▽ More
This work aims to improve generalization and interpretability of dynamical systems by recovering the underlying lower-dimensional latent states and their time evolutions. Previous work on disentangled representation learning within the realm of dynamical systems focused on the latent states, possibly with linear transition approximations. As such, they cannot identify nonlinear transition dynamics, and hence fail to reliably predict complex future behavior. Inspired by the advances in nonlinear ICA, we propose a state-space modeling framework in which we can identify not just the latent states but also the unknown transition function that maps the past states to the present. We introduce a practical algorithm based on variational auto-encoders and empirically demonstrate in realistic synthetic settings that we can (i) recover latent state dynamics with high accuracy, (ii) correspondingly achieve high future prediction accuracy, and (iii) adapt fast to new environments.
△ Less
Submitted 6 June, 2024; v1 submitted 5 June, 2024;
originally announced June 2024.
-
Investigating Continual Pretraining in Large Language Models: Insights and Implications
Authors:
Çağatay Yıldız,
Nishaanth Kanna Ravichandran,
Prishruit Punia,
Matthias Bethge,
Beyza Ermis
Abstract:
This paper studies the evolving domain of Continual Learning (CL) in large language models (LLMs), with a focus on develo** strategies for efficient and sustainable training. Our primary emphasis is on continual domain-adaptive pretraining, a process designed to equip LLMs with the ability to integrate new information from various domains while retaining previously learned knowledge and enhancin…
▽ More
This paper studies the evolving domain of Continual Learning (CL) in large language models (LLMs), with a focus on develo** strategies for efficient and sustainable training. Our primary emphasis is on continual domain-adaptive pretraining, a process designed to equip LLMs with the ability to integrate new information from various domains while retaining previously learned knowledge and enhancing cross-domain knowledge transfer without relying on domain-specific identification. Unlike previous studies, which mostly concentrate on a limited selection of tasks or domains and primarily aim to address the issue of forgetting, our research evaluates the adaptability and capabilities of LLMs to changing data landscapes in practical scenarios. To this end, we introduce a new benchmark designed to measure the adaptability of LLMs to these evolving data environments, offering a comprehensive framework for evaluation. We examine the impact of model size on learning efficacy and forgetting, as well as how the progression and similarity of emerging domains affect the knowledge transfer within these models. Our findings uncover several key insights: (i) when the sequence of domains shows semantic similarity, continual pretraining enables LLMs to better specialize in the current domain compared to stand-alone fine-tuning, (ii) training across a diverse range of domains enhances both backward and forward knowledge transfer, and (iii) smaller models are particularly sensitive to continual pretraining, showing the most significant rates of both forgetting and learning. We posit that our research marks a shift towards establishing a more realistic benchmark for investigating CL in LLMs, and has the potential to play a key role in guiding the direction of future research in the field.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Infinite dSprites for Disentangled Continual Learning: Separating Memory Edits from Generalization
Authors:
Sebastian Dziadzio,
Çağatay Yıldız,
Gido M. van de Ven,
Tomasz Trzciński,
Tinne Tuytelaars,
Matthias Bethge
Abstract:
The ability of machine learning systems to learn continually is hindered by catastrophic forgetting, the tendency of neural networks to overwrite existing knowledge when learning a new task. Continual learning methods alleviate this problem through regularization, parameter isolation, or rehearsal, but they are typically evaluated on benchmarks comprising only a handful of tasks. In contrast, huma…
▽ More
The ability of machine learning systems to learn continually is hindered by catastrophic forgetting, the tendency of neural networks to overwrite existing knowledge when learning a new task. Continual learning methods alleviate this problem through regularization, parameter isolation, or rehearsal, but they are typically evaluated on benchmarks comprising only a handful of tasks. In contrast, humans are able to learn continually in dynamic, open-world environments, effortlessly achieving one-shot memorization of unfamiliar objects and reliably recognizing them under various transformations. To make progress towards closing this gap, we introduce Infinite dSprites, a parsimonious tool for creating continual classification and disentanglement benchmarks of arbitrary length and with full control over generative factors. We show that over a sufficiently long time horizon, the performance of all major types of continual learning methods deteriorates on this simple benchmark. Thus, Infinite dSprites highlights an important aspect of continual learning that has not received enough attention so far: given a finite modelling capacity and an arbitrarily long learning horizon, efficient learning requires memorizing class-specific information and accumulating knowledge about general mechanisms. In a simple setting with direct supervision on the generative factors, we show how learning class-agnostic transformations offers a way to circumvent catastrophic forgetting and improve classification accuracy over time. Our approach sets the stage for continual learning over hundreds of tasks with explicit control over memorization and forgetting, emphasizing open-set classification and one-shot generalization.
△ Less
Submitted 29 February, 2024; v1 submitted 27 December, 2023;
originally announced December 2023.
-
Modulated Neural ODEs
Authors:
Ilze Amanda Auzina,
Çağatay Yıldız,
Sara Magliacane,
Matthias Bethge,
Efstratios Gavves
Abstract:
Neural ordinary differential equations (NODEs) have been proven useful for learning non-linear dynamics of arbitrary trajectories. However, current NODE methods capture variations across trajectories only via the initial state value or by auto-regressive encoder updates. In this work, we introduce Modulated Neural ODEs (MoNODEs), a novel framework that sets apart dynamics states from underlying st…
▽ More
Neural ordinary differential equations (NODEs) have been proven useful for learning non-linear dynamics of arbitrary trajectories. However, current NODE methods capture variations across trajectories only via the initial state value or by auto-regressive encoder updates. In this work, we introduce Modulated Neural ODEs (MoNODEs), a novel framework that sets apart dynamics states from underlying static factors of variation and improves the existing NODE methods. In particular, we introduce $\textit{time-invariant modulator variables}$ that are learned from the data. We incorporate our proposed framework into four existing NODE variants. We test MoNODE on oscillating systems, videos and human walking trajectories, where each trajectory has trajectory-specific modulation. Our framework consistently improves the existing model ability to generalize to new dynamic parameterizations and to perform far-horizon forecasting. In addition, we verify that the proposed modulator variables are informative of the true unknown factors of variation as measured by $R^2$ scores.
△ Less
Submitted 13 November, 2023; v1 submitted 26 February, 2023;
originally announced February 2023.
-
Latent Neural ODEs with Sparse Bayesian Multiple Shooting
Authors:
Valerii Iakovlev,
Cagatay Yildiz,
Markus Heinonen,
Harri Lähdesmäki
Abstract:
Training dynamic models, such as neural ODEs, on long trajectories is a hard problem that requires using various tricks, such as trajectory splitting, to make model training work in practice. These methods are often heuristics with poor theoretical justifications, and require iterative manual tuning. We propose a principled multiple shooting technique for neural ODEs that splits the trajectories i…
▽ More
Training dynamic models, such as neural ODEs, on long trajectories is a hard problem that requires using various tricks, such as trajectory splitting, to make model training work in practice. These methods are often heuristics with poor theoretical justifications, and require iterative manual tuning. We propose a principled multiple shooting technique for neural ODEs that splits the trajectories into manageable short segments, which are optimised in parallel, while ensuring probabilistic control on continuity over consecutive segments. We derive variational inference for our shooting-based latent neural ODE models and propose amortized encodings of irregularly sampled trajectories with a transformer-based recognition network with temporal attention and relative positional encoding. We demonstrate efficient and stable training, and state-of-the-art performance on multiple large-scale benchmark datasets.
△ Less
Submitted 8 February, 2023; v1 submitted 7 October, 2022;
originally announced October 2022.
-
VIDI: A Video Dataset of Incidents
Authors:
Duygu Sesver,
Alp Eren Gençoğlu,
Çağrı Emre Yıldız,
Zehra Günindi,
Faeze Habibi,
Ziya Ata Yazıcı,
Hazım Kemal Ekenel
Abstract:
Automatic detection of natural disasters and incidents has become more important as a tool for fast response. There have been many studies to detect incidents using still images and text. However, the number of approaches that exploit temporal information is rather limited. One of the main reasons for this is that a diverse video dataset with various incident types does not exist. To address this…
▽ More
Automatic detection of natural disasters and incidents has become more important as a tool for fast response. There have been many studies to detect incidents using still images and text. However, the number of approaches that exploit temporal information is rather limited. One of the main reasons for this is that a diverse video dataset with various incident types does not exist. To address this need, in this paper we present a video dataset, Video Dataset of Incidents, VIDI, that contains 4,534 video clips corresponding to 43 incident categories. Each incident class has around 100 videos with a duration of ten seconds on average. To increase diversity, the videos have been searched in several languages. To assess the performance of the recent state-of-the-art approaches, Vision Transformer and TimeSformer, as well as to explore the contribution of video-based information for incident classification, we performed benchmark experiments on the VIDI and Incidents Dataset. We have shown that the recent methods improve the incident classification accuracy. We have found that employing video data is very beneficial for the task. By using the video data, the top-1 accuracy is increased to 76.56% from 67.37%, which was obtained using a single frame. VIDI will be made publicly available. Additional materials can be found at the following link: https://github.com/vididataset/VIDI.
△ Less
Submitted 26 May, 2022;
originally announced May 2022.
-
Learning Interacting Dynamical Systems with Latent Gaussian Process ODEs
Authors:
Çağatay Yıldız,
Melih Kandemir,
Barbara Rakitsch
Abstract:
We study time uncertainty-aware modeling of continuous-time dynamics of interacting objects. We introduce a new model that decomposes independent dynamics of single objects accurately from their interactions. By employing latent Gaussian process ordinary differential equations, our model infers both independent dynamics and their interactions with reliable uncertainty estimates. In our formulation…
▽ More
We study time uncertainty-aware modeling of continuous-time dynamics of interacting objects. We introduce a new model that decomposes independent dynamics of single objects accurately from their interactions. By employing latent Gaussian process ordinary differential equations, our model infers both independent dynamics and their interactions with reliable uncertainty estimates. In our formulation, each object is represented as a graph node and interactions are modeled by accumulating the messages coming from neighboring objects. We show that efficient inference of such a complex network of variables is possible with modern variational sparse Gaussian process inference techniques. We empirically demonstrate that our model improves the reliability of long-term predictions over neural network based alternatives and it successfully handles missing dynamic or static information. Furthermore, we observe that only our model can successfully encapsulate independent dynamics and interaction information in distinct functions and show the benefit from this disentanglement in extrapolation scenarios.
△ Less
Submitted 12 October, 2022; v1 submitted 24 May, 2022;
originally announced May 2022.
-
Variational multiple shooting for Bayesian ODEs with Gaussian processes
Authors:
Pashupati Hegde,
Çağatay Yıldız,
Harri Lähdesmäki,
Samuel Kaski,
Markus Heinonen
Abstract:
Recent machine learning advances have proposed black-box estimation of unknown continuous-time system dynamics directly from data. However, earlier works are based on approximative ODE solutions or point estimates. We propose a novel Bayesian nonparametric model that uses Gaussian processes to infer posteriors of unknown ODE systems directly from data. We derive sparse variational inference with d…
▽ More
Recent machine learning advances have proposed black-box estimation of unknown continuous-time system dynamics directly from data. However, earlier works are based on approximative ODE solutions or point estimates. We propose a novel Bayesian nonparametric model that uses Gaussian processes to infer posteriors of unknown ODE systems directly from data. We derive sparse variational inference with decoupled functional sampling to represent vector field posteriors. We also introduce a probabilistic shooting augmentation to enable efficient inference from arbitrarily long trajectories. The method demonstrates the benefit of computing vector field posteriors, with predictive uncertainty scores outperforming alternative methods on multiple ODE learning tasks.
△ Less
Submitted 17 July, 2022; v1 submitted 21 June, 2021;
originally announced June 2021.
-
Continuous-Time Model-Based Reinforcement Learning
Authors:
Çağatay Yıldız,
Markus Heinonen,
Harri Lähdesmäki
Abstract:
Model-based reinforcement learning (MBRL) approaches rely on discrete-time state transition models whereas physical systems and the vast majority of control tasks operate in continuous-time. To avoid time-discretization approximation of the underlying process, we propose a continuous-time MBRL framework based on a novel actor-critic method. Our approach also infers the unknown state evolution diff…
▽ More
Model-based reinforcement learning (MBRL) approaches rely on discrete-time state transition models whereas physical systems and the vast majority of control tasks operate in continuous-time. To avoid time-discretization approximation of the underlying process, we propose a continuous-time MBRL framework based on a novel actor-critic method. Our approach also infers the unknown state evolution differentials with Bayesian neural ordinary differential equations (ODE) to account for epistemic uncertainty. We implement and test our method on a new ODE-RL suite that explicitly solves continuous-time control systems. Our experiments illustrate that the model is robust against irregular and noisy data, is sample-efficient, and can solve control problems which pose challenges to discrete-time MBRL methods.
△ Less
Submitted 11 June, 2021; v1 submitted 9 February, 2021;
originally announced February 2021.
-
ODE$^2$VAE: Deep generative second order ODEs with Bayesian neural networks
Authors:
Çağatay Yıldız,
Markus Heinonen,
Harri Lähdesmäki
Abstract:
We present Ordinary Differential Equation Variational Auto-Encoder (ODE$^2$VAE), a latent second order ODE model for high-dimensional sequential data. Leveraging the advances in deep generative models, ODE$^2$VAE can simultaneously learn the embedding of high dimensional trajectories and infer arbitrarily complex continuous-time latent dynamics. Our model explicitly decomposes the latent space int…
▽ More
We present Ordinary Differential Equation Variational Auto-Encoder (ODE$^2$VAE), a latent second order ODE model for high-dimensional sequential data. Leveraging the advances in deep generative models, ODE$^2$VAE can simultaneously learn the embedding of high dimensional trajectories and infer arbitrarily complex continuous-time latent dynamics. Our model explicitly decomposes the latent space into momentum and position components and solves a second order ODE system, which is in contrast to recurrent neural network (RNN) based time series models and recently proposed black-box ODE techniques. In order to account for uncertainty, we propose probabilistic latent ODE dynamics parameterized by deep Bayesian neural networks. We demonstrate our approach on motion capture, image rotation and bouncing balls datasets. We achieve state-of-the-art performance in long term motion prediction and imputation tasks.
△ Less
Submitted 24 October, 2019; v1 submitted 27 May, 2019;
originally announced May 2019.
-
Learning Stochastic Differential Equations With Gaussian Processes Without Gradient Matching
Authors:
Cagatay Yildiz,
Markus Heinonen,
Jukka Intosalmi,
Henrik Mannerström,
Harri Lähdesmäki
Abstract:
We introduce a novel paradigm for learning non-parametric drift and diffusion functions for stochastic differential equation (SDE). The proposed model learns to simulate path distributions that match observations with non-uniform time increments and arbitrary sparseness, which is in contrast with gradient matching that does not optimize simulated responses. We formulate sensitivity equations for l…
▽ More
We introduce a novel paradigm for learning non-parametric drift and diffusion functions for stochastic differential equation (SDE). The proposed model learns to simulate path distributions that match observations with non-uniform time increments and arbitrary sparseness, which is in contrast with gradient matching that does not optimize simulated responses. We formulate sensitivity equations for learning and demonstrate that our general stochastic distribution optimisation leads to robust and efficient learning of SDE systems.
△ Less
Submitted 31 July, 2018; v1 submitted 16 July, 2018;
originally announced July 2018.
-
Asynchronous Stochastic Quasi-Newton MCMC for Non-Convex Optimization
Authors:
Umut Şimşekli,
Çağatay Yıldız,
Thanh Huy Nguyen,
Gaël Richard,
A. Taylan Cemgil
Abstract:
Recent studies have illustrated that stochastic gradient Markov Chain Monte Carlo techniques have a strong potential in non-convex optimization, where local and global convergence guarantees can be shown under certain conditions. By building up on this recent theory, in this study, we develop an asynchronous-parallel stochastic L-BFGS algorithm for non-convex optimization. The proposed algorithm i…
▽ More
Recent studies have illustrated that stochastic gradient Markov Chain Monte Carlo techniques have a strong potential in non-convex optimization, where local and global convergence guarantees can be shown under certain conditions. By building up on this recent theory, in this study, we develop an asynchronous-parallel stochastic L-BFGS algorithm for non-convex optimization. The proposed algorithm is suitable for both distributed and shared-memory settings. We provide formal theoretical analysis and show that the proposed method achieves an ergodic convergence rate of ${\cal O}(1/\sqrt{N})$ ($N$ being the total number of iterations) and it can achieve a linear speedup under certain conditions. We perform several experiments on both synthetic and real datasets. The results support our theory and show that the proposed algorithm provides a significant speedup over the recently proposed synchronous distributed L-BFGS algorithm.
△ Less
Submitted 7 June, 2018;
originally announced June 2018.