Skip to main content

Showing 1–24 of 24 results for author: Titsias, M K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17699  [pdf, other

    math.ST cs.LG

    Can independent Metropolis beat crude Monte Carlo?

    Authors: Siran Liu, Petros Dellaportas, Michalis K. Titsias

    Abstract: Assume that we would like to estimate the expected value of a function $F$ with respect to a density $π$. We prove that if $π$ is close enough under KL divergence to another density $q$, an independent Metropolis sampler estimator that obtains samples from $π$ with proposal density $q$, enriched with a variance reduction computational strategy based on control variates, achieves smaller asymptotic… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 37 pages, 3 figures

  2. arXiv:2406.04329  [pdf, other

    cs.LG stat.ML

    Simplified and Generalized Masked Diffusion for Discrete Data

    Authors: Jiaxin Shi, Kehang Han, Zhe Wang, Arnaud Doucet, Michalis K. Titsias

    Abstract: Masked (or absorbing) diffusion is actively explored as an alternative to autoregressive models for generative modeling of discrete data. However, existing work in this area has been hindered by unnecessarily complex model formulations and unclear relationships between different perspectives, leading to suboptimal parameterization, training objectives, and ad hoc adjustments to counteract these is… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  3. arXiv:2403.01518  [pdf, other

    cs.CL cs.LG

    Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models

    Authors: Amal Rannen-Triki, Jorg Bornschein, Razvan Pascanu, Marcus Hutter, Andras György, Alexandre Galashov, Yee Whye Teh, Michalis K. Titsias

    Abstract: We consider the problem of online fine tuning the parameters of a language model at test time, also known as dynamic evaluation. While it is generally known that this approach improves the overall predictive performance, especially when considering distributional shift between training and evaluation data, we here emphasize the perspective that online adaptation turns parameters into temporally ch… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  4. arXiv:2306.08448  [pdf, other

    cs.LG cs.AI

    Kalman Filter for Online Classification of Non-Stationary Data

    Authors: Michalis K. Titsias, Alexandre Galashov, Amal Rannen-Triki, Razvan Pascanu, Yee Whye Teh, Jorg Bornschein

    Abstract: In Online Continual Learning (OCL) a learning system receives a stream of data and sequentially performs prediction and training steps. Important challenges in OCL are concerned with automatic adaptation to the particular non-stationary structure of the data, and with quantification of predictive uncertainty. Motivated by these challenges we introduce a probabilistic Bayesian online learning model… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

  5. arXiv:2305.14442  [pdf, other

    stat.ML cs.LG stat.CO

    Optimal Preconditioning and Fisher Adaptive Langevin Sampling

    Authors: Michalis K. Titsias

    Abstract: We define an optimal preconditioning for the Langevin diffusion by analytically optimizing the expected squared jumped distance. This yields as the optimal preconditioning an inverse Fisher information covariance matrix, where the covariance matrix is computed as the outer product of log target gradients averaged under the target. We apply this result to the Metropolis adjusted Langevin algorithm… ▽ More

    Submitted 28 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: 21 pages, 15 figures

  6. arXiv:2202.09848  [pdf, other

    cs.LG

    Personalized Federated Learning with Exact Stochastic Gradient Descent

    Authors: Sotirios Nikoloutsopoulos, Iordanis Koutsopoulos, Michalis K. Titsias

    Abstract: In Federated Learning (FL), datasets across clients tend to be heterogeneous or personalized, and this poses challenges to the convergence of standard FL schemes that do not account for personalization. To address this, we present a new approach for personalized FL that achieves exact stochastic gradient descent (SGD) minimization. We start from the FedPer (Arivazhagan et al., 2019) neural network… ▽ More

    Submitted 20 February, 2022; originally announced February 2022.

  7. arXiv:2202.09497  [pdf, other

    stat.ML cs.LG

    Gradient Estimation with Discrete Stein Operators

    Authors: Jiaxin Shi, Yuhao Zhou, Jessica Hwang, Michalis K. Titsias, Lester Mackey

    Abstract: Gradient estimation -- approximating the gradient of an expectation with respect to the parameters of a distribution -- is central to the solution of many machine learning problems. However, when the distribution is discrete, most common gradient estimators suffer from excessive variance. To improve the quality of gradient estimation, we introduce a variance reduction technique based on Stein oper… ▽ More

    Submitted 14 April, 2024; v1 submitted 18 February, 2022; originally announced February 2022.

    Comments: NeurIPS 2022. Source code: https://github.com/thjashin/rodeo

  8. arXiv:2111.05300  [pdf, other

    stat.ML cs.LG

    Double Control Variates for Gradient Estimation in Discrete Latent Variable Models

    Authors: Michalis K. Titsias, Jiaxin Shi

    Abstract: Stochastic gradient-based optimisation for discrete latent variable models is challenging due to the high variance of gradients. We introduce a variance reduction technique for score function estimators that makes use of double control variates. These control variates act on top of a main control variate, and try to further reduce the variance of the overall estimator. We develop a double control… ▽ More

    Submitted 4 June, 2022; v1 submitted 9 November, 2021; originally announced November 2021.

    Comments: AISTATS 2022. Source code: https://github.com/thjashin/double-cv

  9. arXiv:2010.03053  [pdf, other

    cs.LG cs.AI stat.CO stat.ME

    Sequential Changepoint Detection in Neural Networks with Checkpoints

    Authors: Michalis K. Titsias, Jakub Sygnowski, Yutian Chen

    Abstract: We introduce a framework for online changepoint detection and simultaneous model learning which is applicable to highly parametrized models, such as deep neural networks. It is based on detecting changepoints across time by sequentially performing generalized likelihood ratio tests that require only evaluations of simple prediction score functions. This procedure makes use of checkpoints, consisti… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: 17 pages, 7 figures

  10. arXiv:2010.01845  [pdf, other

    cs.LG stat.ML

    Unbiased Gradient Estimation for Variational Auto-Encoders using Coupled Markov Chains

    Authors: Francisco J. R. Ruiz, Michalis K. Titsias, Taylan Cemgil, Arnaud Doucet

    Abstract: The variational auto-encoder (VAE) is a deep latent variable model that has two neural networks in an autoencoder-like architecture; one of them parameterizes the model's likelihood. Fitting its parameters via maximum likelihood (ML) is challenging since the computation of the marginal likelihood involves an intractable integral over the latent space; thus the VAE is trained instead by maximizing… ▽ More

    Submitted 2 June, 2021; v1 submitted 5 October, 2020; originally announced October 2020.

    Journal ref: Conference on Uncertainty in Artificial Intelligence (UAI, 2021)

  11. arXiv:2009.03228  [pdf, other

    cs.LG cs.AI stat.ML

    Information Theoretic Meta Learning with Gaussian Processes

    Authors: Michalis K. Titsias, Francisco J. R. Ruiz, Sotirios Nikoloutsopoulos, Alexandre Galashov

    Abstract: We formulate meta learning using information theoretic concepts; namely, mutual information and the information bottleneck. The idea is to learn a stochastic representation or encoding of the task description, given by a training set, that is highly informative about predicting the validation set. By making use of variational approximations to the mutual information, we derive a general and tracta… ▽ More

    Submitted 5 July, 2021; v1 submitted 7 September, 2020; originally announced September 2020.

    Comments: 15 pages, 2 figures

  12. arXiv:1911.01373  [pdf, other

    stat.ML cs.LG stat.CO

    Gradient-based Adaptive Markov Chain Monte Carlo

    Authors: Michalis K. Titsias, Petros Dellaportas

    Abstract: We introduce a gradient-based learning method to automatically adapt Markov chain Monte Carlo (MCMC) proposal distributions to intractable targets. We define a maximum entropy regularised objective function, referred to as generalised speed measure, which can be robustly optimised over the parameters of the proposal distribution by applying stochastic gradient optimisation. An advantage of our met… ▽ More

    Submitted 6 January, 2020; v1 submitted 4 November, 2019; originally announced November 2019.

    Comments: 17 pages, 7 Figures, NeurIPS 2019

  13. arXiv:1910.10596  [pdf, other

    stat.ML cs.LG

    Sparse Orthogonal Variational Inference for Gaussian Processes

    Authors: Jiaxin Shi, Michalis K. Titsias, Andriy Mnih

    Abstract: We introduce a new interpretation of sparse variational approximations for Gaussian processes using inducing points, which can lead to more scalable algorithms than previous methods. It is based on decomposing a Gaussian process as a sum of two independent processes: one spanned by a finite basis of inducing points and the other capturing the remaining variation. We show that this formulation reco… ▽ More

    Submitted 24 February, 2024; v1 submitted 23 October, 2019; originally announced October 2019.

    Comments: AISTATS 2020

  14. arXiv:1910.04302  [pdf, other

    stat.ML cs.LG stat.ME

    Prescribed Generative Adversarial Networks

    Authors: Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei, Michalis K. Titsias

    Abstract: Generative adversarial networks (GANs) are a powerful approach to unsupervised learning. They have achieved state-of-the-art performance in the image domain. However, GANs are limited in two ways. They often learn distributions with low support---a phenomenon known as mode collapse---and they do not guarantee the existence of a probability density, which makes evaluating generalization using predi… ▽ More

    Submitted 9 October, 2019; originally announced October 2019.

    Comments: Code for this paper can be found at https://github.com/adjidieng/PresGANs

  15. arXiv:1905.04062  [pdf, other

    stat.ML cs.LG

    A Contrastive Divergence for Combining Variational Inference and MCMC

    Authors: Francisco J. R. Ruiz, Michalis K. Titsias

    Abstract: We develop a method to combine Markov chain Monte Carlo (MCMC) and variational inference (VI), leveraging the advantages of both inference approaches. Specifically, we improve the variational distribution by running a few MCMC steps. To make inference tractable, we introduce the variational contrastive divergence (VCD), a new divergence that replaces the standard Kullback-Leibler (KL) divergence u… ▽ More

    Submitted 28 May, 2019; v1 submitted 10 May, 2019; originally announced May 2019.

    Comments: International Conference on Machine Learning (ICML 2019). 12 pages, 3 figures

  16. arXiv:1901.11356  [pdf, other

    stat.ML cs.LG

    Functional Regularisation for Continual Learning with Gaussian Processes

    Authors: Michalis K. Titsias, Jonathan Schwarz, Alexander G. de G. Matthews, Razvan Pascanu, Yee Whye Teh

    Abstract: We introduce a framework for Continual Learning (CL) based on Bayesian inference over the function space rather than the parameters of a deep neural network. This method, referred to as functional regularisation for Continual Learning, avoids forgetting a previous task by constructing and memorising an approximate posterior belief over the underlying task-specific function. To achieve this we rely… ▽ More

    Submitted 11 February, 2020; v1 submitted 31 January, 2019; originally announced January 2019.

    Comments: 17 pages, 7 figures

  17. arXiv:1810.00468  [pdf, other

    cs.LG stat.ML

    Bayesian Transfer Reinforcement Learning with Prior Knowledge Rules

    Authors: Michalis K. Titsias, Sotirios Nikoloutsopoulos

    Abstract: We propose a probabilistic framework to directly insert prior knowledge in reinforcement learning (RL) algorithms by defining the behaviour policy as a Bayesian posterior distribution. Such a posterior combines task specific information with prior knowledge, thus allowing to achieve transfer learning across tasks. The resulting method is flexible and it can be easily incorporated to any standard o… ▽ More

    Submitted 30 September, 2018; originally announced October 2018.

    Comments: 11 pages, 2 figures

  18. arXiv:1808.02078  [pdf, other

    stat.ML cs.LG

    Unbiased Implicit Variational Inference

    Authors: Michalis K. Titsias, Francisco J. R. Ruiz

    Abstract: We develop unbiased implicit variational inference (UIVI), a method that expands the applicability of variational inference by defining an expressive variational family. UIVI considers an implicit variational distribution obtained in a hierarchical manner using a simple reparameterizable distribution whose variational parameters are defined by arbitrarily flexible deep neural networks. Unlike prev… ▽ More

    Submitted 6 February, 2019; v1 submitted 6 August, 2018; originally announced August 2018.

    Comments: 9 pages, 3 figures

    Journal ref: Artificial Intelligence and Statistics (AISTATS 2019)

  19. arXiv:1807.02537  [pdf, other

    stat.ML cs.LG

    Fully Scalable Gaussian Processes using Subspace Inducing Inputs

    Authors: Aristeidis Panos, Petros Dellaportas, Michalis K. Titsias

    Abstract: We introduce fully scalable Gaussian processes, an implementation scheme that tackles the problem of treating a high number of training instances together with high dimensional input data. Our key idea is a representation trick over the inducing variables called subspace inducing inputs. This is combined with certain matrix-preconditioning based parametrizations of the variational distributions th… ▽ More

    Submitted 12 July, 2018; v1 submitted 6 July, 2018; originally announced July 2018.

  20. arXiv:1802.04220  [pdf, other

    stat.ML cs.LG

    Augment and Reduce: Stochastic Inference for Large Categorical Distributions

    Authors: Francisco J. R. Ruiz, Michalis K. Titsias, Adji B. Dieng, David M. Blei

    Abstract: Categorical distributions are ubiquitous in machine learning, e.g., in classification, language models, and recommendation systems. However, when the number of possible outcomes is very large, using categorical distributions becomes computationally expensive, as the complexity scales linearly with the number of outcomes. To address this problem, we propose augment and reduce (A&R), a method to all… ▽ More

    Submitted 7 June, 2018; v1 submitted 12 February, 2018; originally announced February 2018.

    Comments: 11 pages, 2 figures

    Journal ref: Francisco J. R. Ruiz, Michalis K. Titsias, Adji B. Dieng, and David M. Blei. Augment and Reduce: Stochastic Inference for Large Categorical Distributions. International Conference on Machine Learning. Stockholm (Sweden), July 2018

  21. arXiv:1702.06166  [pdf, other

    stat.ML cs.LG math.NA q-bio.GN q-bio.QM stat.ME

    Bayesian Boolean Matrix Factorisation

    Authors: Tammo Rukat, Chris C. Holmes, Michalis K. Titsias, Christopher Yau

    Abstract: Boolean matrix factorisation aims to decompose a binary data matrix into an approximate Boolean product of two low rank, binary matrices: one containing meaningful patterns, the other quantifying how the observations can be expressed as a combination of these patterns. We introduce the OrMachine, a probabilistic generative model for Boolean matrix factorisation and derive a Metropolised Gibbs samp… ▽ More

    Submitted 25 February, 2017; v1 submitted 20 February, 2017; originally announced February 2017.

  22. arXiv:1409.2287  [pdf, other

    stat.ML cs.AI cs.CV cs.LG

    Variational Inference for Uncertainty on the Inputs of Gaussian Process Models

    Authors: Andreas C. Damianou, Michalis K. Titsias, Neil D. Lawrence

    Abstract: The Gaussian process latent variable model (GP-LVM) provides a flexible approach for non-linear dimensionality reduction that has been widely applied. However, the current approach for training GP-LVMs is based on maximum likelihood, where the latent projection variables are maximized over rather than integrated out. In this paper we present a Bayesian method for training GP-LVMs by introducing a… ▽ More

    Submitted 8 September, 2014; originally announced September 2014.

    Comments: 51 pages (of which 10 is Appendix), 19 figures

    MSC Class: 60G15 (Primary); 58E30 ACM Class: G.3; G.1.2; I.2.6; I.5.4

  23. arXiv:1311.1189  [pdf, other

    stat.ME cs.LG stat.ML

    Statistical Inference in Hidden Markov Models using $k$-segment Constraints

    Authors: Michalis K. Titsias, Christopher Yau, Christopher C. Holmes

    Abstract: Hidden Markov models (HMMs) are one of the most widely used statistical methods for analyzing sequence data. However, the reporting of output from HMMs has largely been restricted to the presentation of the most-probable (MAP) hidden state sequence, found via the Viterbi algorithm, or the sequence of most probable marginals using the forward-backward (F-B) algorithm. In this article, we expand the… ▽ More

    Submitted 5 November, 2013; originally announced November 2013.

    Comments: 37 pages

  24. arXiv:1107.4985  [pdf, other

    stat.ML cs.AI cs.CV math.PR

    Variational Gaussian Process Dynamical Systems

    Authors: Andreas C. Damianou, Michalis K. Titsias, Neil D. Lawrence

    Abstract: High dimensional time series are endemic in applications of machine learning such as robotics (sensor data), computational biology (gene expression data), vision (video sequences) and graphics (motion capture data). Practical nonlinear probabilistic approaches to this data are required. In this paper we introduce the variational Gaussian process dynamical system. Our work builds on recent variatio… ▽ More

    Submitted 25 July, 2011; originally announced July 2011.

    Comments: 16 pages, 19 figures

    MSC Class: 60G15 (Primary); 62-09; 58E30 ACM Class: G.3; G.1.2; I.2.6; I.5.4