Skip to main content

Showing 1–47 of 47 results for author: Hyvärinen, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2311.16849  [pdf, other

    stat.ML cs.LG

    Identifiable Feature Learning for Spatial Data with Nonlinear ICA

    Authors: Hermanni Hälvä, Jonathan So, Richard E. Turner, Aapo Hyvärinen

    Abstract: Recently, nonlinear ICA has surfaced as a popular alternative to the many heuristic models used in deep representation learning and disentanglement. An advantage of nonlinear ICA is that a sophisticated identifiability theory has been developed; in particular, it has been proven that the original components can be recovered under sufficiently strong latent dependencies. Despite this general theory… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: Work under review

  2. arXiv:2310.15709  [pdf, other

    stat.ML cs.LG

    Causal Representation Learning Made Identifiable by Grou** of Observational Variables

    Authors: Hiroshi Morioka, Aapo Hyvärinen

    Abstract: A topic of great current interest is Causal Representation Learning (CRL), whose goal is to learn a causal model for hidden features in a data-driven manner. Unfortunately, CRL is severely ill-posed since it is a combination of the two notoriously ill-posed problems of representation learning and causal discovery. Yet, finding practical identifiability conditions that guarantee a unique solution i… ▽ More

    Submitted 7 June, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

  3. arXiv:2310.03902  [pdf, other

    stat.ML cs.LG

    Provable benefits of annealing for estimating normalizing constants: Importance Sampling, Noise-Contrastive Estimation, and beyond

    Authors: Omar Chehab, Aapo Hyvarinen, Andrej Risteski

    Abstract: Recent research has developed several Monte Carlo methods for estimating the normalization constant (partition function) based on the idea of annealing. This means sampling successively from a path of distributions that interpolate between a tractable "proposal" distribution and the unnormalized "target" distribution. Prominent estimators in this family include annealed importance sampling and ann… ▽ More

    Submitted 9 October, 2023; v1 submitted 5 October, 2023; originally announced October 2023.

  4. arXiv:2303.16535  [pdf, other

    cs.LG stat.ML

    Nonlinear Independent Component Analysis for Principled Disentanglement in Unsupervised Deep Learning

    Authors: Aapo Hyvarinen, Ilyes Khemakhem, Hiroshi Morioka

    Abstract: A central problem in unsupervised deep learning is how to find useful representations of high-dimensional data, sometimes called "disentanglement". Most approaches are heuristic and lack a proper theoretical foundation. In linear representation learning, independent component analysis (ICA) has been successful in many applications areas, and it is principled, i.e., based on a well-defined probabil… ▽ More

    Submitted 5 September, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: Revised version, to appear in Patterns

  5. arXiv:2302.02672  [pdf, other

    stat.ML cs.LG

    Identifiability of latent-variable and structural-equation models: from linear to nonlinear

    Authors: Aapo Hyvärinen, Ilyes Khemakhem, Ricardo Monti

    Abstract: An old problem in multivariate statistics is that linear Gaussian models are often unidentifiable, i.e. some parameters cannot be uniquely estimated. In factor (component) analysis, an orthogonal rotation of the factors is unidentifiable, while in linear regression, the direction of effect cannot be identified. For such linear models, non-Gaussianity of the (latent) variables has been shown to pro… ▽ More

    Submitted 3 May, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: Revised final version of invited review to be published at Annals of the Institute of Statistical Mathematics

  6. arXiv:2301.09696  [pdf, other

    stat.ML cs.LG

    Optimizing the Noise in Self-Supervised Learning: from Importance Sampling to Noise-Contrastive Estimation

    Authors: Omar Chehab, Alexandre Gramfort, Aapo Hyvarinen

    Abstract: Self-supervised learning is an increasingly popular approach to unsupervised learning, achieving state-of-the-art results. A prevalent approach consists in contrasting data points and noise points within a classification task: this requires a good noise distribution which is notoriously hard to specify. While a comprehensive theory is missing, it is widely assumed that the optimal noise distributi… ▽ More

    Submitted 23 January, 2023; originally announced January 2023.

    Comments: arXiv admin note: text overlap with arXiv:2203.01110

  7. arXiv:2203.01110  [pdf, other

    stat.ML cs.LG

    The Optimal Noise in Noise-Contrastive Learning Is Not What You Think

    Authors: Omar Chehab, Alexandre Gramfort, Aapo Hyvarinen

    Abstract: Learning a parametric model of a data distribution is a well-known statistical problem that has seen renewed interest as it is brought to scale in deep learning. Framing the problem as a self-supervised task, where data samples are discriminated from noise samples, is at the core of state-of-the-art methods, beginning with Noise-Contrastive Estimation (NCE). Yet, such contrastive learning requires… ▽ More

    Submitted 26 July, 2022; v1 submitted 2 March, 2022; originally announced March 2022.

  8. arXiv:2111.15431  [pdf, other

    cs.LG stat.ML

    Binary Independent Component Analysis: A Non-stationarity-based Approach

    Authors: Antti Hyttinen, Vitória Barin-Pacela, Aapo Hyvärinen

    Abstract: We consider independent component analysis of binary data. While fundamental in practice, this case has been much less developed than ICA for continuous data. We start by assuming a linear mixing model in a continuous-valued latent space, followed by a binary observation model. Importantly, we assume that the sources are non-stationary; this is necessary since any non-Gaussianity would essentially… ▽ More

    Submitted 2 August, 2022; v1 submitted 30 November, 2021; originally announced November 2021.

    Comments: This is an updated version (including a slight name change) which was published at UAI2022

  9. arXiv:2106.09620  [pdf, other

    stat.ML cs.LG

    Disentangling Identifiable Features from Noisy Data with Structured Nonlinear ICA

    Authors: Hermanni Hälvä, Sylvain Le Corff, Luc Lehéricy, Jonathan So, Yongjie Zhu, Elisabeth Gassiat, Aapo Hyvarinen

    Abstract: We introduce a new general identifiable framework for principled disentanglement referred to as Structured Nonlinear Independent Component Analysis (SNICA). Our contribution is to extend the identifiability theory of deep generative models for a very broad class of structured models. While previous works have shown identifiability for specific classes of time-series models, our theorems extend thi… ▽ More

    Submitted 27 October, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

    Comments: Accepted for publication at NeurIPS 2021

  10. arXiv:2102.10964  [pdf, other

    stat.ML cs.LG

    Adaptive Multi-View ICA: Estimation of noise levels for optimal inference

    Authors: Hugo Richard, Pierre Ablin, Aapo Hyvärinen, Alexandre Gramfort, Bertrand Thirion

    Abstract: We consider a multi-view learning problem known as group independent component analysis (group ICA), where the goal is to recover shared independent sources from many views. The statistical modeling of this problem requires to take noise into account. When the model includes additive noise on the observations, the likelihood is intractable. By contrast, we propose Adaptive multiView ICA (AVICA), a… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

  11. arXiv:2011.02268  [pdf, other

    stat.ML cs.LG

    Causal Autoregressive Flows

    Authors: Ilyes Khemakhem, Ricardo Pio Monti, Robert Leech, Aapo Hyvärinen

    Abstract: Two apparently unrelated fields -- normalizing flows and causality -- have recently received considerable attention in the machine learning community. In this work, we highlight an intrinsic correspondence between a simple family of autoregressive normalizing flows and identifiable causal models. We exploit the fact that autoregressive flow architectures define an ordering over variables, analogou… ▽ More

    Submitted 24 February, 2021; v1 submitted 4 November, 2020; originally announced November 2020.

    Comments: Published at AISTATS2021. Code available at https://github.com/piomonti/carefl

  12. arXiv:2007.16104  [pdf, other

    stat.ML cs.LG eess.SP q-bio.NC q-bio.QM

    Uncovering the structure of clinical EEG signals with self-supervised learning

    Authors: Hubert Banville, Omar Chehab, Aapo Hyvärinen, Denis-Alexander Engemann, Alexandre Gramfort

    Abstract: Objective. Supervised learning paradigms are often limited by the amount of labeled data that is available. This phenomenon is particularly problematic in clinically-relevant data, such as electroencephalography (EEG), where labeling can be costly in terms of specialized expertise and human processing time. Consequently, deep learning architectures designed to learn on EEG data have yielded relati… ▽ More

    Submitted 31 July, 2020; originally announced July 2020.

    Comments: 32 pages, 9 figures

  13. arXiv:2007.09390  [pdf, other

    stat.ML cs.LG

    Autoregressive flow-based causal discovery and inference

    Authors: Ricardo Pio Monti, Ilyes Khemakhem, Aapo Hyvarinen

    Abstract: We posit that autoregressive flow models are well-suited to performing a range of causal inference tasks - ranging from causal discovery to making interventional and counterfactual predictions. In particular, we exploit the fact that autoregressive architectures define an ordering over variables, analogous to a causal ordering, in order to propose a single flow architecture to perform all three af… ▽ More

    Submitted 26 July, 2020; v1 submitted 18 July, 2020; originally announced July 2020.

    Comments: 6 pages, 3 figures. Accepted at the 2nd ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models

  14. arXiv:2006.15090  [pdf, other

    stat.ML cs.LG

    Relative gradient optimization of the Jacobian term in unsupervised deep learning

    Authors: Luigi Gresele, Giancarlo Fissore, Adrián Javaloy, Bernhard Schölkopf, Aapo Hyvärinen

    Abstract: Learning expressive probabilistic models correctly describing the data is a ubiquitous problem in machine learning. A popular approach for solving it is map** the observations into a representation space with a simple joint distribution, which can typically be written as a product of its marginals -- thus drawing a connection with the field of nonlinear independent component analysis. Deep densi… ▽ More

    Submitted 26 October, 2020; v1 submitted 26 June, 2020; originally announced June 2020.

  15. arXiv:2006.12107  [pdf, other

    stat.ML cs.LG

    Hidden Markov Nonlinear ICA: Unsupervised Learning from Nonstationary Time Series

    Authors: Hermanni Hälvä, Aapo Hyvärinen

    Abstract: Recent advances in nonlinear Independent Component Analysis (ICA) provide a principled framework for unsupervised feature learning and disentanglement. The central idea in such works is that the latent components are assumed to be independent conditional on some observed auxiliary variables, such as the time-segment index. This requires manual segmentation of data into non-stationary segments whic… ▽ More

    Submitted 22 June, 2020; originally announced June 2020.

    Comments: Accepted for publication at UAI 2020

  16. arXiv:2006.10944  [pdf, other

    stat.ML cs.LG

    Independent Innovation Analysis for Nonlinear Vector Autoregressive Process

    Authors: Hiroshi Morioka, Hermanni Hälvä, Aapo Hyvärinen

    Abstract: The nonlinear vector autoregressive (NVAR) model provides an appealing framework to analyze multivariate time series obtained from a nonlinear dynamical system. However, the innovation (or error), which plays a key role by driving the dynamics, is almost always assumed to be additive. Additivity greatly limits the generality of the model, hindering analysis of general NVAR processes which have non… ▽ More

    Submitted 25 February, 2021; v1 submitted 18 June, 2020; originally announced June 2020.

  17. arXiv:2006.06635  [pdf, other

    stat.ML cs.LG

    Modeling Shared Responses in Neuroimaging Studies through MultiView ICA

    Authors: Hugo Richard, Luigi Gresele, Aapo Hyvärinen, Bertrand Thirion, Alexandre Gramfort, Pierre Ablin

    Abstract: Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization. However, the aggregation of data coming from multiple subjects is challenging, since it requires accounting for large variability in anatomy, functional topography and stimulus response across individuals. Data modeling is especially hard for ecologically relevant condit… ▽ More

    Submitted 24 December, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: Accepted to NeurIPS 2020

  18. arXiv:2002.11537  [pdf, other

    stat.ML cs.LG

    ICE-BeeM: Identifiable Conditional Energy-Based Deep Models Based on Nonlinear ICA

    Authors: Ilyes Khemakhem, Ricardo Pio Monti, Diederik P. Kingma, Aapo Hyvärinen

    Abstract: We consider the identifiability theory of probabilistic models and establish sufficient conditions under which the representations learned by a very broad family of conditional energy-based models are unique in function space, up to a simple transformation. In our model family, the energy function is the dot-product between two feature extractors, one for the dependent variable, and one for the co… ▽ More

    Submitted 26 October, 2020; v1 submitted 26 February, 2020; originally announced February 2020.

    Comments: Accepted for publication at NeurIPS 2020

  19. arXiv:1911.05419  [pdf, other

    cs.LG eess.SP stat.ML

    Self-supervised representation learning from electroencephalography signals

    Authors: Hubert Banville, Isabela Albuquerque, Aapo Hyvärinen, Graeme Moffat, Denis-Alexander Engemann, Alexandre Gramfort

    Abstract: The supervised learning paradigm is limited by the cost - and sometimes the impracticality - of data collection and labeling in multiple domains. Self-supervised learning, a paradigm which exploits the structure of unlabeled data to create learning problems that can be solved with standard supervised approaches, has shown great promise as a pretraining or feature learning approach in fields like c… ▽ More

    Submitted 13 November, 2019; originally announced November 2019.

  20. arXiv:1911.00265  [pdf, other

    cs.LG stat.ML

    Robust contrastive learning and nonlinear ICA in the presence of outliers

    Authors: Hiroaki Sasaki, Takashi Takenouchi, Ricardo Monti, Aapo Hyvärinen

    Abstract: Nonlinear independent component analysis (ICA) is a general framework for unsupervised representation learning, and aimed at recovering the latent variables in data. Recent practical methods perform nonlinear ICA by solving a series of classification problems based on logistic regression. However, it is well-known that logistic regression is vulnerable to outliers, and thus the performance can be… ▽ More

    Submitted 1 November, 2019; originally announced November 2019.

  21. Interpretable brain age prediction using linear latent variable models of functional connectivity

    Authors: Ricardo Pio Monti, Alex Gibberd, Sandipan Roy, Matt Nunes, Romy Lorenz, Robert Leech, Takeshi Ogawa, Motoaki Kawanabe, Aapo Hyvarinen

    Abstract: Neuroimaging-driven prediction of brain age, defined as the predicted biological age of a subject using only brain imaging data, is an exciting avenue of research. In this work we seek to build models of brain age based on functional connectivity while prioritizing model interpretability and understanding. This way, the models serve to both provide accurate estimates of brain age as well as allow… ▽ More

    Submitted 5 August, 2019; originally announced August 2019.

    Comments: 21 pages, 11 figures

  22. arXiv:1907.09588  [pdf, other

    stat.ML cs.LG

    Direction Matters: On Influence-Preserving Graph Summarization and Max-cut Principle for Directed Graphs

    Authors: Wenkai Xu, Gang Niu, Aapo Hyvärinen, Masashi Sugiyama

    Abstract: Summarizing large-scaled directed graphs into small-scale representations is a useful but less studied problem setting. Conventional clustering approaches, which based on "Min-Cut"-style criteria, compress both the vertices and edges of the graph into the communities, that lead to a loss of directed edge information. On the other hand, compressing the vertices while preserving the directed edge in… ▽ More

    Submitted 22 July, 2019; originally announced July 2019.

  23. arXiv:1907.04809  [pdf, other

    stat.ML cs.LG

    Variational Autoencoders and Nonlinear ICA: A Unifying Framework

    Authors: Ilyes Khemakhem, Diederik P. Kingma, Ricardo Pio Monti, Aapo Hyvärinen

    Abstract: The framework of variational autoencoders allows us to efficiently learn deep latent-variable models, such that the model's marginal distribution over observed variables fits the data. Often, we're interested in going a step further, and want to approximate the true joint distribution over observed and latent variables, including the true prior and posterior distributions over latent variables. Th… ▽ More

    Submitted 21 December, 2020; v1 submitted 10 July, 2019; originally announced July 2019.

    Comments: Accepted for publication at AISTATS 2020. This is a slightly updated version of the published manuscript; see Corrigendum at the end of the paper

    Journal ref: Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, pages 2207-2217, year 2020

  24. arXiv:1905.05976  [pdf, ps, other

    math.ST cs.LG stat.ML

    Information criteria for non-normalized models

    Authors: Takeru Matsuda, Masatoshi Uehara, Aapo Hyvarinen

    Abstract: Many statistical models are given in the form of non-normalized densities with an intractable normalization constant. Since maximum likelihood estimation is computationally intensive for these models, several estimation methods have been developed which do not require explicit computation of the normalization constant, such as noise contrastive estimation (NCE) and score matching. However, model s… ▽ More

    Submitted 27 July, 2021; v1 submitted 15 May, 2019; originally announced May 2019.

    Journal ref: Journal of Machine Learning Research, 22(158):1--33, 2021

  25. arXiv:1904.09096  [pdf, other

    stat.ML cs.LG

    Causal Discovery with General Non-Linear Relationships Using Non-Linear ICA

    Authors: Ricardo Pio Monti, Kun Zhang, Aapo Hyvarinen

    Abstract: We consider the problem of inferring causal relationships between two or more passively observed variables. While the problem of such causal discovery has been extensively studied especially in the bivariate setting, the majority of current methods assume a linear causal relationship, and the few methods which consider non-linear dependencies usually make the assumption of additive noise. Here, we… ▽ More

    Submitted 19 April, 2019; originally announced April 2019.

  26. arXiv:1903.02334  [pdf, other

    stat.ML cs.LG

    Neural Empirical Bayes

    Authors: Saeed Saremi, Aapo Hyvarinen

    Abstract: We unify $\textit{kernel density estimation}$ and $\textit{empirical Bayes}$ and address a set of problems in unsupervised learning with a geometric interpretation of those methods, rooted in the $\textit{concentration of measure}$ phenomenon. Kernel density is viewed symbolically as $X\rightharpoonup Y$ where the random variable $X$ is smoothed to $Y= X+N(0,σ^2 I_d)$, and empirical Bayes is the m… ▽ More

    Submitted 21 April, 2020; v1 submitted 6 March, 2019; originally announced March 2019.

    Comments: 23 pages, 10 figures

    Journal ref: Journal of Machine Learning Research 20(181), 1-23, 2019

  27. arXiv:1806.01754  [pdf, ps, other

    stat.ML cs.LG

    Neural-Kernelized Conditional Density Estimation

    Authors: Hiroaki Sasaki, Aapo Hyvärinen

    Abstract: Conditional density estimation is a general framework for solving various problems in machine learning. Among existing methods, non-parametric and/or kernel-based methods are often difficult to use on large datasets, while methods based on neural networks usually make restrictive parametric assumptions on the probability densities. Here, we propose a novel method for estimating the conditional den… ▽ More

    Submitted 5 June, 2018; originally announced June 2018.

  28. arXiv:1805.09567  [pdf, other

    stat.ML cs.LG

    A Unified Probabilistic Model for Learning Latent Factors and Their Connectivities from High-Dimensional Data

    Authors: Ricardo Pio Monti, Aapo Hyvärinen

    Abstract: Connectivity estimation is challenging in the context of high-dimensional data. A useful preprocessing step is to group variables into clusters, however, it is not always clear how to do so from the perspective of connectivity estimation. Another practical challenge is that we may have data from multiple related classes (e.g., multiple subjects or conditions) and wish to incorporate constraints on… ▽ More

    Submitted 24 May, 2018; originally announced May 2018.

    Comments: 13 pages, 6 figures. To appear in UAI 2018

  29. arXiv:1805.08651  [pdf, other

    stat.ML cs.LG

    Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning

    Authors: Aapo Hyvarinen, Hiroaki Sasaki, Richard E. Turner

    Abstract: Nonlinear ICA is a fundamental problem for unsupervised representation learning, emphasizing the capacity to recover the underlying latent variables generating the data (i.e., identifiability). Recently, the very first identifiability proofs for nonlinear ICA have been proposed, leveraging the temporal structure of the independent components. Here, we propose a general framework for nonlinear ICA,… ▽ More

    Submitted 4 February, 2019; v1 submitted 22 May, 2018; originally announced May 2018.

    Comments: Camera-ready version of article accepted for AISTATS2019

  30. arXiv:1805.08306  [pdf, other

    stat.ML cs.LG

    Deep Energy Estimator Networks

    Authors: Saeed Saremi, Arash Mehrjou, Bernhard Schölkopf, Aapo Hyvärinen

    Abstract: Density estimation is a fundamental problem in statistical learning. This problem is especially challenging for complex high-dimensional data due to the curse of dimensionality. A promising solution to this problem is given here in an inference-free hierarchical framework that is built on score matching. We revisit the Bayesian interpretation of the score function and the Parzen score matching, an… ▽ More

    Submitted 21 May, 2018; originally announced May 2018.

  31. arXiv:1805.07516  [pdf, other

    stat.ML cs.LG

    Estimation of Non-Normalized Mixture Models and Clustering Using Deep Representation

    Authors: Takeru Matsuda, Aapo Hyvarinen

    Abstract: We develop a general method for estimating a finite mixture of non-normalized models. Here, a non-normalized model is defined to be a parametric distribution with an intractable normalization constant. Existing methods for estimating non-normalized models without computing the normalization constant are not applicable to mixture models because they contain more than one intractable normalization c… ▽ More

    Submitted 19 May, 2018; originally announced May 2018.

    Journal ref: Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:2555-2563, 2019

  32. arXiv:1707.01711  [pdf, other

    stat.ML

    Mode-Seeking Clustering and Density Ridge Estimation via Direct Estimation of Density-Derivative-Ratios

    Authors: Hiroaki Sasaki, Takafumi Kanamori, Aapo Hyvärinen, Gang Niu, Masashi Sugiyama

    Abstract: Modes and ridges of the probability density function behind observed data are useful geometric features. Mode-seeking clustering assigns cluster labels by associating data samples with the nearest modes, and estimation of density ridges enables us to find lower-dimensional structures hidden in data. A key technical challenge both in mode-seeking clustering and density ridge estimation is accurate… ▽ More

    Submitted 30 March, 2018; v1 submitted 6 July, 2017; originally announced July 2017.

  33. arXiv:1605.06336  [pdf, other

    stat.ML cs.LG

    Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA

    Authors: Aapo Hyvarinen, Hiroshi Morioka

    Abstract: Nonlinear independent component analysis (ICA) provides an appealing framework for unsupervised feature learning, but the models proposed so far are not identifiable. Here, we first propose a new intuitive principle of unsupervised deep learning from time series which uses the nonstationary structure of the data. Our learning principle, time-contrastive learning (TCL), finds a representation which… ▽ More

    Submitted 20 May, 2016; originally announced May 2016.

  34. arXiv:1506.05666  [pdf, other

    stat.ML

    Simultaneous Estimation of Non-Gaussian Components and their Correlation Structure

    Authors: Hiroaki Sasaki, Michael U. Gutmann, Hayaru Shouno, Aapo Hyvärinen

    Abstract: The statistical dependencies which independent component analysis (ICA) cannot remove often provide rich information beyond the linear independent components. It would thus be very useful to estimate the dependency structure from data. While such models have been proposed, they usually concentrated on higher-order correlations such as energy (square) correlations. Yet, linear correlations are a mo… ▽ More

    Submitted 27 July, 2017; v1 submitted 18 June, 2015; originally announced June 2015.

  35. arXiv:1408.2038  [pdf

    cs.LG stat.ML

    A direct method for estimating a causal ordering in a linear non-Gaussian acyclic model

    Authors: Shohei Shimizu, Aapo Hyvarinen, Yoshinobu Kawahara

    Abstract: Structural equation models and Bayesian networks have been widely used to analyze causal relations between continuous variables. In such frameworks, linear acyclic models are typically used to model the datagenerating process of variables. Recently, it was shown that use of non-Gaussianity identifies a causal ordering of variables in a linear acyclic model without using any prior knowledge on the… ▽ More

    Submitted 9 August, 2014; originally announced August 2014.

    Comments: Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009)

    Report number: UAI-P-2009-PG-506-513

  36. arXiv:1404.5028  [pdf, ps, other

    stat.ML

    Clustering via Mode Seeking by Direct Estimation of the Gradient of a Log-Density

    Authors: Hiroaki Sasaki, Aapo Hyvärinen, Masashi Sugiyama

    Abstract: Mean shift clustering finds the modes of the data probability density by identifying the zero points of the density gradient. Since it does not require to fix the number of clusters in advance, the mean shift has been a popular clustering algorithm in various application fields. A typical implementation of the mean shift is to first estimate the density by kernel density estimation and then comput… ▽ More

    Submitted 20 April, 2014; originally announced April 2014.

  37. arXiv:1312.3516  [pdf, ps, other

    math.ST stat.ME stat.ML

    Density Estimation in Infinite Dimensional Exponential Families

    Authors: Bharath Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Aapo Hyvärinen, Revant Kumar

    Abstract: In this paper, we consider an infinite dimensional exponential family, $\mathcal{P}$ of probability densities, which are parametrized by functions in a reproducing kernel Hilbert space, $H$ and show it to be quite rich in the sense that a broad class of densities on $\mathbb{R}^d$ can be approximated arbitrarily well in Kullback-Leibler (KL) divergence by elements in $\mathcal{P}$. The main goal o… ▽ More

    Submitted 26 May, 2017; v1 submitted 12 December, 2013; originally announced December 2013.

    Comments: 58 pages, 8 figures; Fixed some errors and typos

  38. arXiv:1307.2307  [pdf, ps, other

    stat.ML cs.LG

    Bridging Information Criteria and Parameter Shrinkage for Model Selection

    Authors: Kun Zhang, Heng Peng, Laiwan Chan, Aapo Hyvarinen

    Abstract: Model selection based on classical information criteria, such as BIC, is generally computationally demanding, but its properties are well studied. On the other hand, model selection based on parameter shrinkage by $\ell_1$-type penalties is computationally efficient. In this paper we make an attempt to combine their strengths, and propose a simple approach that penalizes the likelihood with data-d… ▽ More

    Submitted 8 July, 2013; originally announced July 2013.

    Comments: 16 pages, 3 figures

  39. arXiv:1303.7410  [pdf, ps, other

    stat.ML

    ParceLiNGAM: A causal ordering method robust against latent confounders

    Authors: Tatsuya Tashiro, Shohei Shimizu, Aapo Hyvarinen, Takashi Washio

    Abstract: We consider learning a causal ordering of variables in a linear non-Gaussian acyclic model called LiNGAM. Several existing methods have been shown to consistently estimate a causal ordering assuming that all the model assumptions are correct. But, the estimation results could be distorted if some assumptions actually are violated. In this paper, we propose a new algorithm for learning causal order… ▽ More

    Submitted 28 July, 2013; v1 submitted 29 March, 2013; originally announced March 2013.

    Comments: A revised version of this was accepted in Neural Computation. 18 pages and 5 figures. arXiv admin note: substantial text overlap with arXiv:1204.1795

  40. arXiv:1207.1413  [pdf

    cs.LG cs.MS stat.ML

    Discovery of non-gaussian linear causal models using ICA

    Authors: Shohei Shimizu, Aapo Hyvarinen, Yutaka Kano, Patrik O. Hoyer

    Abstract: In recent years, several methods have been proposed for the discovery of causal structure from non-experimental data (Spirtes et al. 2000; Pearl 2000). Such methods make various assumptions on the data generating process to facilitate its identification from purely observational data. Continuing this line of research, we show how to discover the complete causal structure of continuous-valued data,… ▽ More

    Submitted 4 July, 2012; originally announced July 2012.

    Comments: Appears in Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI2005)

    Report number: UAI-P-2005-PG-525-533

  41. arXiv:1206.3260  [pdf

    stat.ML cs.AI cs.LG

    Causal discovery of linear acyclic models with arbitrary distributions

    Authors: Patrik O. Hoyer, Aapo Hyvarinen, Richard Scheines, Peter L. Spirtes, Joseph Ramsey, Gustavo Lacerda, Shohei Shimizu

    Abstract: An important task in data analysis is the discovery of causal relationships between observed variables. For continuous-valued data, linear acyclic causal models are commonly used to model the data-generating process, and the inference of such models is a well-studied problem. However, existing methods have significant limitations. Methods based on conditional independencies (Spirtes et al. 1993; P… ▽ More

    Submitted 13 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI2008)

    Report number: UAI-P-2008-PG-282-289

  42. arXiv:1205.2599  [pdf

    stat.ML cs.LG

    On the Identifiability of the Post-Nonlinear Causal Model

    Authors: Kun Zhang, Aapo Hyvarinen

    Abstract: By taking into account the nonlinear effect of the cause, the inner noise effect, and the measurement distortion effect in the observed variables, the post-nonlinear (PNL) causal model has demonstrated its excellent performance in distinguishing the cause from effect. However, its identifiability has not been properly addressed, and how to apply it in the case of more than two variables is also a… ▽ More

    Submitted 9 May, 2012; originally announced May 2012.

    Comments: Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009)

    Report number: UAI-P-2009-PG-647-655

  43. arXiv:1204.1795  [pdf, ps, other

    stat.ML

    Estimation of causal orders in a linear non-Gaussian acyclic model: a method robust against latent confounders

    Authors: Tatsuya Tashiro, Shohei Shimizu, Aapo Hyvarinen, Takashi Washio

    Abstract: We consider to learn a causal ordering of variables in a linear non-Gaussian acyclic model called LiNGAM. Several existing methods have been shown to consistently estimate a causal ordering assuming that all the model assumptions are correct. But, the estimation results could be distorted if some assumptions actually are violated. In this paper, we propose a new algorithm for learning causal order… ▽ More

    Submitted 9 April, 2012; originally announced April 2012.

    Comments: 8 pages, 2 figures

  44. arXiv:1203.3533  [pdf

    cs.LG stat.ML

    Source Separation and Higher-Order Causal Analysis of MEG and EEG

    Authors: Kun Zhang, Aapo Hyvarinen

    Abstract: Separation of the sources and analysis of their connectivity have been an important topic in EEG/MEG analysis. To solve this problem in an automatic manner, we propose a two-layer model, in which the sources are conditionally uncorrelated from each other, but not independent; the dependence is caused by the causality in their time-varying variances (envelopes). The model is identified in two steps… ▽ More

    Submitted 15 March, 2012; originally announced March 2012.

    Comments: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010)

    Report number: UAI-P-2010-PG-709-716

  45. arXiv:1203.3506  [pdf

    cs.LG stat.ML

    A Family of Computationally Efficient and Simple Estimators for Unnormalized Statistical Models

    Authors: Miika Pihlaja, Michael Gutmann, Aapo Hyvarinen

    Abstract: We introduce a new family of estimators for unnormalized statistical models. Our family of estimators is parameterized by two nonlinear functions and uses a single sample from an auxiliary distribution, generalizing Maximum Likelihood Monte Carlo estimation of Geyer and Thompson (1992). The family is such that we can estimate the partition function like any other parameter in the model. The estima… ▽ More

    Submitted 15 March, 2012; originally announced March 2012.

    Comments: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010)

    Report number: UAI-P-2010-PG-442-449

  46. arXiv:1101.2489  [pdf, ps, other

    stat.ML

    DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model

    Authors: Shohei Shimizu, Takanori Inazumi, Yasuhiro Sogawa, Aapo Hyvarinen, Yoshinobu Kawahara, Takashi Washio, Patrik O. Hoyer, Kenneth Bollen

    Abstract: Structural equation models and Bayesian networks have been widely used to analyze causal relations between continuous variables. In such frameworks, linear acyclic models are typically used to model the data-generating process of variables. Recently, it was shown that use of non-Gaussianity identifies the full structure of a linear acyclic model, i.e., a causal ordering of variables and their conn… ▽ More

    Submitted 7 April, 2011; v1 submitted 12 January, 2011; originally announced January 2011.

    Comments: A revised version of this was accepted in Journal of Machine Learning Research

  47. Finding Exogenous Variables in Data with Many More Variables than Observations

    Authors: Shohei Shimizu, Takashi Washio, Aapo Hyvarinen, Seiya Imoto

    Abstract: Many statistical methods have been proposed to estimate causal models in classical situations with fewer variables than observations (p<n, p: the number of variables and n: the number of observations). However, modern datasets including gene expression data need high-dimensional causal modeling in challenging situations with orders of magnitude more variables than observations (p>>n). In this pape… ▽ More

    Submitted 7 April, 2011; v1 submitted 5 April, 2009; originally announced April 2009.

    Comments: A revised version of this was published in Proc. ICANN2010

    Journal ref: ARTIFICIAL NEURAL NETWORKS - ICANN 2010. Lecture Notes in Computer Science, 2010, Volume 6352/2010, 67-76