-
Information-seeking polynomial NARX model-predictive control through expected free energy minimization
Authors:
Wouter M. Kouw
Abstract:
We propose an adaptive model-predictive controller that balances driving the system to a goal state and seeking system observations that are informative with respect to the parameters of a nonlinear autoregressive exogenous model. The controller's objective function is derived from an expected free energy functional and contains information-theoretic terms expressing uncertainty over model paramet…
▽ More
We propose an adaptive model-predictive controller that balances driving the system to a goal state and seeking system observations that are informative with respect to the parameters of a nonlinear autoregressive exogenous model. The controller's objective function is derived from an expected free energy functional and contains information-theoretic terms expressing uncertainty over model parameters and output predictions. Experiments illustrate how parameter uncertainty affects the control objective and evaluate the proposed controller for a pendulum swing-up task.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
Online system identification in a Duffing oscillator by free energy minimisation
Authors:
Wouter M Kouw
Abstract:
Online system identification is the estimation of parameters of a dynamical system, such as mass or friction coefficients, for each measurement of the input and output signals. Here, the nonlinear stochastic differential equation of a Duffing oscillator is cast to a generative model and dynamical parameters are inferred using variational message passing on a factor graph of the model. The approach…
▽ More
Online system identification is the estimation of parameters of a dynamical system, such as mass or friction coefficients, for each measurement of the input and output signals. Here, the nonlinear stochastic differential equation of a Duffing oscillator is cast to a generative model and dynamical parameters are inferred using variational message passing on a factor graph of the model. The approach is validated with an experiment on data from an electronic implementation of a Duffing oscillator. The proposed inference procedure performs as well as offline prediction error minimisation in a state-of-the-art nonlinear model.
△ Less
Submitted 2 September, 2020;
originally announced September 2020.
-
The Data Representativeness Criterion: Predicting the Performance of Supervised Classification Based on Data Set Similarity
Authors:
Evelien Schat,
Rens van de Schoot,
Wouter M. Kouw,
Duco Veen,
Adriënne M. Mendrik
Abstract:
In a broad range of fields it may be desirable to reuse a supervised classification algorithm and apply it to a new data set. However, generalization of such an algorithm and thus achieving a similar classification performance is only possible when the training data used to build the algorithm is similar to new unseen data one wishes to apply it to. It is often unknown in advance how an algorithm…
▽ More
In a broad range of fields it may be desirable to reuse a supervised classification algorithm and apply it to a new data set. However, generalization of such an algorithm and thus achieving a similar classification performance is only possible when the training data used to build the algorithm is similar to new unseen data one wishes to apply it to. It is often unknown in advance how an algorithm will perform on new unseen data, being a crucial reason for not deploying an algorithm at all. Therefore, tools are needed to measure the similarity of data sets. In this paper, we propose the Data Representativeness Criterion (DRC) to determine how representative a training data set is of a new unseen data set. We present a proof of principle, to see whether the DRC can quantify the similarity of data sets and whether the DRC relates to the performance of a supervised classification algorithm. We compared a number of magnetic resonance imaging (MRI) data sets, ranging from subtle to severe difference is acquisition parameters. Results indicate that, based on the similarity of data sets, the DRC is able to give an indication as to when the performance of a supervised classifier decreases. The strictness of the DRC can be set by the user, depending on what one considers to be an acceptable underperformance.
△ Less
Submitted 27 February, 2020;
originally announced February 2020.
-
A cross-center smoothness prior for variational Bayesian brain tissue segmentation
Authors:
Wouter M. Kouw,
Silas N. Ørting,
Jens Petersen,
Kim S. Pedersen,
Marleen de Bruijne
Abstract:
Suppose one is faced with the challenge of tissue segmentation in MR images, without annotators at their center to provide labeled training data. One option is to go to another medical center for a trained classifier. Sadly, tissue classifiers do not generalize well across centers due to voxel intensity shifts caused by center-specific acquisition protocols. However, certain aspects of segmentatio…
▽ More
Suppose one is faced with the challenge of tissue segmentation in MR images, without annotators at their center to provide labeled training data. One option is to go to another medical center for a trained classifier. Sadly, tissue classifiers do not generalize well across centers due to voxel intensity shifts caused by center-specific acquisition protocols. However, certain aspects of segmentations, such as spatial smoothness, remain relatively consistent and can be learned separately. Here we present a smoothness prior that is fit to segmentations produced at another medical center. This informative prior is presented to an unsupervised Bayesian model. The model clusters the voxel intensities, such that it produces segmentations that are similarly smooth to those of the other medical center. In addition, the unsupervised Bayesian model is extended to a semi-supervised variant, which needs no visual interpretation of clusters into tissues.
△ Less
Submitted 11 March, 2019;
originally announced March 2019.
-
A review of domain adaptation without target labels
Authors:
Wouter M. Kouw,
Marco Loog
Abstract:
Domain adaptation has become a prominent problem setting in machine learning and related fields. This review asks the question: how can a classifier learn from a source domain and generalize to a target domain? We present a categorization of approaches, divided into, what we refer to as, sample-based, feature-based and inference-based methods. Sample-based methods focus on weighting individual obs…
▽ More
Domain adaptation has become a prominent problem setting in machine learning and related fields. This review asks the question: how can a classifier learn from a source domain and generalize to a target domain? We present a categorization of approaches, divided into, what we refer to as, sample-based, feature-based and inference-based methods. Sample-based methods focus on weighting individual observations during training based on their importance to the target domain. Feature-based methods revolve around on map**, projecting and representing features such that a source classifier performs well on the target domain and inference-based methods incorporate adaptation into the parameter estimation procedure, for instance through constraints on the optimization procedure. Additionally, we review a number of conditions that allow for formulating bounds on the cross-domain generalization error. Our categorization highlights recurring ideas and raises questions important to further research.
△ Less
Submitted 24 July, 2019; v1 submitted 16 January, 2019;
originally announced January 2019.
-
An introduction to domain adaptation and transfer learning
Authors:
Wouter M. Kouw,
Marco Loog
Abstract:
In machine learning, if the training data is an unbiased sample of an underlying distribution, then the learned classification function will make accurate predictions for new samples. However, if the training data is not an unbiased sample, then there will be differences between how the training data is distributed and how the test data is distributed. Standard classifiers cannot cope with changes…
▽ More
In machine learning, if the training data is an unbiased sample of an underlying distribution, then the learned classification function will make accurate predictions for new samples. However, if the training data is not an unbiased sample, then there will be differences between how the training data is distributed and how the test data is distributed. Standard classifiers cannot cope with changes in data distributions between training and test phases, and will not perform well. Domain adaptation and transfer learning are sub-fields within machine learning that are concerned with accounting for these types of changes. Here, we present an introduction to these fields, guided by the question: when and how can a classifier generalize from a source to a target domain? We will start with a brief introduction into risk minimization, and how transfer learning and domain adaptation expand upon this framework. Following that, we discuss three special cases of data set shift, namely prior, covariate and concept shift. For more complex domain shifts, there are a wide variety of approaches. These are categorized into: importance-weighting, subspace map**, domain-invariant spaces, feature augmentation, minimax estimators and robust algorithms. A number of points will arise, which we will discuss in the last section. We conclude with the remark that many open questions will have to be addressed before transfer learners and domain-adaptive classifiers become practical.
△ Less
Submitted 14 January, 2019; v1 submitted 31 December, 2018;
originally announced December 2018.
-
Learning an MR acquisition-invariant representation using Siamese neural networks
Authors:
Wouter M. Kouw,
Marco Loog,
Wilbert Bartels,
Adriënne M. Mendrik
Abstract:
Generalization of voxelwise classifiers is hampered by differences between MRI-scanners, e.g. different acquisition protocols and field strengths. To address this limitation, we propose a Siamese neural network (MRAI-NET) that extracts acquisition-invariant feature vectors. These can consequently be used by task-specific methods, such as voxelwise classifiers for tissue segmentation. MRAI-NET is t…
▽ More
Generalization of voxelwise classifiers is hampered by differences between MRI-scanners, e.g. different acquisition protocols and field strengths. To address this limitation, we propose a Siamese neural network (MRAI-NET) that extracts acquisition-invariant feature vectors. These can consequently be used by task-specific methods, such as voxelwise classifiers for tissue segmentation. MRAI-NET is tested on both simulated and real patient data. Experiments show that MRAI-NET outperforms voxelwise classifiers trained on the source or target scanner data when a small number of labeled samples is available.
△ Less
Submitted 17 October, 2018;
originally announced October 2018.
-
Target Robust Discriminant Analysis
Authors:
Wouter M. Kouw,
Marco Loog
Abstract:
In practice, the data distribution at test time often differs, to a smaller or larger extent, from that of the original training data. Consequentially, the so-called source classifier, trained on the available labelled data, deteriorates on the test, or target, data. Domain adaptive classifiers aim to combat this problem, but typically assume some particular form of domain shift. Most are not robu…
▽ More
In practice, the data distribution at test time often differs, to a smaller or larger extent, from that of the original training data. Consequentially, the so-called source classifier, trained on the available labelled data, deteriorates on the test, or target, data. Domain adaptive classifiers aim to combat this problem, but typically assume some particular form of domain shift. Most are not robust to violations of domain shift assumptions and may even perform worse than their non-adaptive counterparts. We construct robust parameter estimators for discriminant analysis that guarantee performance improvements of the adaptive classifier over the non-adaptive source classifier.
△ Less
Submitted 8 February, 2021; v1 submitted 21 June, 2018;
originally announced June 2018.
-
Effects of sampling skewness of the importance-weighted risk estimator on model selection
Authors:
Wouter M. Kouw,
Marco Loog
Abstract:
Importance-weighting is a popular and well-researched technique for dealing with sample selection bias and covariate shift. It has desirable characteristics such as unbiasedness, consistency and low computational complexity. However, weighting can have a detrimental effect on an estimator as well. In this work, we empirically show that the sampling distribution of an importance-weighted estimator…
▽ More
Importance-weighting is a popular and well-researched technique for dealing with sample selection bias and covariate shift. It has desirable characteristics such as unbiasedness, consistency and low computational complexity. However, weighting can have a detrimental effect on an estimator as well. In this work, we empirically show that the sampling distribution of an importance-weighted estimator can be skewed. For sample selection bias settings, and for small sample sizes, the importance-weighted risk estimator produces overestimates for datasets in the body of the sampling distribution, i.e. the majority of cases, and large underestimates for data sets in the tail of the sampling distribution. These over- and underestimates of the risk lead to suboptimal regularization parameters when used for importance-weighted validation.
△ Less
Submitted 19 April, 2018;
originally announced April 2018.
-
Robust importance-weighted cross-validation under sample selection bias
Authors:
Wouter M. Kouw,
Jesse H. Krijthe,
Marco Loog
Abstract:
Cross-validation under sample selection bias can, in principle, be done by importance-weighting the empirical risk. However, the importance-weighted risk estimator produces sub-optimal hyperparameter estimates in problem settings where large weights arise with high probability. We study its sampling variance as a function of the training data distribution and introduce a control variate to increas…
▽ More
Cross-validation under sample selection bias can, in principle, be done by importance-weighting the empirical risk. However, the importance-weighted risk estimator produces sub-optimal hyperparameter estimates in problem settings where large weights arise with high probability. We study its sampling variance as a function of the training data distribution and introduce a control variate to increase its robustness to problematically large weights.
△ Less
Submitted 27 August, 2019; v1 submitted 17 October, 2017;
originally announced October 2017.
-
MR Acquisition-Invariant Representation Learning
Authors:
Wouter M. Kouw,
Marco Loog,
Lambertus W. Bartels,
Adriënne M. Mendrik
Abstract:
Voxelwise classification approaches are popular and effective methods for tissue quantification in brain magnetic resonance imaging (MRI) scans. However, generalization of these approaches is hampered by large differences between sets of MRI scans such as differences in field strength, vendor or acquisition protocols. Due to this acquisition related variation, classifiers trained on data from a sp…
▽ More
Voxelwise classification approaches are popular and effective methods for tissue quantification in brain magnetic resonance imaging (MRI) scans. However, generalization of these approaches is hampered by large differences between sets of MRI scans such as differences in field strength, vendor or acquisition protocols. Due to this acquisition related variation, classifiers trained on data from a specific scanner fail or under-perform when applied to data that was acquired differently. In order to address this lack of generalization, we propose a Siamese neural network (MRAI-net) to learn a representation that minimizes the between-scanner variation, while maintaining the contrast between brain tissues necessary for brain tissue quantification. The proposed MRAI-net was evaluated on both simulated and real MRI data. After learning the MR acquisition invariant representation, any supervised classification model that uses feature vectors can be applied. In this paper, we provide a proof of principle, which shows that a linear classifier applied on the MRAI representation is able to outperform supervised convolutional neural network classifiers for tissue classification when little target training data is available.
△ Less
Submitted 19 April, 2018; v1 submitted 22 September, 2017;
originally announced September 2017.
-
Target contrastive pessimistic risk for robust domain adaptation
Authors:
Wouter M. Kouw,
Marco Loog
Abstract:
In domain adaptation, classifiers with information from a source domain adapt to generalize to a target domain. However, an adaptive classifier can perform worse than a non-adaptive classifier due to invalid assumptions, increased sensitivity to estimation errors or model misspecification. Our goal is to develop a domain-adaptive classifier that is robust in the sense that it does not rely on rest…
▽ More
In domain adaptation, classifiers with information from a source domain adapt to generalize to a target domain. However, an adaptive classifier can perform worse than a non-adaptive classifier due to invalid assumptions, increased sensitivity to estimation errors or model misspecification. Our goal is to develop a domain-adaptive classifier that is robust in the sense that it does not rely on restrictive assumptions on how the source and target domains relate to each other and that it does not perform worse than the non-adaptive classifier. We formulate a conservative parameter estimator that only deviates from the source classifier when a lower risk is guaranteed for all possible labellings of the given target samples. We derive the classical least-squares and discriminant analysis cases and show that these perform on par with state-of-the-art domain adaptive classifiers in sample selection bias settings, while outperforming them in more general domain adaptation settings.
△ Less
Submitted 25 June, 2017;
originally announced June 2017.
-
On Regularization Parameter Estimation under Covariate Shift
Authors:
Wouter M. Kouw,
Marco Loog
Abstract:
This paper identifies a problem with the usual procedure for L2-regularization parameter estimation in a domain adaptation setting. In such a setting, there are differences between the distributions generating the training data (source domain) and the test data (target domain). The usual cross-validation procedure requires validation data, which can not be obtained from the unlabeled target data.…
▽ More
This paper identifies a problem with the usual procedure for L2-regularization parameter estimation in a domain adaptation setting. In such a setting, there are differences between the distributions generating the training data (source domain) and the test data (target domain). The usual cross-validation procedure requires validation data, which can not be obtained from the unlabeled target data. The problem is that if one decides to use source validation data, the regularization parameter is underestimated. One possible solution is to scale the source validation data through importance weighting, but we show that this correction is not sufficient. We conclude the paper with an empirical analysis of the effect of several importance weight estimators on the estimation of the regularization parameter.
△ Less
Submitted 31 July, 2016;
originally announced August 2016.
-
Feature-Level Domain Adaptation
Authors:
Wouter M. Kouw,
Jesse H. Krijthe,
Marco Loog,
Laurens J. P. van der Maaten
Abstract:
Domain adaptation is the supervised learning setting in which the training and test data are sampled from different distributions: training data is sampled from a source domain, whilst test data is sampled from a target domain. This paper proposes and studies an approach, called feature-level domain adaptation (FLDA), that models the dependence between the two domains by means of a feature-level t…
▽ More
Domain adaptation is the supervised learning setting in which the training and test data are sampled from different distributions: training data is sampled from a source domain, whilst test data is sampled from a target domain. This paper proposes and studies an approach, called feature-level domain adaptation (FLDA), that models the dependence between the two domains by means of a feature-level transfer model that is trained to describe the transfer from source to target domain. Subsequently, we train a domain-adapted classifier by minimizing the expected loss under the resulting transfer model. For linear classifiers and a large family of loss functions and transfer models, this expected loss can be computed or approximated analytically, and minimized efficiently. Our empirical evaluation of FLDA focuses on problems comprising binary and count data in which the transfer can be naturally modeled via a dropout distribution, which allows the classifier to adapt to differences in the marginal probability of features in the source and the target domain. Our experiments on several real-world problems show that FLDA performs on par with state-of-the-art domain-adaptation techniques.
△ Less
Submitted 7 June, 2016; v1 submitted 15 December, 2015;
originally announced December 2015.