-
Length of Stay prediction for Hospital Management using Domain Adaptation
Authors:
Lyse Naomi Wamba Momo,
Nyalleng Moorosi,
Elaine O. Nsoesie,
Frank Rademakers,
Bart De Moor
Abstract:
Inpatient length of stay (LoS) is an important managerial metric which if known in advance can be used to efficiently plan admissions, allocate resources and improve care. Using historical patient data and machine learning techniques, LoS prediction models can be developed. Ethically, these models can not be used for patient discharge in lieu of unit heads but are of utmost necessity for hospital…
▽ More
Inpatient length of stay (LoS) is an important managerial metric which if known in advance can be used to efficiently plan admissions, allocate resources and improve care. Using historical patient data and machine learning techniques, LoS prediction models can be developed. Ethically, these models can not be used for patient discharge in lieu of unit heads but are of utmost necessity for hospital management systems in charge of effective hospital planning. Therefore, the design of the prediction system should be adapted to work in a true hospital setting. In this study, we predict early hospital LoS at the granular level of admission units by applying domain adaptation to leverage information learned from a potential source domain. Time-varying data from 110,079 and 60,492 patient stays to 8 and 9 intensive care units were respectively extracted from eICU-CRD and MIMIC-IV. These were fed into a Long-Short Term Memory and a Fully connected network to train a source domain model, the weights of which were transferred either partially or fully to initiate training in target domains. Shapley Additive exPlanations (SHAP) algorithms were used to study the effect of weight transfer on model explanability. Compared to the benchmark, the proposed weight transfer model showed statistically significant gains in prediction accuracy (between 1% and 5%) as well as computation time (up to 2hrs) for some target domains. The proposed method thus provides an adapted clinical decision support system for hospital management that can ease processes of data access via ethical committee, computation infrastructures and time.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
Duality in Multi-View Restricted Kernel Machines
Authors:
Sonny Achten,
Arun Pandey,
Hannes De Meulemeester,
Bart De Moor,
Johan A. K. Suykens
Abstract:
We propose a unifying setting that combines existing restricted kernel machine methods into a single primal-dual multi-view framework for kernel principal component analysis in both supervised and unsupervised settings. We derive the primal and dual representations of the framework and relate different training and inference algorithms from a theoretical perspective. We show how to achieve full eq…
▽ More
We propose a unifying setting that combines existing restricted kernel machine methods into a single primal-dual multi-view framework for kernel principal component analysis in both supervised and unsupervised settings. We derive the primal and dual representations of the framework and relate different training and inference algorithms from a theoretical perspective. We show how to achieve full equivalence in primal and dual formulations by rescaling primal variables. Finally, we experimentally validate the equivalence and provide insight into the relationships between different methods on a number of time series data sets by recursively forecasting unseen test data and visualizing the learned features.
△ Less
Submitted 6 July, 2023; v1 submitted 26 May, 2023;
originally announced May 2023.
-
Client Recruitment for Federated Learning in ICU Length of Stay Prediction
Authors:
Vincent Scheltjens,
Lyse Naomi Wamba Momo,
Wouter Verbeke,
Bart De Moor
Abstract:
Machine and deep learning methods for medical and healthcare applications have shown significant progress and performance improvement in recent years. These methods require vast amounts of training data which are available in the medical sector, albeit decentralized. Medical institutions generate vast amounts of data for which sharing and centralizing remains a challenge as the result of data and…
▽ More
Machine and deep learning methods for medical and healthcare applications have shown significant progress and performance improvement in recent years. These methods require vast amounts of training data which are available in the medical sector, albeit decentralized. Medical institutions generate vast amounts of data for which sharing and centralizing remains a challenge as the result of data and privacy regulations. The federated learning technique is well-suited to tackle these challenges. However, federated learning comes with a new set of open problems related to communication overhead, efficient parameter aggregation, client selection strategies and more. In this work, we address the step prior to the initiation of a federated network for model training, client recruitment. By intelligently recruiting clients, communication overhead and overall cost of training can be reduced without sacrificing predictive performance. Client recruitment aims at pre-excluding potential clients from partaking in the federation based on a set of criteria indicative of their eventual contributions to the federation. In this work, we propose a client recruitment approach using only the output distribution and sample size at the client site. We show how a subset of clients can be recruited without sacrificing model performance whilst, at the same time, significantly improving computation time. By applying the recruitment approach to the training of federated models for accurate patient Length of Stay prediction using data from 189 Intensive Care Units, we show how the models trained in federations made up from recruited clients significantly outperform federated models trained with the standard procedure in terms of predictive power and training time.
△ Less
Submitted 28 April, 2023;
originally announced April 2023.
-
Multi-view Kernel PCA for Time series Forecasting
Authors:
Arun Pandey,
Hannes De Meulemeester,
Bart De Moor,
Johan A. K. Suykens
Abstract:
In this paper, we propose a kernel principal component analysis model for multi-variate time series forecasting, where the training and prediction schemes are derived from the multi-view formulation of Restricted Kernel Machines. The training problem is simply an eigenvalue decomposition of the summation of two kernel matrices corresponding to the views of the input and output data. When a linear…
▽ More
In this paper, we propose a kernel principal component analysis model for multi-variate time series forecasting, where the training and prediction schemes are derived from the multi-view formulation of Restricted Kernel Machines. The training problem is simply an eigenvalue decomposition of the summation of two kernel matrices corresponding to the views of the input and output data. When a linear kernel is used for the output view, it is shown that the forecasting equation takes the form of kernel ridge regression. When that kernel is non-linear, a pre-image problem has to be solved to forecast a point in the input space. We evaluate the model on several standard time series datasets, perform ablation studies, benchmark with closely related models and discuss its results.
△ Less
Submitted 23 January, 2023;
originally announced January 2023.
-
Leverage Score Sampling for Complete Mode Coverage in Generative Adversarial Networks
Authors:
Joachim Schreurs,
Hannes De Meulemeester,
Michaël Fanuel,
Bart De Moor,
Johan A. K. Suykens
Abstract:
Commonly, machine learning models minimize an empirical expectation. As a result, the trained models typically perform well for the majority of the data but the performance may deteriorate in less dense regions of the dataset. This issue also arises in generative modeling. A generative model may overlook underrepresented modes that are less frequent in the empirical data distribution. This problem…
▽ More
Commonly, machine learning models minimize an empirical expectation. As a result, the trained models typically perform well for the majority of the data but the performance may deteriorate in less dense regions of the dataset. This issue also arises in generative modeling. A generative model may overlook underrepresented modes that are less frequent in the empirical data distribution. This problem is known as complete mode coverage. We propose a sampling procedure based on ridge leverage scores which significantly improves mode coverage when compared to standard methods and can easily be combined with any GAN. Ridge leverage scores are computed by using an explicit feature map, associated with the next-to-last layer of a GAN discriminator or of a pre-trained network, or by using an implicit feature map corresponding to a Gaussian kernel. Multiple evaluations against recent approaches of complete mode coverage show a clear improvement when using the proposed sampling strategy.
△ Less
Submitted 21 July, 2021; v1 submitted 6 April, 2021;
originally announced April 2021.
-
The Bures Metric for Generative Adversarial Networks
Authors:
Hannes De Meulemeester,
Joachim Schreurs,
Michaël Fanuel,
Bart De Moor,
Johan A. K. Suykens
Abstract:
Generative Adversarial Networks (GANs) are performant generative methods yielding high-quality samples. However, under certain circumstances, the training of GANs can lead to mode collapse or mode drop**, i.e. the generative models not being able to sample from the entire probability distribution. To address this problem, we use the last layer of the discriminator as a feature map to study the d…
▽ More
Generative Adversarial Networks (GANs) are performant generative methods yielding high-quality samples. However, under certain circumstances, the training of GANs can lead to mode collapse or mode drop**, i.e. the generative models not being able to sample from the entire probability distribution. To address this problem, we use the last layer of the discriminator as a feature map to study the distribution of the real and the fake data. During training, we propose to match the real batch diversity to the fake batch diversity by using the Bures distance between covariance matrices in feature space. The computation of the Bures distance can be conveniently done in either feature space or kernel space in terms of the covariance and kernel matrix respectively. We observe that diversity matching reduces mode collapse substantially and has a positive effect on the sample quality. On the practical side, a very simple training procedure, that does not require additional hyperparameter tuning, is proposed and assessed on several datasets.
△ Less
Submitted 27 April, 2021; v1 submitted 16 June, 2020;
originally announced June 2020.
-
Applicability and interpretation of the deterministic weighted cepstral distance
Authors:
Oliver Lauwers,
Bart De Moor
Abstract:
Quantifying similarity between data objects is an important part of modern data science. Deciding what similarity measure to use is very application dependent. In this paper, we combine insights from systems theory and machine learning, and investigate the weighted cepstral distance, which was previously defined for signals coming from ARMA models. We provide an extension of this distance to inver…
▽ More
Quantifying similarity between data objects is an important part of modern data science. Deciding what similarity measure to use is very application dependent. In this paper, we combine insights from systems theory and machine learning, and investigate the weighted cepstral distance, which was previously defined for signals coming from ARMA models. We provide an extension of this distance to invertible deterministic linear time invariant single input single output models, and assess its applicability. We show that it can always be interpreted in terms of the poles and zeros of the underlying model, and that, in the case of stable, minimum-phase, or unstable, maximum-phase models, a geometrical interpretation in terms of subspace angles can be given. We then devise a method to assess stability and phase-type of the generating models, using only input/output signal information. In this way, we prove a connection between the extended weighted cepstral distance and a weighted cepstral model norm. In this way, we provide a purely data-driven way to assess different underlying dynamics of input/output signal pairs, without the need for any system identification step. This can be useful in machine learning tasks such as time series clustering. An iPython tutorial is published complementary to this paper, containing implementations of the various methods and algorithms presented here, as well as some numerical illustrations of the equivalences proven here.
△ Less
Submitted 8 March, 2018;
originally announced March 2018.
-
Building Classifiers to Predict the Start of Glucose-Lowering Pharmacotherapy Using Belgian Health Expenditure Data
Authors:
Marc Claesen,
Frank De Smet,
Pieter Gillard,
Chantal Mathieu,
Bart De Moor
Abstract:
Early diagnosis is important for type 2 diabetes (T2D) to improve patient prognosis, prevent complications and reduce long-term treatment costs. We present a novel risk profiling approach based exclusively on health expenditure data that is available to Belgian mutual health insurers. We used expenditure data related to drug purchases and medical provisions to construct models that predict whether…
▽ More
Early diagnosis is important for type 2 diabetes (T2D) to improve patient prognosis, prevent complications and reduce long-term treatment costs. We present a novel risk profiling approach based exclusively on health expenditure data that is available to Belgian mutual health insurers. We used expenditure data related to drug purchases and medical provisions to construct models that predict whether a patient will start glucose-lowering pharmacotherapy in the coming years, based on that patient's recent medical expenditure history. The design and implementation of the modeling strategy are discussed in detail and several learning methods are benchmarked for our application. Our best performing model obtains between 74.9% and 76.8% area under the ROC curve, which is comparable to state-of-the-art risk prediction approaches for T2D based on questionnaires. In contrast to other methods, our approach can be implemented on a population-wide scale at virtually no extra operational cost. Possibly, our approach can be further improved by additional information about some risk factors of T2D that is unavailable in health expenditure data.
△ Less
Submitted 28 April, 2015;
originally announced April 2015.
-
Assessing binary classifiers using only positive and unlabeled data
Authors:
Marc Claesen,
Jesse Davis,
Frank De Smet,
Bart De Moor
Abstract:
Assessing the performance of a learned model is a crucial part of machine learning. However, in some domains only positive and unlabeled examples are available, which prohibits the use of most standard evaluation metrics. We propose an approach to estimate any metric based on contingency tables, including ROC and PR curves, using only positive and unlabeled data. Estimating these performance metri…
▽ More
Assessing the performance of a learned model is a crucial part of machine learning. However, in some domains only positive and unlabeled examples are available, which prohibits the use of most standard evaluation metrics. We propose an approach to estimate any metric based on contingency tables, including ROC and PR curves, using only positive and unlabeled data. Estimating these performance metrics is essentially reduced to estimating the fraction of (latent) positives in the unlabeled set, assuming known positives are a random sample of all positives. We provide theoretical bounds on the quality of our estimates, illustrate the importance of estimating the fraction of positives in the unlabeled set and demonstrate empirically that we are able to reliably estimate ROC and PR curves on real data.
△ Less
Submitted 30 December, 2015; v1 submitted 26 April, 2015;
originally announced April 2015.
-
Hyperparameter Search in Machine Learning
Authors:
Marc Claesen,
Bart De Moor
Abstract:
We introduce the hyperparameter search problem in the field of machine learning and discuss its main challenges from an optimization perspective. Machine learning methods attempt to build models that capture some element of interest based on given data. Most common learning algorithms feature a set of hyperparameters that must be determined before training commences. The choice of hyperparameters…
▽ More
We introduce the hyperparameter search problem in the field of machine learning and discuss its main challenges from an optimization perspective. Machine learning methods attempt to build models that capture some element of interest based on given data. Most common learning algorithms feature a set of hyperparameters that must be determined before training commences. The choice of hyperparameters can significantly affect the resulting model's performance, but determining good values can be complex; hence a disciplined, theoretically sound search strategy is essential.
△ Less
Submitted 6 April, 2015; v1 submitted 7 February, 2015;
originally announced February 2015.
-
Easy Hyperparameter Search Using Optunity
Authors:
Marc Claesen,
Jaak Simm,
Dusan Popovic,
Yves Moreau,
Bart De Moor
Abstract:
Optunity is a free software package dedicated to hyperparameter optimization. It contains various types of solvers, ranging from undirected methods to direct search, particle swarm and evolutionary optimization. The design focuses on ease of use, flexibility, code clarity and interoperability with existing software in all machine learning environments. Optunity is written in Python and contains in…
▽ More
Optunity is a free software package dedicated to hyperparameter optimization. It contains various types of solvers, ranging from undirected methods to direct search, particle swarm and evolutionary optimization. The design focuses on ease of use, flexibility, code clarity and interoperability with existing software in all machine learning environments. Optunity is written in Python and contains interfaces to environments such as R and MATLAB. Optunity uses a BSD license and is freely available online at http://www.optunity.net.
△ Less
Submitted 2 December, 2014;
originally announced December 2014.
-
EnsembleSVM: A Library for Ensemble Learning Using Support Vector Machines
Authors:
Marc Claesen,
Frank De Smet,
Johan Suykens,
Bart De Moor
Abstract:
EnsembleSVM is a free software package containing efficient routines to perform ensemble learning with support vector machine (SVM) base models. It currently offers ensemble methods based on binary SVM models. Our implementation avoids duplicate storage and evaluation of support vectors which are shared between constituent models. Experimental results show that using ensemble approaches can drasti…
▽ More
EnsembleSVM is a free software package containing efficient routines to perform ensemble learning with support vector machine (SVM) base models. It currently offers ensemble methods based on binary SVM models. Our implementation avoids duplicate storage and evaluation of support vectors which are shared between constituent models. Experimental results show that using ensemble approaches can drastically reduce training complexity while maintaining high predictive accuracy. The EnsembleSVM software package is freely available online at http://esat.kuleuven.be/stadius/ensemblesvm.
△ Less
Submitted 4 March, 2014;
originally announced March 2014.
-
Fast Prediction with SVM Models Containing RBF Kernels
Authors:
Marc Claesen,
Frank De Smet,
Johan A. K. Suykens,
Bart De Moor
Abstract:
We present an approximation scheme for support vector machine models that use an RBF kernel. A second-order Maclaurin series approximation is used for exponentials of inner products between support vectors and test instances. The approximation is applicable to all kernel methods featuring sums of kernel evaluations and makes no assumptions regarding data normalization. The prediction speed of appr…
▽ More
We present an approximation scheme for support vector machine models that use an RBF kernel. A second-order Maclaurin series approximation is used for exponentials of inner products between support vectors and test instances. The approximation is applicable to all kernel methods featuring sums of kernel evaluations and makes no assumptions regarding data normalization. The prediction speed of approximated models no longer relates to the amount of support vectors but is quadratic in terms of the number of input dimensions. If the number of input dimensions is small compared to the amount of support vectors, the approximated model is significantly faster in prediction and has a smaller memory footprint. An optimized C++ implementation was made to assess the gain in prediction speed in a set of practical tests. We additionally provide a method to verify the approximation accuracy, prior to training models or during run-time, to ensure the loss in accuracy remains acceptable and within known bounds.
△ Less
Submitted 3 October, 2014; v1 submitted 4 March, 2014;
originally announced March 2014.
-
A Robust Ensemble Approach to Learn From Positive and Unlabeled Data Using SVM Base Models
Authors:
Marc Claesen,
Frank De Smet,
Johan A. K. Suykens,
Bart De Moor
Abstract:
We present a novel approach to learn binary classifiers when only positive and unlabeled instances are available (PU learning). This problem is routinely cast as a supervised task with label noise in the negative set. We use an ensemble of SVM models trained on bootstrap resamples of the training data for increased robustness against label noise. The approach can be considered in a bagging framewo…
▽ More
We present a novel approach to learn binary classifiers when only positive and unlabeled instances are available (PU learning). This problem is routinely cast as a supervised task with label noise in the negative set. We use an ensemble of SVM models trained on bootstrap resamples of the training data for increased robustness against label noise. The approach can be considered in a bagging framework which provides an intuitive explanation for its mechanics in a semi-supervised setting. We compared our method to state-of-the-art approaches in simulations using multiple public benchmark data sets. The included benchmark comprises three settings with increasing label noise: (i) fully supervised, (ii) PU learning and (iii) PU learning with false positives. Our approach shows a marginal improvement over existing methods in the second setting and a significant improvement in the third.
△ Less
Submitted 21 October, 2014; v1 submitted 13 February, 2014;
originally announced February 2014.
-
Geometric Analogue of Holographic Reduced Representation
Authors:
Diederik Aerts,
Marek Czachor,
Bart De Moor
Abstract:
Holographic reduced representations (HRR) are based on superpositions of convolution-bound $n$-tuples, but the $n$-tuples cannot be regarded as vectors since the formalism is basis dependent. This is why HRR cannot be associated with geometric structures. Replacing convolutions by geometric products one arrives at reduced representations analogous to HRR but interpretable in terms of geometry. V…
▽ More
Holographic reduced representations (HRR) are based on superpositions of convolution-bound $n$-tuples, but the $n$-tuples cannot be regarded as vectors since the formalism is basis dependent. This is why HRR cannot be associated with geometric structures. Replacing convolutions by geometric products one arrives at reduced representations analogous to HRR but interpretable in terms of geometry. Variable bindings occurring in both HRR and its geometric analogue mathematically correspond to two different representations of $Z_2\times...\times Z_2$ (the additive group of binary $n$-tuples with addition modulo 2). As opposed to standard HRR, variable binding performed by means of geometric product allows for computing exact inverses of all nonzero vectors, a procedure even simpler than approximate inverses employed in HRR. The formal structure of the new reduced representation is analogous to cartoon computation, a geometric analogue of quantum computation.
△ Less
Submitted 17 October, 2007; v1 submitted 15 October, 2007;
originally announced October 2007.
-
Support and Quantile Tubes
Authors:
Kristiaan Pelckmans,
Jos De Brabanter,
Johan A. K. Suykens,
Bart De Moor
Abstract:
This correspondence studies an estimator of the conditional support of a distribution underlying a set of i.i.d. observations. The relation with mutual information is shown via an extension of Fano's theorem in combination with a generalization bound based on a compression argument. Extensions to estimating the conditional quantile interval, and statistical guarantees on the minimal convex hull…
▽ More
This correspondence studies an estimator of the conditional support of a distribution underlying a set of i.i.d. observations. The relation with mutual information is shown via an extension of Fano's theorem in combination with a generalization bound based on a compression argument. Extensions to estimating the conditional quantile interval, and statistical guarantees on the minimal convex hull are given.
△ Less
Submitted 12 March, 2007;
originally announced March 2007.
-
On Geometric Algebra representation of Binary Spatter Codes
Authors:
Diederik Aerts,
Marek Czachor,
Bart De Moor
Abstract:
Kanerva's Binary Spatter Codes are reformulated in terms of geometric algebra. The key ingredient of the construction is the representation of XOR binding in terms of geometric product.
Kanerva's Binary Spatter Codes are reformulated in terms of geometric algebra. The key ingredient of the construction is the representation of XOR binding in terms of geometric product.
△ Less
Submitted 23 October, 2006; v1 submitted 12 October, 2006;
originally announced October 2006.
-
Componentwise Least Squares Support Vector Machines
Authors:
Kristiaan Pelckmans,
Ivan Goethals,
Jos De Brabanter,
Johan A. K. Suykens,
Bart De Moor
Abstract:
This chapter describes componentwise Least Squares Support Vector Machines (LS-SVMs) for the estimation of additive models consisting of a sum of nonlinear components. The primal-dual derivations characterizing LS-SVMs for the estimation of the additive model result in a single set of linear equations with size growing in the number of data-points. The derivation is elaborated for the classifica…
▽ More
This chapter describes componentwise Least Squares Support Vector Machines (LS-SVMs) for the estimation of additive models consisting of a sum of nonlinear components. The primal-dual derivations characterizing LS-SVMs for the estimation of the additive model result in a single set of linear equations with size growing in the number of data-points. The derivation is elaborated for the classification as well as the regression case. Furthermore, different techniques are proposed to discover structure in the data by looking for sparse components in the model based on dedicated regularization schemes on the one hand and fusion of the componentwise LS-SVMs training with a validation criterion on the other hand. (keywords: LS-SVMs, additive models, regularization, structure detection)
△ Less
Submitted 19 April, 2005;
originally announced April 2005.
-
Intelligence and Cooperative Search by Coupled Local Minimizers
Authors:
J. A. K. Suykens,
J. Vandewalle,
B. De Moor
Abstract:
We show how coupling of local optimization processes can lead to better solutions than multi-start local optimization consisting of independent runs. This is achieved by minimizing the average energy cost of the ensemble, subject to synchronization constraints between the state vectors of the individual local minimizers. From an augmented Lagrangian which incorporates the synchronization constra…
▽ More
We show how coupling of local optimization processes can lead to better solutions than multi-start local optimization consisting of independent runs. This is achieved by minimizing the average energy cost of the ensemble, subject to synchronization constraints between the state vectors of the individual local minimizers. From an augmented Lagrangian which incorporates the synchronization constraints both as soft and hard constraints, a network is derived wherein the local minimizers interact and exchange information through the synchronization constraints. From the viewpoint of neural networks, the array can be considered as a Lagrange programming network for continuous optimization and as a cellular neural network (CNN). The penalty weights associated with the soft state synchronization constraints follow from the solution to a linear program. This expresses that the energy cost of the ensemble should maximally decrease. In this way successful local minimizers can implicitly impose their state to the others through a mechanism of master-slave dynamics resulting into a cooperative search mechanism. Improved information spreading within the ensemble is obtained by applying the concept of small-world networks. This work suggests, in an interdisciplinary context, the importance of information exchange and state synchronization within ensembles, towards issues as evolution, collective behaviour, optimality and intelligence.
△ Less
Submitted 30 October, 2002;
originally announced October 2002.