-
SMURFF: a High-Performance Framework for Matrix Factorization
Authors:
Tom Vander Aa,
Imen Chakroun,
Thomas J. Ashby,
Jaak Simm,
Adam Arany,
Yves Moreau,
Thanh Le Van,
José Felipe Golib Dzib,
Jörg Wegner,
Vladimir Chupakhin,
Hugo Ceulemans,
Roel Wuyts,
Wilfried Verachtert
Abstract:
Bayesian Matrix Factorization (BMF) is a powerful technique for recommender systems because it produces good results and is relatively robust against overfitting. Yet BMF is more computationally intensive and thus more challenging to implement for large datasets. In this work we present SMURFF a high-performance feature-rich framework to compose and construct different Bayesian matrix-factorizatio…
▽ More
Bayesian Matrix Factorization (BMF) is a powerful technique for recommender systems because it produces good results and is relatively robust against overfitting. Yet BMF is more computationally intensive and thus more challenging to implement for large datasets. In this work we present SMURFF a high-performance feature-rich framework to compose and construct different Bayesian matrix-factorization methods. The framework has been successfully used in to do large scale runs of compound-activity prediction. SMURFF is available as open-source and can be used both on a supercomputer and on a desktop or laptop machine. Documentation and several examples are provided as Jupyter notebooks using SMURFF's high-level Python API.
△ Less
Submitted 29 July, 2019; v1 submitted 4 April, 2019;
originally announced April 2019.
-
Highly Scalable Tensor Factorization for Prediction of Drug-Protein Interaction Type
Authors:
Adam Arany,
Jaak Simm,
Pooya Zakeri,
Tom Haber,
Jörg K. Wegner,
Vladimir Chupakhin,
Hugo Ceulemans,
Yves Moreau
Abstract:
The understanding of the type of inhibitory interaction plays an important role in drug design. Therefore, researchers are interested to know whether a drug has competitive or non-competitive interaction to particular protein targets.
Method: to analyze the interaction types we propose factorization method Macau which allows us to combine different measurement types into a single tensor together…
▽ More
The understanding of the type of inhibitory interaction plays an important role in drug design. Therefore, researchers are interested to know whether a drug has competitive or non-competitive interaction to particular protein targets.
Method: to analyze the interaction types we propose factorization method Macau which allows us to combine different measurement types into a single tensor together with proteins and compounds. The compounds are characterized by high dimensional 2D ECFP fingerprints. The novelty of the proposed method is that using a specially designed noise injection MCMC sampler it can incorporate high dimensional side information, i.e., millions of unique 2D ECFP compound features, even for large scale datasets of millions of compounds. Without the side information, in this case, the tensor factorization would be practically futile.
Results: using public IC50 and Ki data from ChEMBL we trained a model from where we can identify the latent subspace separating the two measurement types (IC50 and Ki). The results suggest the proposed method can detect the competitive inhibitory activity between compounds and proteins.
△ Less
Submitted 1 December, 2015;
originally announced December 2015.
-
Macau: Scalable Bayesian Multi-relational Factorization with Side Information using MCMC
Authors:
Jaak Simm,
Adam Arany,
Pooya Zakeri,
Tom Haber,
Jörg K. Wegner,
Vladimir Chupakhin,
Hugo Ceulemans,
Yves Moreau
Abstract:
We propose Macau, a powerful and flexible Bayesian factorization method for heterogeneous data. Our model can factorize any set of entities and relations that can be represented by a relational model, including tensors and also multiple relations for each entity. Macau can also incorporate side information, specifically entity and relation features, which are crucial for predicting sparsely observ…
▽ More
We propose Macau, a powerful and flexible Bayesian factorization method for heterogeneous data. Our model can factorize any set of entities and relations that can be represented by a relational model, including tensors and also multiple relations for each entity. Macau can also incorporate side information, specifically entity and relation features, which are crucial for predicting sparsely observed relations. Macau scales to millions of entity instances, hundred millions of observations, and sparse entity features with millions of dimensions. To achieve the scale up, we specially designed sampling procedure for entity and relation features that relies primarily on noise injection in linear regressions. We show performance and advanced features of Macau in a set of experiments, including challenging drug-protein activity prediction task.
△ Less
Submitted 17 December, 2015; v1 submitted 15 September, 2015;
originally announced September 2015.