-
Distributed Bayesian Piecewise Sparse Linear Models
Authors:
Masato Asahara,
Ryohei Fujimaki
Abstract:
The importance of interpretability of machine learning models has been increasing due to emerging enterprise predictive analytics, threat of data privacy, accountability of artificial intelligence in society, and so on. Piecewise linear models have been actively studied to achieve both accuracy and interpretability. They often produce competitive accuracy against state-of-the-art non-linear method…
▽ More
The importance of interpretability of machine learning models has been increasing due to emerging enterprise predictive analytics, threat of data privacy, accountability of artificial intelligence in society, and so on. Piecewise linear models have been actively studied to achieve both accuracy and interpretability. They often produce competitive accuracy against state-of-the-art non-linear methods. In addition, their representations (i.e., rule-based segmentation plus sparse linear formula) are often preferred by domain experts. A disadvantage of such models, however, is high computational cost for simultaneous determinations of the number of "pieces" and cardinality of each linear predictor, which has restricted their applicability to middle-scale data sets. This paper proposes a distributed factorized asymptotic Bayesian (FAB) inference of learning piece-wise sparse linear models on distributed memory architectures. The distributed FAB inference solves the simultaneous model selection issue without communicating $O(N)$ data where N is the number of training samples and achieves linear scale-out against the number of CPU cores. Experimental results demonstrate that the distributed FAB inference achieves high prediction accuracy and performance scalability with both synthetic and benchmark data.
△ Less
Submitted 7 November, 2017;
originally announced November 2017.
-
On The Projection Operator to A Three-view Cardinality Constrained Set
Authors:
Haichuan Yang,
Shupeng Gui,
Chuyang Ke,
Daniel Stefankovic,
Ryohei Fujimaki,
Ji Liu
Abstract:
The cardinality constraint is an intrinsic way to restrict the solution structure in many domains, for example, sparse learning, feature selection, and compressed sensing. To solve a cardinality constrained problem, the key challenge is to solve the projection onto the cardinality constraint set, which is NP-hard in general when there exist multiple overlapped cardinality constraints. In this pape…
▽ More
The cardinality constraint is an intrinsic way to restrict the solution structure in many domains, for example, sparse learning, feature selection, and compressed sensing. To solve a cardinality constrained problem, the key challenge is to solve the projection onto the cardinality constraint set, which is NP-hard in general when there exist multiple overlapped cardinality constraints. In this paper, we consider the scenario where the overlapped cardinality constraints satisfy a Three-view Cardinality Structure (TVCS), which reflects the natural restriction in many applications, such as identification of gene regulatory networks and task-worker assignment problem. We cast the projection into a linear programming, and show that for TVCS, the vertex solution of this linear programming is the solution for the original projection problem. We further prove that such solution can be found with the complexity proportional to the number of variables and constraints. We finally use synthetic experiments and two interesting applications in bioinformatics and crowdsourcing to validate the proposed TVCS model and method.
△ Less
Submitted 14 June, 2017; v1 submitted 21 March, 2017;
originally announced March 2017.
-
Optimization Beyond Prediction: Prescriptive Price Optimization
Authors:
Shinji Ito,
Ryohei Fujimaki
Abstract:
This paper addresses a novel data science problem, prescriptive price optimization, which derives the optimal price strategy to maximize future profit/revenue on the basis of massive predictive formulas produced by machine learning. The prescriptive price optimization first builds sales forecast formulas of multiple products, on the basis of historical data, which reveal complex relationships betw…
▽ More
This paper addresses a novel data science problem, prescriptive price optimization, which derives the optimal price strategy to maximize future profit/revenue on the basis of massive predictive formulas produced by machine learning. The prescriptive price optimization first builds sales forecast formulas of multiple products, on the basis of historical data, which reveal complex relationships between sales and prices, such as price elasticity of demand and cannibalization. Then, it constructs a mathematical optimization problem on the basis of those predictive formulas. We present that the optimization problem can be formulated as an instance of binary quadratic programming (BQP). Although BQP problems are NP-hard in general and computationally intractable, we propose a fast approximation algorithm using a semi-definite programming (SDP) relaxation, which is closely related to the Goemans-Williamson's Max-Cut approximation. Our experiments on simulation and real retail datasets show that our prescriptive price optimization simultaneously derives the optimal prices of tens/hundreds products with practical computational time, that potentially improve 8.2% of gross profit of those products.
△ Less
Submitted 24 May, 2016; v1 submitted 17 May, 2016;
originally announced May 2016.
-
Rebuilding Factorized Information Criterion: Asymptotically Accurate Marginal Likelihood
Authors:
Kohei Hayashi,
Shin-ichi Maeda,
Ryohei Fujimaki
Abstract:
Factorized information criterion (FIC) is a recently developed approximation technique for the marginal log-likelihood, which provides an automatic model selection framework for a few latent variable models (LVMs) with tractable inference algorithms. This paper reconsiders FIC and fills theoretical gaps of previous FIC studies. First, we reveal the core idea of FIC that allows generalization for a…
▽ More
Factorized information criterion (FIC) is a recently developed approximation technique for the marginal log-likelihood, which provides an automatic model selection framework for a few latent variable models (LVMs) with tractable inference algorithms. This paper reconsiders FIC and fills theoretical gaps of previous FIC studies. First, we reveal the core idea of FIC that allows generalization for a broader class of LVMs, including continuous LVMs, in contrast to previous FICs, which are applicable only to binary LVMs. Second, we investigate the model selection mechanism of the generalized FIC. Our analysis provides a formal justification of FIC as a model selection criterion for LVMs and also a systematic procedure for pruning redundant latent variables that have been removed heuristically in previous studies. Third, we provide an interpretation of FIC as a variational free energy and uncover a few previously-unknown their relationships. A demonstrative study on Bayesian principal component analysis is provided and numerical experiments support our theoretical results.
△ Less
Submitted 22 April, 2015;
originally announced April 2015.
-
Partition-wise Linear Models
Authors:
Hidekazu Oiwa,
Ryohei Fujimaki
Abstract:
Region-specific linear models are widely used in practical applications because of their non-linear but highly interpretable model representations. One of the key challenges in their use is non-convexity in simultaneous optimization of regions and region-specific models. This paper proposes novel convex region-specific linear models, which we refer to as partition-wise linear models. Our key ideas…
▽ More
Region-specific linear models are widely used in practical applications because of their non-linear but highly interpretable model representations. One of the key challenges in their use is non-convexity in simultaneous optimization of regions and region-specific models. This paper proposes novel convex region-specific linear models, which we refer to as partition-wise linear models. Our key ideas are 1) assigning linear models not to regions but to partitions (region-specifiers) and representing region-specific linear models by linear combinations of partition-specific models, and 2) optimizing regions via partition selection from a large number of given partition candidates by means of convex structured regularizations. In addition to providing initialization-free globally-optimal solutions, our convex formulation makes it possible to derive a generalization bound and to use such advanced optimization techniques as proximal methods and decomposition of the proximal maps for sparsity-inducing regularizations. Experimental results demonstrate that our partition-wise linear models perform better than or are at least competitive with state-of-the-art region-specific or locally linear models.
△ Less
Submitted 31 October, 2014;
originally announced October 2014.
-
Factorized Asymptotic Bayesian Hidden Markov Models
Authors:
Ryohei Fujimaki,
Kohei Hayashi
Abstract:
This paper addresses the issue of model selection for hidden Markov models (HMMs). We generalize factorized asymptotic Bayesian inference (FAB), which has been recently developed for model selection on independent hidden variables (i.e., mixture models), for time-dependent hidden variables. As with FAB in mixture models, FAB for HMMs is derived as an iterative lower bound maximization algorithm of…
▽ More
This paper addresses the issue of model selection for hidden Markov models (HMMs). We generalize factorized asymptotic Bayesian inference (FAB), which has been recently developed for model selection on independent hidden variables (i.e., mixture models), for time-dependent hidden variables. As with FAB in mixture models, FAB for HMMs is derived as an iterative lower bound maximization algorithm of a factorized information criterion (FIC). It inherits, from FAB for mixture models, several desirable properties for learning HMMs, such as asymptotic consistency of FIC with marginal log-likelihood, a shrinkage effect for hidden state selection, monotonic increase of the lower FIC bound through the iterative optimization. Further, it does not have a tunable hyper-parameter, and thus its model selection process can be fully automated. Experimental results shows that FAB outperforms states-of-the-art variational Bayesian HMM and non-parametric Bayesian HMM in terms of model selection accuracy and computational efficiency.
△ Less
Submitted 18 June, 2012;
originally announced June 2012.