-
Learning Attributed Graphlets: Predictive Graph Mining by Graphlets with Trainable Attribute
Authors:
Tajima Shinji,
Ren Sugihara,
Ryota Kitahara,
Masayuki Karasuyama
Abstract:
The graph classification problem has been widely studied; however, achieving an interpretable model with high predictive performance remains a challenging issue. This paper proposes an interpretable classification algorithm for attributed graph data, called LAGRA (Learning Attributed GRAphlets). LAGRA learns importance weights for small attributed subgraphs, called attributed graphlets (AGs), whil…
▽ More
The graph classification problem has been widely studied; however, achieving an interpretable model with high predictive performance remains a challenging issue. This paper proposes an interpretable classification algorithm for attributed graph data, called LAGRA (Learning Attributed GRAphlets). LAGRA learns importance weights for small attributed subgraphs, called attributed graphlets (AGs), while simultaneously optimizing their attribute vectors. This enables us to obtain a combination of subgraph structures and their attribute vectors that strongly contribute to discriminating different classes. A significant characteristics of LAGRA is that all the subgraph structures in the training dataset can be considered as a candidate structures of AGs. This approach can explore all the potentially important subgraphs exhaustively, but obviously, a naive implementation can require a large amount of computations. To mitigate this issue, we propose an efficient pruning strategy by combining the proximal gradient descent and a graph mining tree search. Our pruning strategy can ensure that the quality of the solution is maintained compared to the result without pruning. We empirically demonstrate that LAGRA has superior or comparable prediction performance to the standard existing algorithms including graph neural networks, while using only a small number of AGs in an interpretable manner.
△ Less
Submitted 10 February, 2024;
originally announced February 2024.
-
Multi-Objective Bayesian Optimization with Active Preference Learning
Authors:
Ryota Ozaki,
Kazuki Ishikawa,
Youhei Kanzaki,
Shinya Suzuki,
Shion Takeno,
Ichiro Takeuchi,
Masayuki Karasuyama
Abstract:
There are a lot of real-world black-box optimization problems that need to optimize multiple criteria simultaneously. However, in a multi-objective optimization (MOO) problem, identifying the whole Pareto front requires the prohibitive search cost, while in many practical scenarios, the decision maker (DM) only needs a specific solution among the set of the Pareto optimal solutions. We propose a B…
▽ More
There are a lot of real-world black-box optimization problems that need to optimize multiple criteria simultaneously. However, in a multi-objective optimization (MOO) problem, identifying the whole Pareto front requires the prohibitive search cost, while in many practical scenarios, the decision maker (DM) only needs a specific solution among the set of the Pareto optimal solutions. We propose a Bayesian optimization (BO) approach to identifying the most preferred solution in the MOO with expensive objective functions, in which a Bayesian preference model of the DM is adaptively estimated by an interactive manner based on the two types of supervisions called the pairwise preference and improvement request. To explore the most preferred solution, we define an acquisition function in which the uncertainty both in the objective functions and the DM preference is incorporated. Further, to minimize the interaction cost with the DM, we also propose an active learning strategy for the preference estimation. We empirically demonstrate the effectiveness of our proposed method through the benchmark function optimization and the hyper-parameter optimization problems for machine learning models.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
Posterior Sampling-Based Bayesian Optimization with Tighter Bayesian Regret Bounds
Authors:
Shion Takeno,
Yu Inatsu,
Masayuki Karasuyama,
Ichiro Takeuchi
Abstract:
Among various acquisition functions (AFs) in Bayesian optimization (BO), Gaussian process upper confidence bound (GP-UCB) and Thompson sampling (TS) are well-known options with established theoretical properties regarding Bayesian cumulative regret (BCR). Recently, it has been shown that a randomized variant of GP-UCB achieves a tighter BCR bound compared with GP-UCB, which we call the tighter BCR…
▽ More
Among various acquisition functions (AFs) in Bayesian optimization (BO), Gaussian process upper confidence bound (GP-UCB) and Thompson sampling (TS) are well-known options with established theoretical properties regarding Bayesian cumulative regret (BCR). Recently, it has been shown that a randomized variant of GP-UCB achieves a tighter BCR bound compared with GP-UCB, which we call the tighter BCR bound for brevity. Inspired by this study, this paper first shows that TS achieves the tighter BCR bound. On the other hand, GP-UCB and TS often practically suffer from manual hyperparameter tuning and over-exploration issues, respectively. Therefore, we analyze yet another AF called a probability of improvement from the maximum of a sample path (PIMS). We show that PIMS achieves the tighter BCR bound and avoids the hyperparameter tuning, unlike GP-UCB. Furthermore, we demonstrate a wide range of experiments, focusing on the effectiveness of PIMS that mitigates the practical issues of GP-UCB and TS.
△ Less
Submitted 4 June, 2024; v1 submitted 7 November, 2023;
originally announced November 2023.
-
Towards Practical Preferential Bayesian Optimization with Skew Gaussian Processes
Authors:
Shion Takeno,
Masahiro Nomura,
Masayuki Karasuyama
Abstract:
We study preferential Bayesian optimization (BO) where reliable feedback is limited to pairwise comparison called duels. An important challenge in preferential BO, which uses the preferential Gaussian process (GP) model to represent flexible preference structure, is that the posterior distribution is a computationally intractable skew GP. The most widely used approach for preferential BO is Gaussi…
▽ More
We study preferential Bayesian optimization (BO) where reliable feedback is limited to pairwise comparison called duels. An important challenge in preferential BO, which uses the preferential Gaussian process (GP) model to represent flexible preference structure, is that the posterior distribution is a computationally intractable skew GP. The most widely used approach for preferential BO is Gaussian approximation, which ignores the skewness of the true posterior. Alternatively, Markov chain Monte Carlo (MCMC) based preferential BO is also proposed. In this work, we first verify the accuracy of Gaussian approximation, from which we reveal the critical problem that the predictive probability of duels can be inaccurate. This observation motivates us to improve the MCMC-based estimation for skew GP, for which we show the practical efficiency of Gibbs sampling and derive the low variance MC estimator. However, the computational time of MCMC can still be a bottleneck in practice. Towards building a more practical preferential BO, we develop a new method that achieves both high computational efficiency and low sample complexity, and then demonstrate its effectiveness through extensive numerical experiments.
△ Less
Submitted 11 June, 2023; v1 submitted 2 February, 2023;
originally announced February 2023.
-
Randomized Gaussian Process Upper Confidence Bound with Tighter Bayesian Regret Bounds
Authors:
Shion Takeno,
Yu Inatsu,
Masayuki Karasuyama
Abstract:
Gaussian process upper confidence bound (GP-UCB) is a theoretically promising approach for black-box optimization; however, the confidence parameter $β$ is considerably large in the theorem and chosen heuristically in practice. Then, randomized GP-UCB (RGP-UCB) uses a randomized confidence parameter, which follows the Gamma distribution, to mitigate the impact of manually specifying $β$. This stud…
▽ More
Gaussian process upper confidence bound (GP-UCB) is a theoretically promising approach for black-box optimization; however, the confidence parameter $β$ is considerably large in the theorem and chosen heuristically in practice. Then, randomized GP-UCB (RGP-UCB) uses a randomized confidence parameter, which follows the Gamma distribution, to mitigate the impact of manually specifying $β$. This study first generalizes the regret analysis of RGP-UCB to a wider class of distributions, including the Gamma distribution. Furthermore, we propose improved RGP-UCB (IRGP-UCB) based on a two-parameter exponential distribution, which achieves tighter Bayesian regret bounds. IRGP-UCB does not require an increase in the confidence parameter in terms of the number of iterations, which avoids over-exploration in the later iterations. Finally, we demonstrate the effectiveness of IRGP-UCB through extensive experiments.
△ Less
Submitted 11 June, 2023; v1 submitted 2 February, 2023;
originally announced February 2023.
-
Bayesian Optimization for Distributionally Robust Chance-constrained Problem
Authors:
Yu Inatsu,
Shion Takeno,
Masayuki Karasuyama,
Ichiro Takeuchi
Abstract:
In black-box function optimization, we need to consider not only controllable design variables but also uncontrollable stochastic environment variables. In such cases, it is necessary to solve the optimization problem by taking into account the uncertainty of the environmental variables. Chance-constrained (CC) problem, the problem of maximizing the expected value under a certain level of constrai…
▽ More
In black-box function optimization, we need to consider not only controllable design variables but also uncontrollable stochastic environment variables. In such cases, it is necessary to solve the optimization problem by taking into account the uncertainty of the environmental variables. Chance-constrained (CC) problem, the problem of maximizing the expected value under a certain level of constraint satisfaction probability, is one of the practically important problems in the presence of environmental variables. In this study, we consider distributionally robust CC (DRCC) problem and propose a novel DRCC Bayesian optimization method for the case where the distribution of the environmental variables cannot be precisely specified. We show that the proposed method can find an arbitrary accurate solution with high probability in a finite number of trials, and confirm the usefulness of the proposed method through numerical experiments.
△ Less
Submitted 2 February, 2022; v1 submitted 31 January, 2022;
originally announced January 2022.
-
Bayesian Optimization for Cascade-type Multi-stage Processes
Authors:
Shunya Kusakawa,
Shion Takeno,
Yu Inatsu,
Kentaro Kutsukake,
Shogo Iwazaki,
Takashi Nakano,
Toru Ujihara,
Masayuki Karasuyama,
Ichiro Takeuchi
Abstract:
Complex processes in science and engineering are often formulated as multistage decision-making problems. In this paper, we consider a type of multistage decision-making process called a cascade process. A cascade process is a multistage process in which the output of one stage is used as an input for the subsequent stage. When the cost of each stage is expensive, it is difficult to search for the…
▽ More
Complex processes in science and engineering are often formulated as multistage decision-making problems. In this paper, we consider a type of multistage decision-making process called a cascade process. A cascade process is a multistage process in which the output of one stage is used as an input for the subsequent stage. When the cost of each stage is expensive, it is difficult to search for the optimal controllable parameters for each stage exhaustively. To address this problem, we formulate the optimization of the cascade process as an extension of the Bayesian optimization framework and propose two types of acquisition functions based on credible intervals and expected improvement. We investigate the theoretical properties of the proposed acquisition functions and demonstrate their effectiveness through numerical experiments. In addition, we consider an extension called suspension setting in which we are allowed to suspend the cascade process at the middle of the multistage decision-making process that often arises in practical problems. We apply the proposed method in a test problem involving a solar cell simulator, which was the motivation for this study.
△ Less
Submitted 7 March, 2023; v1 submitted 16 November, 2021;
originally announced November 2021.
-
Sequential- and Parallel- Constrained Max-value Entropy Search via Information Lower Bound
Authors:
Shion Takeno,
Tomoyuki Tamura,
Kazuki Shitara,
Masayuki Karasuyama
Abstract:
Max-value entropy search (MES) is one of the state-of-the-art approaches in Bayesian optimization (BO). In this paper, we propose a novel variant of MES for constrained problems, called Constrained MES via Information lower BOund (CMES-IBO), that is based on a Monte Carlo (MC) estimator of a lower bound of a mutual information (MI). Unlike existing studies, our MI is defined so that uncertainty wi…
▽ More
Max-value entropy search (MES) is one of the state-of-the-art approaches in Bayesian optimization (BO). In this paper, we propose a novel variant of MES for constrained problems, called Constrained MES via Information lower BOund (CMES-IBO), that is based on a Monte Carlo (MC) estimator of a lower bound of a mutual information (MI). Unlike existing studies, our MI is defined so that uncertainty with respect to feasibility can be incorporated. We derive a lower bound of the MI that guarantees non-negativity, while a constrained counterpart of conventional MES can be negative. We further provide theoretical analysis that assures the low-variability of our estimator which has never been investigated for any existing information-theoretic BO. Moreover, using the conditional MI, we extend CMES-IBO to the parallel setting while maintaining the desirable properties. We demonstrate the effectiveness of CMES-IBO by several benchmark functions and real-world problems.
△ Less
Submitted 2 February, 2022; v1 submitted 19 February, 2021;
originally announced February 2021.
-
Cost-effective search for lower-error region in material parameter space using multifidelity Gaussian process modeling
Authors:
Shion Takeno,
Yuhki Tsukada,
Hitoshi Fukuoka,
Toshiyuki Koyama,
Motoki Shiga,
Masayuki Karasuyama
Abstract:
Information regarding precipitate shapes is critical for estimating material parameters. Hence, we considered estimating a region of material parameter space in which a computational model produces precipitates having shapes similar to those observed in the experimental images. This region, called the lower-error region (LER), reflects intrinsic information of the material contained in the precipi…
▽ More
Information regarding precipitate shapes is critical for estimating material parameters. Hence, we considered estimating a region of material parameter space in which a computational model produces precipitates having shapes similar to those observed in the experimental images. This region, called the lower-error region (LER), reflects intrinsic information of the material contained in the precipitate shapes. However, the computational cost of LER estimation can be high because the accurate computation of the model is required many times to better explore parameters. To overcome this difficulty, we used a Gaussian-process-based multifidelity modeling, in which training data can be sampled from multiple computations with different accuracy levels (fidelity). Lower-fidelity samples may have lower accuracy, but the computational cost is lower than that for higher-fidelity samples. Our proposed sampling procedure iteratively determines the most cost-effective pair of a point and a fidelity level for enhancing the accuracy of LER estimation. We demonstrated the efficiency of our method through estimation of the interface energy and lattice mismatch between MgZn2 and α-Mg phases in an Mg-based alloy. The results showed that the sampling cost required to obtain accurate LER estimation could be drastically reduced.
△ Less
Submitted 15 March, 2020;
originally announced March 2020.
-
Computational Design of Stable and Highly Ion-conductive Materials using Multi-objective Bayesian Optimization: Case Studies on Diffusion of Oxygen and Lithium
Authors:
Masayuki Karasuyama,
Hiroki Kasugai,
Tomoyuki Tamura,
Kazuki Shitara
Abstract:
Ion-conducting solid electrolytes are widely used for a variety of purposes. Therefore, designing highly ion-conductive materials is in strongly demand. Because of advancement in computers and enhancement of computational codes, theoretical simulations have become effective tools for investigating the performance of ion-conductive materials. However, an exhaustive search conducted by theoretical c…
▽ More
Ion-conducting solid electrolytes are widely used for a variety of purposes. Therefore, designing highly ion-conductive materials is in strongly demand. Because of advancement in computers and enhancement of computational codes, theoretical simulations have become effective tools for investigating the performance of ion-conductive materials. However, an exhaustive search conducted by theoretical computations can be prohibitively expensive. Further, for practical applications, both dynamic conductivity as well as static stability must be satisfied at the same time. Therefore, we propose a computational framework that simultaneously optimizes dynamic conductivity and static stability; this is achieved by combining theoretical calculations and the Bayesian multi-objective optimization that is based on the Pareto hyper-volume criterion. Our framework iteratively selects the candidate material, which maximizes the expected increase in the Pareto hyper-volume criterion; this is a standard optimality criterion of multi-objective optimization. Through two case studies on oxygen and lithium diffusions, we show that ion-conductive materials with high dynamic conductivity and static stability can be efficiently identified by our framework.
△ Less
Submitted 19 March, 2020;
originally announced March 2020.
-
Distance Metric Learning for Graph Structured Data
Authors:
Tomoki Yoshida,
Ichiro Takeuchi,
Masayuki Karasuyama
Abstract:
Graphs are versatile tools for representing structured data. As a result, a variety of machine learning methods have been studied for graph data analysis. Although many such learning methods depend on the measurement of differences between input graphs, defining an appropriate distance metric for graphs remains a controversial issue. Hence, we propose a supervised distance metric learning method f…
▽ More
Graphs are versatile tools for representing structured data. As a result, a variety of machine learning methods have been studied for graph data analysis. Although many such learning methods depend on the measurement of differences between input graphs, defining an appropriate distance metric for graphs remains a controversial issue. Hence, we propose a supervised distance metric learning method for the graph classification problem. Our method, named interpretable graph metric learning (IGML), learns discriminative metrics in a subgraph-based feature space, which has a strong graph representation capability. By introducing a sparsity-inducing penalty on the weight of each subgraph, IGML can identify a small number of important subgraphs that can provide insight into the given classification task. Because our formulation has a large number of optimization variables, an efficient algorithm that uses pruning techniques based on safe screening and working set selection methods is also proposed. An important property of IGML is that solution optimality is guaranteed because the problem is formulated as a convex problem and our pruning strategies only discard unnecessary subgraphs. Furthermore, we show that IGML is also applicable to other structured data such as itemset and sequence data, and that it can incorporate vertex-label similarity by using a transportation-based subgraph feature. We empirically evaluate the computational efficiency and classification performance of IGML on several benchmark datasets and provide some illustrative examples of how IGML identifies important subgraphs from a given graph dataset.
△ Less
Submitted 17 June, 2021; v1 submitted 3 February, 2020;
originally announced February 2020.
-
Active-learning-based efficient prediction of ab-initio atomic energy: a case study on a Fe random grain boundary model with millions of atoms
Authors:
Tomoyuki Tamura,
Masayuki Karasuyama
Abstract:
We have developed a method that can analyze large random grain boundary (GB) models with the accuracy of density functional theory (DFT) calculations using active learning. It is assumed that the atomic energy is represented by the linear regression of the atomic structural descriptor. The atomic energy is obtained through DFT calculations using a small cell extracted from a huge GB model, called…
▽ More
We have developed a method that can analyze large random grain boundary (GB) models with the accuracy of density functional theory (DFT) calculations using active learning. It is assumed that the atomic energy is represented by the linear regression of the atomic structural descriptor. The atomic energy is obtained through DFT calculations using a small cell extracted from a huge GB model, called replica DFT atomic energy. The uncertainty reduction (UR) approach in active learning is used to efficiently collect the training data for the atomic energy. In this approach, atomic energy is not required to search for candidate points; therefore, sequential DFT calculations are not required. This approach is suitable for massively parallel computers that can execute a large number of jobs simultaneously. In this study, we demonstrate the prediction of the atomic energy of a Fe random GB model containing one million atoms using the UR approach and show that the prediction error decreases more rapidly compared with random sampling. We conclude that the UR approach with replica DFT atomic energy is useful for modeling huge GBs and will be essential for modeling other structural defects.
△ Less
Submitted 17 April, 2020; v1 submitted 10 December, 2019;
originally announced December 2019.
-
Active learning for level set estimation under cost-dependent input uncertainty
Authors:
Yu Inatsu,
Masayuki Karasuyama,
Keiichi Inoue,
Ichiro Takeuchi
Abstract:
As part of a quality control process in manufacturing it is often necessary to test whether all parts of a product satisfy a required property, with as few inspections as possible. When multiple inspection apparatuses with different costs and precision exist, it is desirable that testing can be carried out cost-effectively by properly controlling the trade-off between the costs and the precision.…
▽ More
As part of a quality control process in manufacturing it is often necessary to test whether all parts of a product satisfy a required property, with as few inspections as possible. When multiple inspection apparatuses with different costs and precision exist, it is desirable that testing can be carried out cost-effectively by properly controlling the trade-off between the costs and the precision. In this paper, we formulate this as a level set estimation (LSE) problem under cost-dependent input uncertainty - LSE being a type of active learning for estimating the level set, i.e., the subset of the input space in which an unknown function value is greater or smaller than a pre-determined threshold. Then, we propose a new algorithm for LSE under cost-dependent input uncertainty with theoretical convergence guarantee. We demonstrate the effectiveness of the proposed algorithm by applying it to synthetic and real datasets.
△ Less
Submitted 13 September, 2019;
originally announced September 2019.
-
Multi-objective Bayesian Optimization using Pareto-frontier Entropy
Authors:
Shinya Suzuki,
Shion Takeno,
Tomoyuki Tamura,
Kazuki Shitara,
Masayuki Karasuyama
Abstract:
This paper studies an entropy-based multi-objective Bayesian optimization (MBO). The entropy search is successful approach to Bayesian optimization. However, for MBO, existing entropy-based methods ignore trade-off among objectives or introduce unreliable approximations. We propose a novel entropy-based MBO called Pareto-frontier entropy search (PFES) by considering the entropy of Pareto-frontier,…
▽ More
This paper studies an entropy-based multi-objective Bayesian optimization (MBO). The entropy search is successful approach to Bayesian optimization. However, for MBO, existing entropy-based methods ignore trade-off among objectives or introduce unreliable approximations. We propose a novel entropy-based MBO called Pareto-frontier entropy search (PFES) by considering the entropy of Pareto-frontier, which is an essential notion of the optimality of the multi-objective problem. Our entropy can incorporate the trade-off relation of the optimal values, and further, we derive an analytical formula without introducing additional approximations or simplifications to the standard entropy search setting. We also show that our entropy computation is practically feasible by using a recursive decomposition technique which has been known in studies of the Pareto hyper-volume computation. Besides the usual MBO setting, in which all the objectives are simultaneously observed, we also consider the "decoupled" setting, in which the objective functions can be observed separately. PFES can easily adapt to the decoupled setting by considering the entropy of the marginal density for each output dimension. This approach incorporates dependency among objectives conditioned on Pareto-frontier, which is ignored by the existing method. Our numerical experiments show effectiveness of PFES through several benchmark datasets.
△ Less
Submitted 10 February, 2020; v1 submitted 31 May, 2019;
originally announced June 2019.
-
Statistically Discriminative Sub-trajectory Mining
Authors:
Vo Nguyen Le Duy,
Takuto Sakuma,
Taiju Ishiyama,
Hiroki Toda,
Kazuya Nishi,
Masayuki Karasuyama,
Yuta Okubo,
Masayuki Sunaga,
Yasuo Tabei,
Ichiro Takeuchi
Abstract:
We study the problem of discriminative sub-trajectory mining. Given two groups of trajectories, the goal of this problem is to extract moving patterns in the form of sub-trajectories which are more similar to sub-trajectories of one group and less similar to those of the other. We propose a new method called Statistically Discriminative Sub-trajectory Mining (SDSM) for this problem. An advantage o…
▽ More
We study the problem of discriminative sub-trajectory mining. Given two groups of trajectories, the goal of this problem is to extract moving patterns in the form of sub-trajectories which are more similar to sub-trajectories of one group and less similar to those of the other. We propose a new method called Statistically Discriminative Sub-trajectory Mining (SDSM) for this problem. An advantage of the SDSM method is that the statistical significance of the extracted sub-trajectories are properly controlled in the sense that the probability of finding a false positive sub-trajectory is smaller than a specified significance threshold alpha (e.g., 0.05), which is indispensable when the method is used in scientific or social studies under noisy environment. Finding such statistically discriminative sub-trajectories from massive trajectory dataset is both computationally and statistically challenging. In the SDSM method, we resolve the difficulties by introducing a tree representation among sub-trajectories and running an efficient permutation-based statistical inference method on the tree. To the best of our knowledge, SDSM is the first method that can efficiently extract statistically discriminative sub-trajectories from massive trajectory dataset. We illustrate the effectiveness and scalability of the SDSM method by applying it to a real-world dataset with 1,000,000 trajectories which contains 16,723,602,505 sub-trajectories.
△ Less
Submitted 5 May, 2019;
originally announced May 2019.
-
Multi-fidelity Bayesian Optimization with Max-value Entropy Search and its parallelization
Authors:
Shion Takeno,
Hitoshi Fukuoka,
Yuhki Tsukada,
Toshiyuki Koyama,
Motoki Shiga,
Ichiro Takeuchi,
Masayuki Karasuyama
Abstract:
In a standard setting of Bayesian optimization (BO), the objective function evaluation is assumed to be highly expensive. Multi-fidelity Bayesian optimization (MFBO) accelerates BO by incorporating lower fidelity observations available with a lower sampling cost. In this paper, we focus on the information-based approach, which is a popular and empirically successful approach in BO. For MFBO, howev…
▽ More
In a standard setting of Bayesian optimization (BO), the objective function evaluation is assumed to be highly expensive. Multi-fidelity Bayesian optimization (MFBO) accelerates BO by incorporating lower fidelity observations available with a lower sampling cost. In this paper, we focus on the information-based approach, which is a popular and empirically successful approach in BO. For MFBO, however, existing information-based methods are plagued by difficulty in estimating the information gain. We propose an approach based on max-value entropy search (MES), which greatly facilitates computations by considering the entropy of the optimal function value instead of the optimal input point. We show that, in our multi-fidelity MES (MF-MES), most of additional computations, compared with usual MES, is reduced to analytical computations. Although an additional numerical integration is necessary for the information across different fidelities, this is only in one dimensional space, which can be performed efficiently and accurately. Further, we also propose parallelization of MF-MES. Since there exist a variety of different sampling costs, queries typically occur asynchronously in MFBO. We show that similar simple computations can be derived for asynchronous parallel MFBO. We demonstrate effectiveness of our approach by using benchmark datasets and a real-world application to materials science data.
△ Less
Submitted 12 February, 2020; v1 submitted 24 January, 2019;
originally announced January 2019.
-
Safe Triplet Screening for Distance Metric Learning
Authors:
Tomoki Yoshida,
Ichiro Takeuchi,
Masayuki Karasuyama
Abstract:
We study safe screening for metric learning. Distance metric learning can optimize a metric over a set of triplets, each one of which is defined by a pair of same class instances and an instance in a different class. However, the number of possible triplets is quite huge even for a small dataset. Our safe triplet screening identifies triplets which can be safely removed from the optimization probl…
▽ More
We study safe screening for metric learning. Distance metric learning can optimize a metric over a set of triplets, each one of which is defined by a pair of same class instances and an instance in a different class. However, the number of possible triplets is quite huge even for a small dataset. Our safe triplet screening identifies triplets which can be safely removed from the optimization problem without losing the optimality. Compared with existing safe screening studies, triplet screening is particularly significant because of (1) the huge number of possible triplets, and (2) the semi-definite constraint in the optimization. We derive several variants of screening rules, and analyze their relationships. Numerical experiments on benchmark datasets demonstrate the effectiveness of safe triplet screening.
△ Less
Submitted 5 October, 2018; v1 submitted 12 February, 2018;
originally announced February 2018.
-
Exploring a potential energy surface by machine learning for characterizing atomic transport
Authors:
Kenta Kanamori,
Kazuaki Toyoura,
Junya Honda,
Kazuki Hattori,
Atsuto Seko,
Masayuki Karasuyama,
Kazuki Shitara,
Motoki Shiga,
Akihide Kuwabara,
Ichiro Takeuchi
Abstract:
We propose a machine-learning method for evaluating the potential barrier governing atomic transport based on the preferential selection of dominant points for the atomic transport. The proposed method generates numerous random samples of the entire potential energy surface (PES) from a probabilistic Gaussian process model of the PES, which enables defining the likelihood of the dominant points. T…
▽ More
We propose a machine-learning method for evaluating the potential barrier governing atomic transport based on the preferential selection of dominant points for the atomic transport. The proposed method generates numerous random samples of the entire potential energy surface (PES) from a probabilistic Gaussian process model of the PES, which enables defining the likelihood of the dominant points. The robustness and efficiency of the method are demonstrated on a dozen model cases for proton diffusion in oxides, in comparison with a conventional nudge elastic band method.
△ Less
Submitted 18 January, 2018; v1 submitted 10 October, 2017;
originally announced October 2017.
-
Knowledge-Transfer based Cost-effective Search for Interface Structures: A Case Study on fcc-Al [110] Tilt Grain Boundary
Authors:
Tomohiro Yonezu,
Tomoyuki Tamura,
Ichiro Takeuchi,
Masayuki Karasuyama
Abstract:
Determining the atomic configuration of an interface is one of the most important issues in materials science research. Although theoretical simulations are effective tools, an exhaustive search is computationally prohibitive due to the high degrees of freedom of the interface structure. In the interface structure search, multiple energy surfaces created by a variety of orientation angles need to…
▽ More
Determining the atomic configuration of an interface is one of the most important issues in materials science research. Although theoretical simulations are effective tools, an exhaustive search is computationally prohibitive due to the high degrees of freedom of the interface structure. In the interface structure search, multiple energy surfaces created by a variety of orientation angles need to be explored, and the necessary computational costs for different angles vary substantially owing to significant variations in the supercell sizes. In this paper, we introduce two machine-learning concepts, called transfer learning and cost-sensitive search, to the interface-structure search. As a case study, we demonstrate the effectiveness of our method, called cost-sensitive multi-task Bayesian optimization (CMB), using the fcc-Al [110] tilt grain boundary. Four microscopic parameters, the three-dimensional rigid body translation, and the number of atomic columns, are optimized by transferring knowledge of energy surfaces among different orientation angles. We show that transferring knowledge of different energy surfaces can accelerate the structure search, and that considering the cost variations further improves the total efficiency.
△ Less
Submitted 10 October, 2018; v1 submitted 10 August, 2017;
originally announced August 2017.
-
Safe Pattern Pruning: An Efficient Approach for Predictive Pattern Mining
Authors:
Kazuya Nakagawa,
Shinya Suzumura,
Masayuki Karasuyama,
Koji Tsuda,
Ichiro Takeuchi
Abstract:
In this paper we study predictive pattern mining problems where the goal is to construct a predictive model based on a subset of predictive patterns in the database. Our main contribution is to introduce a novel method called safe pattern pruning (SPP) for a class of predictive pattern mining problems. The SPP method allows us to efficiently find a superset of all the predictive patterns in the da…
▽ More
In this paper we study predictive pattern mining problems where the goal is to construct a predictive model based on a subset of predictive patterns in the database. Our main contribution is to introduce a novel method called safe pattern pruning (SPP) for a class of predictive pattern mining problems. The SPP method allows us to efficiently find a superset of all the predictive patterns in the database that are needed for the optimal predictive model. The advantage of the SPP method over existing boosting-type method is that the former can find the superset by a single search over the database, while the latter requires multiple searches. The SPP method is inspired by recent development of safe feature screening. In order to extend the idea of safe feature screening into predictive pattern mining, we derive a novel pruning rule called safe pattern pruning (SPP) rule that can be used for searching over the tree defined among patterns in the database. The SPP rule has a property that, if a node corresponding to a pattern in the database is pruned out by the SPP rule, then it is guaranteed that all the patterns corresponding to its descendant nodes are never needed for the optimal predictive model. We apply the SPP method to graph mining and item-set mining problems, and demonstrate its computational advantage.
△ Less
Submitted 14 February, 2016;
originally announced February 2016.
-
Simultaneous Safe Screening of Features and Samples in Doubly Sparse Modeling
Authors:
Atsushi Shibagaki,
Masayuki Karasuyama,
Kohei Hatano,
Ichiro Takeuchi
Abstract:
The problem of learning a sparse model is conceptually interpreted as the process of identifying active features/samples and then optimizing the model over them. Recently introduced safe screening allows us to identify a part of non-active features/samples. So far, safe screening has been individually studied either for feature screening or for sample screening. In this paper, we introduce a new a…
▽ More
The problem of learning a sparse model is conceptually interpreted as the process of identifying active features/samples and then optimizing the model over them. Recently introduced safe screening allows us to identify a part of non-active features/samples. So far, safe screening has been individually studied either for feature screening or for sample screening. In this paper, we introduce a new approach for safely screening features and samples simultaneously by alternatively iterating feature and sample screening steps. A significant advantage of considering them simultaneously rather than individually is that they have a synergy effect in the sense that the results of the previous safe feature screening can be exploited for improving the next safe sample screening performances, and vice-versa. We first theoretically investigate the synergy effect, and then illustrate the practical advantage through intensive numerical experiments for problems with large numbers of features and samples.
△ Less
Submitted 8 February, 2016;
originally announced February 2016.
-
A machine learning-based selective sampling procedure for identifying the low energy region in a potential energy surface: a case study on proton conduction in oxides
Authors:
Kazuaki Toyoura,
Daisuke Hirano,
Atsuto Seko,
Motoki Shiga,
Akihide Kuwabara,
Masayuki Karasuyama,
Kazuki Shitara,
Ichiro Takeuchi
Abstract:
In this paper, we propose a selective sampling procedure to preferentially evaluate a potential energy surface (PES) in a part of the configuration space governing a physical property of interest. The proposed sampling procedure is based on a machine learning method called the Gaussian process (GP), which is used to construct a statistical model of the PES for identifying the region of interest in…
▽ More
In this paper, we propose a selective sampling procedure to preferentially evaluate a potential energy surface (PES) in a part of the configuration space governing a physical property of interest. The proposed sampling procedure is based on a machine learning method called the Gaussian process (GP), which is used to construct a statistical model of the PES for identifying the region of interest in the configuration space. We demonstrate the efficacy of the proposed procedure for atomic diffusion and ionic conduction, specifically the proton conduction in a well-studied proton-conducting oxide, barium zirconate BaZrO3. The results of the demonstration study indicate that our procedure can efficiently identify the low-energy region characterizing the proton conduction in the host crystal lattice, and that the descriptors used for the statistical PES model have a great influence on the performance.
△ Less
Submitted 3 December, 2015; v1 submitted 2 December, 2015;
originally announced December 2015.
-
Homotopy Continuation Approaches for Robust SV Classification and Regression
Authors:
Shinya Suzumura,
Kohei Ogawa,
Masashi Sugiyama,
Masayuki Karasuyama,
Ichiro Takeuchi
Abstract:
In support vector machine (SVM) applications with unreliable data that contains a portion of outliers, non-robustness of SVMs often causes considerable performance deterioration. Although many approaches for improving the robustness of SVMs have been studied, two major challenges remain in robust SVM learning. First, robust learning algorithms are essentially formulated as non-convex optimization…
▽ More
In support vector machine (SVM) applications with unreliable data that contains a portion of outliers, non-robustness of SVMs often causes considerable performance deterioration. Although many approaches for improving the robustness of SVMs have been studied, two major challenges remain in robust SVM learning. First, robust learning algorithms are essentially formulated as non-convex optimization problems. It is thus important to develop a non-convex optimization method for robust SVM that can find a good local optimal solution. The second practical issue is how one can tune the hyperparameter that controls the balance between robustness and efficiency. Unfortunately, due to the non-convexity, robust SVM solutions with slightly different hyper-parameter values can be significantly different, which makes model selection highly unstable. In this paper, we address these two issues simultaneously by introducing a novel homotopy approach to non-convex robust SVM learning. Our basic idea is to introduce parametrized formulations of robust SVM which bridge the standard SVM and fully robust SVM via the parameter that represents the influence of outliers. We characterize the necessary and sufficient conditions of the local optimal solutions of robust SVM, and develop an algorithm that can trace a path of local optimal solutions when the influence of outliers is gradually decreased. An advantage of our homotopy approach is that it can be interpreted as simulated annealing, a common approach for finding a good local optimal solution in non-convex optimization problems. In addition, our homotopy method allows stable and efficient model selection based on the path of local optimal solutions. Empirical performances of the proposed approach are demonstrated through intensive numerical experiments both on robust classification and regression problems.
△ Less
Submitted 12 July, 2015;
originally announced July 2015.
-
Safe Feature Pruning for Sparse High-Order Interaction Models
Authors:
Kazuya Nakagawa,
Shinya Suzumura,
Masayuki Karasuyama,
Koji Tsuda,
Ichiro Takeuchi
Abstract:
Taking into account high-order interactions among covariates is valuable in many practical regression problems. This is, however, computationally challenging task because the number of high-order interaction features to be considered would be extremely large unless the number of covariates is sufficiently small. In this paper, we propose a novel efficient algorithm for LASSO-based sparse learning…
▽ More
Taking into account high-order interactions among covariates is valuable in many practical regression problems. This is, however, computationally challenging task because the number of high-order interaction features to be considered would be extremely large unless the number of covariates is sufficiently small. In this paper, we propose a novel efficient algorithm for LASSO-based sparse learning of such high-order interaction models. Our basic strategy for reducing the number of features is to employ the idea of recently proposed safe feature screening (SFS) rule. An SFS rule has a property that, if a feature satisfies the rule, then the feature is guaranteed to be non-active in the LASSO solution, meaning that it can be safely screened-out prior to the LASSO training process. If a large number of features can be screened-out before training the LASSO, the computational cost and the memory requirment can be dramatically reduced. However, applying such an SFS rule to each of the extremely large number of high-order interaction features would be computationally infeasible. Our key idea for solving this computational issue is to exploit the underlying tree structure among high-order interaction features. Specifically, we introduce a pruning condition called safe feature pruning (SFP) rule which has a property that, if the rule is satisfied in a certain node of the tree, then all the high-order interaction features corresponding to its descendant nodes can be guaranteed to be non-active at the optimal solution. Our algorithm is extremely efficient, making it possible to work, e.g., with 3rd order interactions of 10,000 original covariates, where the number of possible high-order interaction features is greater than 10^{12}.
△ Less
Submitted 26 June, 2015;
originally announced June 2015.
-
Regularization Path of Cross-Validation Error Lower Bounds
Authors:
Atsushi Shibagaki,
Yoshiki Suzuki,
Masayuki Karasuyama,
Ichiro Takeuchi
Abstract:
Careful tuning of a regularization parameter is indispensable in many machine learning tasks because it has a significant impact on generalization performances. Nevertheless, current practice of regularization parameter tuning is more of an art than a science, e.g., it is hard to tell how many grid-points would be needed in cross-validation (CV) for obtaining a solution with sufficiently small CV…
▽ More
Careful tuning of a regularization parameter is indispensable in many machine learning tasks because it has a significant impact on generalization performances. Nevertheless, current practice of regularization parameter tuning is more of an art than a science, e.g., it is hard to tell how many grid-points would be needed in cross-validation (CV) for obtaining a solution with sufficiently small CV error. In this paper we propose a novel framework for computing a lower bound of the CV errors as a function of the regularization parameter, which we call regularization path of CV error lower bounds. The proposed framework can be used for providing a theoretical approximation guarantee on a set of solutions in the sense that how far the CV error of the current best solution could be away from best possible CV error in the entire range of the regularization parameters. We demonstrate through numerical experiments that a theoretically guaranteed a choice of regularization parameter in the above sense is possible with reasonable computational costs.
△ Less
Submitted 22 June, 2015; v1 submitted 8 February, 2015;
originally announced February 2015.
-
Suboptimal Solution Path Algorithm for Support Vector Machine
Authors:
Masayuki Karasuyama,
Ichiro Takeuchi
Abstract:
We consider a suboptimal solution path algorithm for the Support Vector Machine. The solution path algorithm is an effective tool for solving a sequence of a parametrized optimization problems in machine learning. The path of the solutions provided by this algorithm are very accurate and they satisfy the optimality conditions more strictly than other SVM optimization algorithms. In many machine le…
▽ More
We consider a suboptimal solution path algorithm for the Support Vector Machine. The solution path algorithm is an effective tool for solving a sequence of a parametrized optimization problems in machine learning. The path of the solutions provided by this algorithm are very accurate and they satisfy the optimality conditions more strictly than other SVM optimization algorithms. In many machine learning application, however, this strict optimality is often unnecessary, and it adversely affects the computational efficiency. Our algorithm can generate the path of suboptimal solutions within an arbitrary user-specified tolerance level. It allows us to control the trade-off between the accuracy of the solution and the computational cost. Moreover, We also show that our suboptimal solutions can be interpreted as the solution of a \emph{perturbed optimization problem} from the original one. We provide some theoretical analyses of our algorithm based on this novel interpretation. The experimental results also demonstrate the effectiveness of our algorithm.
△ Less
Submitted 2 May, 2011;
originally announced May 2011.
-
Multi-parametric Solution-path Algorithm for Instance-weighted Support Vector Machines
Authors:
Masayuki Karasuyama,
Naoyuki Harada,
Masashi Sugiyama,
Ichiro Takeuchi
Abstract:
An instance-weighted variant of the support vector machine (SVM) has attracted considerable attention recently since they are useful in various machine learning tasks such as non-stationary data analysis, heteroscedastic data modeling, transfer learning, learning to rank, and transduction. An important challenge in these scenarios is to overcome the computational bottleneck---instance weights ofte…
▽ More
An instance-weighted variant of the support vector machine (SVM) has attracted considerable attention recently since they are useful in various machine learning tasks such as non-stationary data analysis, heteroscedastic data modeling, transfer learning, learning to rank, and transduction. An important challenge in these scenarios is to overcome the computational bottleneck---instance weights often change dynamically or adaptively, and thus the weighted SVM solutions must be repeatedly computed. In this paper, we develop an algorithm that can efficiently and exactly update the weighted SVM solutions for arbitrary change of instance weights. Technically, this contribution can be regarded as an extension of the conventional solution-path algorithm for a single regularization parameter to multiple instance-weight parameters. However, this extension gives rise to a significant problem that breakpoints (at which the solution path turns) have to be identified in high-dimensional space. To facilitate this, we introduce a parametric representation of instance weights. We also provide a geometric interpretation in weight space using a notion of critical region: a polyhedron in which the current affine solution remains to be optimal. Then we find breakpoints at intersections of the solution path and boundaries of polyhedrons. Through extensive experiments on various practical applications, we demonstrate the usefulness of the proposed algorithm.
△ Less
Submitted 1 November, 2010; v1 submitted 24 September, 2010;
originally announced September 2010.