Search | arXiv e-print repository

Tail Bounds on the Runtime of Categorical Compact Genetic Algorithm

Authors: Ryoki Hamano, Kento Uchida, Shinichi Shirakawa, Daiki Morinaga, Youhei Akimoto

Abstract: The majority of theoretical analyses of evolutionary algorithms in the discrete domain focus on binary optimization algorithms, even though black-box optimization on the categorical domain has a lot of practical applications. In this paper, we consider a probabilistic model-based algorithm using the family of categorical distributions as its underlying distribution and set the sample size as two.… ▽ More The majority of theoretical analyses of evolutionary algorithms in the discrete domain focus on binary optimization algorithms, even though black-box optimization on the categorical domain has a lot of practical applications. In this paper, we consider a probabilistic model-based algorithm using the family of categorical distributions as its underlying distribution and set the sample size as two. We term this specific algorithm the categorical compact genetic algorithm (ccGA). The ccGA can be considered as an extension of the compact genetic algorithm (cGA), which is an efficient binary optimization algorithm. We theoretically analyze the dependency of the number of possible categories $K$, the number of dimensions $D$, and the learning rate $η$ on the runtime. We investigate the tail bound of the runtime on two typical linear functions on the categorical domain: categorical OneMax (COM) and KVal. We derive that the runtimes on COM and KVal are $O(\sqrt{D} \ln (DK) / η)$ and $Θ(D \ln K/ η)$ with high probability, respectively. Our analysis is a generalization for that of the cGA on the binary domain. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2406.16506 [pdf, other]

Natural Gradient Interpretation of Rank-One Update in CMA-ES

Authors: Ryoki Hamano, Shinichi Shirakawa, Masahiro Nomura

Abstract: The covariance matrix adaptation evolution strategy (CMA-ES) is a stochastic search algorithm using a multivariate normal distribution for continuous black-box optimization. In addition to strong empirical results, part of the CMA-ES can be described by a stochastic natural gradient method and can be derived from information geometric optimization (IGO) framework. However, there are some component… ▽ More The covariance matrix adaptation evolution strategy (CMA-ES) is a stochastic search algorithm using a multivariate normal distribution for continuous black-box optimization. In addition to strong empirical results, part of the CMA-ES can be described by a stochastic natural gradient method and can be derived from information geometric optimization (IGO) framework. However, there are some components of the CMA-ES, such as the rank-one update, for which the theoretical understanding is limited. While the rank-one update makes the covariance matrix to increase the likelihood of generating a solution in the direction of the evolution path, this idea has been difficult to formulate and interpret as a natural gradient method unlike the rank-$μ$ update. In this work, we provide a new interpretation of the rank-one update in the CMA-ES from the perspective of the natural gradient with prior distribution. First, we propose maximum a posteriori IGO (MAP-IGO), which is the IGO framework extended to incorporate a prior distribution. Then, we derive the rank-one update from the MAP-IGO by setting the prior distribution based on the idea that the promising mean vector should exist in the direction of the evolution path. Moreover, the newly derived rank-one update is extensible, where an additional term appears in the update for the mean vector. We empirically investigate the properties of the additional term using various benchmark functions. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: This paper has been accepted for presentation at PPSN2024

arXiv:2405.11471 [pdf, other]

CMA-ES with Adaptive Reevaluation for Multiplicative Noise

Authors: Kento Uchida, Kenta Nishihara, Shinichi Shirakawa

Abstract: The covariance matrix adaptation evolution strategy (CMA-ES) is a powerful optimization method for continuous black-box optimization problems. Several noise-handling methods have been proposed to bring out the optimization performance of the CMA-ES on noisy objective functions. The adaptations of the population size and the learning rate are two major approaches that perform well under additive Ga… ▽ More The covariance matrix adaptation evolution strategy (CMA-ES) is a powerful optimization method for continuous black-box optimization problems. Several noise-handling methods have been proposed to bring out the optimization performance of the CMA-ES on noisy objective functions. The adaptations of the population size and the learning rate are two major approaches that perform well under additive Gaussian noise. The reevaluation technique is another technique that evaluates each solution multiple times. In this paper, we discuss the difference between those methods from the perspective of stochastic relaxation that considers the maximization of the expected utility function. We derive that the set of maximizers of the noise-independent utility, which is used in the reevaluation technique, certainly contains the optimal solution, while the noise-dependent utility, which is used in the population size and leaning rate adaptations, does not satisfy it under multiplicative noise. Based on the discussion, we develop the reevaluation adaptation CMA-ES (RA-CMA-ES), which computes two update directions using half of the evaluations and adapts the number of reevaluations based on the estimated correlation of those two update directions. The numerical simulation shows that the RA-CMA-ES outperforms the comparative method under multiplicative noise, maintaining competitive performance under additive noise. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: This paper has been accepted as a full paper at GECCO2024

arXiv:2405.10534 [pdf, other]

CMA-ES for Safe Optimization

Authors: Kento Uchida, Ryoki Hamano, Masahiro Nomura, Shota Saito, Shinichi Shirakawa

Abstract: In several real-world applications in medical and control engineering, there are unsafe solutions whose evaluations involve inherent risk. This optimization setting is known as safe optimization and formulated as a specialized type of constrained optimization problem with constraints for safety functions. Safe optimization requires performing efficient optimization without evaluating unsafe soluti… ▽ More In several real-world applications in medical and control engineering, there are unsafe solutions whose evaluations involve inherent risk. This optimization setting is known as safe optimization and formulated as a specialized type of constrained optimization problem with constraints for safety functions. Safe optimization requires performing efficient optimization without evaluating unsafe solutions. A few studies have proposed the optimization methods for safe optimization based on Bayesian optimization and the evolutionary algorithm. However, Bayesian optimization-based methods often struggle to achieve superior solutions, and the evolutionary algorithm-based method fails to effectively reduce unsafe evaluations. This study focuses on CMA-ES as an efficient evolutionary algorithm and proposes an optimization method termed safe CMA-ES. The safe CMA-ES is designed to achieve both safety and efficiency in safe optimization. The safe CMA-ES estimates the Lipschitz constants of safety functions transformed with the distribution parameters using the maximum norm of the gradient in Gaussian process regression. Subsequently, the safe CMA-ES projects the samples to the nearest point in the safe region constructed with the estimated Lipschitz constants. The numerical simulation using the benchmark functions shows that the safe CMA-ES successfully performs optimization, suppressing the unsafe evaluations, while the existing methods struggle to significantly reduce the unsafe evaluations. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: This paper has been accepted as a full paper at GECCO2024

arXiv:2405.09962 [pdf, other]

doi 10.1145/3638529.3654198

CatCMA : Stochastic Optimization for Mixed-Category Problems

Authors: Ryoki Hamano, Shota Saito, Masahiro Nomura, Kento Uchida, Shinichi Shirakawa

Abstract: Black-box optimization problems often require simultaneously optimizing different types of variables, such as continuous, integer, and categorical variables. Unlike integer variables, categorical variables do not necessarily have a meaningful order, and the discretization approach of continuous variables does not work well. Although several Bayesian optimization methods can deal with mixed-categor… ▽ More Black-box optimization problems often require simultaneously optimizing different types of variables, such as continuous, integer, and categorical variables. Unlike integer variables, categorical variables do not necessarily have a meaningful order, and the discretization approach of continuous variables does not work well. Although several Bayesian optimization methods can deal with mixed-category black-box optimization (MC-BBO), they suffer from a lack of scalability to high-dimensional problems and internal computational cost. This paper proposes CatCMA, a stochastic optimization method for MC-BBO problems, which employs the joint probability distribution of multivariate Gaussian and categorical distributions as the search distribution. CatCMA updates the parameters of the joint probability distribution in the natural gradient direction. CatCMA also incorporates the acceleration techniques used in the covariance matrix adaptation evolution strategy (CMA-ES) and the stochastic natural gradient method, such as step-size adaptation and learning rate adaptation. In addition, we restrict the ranges of the categorical distribution parameters by margin to prevent premature convergence and analytically derive a promising margin setting. Numerical experiments show that the performance of CatCMA is superior and more robust to problem dimensions compared to state-of-the-art Bayesian optimization algorithms. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: This paper has been accepted for presentation at GECCO2024

arXiv:2305.00849 [pdf, other]

(1+1)-CMA-ES with Margin for Discrete and Mixed-Integer Problems

Authors: Yohei Watanabe, Kento Uchida, Ryoki Hamano, Shota Saito, Masahiro Nomura, Shinichi Shirakawa

Abstract: The covariance matrix adaptation evolution strategy (CMA-ES) is an efficient continuous black-box optimization method. The CMA-ES possesses many attractive features, including invariance properties and a well-tuned default hyperparameter setting. Moreover, several components to specialize the CMA-ES have been proposed, such as noise handling and constraint handling. To utilize these advantages in… ▽ More The covariance matrix adaptation evolution strategy (CMA-ES) is an efficient continuous black-box optimization method. The CMA-ES possesses many attractive features, including invariance properties and a well-tuned default hyperparameter setting. Moreover, several components to specialize the CMA-ES have been proposed, such as noise handling and constraint handling. To utilize these advantages in mixed-integer optimization problems, the CMA-ES with margin has been proposed. The CMA-ES with margin prevents the premature convergence of discrete variables by the margin correction, in which the distribution parameters are modified to leave the generation probability for changing the discrete variable. The margin correction has been applied to ($μ/μ_\mathrm{w}$,$λ$)-CMA-ES, while this paper introduces the margin correction into (1+1)-CMA-ES, an elitist version of CMA-ES. The (1+1)-CMA-ES is often advantageous for unimodal functions and can be computationally less expensive. To tackle the performance deterioration on mixed-integer optimization, we use the discretized elitist solution as the mean of the sampling distribution and modify the margin correction not to move the elitist solution. The numerical simulation using benchmark functions on mixed-integer, integer, and binary domains shows that (1+1)-CMA-ES with margin outperforms the CMA-ES with margin and is better than or comparable with several specialized methods to a particular search domain. △ Less

Submitted 1 May, 2023; originally announced May 2023.

arXiv:2303.18031 [pdf, other]

Simple Domain Generalization Methods are Strong Baselines for Open Domain Generalization

Authors: Masashi Noguchi, Shinichi Shirakawa

Abstract: In real-world applications, a machine learning model is required to handle an open-set recognition (OSR), where unknown classes appear during the inference, in addition to a domain shift, where the distribution of data differs between the training and inference phases. Domain generalization (DG) aims to handle the domain shift situation where the target domain of the inference phase is inaccessibl… ▽ More In real-world applications, a machine learning model is required to handle an open-set recognition (OSR), where unknown classes appear during the inference, in addition to a domain shift, where the distribution of data differs between the training and inference phases. Domain generalization (DG) aims to handle the domain shift situation where the target domain of the inference phase is inaccessible during model training. Open domain generalization (ODG) takes into account both DG and OSR. Domain-Augmented Meta-Learning (DAML) is a method targeting ODG but has a complicated learning process. On the other hand, although various DG methods have been proposed, they have not been evaluated in ODG situations. This work comprehensively evaluates existing DG methods in ODG and shows that two simple DG methods, CORrelation ALignment (CORAL) and Maximum Mean Discrepancy (MMD), are competitive with DAML in several cases. In addition, we propose simple extensions of CORAL and MMD by introducing the techniques used in DAML, such as ensemble learning and Dirichlet mixup data augmentation. The experimental evaluation demonstrates that the extended CORAL and MMD can perform comparably to DAML with lower computational costs. This suggests that the simple DG methods and their simple extensions are strong baselines for ODG. The code used in the experiments is available at https://github.com/shiralab/OpenDG-Eval. △ Less

Submitted 31 March, 2023; originally announced March 2023.

Comments: 18 pages, 5 figures

arXiv:2212.09260 [pdf, other]

Marginal Probability-Based Integer Handling for CMA-ES Tackling Single-and Multi-Objective Mixed-Integer Black-Box Optimization

Authors: Ryoki Hamano, Shota Saito, Masahiro Nomura, Shinichi Shirakawa

Abstract: This study targets the mixed-integer black-box optimization (MI-BBO) problem where continuous and integer variables should be optimized simultaneously. The CMA-ES, our focus in this study, is a population-based stochastic search method that samples solution candidates from a multivariate Gaussian distribution (MGD), which shows excellent performance in continuous BBO. The parameters of MGD, mean a… ▽ More This study targets the mixed-integer black-box optimization (MI-BBO) problem where continuous and integer variables should be optimized simultaneously. The CMA-ES, our focus in this study, is a population-based stochastic search method that samples solution candidates from a multivariate Gaussian distribution (MGD), which shows excellent performance in continuous BBO. The parameters of MGD, mean and (co)variance, are updated based on the evaluation value of candidate solutions in the CMA-ES. If the CMA-ES is applied to the MI-BBO with straightforward discretization, however, the variance corresponding to the integer variables becomes much smaller than the granularity of the discretization before reaching the optimal solution, which leads to the stagnation of the optimization. In particular, when binary variables are included in the problem, this stagnation more likely occurs because the granularity of the discretization becomes wider, and the existing integer handling for the CMA-ES does not address this stagnation. To overcome these limitations, we propose a simple integer handling for the CMA-ES based on lower-bounding the marginal probabilities associated with the generation of integer variables in the MGD. The numerical experiments on the MI-BBO benchmark problems demonstrate the efficiency and robustness of the proposed method. Furthermore, in order to demonstrate the generality of the idea of the proposed method, in addition to the single-objective optimization case, we incorporate it into multi-objective CMA-ES and verify its performance on bi-objective mixed-integer benchmark problems. △ Less

Submitted 11 January, 2024; v1 submitted 19 December, 2022; originally announced December 2022.

Comments: Camera-ready version for ACM Transactions on Evolutionary Learning and Optimization (TELO). This paper is an extended version of the work presented in arXiv:2205.13482

arXiv:2208.13968 [pdf, other]

Neural Architecture Search for Improving Latency-Accuracy Trade-off in Split Computing

Authors: Shoma Shimizu, Takayuki Nishio, Shota Saito, Yoichi Hirose, Chen Yen-Hsiu, Shinichi Shirakawa

Abstract: This paper proposes a neural architecture search (NAS) method for split computing. Split computing is an emerging machine-learning inference technique that addresses the privacy and latency challenges of deploying deep learning in IoT systems. In split computing, neural network models are separated and cooperatively processed using edge servers and IoT devices via networks. Thus, the architecture… ▽ More This paper proposes a neural architecture search (NAS) method for split computing. Split computing is an emerging machine-learning inference technique that addresses the privacy and latency challenges of deploying deep learning in IoT systems. In split computing, neural network models are separated and cooperatively processed using edge servers and IoT devices via networks. Thus, the architecture of the neural network model significantly impacts the communication payload size, model accuracy, and computational load. In this paper, we address the challenge of optimizing neural network architecture for split computing. To this end, we proposed NASC, which jointly explores optimal model architecture and a split point to achieve higher accuracy while meeting latency requirements (i.e., smaller total latency of computation and communication than a certain threshold). NASC employs a one-shot NAS that does not require repeating model training for a computationally efficient architecture search. Our performance evaluation using hardware (HW)-NAS-Bench of benchmark data demonstrates that the proposed NASC can improve the ``communication latency and model accuracy" trade-off, i.e., reduce the latency by approximately 40-60% from the baseline, with slight accuracy degradation. △ Less

Submitted 29 August, 2022; originally announced August 2022.

arXiv:2207.10334 [pdf, other]

Efficient Search of Multiple Neural Architectures with Different Complexities via Importance Sampling

Authors: Yuhei Noda, Shota Saito, Shinichi Shirakawa

Abstract: Neural architecture search (NAS) aims to automate architecture design processes and improve the performance of deep neural networks. Platform-aware NAS methods consider both performance and complexity and can find well-performing architectures with low computational resources. Although ordinary NAS methods result in tremendous computational costs owing to the repetition of model training, one-shot… ▽ More Neural architecture search (NAS) aims to automate architecture design processes and improve the performance of deep neural networks. Platform-aware NAS methods consider both performance and complexity and can find well-performing architectures with low computational resources. Although ordinary NAS methods result in tremendous computational costs owing to the repetition of model training, one-shot NAS, which trains the weights of a supernetwork containing all candidate architectures only once during the search process, has been reported to result in a lower search cost. This study focuses on the architecture complexity-aware one-shot NAS that optimizes the objective function composed of the weighted sum of two metrics, such as the predictive performance and number of parameters. In existing methods, the architecture search process must be run multiple times with different coefficients of the weighted sum to obtain multiple architectures with different complexities. This study aims at reducing the search cost associated with finding multiple architectures. The proposed method uses multiple distributions to generate architectures with different complexities and updates each distribution using the samples obtained from multiple distributions based on importance sampling. The proposed method allows us to obtain multiple architectures with different complexities in a single architecture search, resulting in reducing the search cost. The proposed method is applied to the architecture search of convolutional neural networks on the CIAFR-10 and ImageNet datasets. Consequently, compared with baseline methods, the proposed method finds multiple architectures with varying complexities while requiring less computational effort. △ Less

Submitted 21 July, 2022; originally announced July 2022.

Comments: Accepted as a conference paper at the 31st International Conference on Artificial Neural Networks (ICANN 2022). The final authenticated publication will be available in the Springer Lecture Notes in Computer Science (LNCS)

arXiv:2205.13482 [pdf, other]

doi 10.1145/3512290.3528827

CMA-ES with Margin: Lower-Bounding Marginal Probability for Mixed-Integer Black-Box Optimization

Authors: Ryoki Hamano, Shota Saito, Masahiro Nomura, Shinichi Shirakawa

Abstract: This study targets the mixed-integer black-box optimization (MI-BBO) problem where continuous and integer variables should be optimized simultaneously. The CMA-ES, our focus in this study, is a population-based stochastic search method that samples solution candidates from a multivariate Gaussian distribution (MGD), which shows excellent performance in continuous BBO. The parameters of MGD, mean a… ▽ More This study targets the mixed-integer black-box optimization (MI-BBO) problem where continuous and integer variables should be optimized simultaneously. The CMA-ES, our focus in this study, is a population-based stochastic search method that samples solution candidates from a multivariate Gaussian distribution (MGD), which shows excellent performance in continuous BBO. The parameters of MGD, mean and (co)variance, are updated based on the evaluation value of candidate solutions in the CMA-ES. If the CMA-ES is applied to the MI-BBO with straightforward discretization, however, the variance corresponding to the integer variables becomes much smaller than the granularity of the discretization before reaching the optimal solution, which leads to the stagnation of the optimization. In particular, when binary variables are included in the problem, this stagnation more likely occurs because the granularity of the discretization becomes wider, and the existing modification to the CMA-ES does not address this stagnation. To overcome these limitations, we propose a simple modification of the CMA-ES based on lower-bounding the marginal probabilities associated with the generation of integer variables in the MGD. The numerical experiments on the MI-BBO benchmark problems demonstrate the efficiency and robustness of the proposed method. △ Less

Submitted 12 January, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

Comments: Nominated for the best paper of GECCO'22 ENUM Track. We have corrected the error of Algorithm 1 in the Appendix. In addition, an extended version is published at arXiv:2212.09260 that describes support for the multi-objective MI-BBO

arXiv:2203.15292 [pdf, other]

doi 10.1145/3512290.3528778

A Two-phase Framework with a Bézier Simplex-based Interpolation Method for Computationally Expensive Multi-objective Optimization

Authors: Ryoji Tanabe, Youhei Akimoto, Ken Kobayashi, Hiroshi Umeki, Shinichi Shirakawa, Naoki Hamada

Abstract: This paper proposes a two-phase framework with a Bézier simplex-based interpolation method (TPB) for computationally expensive multi-objective optimization. The first phase in TPB aims to approximate a few Pareto optimal solutions by optimizing a sequence of single-objective scalar problems. The first phase in TPB can fully exploit a state-of-the-art single-objective derivative-free optimizer. The… ▽ More This paper proposes a two-phase framework with a Bézier simplex-based interpolation method (TPB) for computationally expensive multi-objective optimization. The first phase in TPB aims to approximate a few Pareto optimal solutions by optimizing a sequence of single-objective scalar problems. The first phase in TPB can fully exploit a state-of-the-art single-objective derivative-free optimizer. The second phase in TPB utilizes a Bézier simplex model to interpolate the solutions obtained in the first phase. The second phase in TPB fully exploits the fact that a Bézier simplex model can approximate the Pareto optimal solution set by exploiting its simplex structure when a given problem is simplicial. We investigate the performance of TPB on the 55 bi-objective BBOB problems. The results show that TPB performs significantly better than HMO-CMA-ES and some state-of-the-art meta-model-based optimizers. △ Less

Submitted 29 March, 2022; originally announced March 2022.

Comments: This is an accepted version of a paper published in the proceedings of GECCO 2022

arXiv:2110.10165 [pdf, other]

NAS-HPO-Bench-II: A Benchmark Dataset on Joint Optimization of Convolutional Neural Network Architecture and Training Hyperparameters

Authors: Yoichi Hirose, Nozomu Yoshinari, Shinichi Shirakawa

Abstract: The benchmark datasets for neural architecture search (NAS) have been developed to alleviate the computationally expensive evaluation process and ensure a fair comparison. Recent NAS benchmarks only focus on architecture optimization, although the training hyperparameters affect the obtained model performances. Building the benchmark dataset for joint optimization of architecture and training hype… ▽ More The benchmark datasets for neural architecture search (NAS) have been developed to alleviate the computationally expensive evaluation process and ensure a fair comparison. Recent NAS benchmarks only focus on architecture optimization, although the training hyperparameters affect the obtained model performances. Building the benchmark dataset for joint optimization of architecture and training hyperparameters is essential to further NAS research. The existing NAS-HPO-Bench is a benchmark for joint optimization, but it does not consider the network connectivity design as done in modern NAS algorithms. This paper introduces the first benchmark dataset for joint optimization of network connections and training hyperparameters, which we call NAS-HPO-Bench-II. We collect the performance data of 4K cell-based convolutional neural network architectures trained on the CIFAR-10 dataset with different learning rate and batch size settings, resulting in the data of 192K configurations. The dataset includes the exact data for 12 epoch training. We further build the surrogate model predicting the accuracies after 200 epoch training to provide the performance data of longer training epoch. By analyzing NAS-HPO-Bench-II, we confirm the dependency between architecture and training hyperparameters and the necessity of joint optimization. Finally, we demonstrate the benchmarking of the baseline optimization algorithms using NAS-HPO-Bench-II. △ Less

Submitted 19 October, 2021; originally announced October 2021.

Comments: 16 pages, 6 figures. Accepted at ACML2021 (long oral). API is available at https://github.com/yoichii/nashpobench2api

arXiv:1907.06341 [pdf, other]

doi 10.1007/978-3-030-30484-3_33

Controlling Model Complexity in Probabilistic Model-Based Dynamic Optimization of Neural Network Structures

Authors: Shota Saito, Shinichi Shirakawa

Abstract: A method of simultaneously optimizing both the structure of neural networks and the connection weights in a single training loop can reduce the enormous computational cost of neural architecture search. We focus on the probabilistic model-based dynamic neural network structure optimization that considers the probability distribution of structure parameters and simultaneously optimizes both the dis… ▽ More A method of simultaneously optimizing both the structure of neural networks and the connection weights in a single training loop can reduce the enormous computational cost of neural architecture search. We focus on the probabilistic model-based dynamic neural network structure optimization that considers the probability distribution of structure parameters and simultaneously optimizes both the distribution parameters and connection weights based on gradient methods. Since the existing algorithm searches for the structures that only minimize the training loss, this method might find overly complicated structures. In this paper, we propose the introduction of a penalty term to control the model complexity of obtained structures. We formulate a penalty term using the number of weights or units and derive its analytical natural gradient. The proposed method minimizes the objective function injected the penalty term based on the stochastic gradient descent. We apply the proposed method in the unit selection of a fully-connected neural network and the connection selection of a convolutional neural network. The experimental results show that the proposed method can control model complexity while maintaining performance. △ Less

Submitted 15 July, 2019; originally announced July 2019.

Comments: Accepted as a conference paper at the 28th International Conference on Artificial Neural Networks (ICANN 2019). The final authenticated publication will be available in the Springer Lecture Notes in Computer Science (LNCS). 13 pages

arXiv:1905.08537 [pdf, other]

Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search

Authors: Youhei Akimoto, Shinichi Shirakawa, Nozomu Yoshinari, Kento Uchida, Shota Saito, Kouhei Nishida

Abstract: High sensitivity of neural architecture search (NAS) methods against their input such as step-size (i.e., learning rate) and search space prevents practitioners from applying them out-of-the-box to their own problems, albeit its purpose is to automate a part of tuning process. Aiming at a fast, robust, and widely-applicable NAS, we develop a generic optimization framework for NAS. We turn a couple… ▽ More High sensitivity of neural architecture search (NAS) methods against their input such as step-size (i.e., learning rate) and search space prevents practitioners from applying them out-of-the-box to their own problems, albeit its purpose is to automate a part of tuning process. Aiming at a fast, robust, and widely-applicable NAS, we develop a generic optimization framework for NAS. We turn a coupled optimization of connection weights and neural architecture into a differentiable optimization by means of stochastic relaxation. It accepts arbitrary search space (widely-applicable) and enables to employ a gradient-based simultaneous optimization of weights and architecture (fast). We propose a stochastic natural gradient method with an adaptive step-size mechanism built upon our theoretical investigation (robust). Despite its simplicity and no problem-dependent parameter tuning, our method exhibited near state-of-the-art performances with low computational budgets both on image classification and inpainting tasks. △ Less

Submitted 21 May, 2019; originally announced May 2019.

Comments: Accepted to ICML 2019. Code is available at https://github.com/shirakawas/ASNG-NAS

arXiv:1809.06517 [pdf, other]

Parameterless Stochastic Natural Gradient Method for Discrete Optimization and its Application to Hyper-Parameter Optimization for Neural Network

Authors: Kouhei Nishida, Hernan Aguirre, Shota Saito, Shinichi Shirakawa, Youhei Akimoto

Abstract: Black box discrete optimization (BBDO) appears in wide range of engineering tasks. Evolutionary or other BBDO approaches have been applied, aiming at automating necessary tuning of system parameters, such as hyper parameter tuning of machine learning based systems when being installed for a specific task. However, automation is often jeopardized by the need of strategy parameter tuning for BBDO al… ▽ More Black box discrete optimization (BBDO) appears in wide range of engineering tasks. Evolutionary or other BBDO approaches have been applied, aiming at automating necessary tuning of system parameters, such as hyper parameter tuning of machine learning based systems when being installed for a specific task. However, automation is often jeopardized by the need of strategy parameter tuning for BBDO algorithms. An expert with the domain knowledge must undergo time-consuming strategy parameter tuning. This paper proposes a parameterless BBDO algorithm based on information geometric optimization, a recent framework for black box optimization using stochastic natural gradient. Inspired by some theoretical implications, we develop an adaptation mechanism for strategy parameters of the stochastic natural gradient method for discrete search domains. The proposed algorithm is evaluated on commonly used test problems. It is further extended to two examples of simultaneous optimization of the hyper parameters and the connection weights of deep learning models, leading to a faster optimization than the existing approaches without any effort of parameter tuning. △ Less

Submitted 17 September, 2018; originally announced September 2018.

arXiv:1805.12388 [pdf, ps, other]

Sample Reuse via Importance Sampling in Information Geometric Optimization

Authors: Shinichi Shirakawa, Youhei Akimoto, Kazuki Ouchi, Kouzou Ohara

Abstract: In this paper we propose a technique to reduce the number of function evaluations, which is often the bottleneck of the black-box optimization, in the information geometric optimization (IGO) that is a generic framework of the probability model-based black-box optimization algorithms and generalizes several well-known evolutionary algorithms, such as the population-based incremental learning (PBIL… ▽ More In this paper we propose a technique to reduce the number of function evaluations, which is often the bottleneck of the black-box optimization, in the information geometric optimization (IGO) that is a generic framework of the probability model-based black-box optimization algorithms and generalizes several well-known evolutionary algorithms, such as the population-based incremental learning (PBIL) and the pure rank-$μ$ update covariance matrix adaptation evolution strategy (CMA-ES). In each iteration, the IGO algorithms update the parameters of the probability distribution to the natural gradient direction estimated by Monte-Carlo with the samples drawn from the current distribution. Our strategy is to reuse previously generated and evaluated samples based on the importance sampling. It is a technique to reduce the estimation variance without introducing a bias in Monte-Carlo estimation. We apply the sample reuse technique to the PBIL and the pure rank-$μ$ update CMA-ES and empirically investigate its effect. The experimental results show that the sample reuse helps to reduce the number of function evaluations on many benchmark functions for both the PBIL and the pure rank-$μ$ update CMA-ES. Moreover, we demonstrate how to combine the importance sampling technique with a variant of the CMA-ES involving an algorithmic component that is not derived in the IGO framework. △ Less

Submitted 31 May, 2018; originally announced May 2018.

Comments: 24 pages

arXiv:1801.07650 [pdf, other]

Dynamic Optimization of Neural Network Structures Using Probabilistic Modeling

Authors: Shinichi Shirakawa, Yasushi Iwata, Youhei Akimoto

Abstract: Deep neural networks (DNNs) are powerful machine learning models and have succeeded in various artificial intelligence tasks. Although various architectures and modules for the DNNs have been proposed, selecting and designing the appropriate network structure for a target problem is a challenging task. In this paper, we propose a method to simultaneously optimize the network structure and weight p… ▽ More Deep neural networks (DNNs) are powerful machine learning models and have succeeded in various artificial intelligence tasks. Although various architectures and modules for the DNNs have been proposed, selecting and designing the appropriate network structure for a target problem is a challenging task. In this paper, we propose a method to simultaneously optimize the network structure and weight parameters during neural network training. We consider a probability distribution that generates network structures, and optimize the parameters of the distribution instead of directly optimizing the network structure. The proposed method can apply to the various network structure optimization problems under the same framework. We apply the proposed method to several structure optimization problems such as selection of layers, selection of unit types, and selection of connections using the MNIST, CIFAR-10, and CIFAR-100 datasets. The experimental results show that the proposed method can find the appropriate and competitive network structures. △ Less

Submitted 23 January, 2018; originally announced January 2018.

Comments: To appear in the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), 9 pages

arXiv:1704.00764 [pdf, other]

A Genetic Programming Approach to Designing Convolutional Neural Network Architectures

Authors: Masanori Suganuma, Shinichi Shirakawa, Tomoharu Nagao

Abstract: The convolutional neural network (CNN), which is one of the deep learning models, has seen much success in a variety of computer vision tasks. However, designing CNN architectures still requires expert knowledge and a lot of trial and error. In this paper, we attempt to automatically construct CNN architectures for an image classification task based on Cartesian genetic programming (CGP). In our m… ▽ More The convolutional neural network (CNN), which is one of the deep learning models, has seen much success in a variety of computer vision tasks. However, designing CNN architectures still requires expert knowledge and a lot of trial and error. In this paper, we attempt to automatically construct CNN architectures for an image classification task based on Cartesian genetic programming (CGP). In our method, we adopt highly functional modules, such as convolutional blocks and tensor concatenation, as the node functions in CGP. The CNN structure and connectivity represented by the CGP encoding method are optimized to maximize the validation accuracy. To evaluate the proposed method, we constructed a CNN architecture for the image classification task with the CIFAR-10 dataset. The experimental result shows that the proposed method can be used to automatically find the competitive CNN architecture compared with state-of-the-art models. △ Less

Submitted 11 August, 2017; v1 submitted 3 April, 2017; originally announced April 2017.

Comments: This is the revised version of the GECCO 2017 paper. The code of our method is available at https://github.com/sg-nm/cgp-cnn

Showing 1–19 of 19 results for author: Shirakawa, S