Search | arXiv e-print repository

Deep learning-based auto-segmentation of paraganglioma for growth monitoring

Authors: E. M. C. Sijben, J. C. Jansen, M. de Ridder, P. A. N. Bosman, T. Alderliesten

Abstract: Volume measurement of a paraganglioma (a rare neuroendocrine tumor that typically forms along major blood vessels and nerve pathways in the head and neck region) is crucial for monitoring and modeling tumor growth in the long term. However, in clinical practice, using available tools to do these measurements is time-consuming and suffers from tumor-shape assumptions and observer-to-observer variat… ▽ More Volume measurement of a paraganglioma (a rare neuroendocrine tumor that typically forms along major blood vessels and nerve pathways in the head and neck region) is crucial for monitoring and modeling tumor growth in the long term. However, in clinical practice, using available tools to do these measurements is time-consuming and suffers from tumor-shape assumptions and observer-to-observer variation. Growth modeling could play a significant role in solving a decades-old dilemma (stemming from uncertainty regarding how the tumor will develop over time). By giving paraganglioma patients treatment, severe symptoms can be prevented. However, treating patients who do not actually need it, comes at the cost of unnecessary possible side effects and complications. Improved measurement techniques could enable growth model studies with a large amount of tumor volume data, possibly giving valuable insights into how these tumors develop over time. Therefore, we propose an automated tumor volume measurement method based on a deep learning segmentation model using no-new-UNnet (nnUNet). We assess the performance of the model based on visual inspection by a senior otorhinolaryngologist and several quantitative metrics by comparing model outputs with manual delineations, including a comparison with variation in manual delineation by multiple observers. Our findings indicate that the automatic method performs (at least) equal to manual delineation. Finally, using the created model, and a linking procedure that we propose to track the tumor over time, we show how additional volume measurements affect the fit of known growth functions. △ Less

Submitted 19 March, 2024; originally announced April 2024.

arXiv:2404.06557 [pdf, other]

doi 10.1145/3638529.3654125

Temporal True and Surrogate Fitness Landscape Analysis for Expensive Bi-Objective Optimisation

Authors: C. J. Rodriguez, S. L. Thomson, T. Alderliesten, P. A. N. Bosman

Abstract: Many real-world problems have expensive-to-compute fitness functions and are multi-objective in nature. Surrogate-assisted evolutionary algorithms are often used to tackle such problems. Despite this, literature about analysing the fitness landscapes induced by surrogate models is limited, and even non-existent for multi-objective problems. This study addresses this critical gap by comparing lands… ▽ More Many real-world problems have expensive-to-compute fitness functions and are multi-objective in nature. Surrogate-assisted evolutionary algorithms are often used to tackle such problems. Despite this, literature about analysing the fitness landscapes induced by surrogate models is limited, and even non-existent for multi-objective problems. This study addresses this critical gap by comparing landscapes of the true fitness function with those of surrogate models for multi-objective functions. Moreover, it does so temporally by examining landscape features at different points in time during optimisation, in the vicinity of the population at that point in time. We consider the BBOB bi-objective benchmark functions in our experiments. The results of the fitness landscape analysis reveals significant differences between true and surrogate features at different time points during optimisation. Despite these differences, the true and surrogate landscape features still show high correlations between each other. Furthermore, this study identifies which landscape features are related to search and demonstrates that both surrogate and true landscape features are capable of predicting algorithm performance. These findings indicate that temporal analysis of the landscape features may help to facilitate the design of surrogate switching approaches to improve performance in multi-objective optimisation. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.06240 [pdf, other]

Hyperparameter-Free Medical Image Synthesis for Sharing Data and Improving Site-Specific Segmentation

Authors: Alexander Chebykin, Peter A. N. Bosman, Tanja Alderliesten

Abstract: Sharing synthetic medical images is a promising alternative to sharing real images that can improve patient privacy and data security. To get good results, existing methods for medical image synthesis must be manually adjusted when they are applied to unseen data. To remove this manual burden, we introduce a Hyperparameter-Free distributed learning method for automatic medical image Synthesis, Sha… ▽ More Sharing synthetic medical images is a promising alternative to sharing real images that can improve patient privacy and data security. To get good results, existing methods for medical image synthesis must be manually adjusted when they are applied to unseen data. To remove this manual burden, we introduce a Hyperparameter-Free distributed learning method for automatic medical image Synthesis, Sharing, and Segmentation called HyFree-S3. For three diverse segmentation settings (pelvic MRIs, lung X-rays, polyp photos), the use of HyFree-S3 results in improved performance over training only with site-specific data (in the majority of cases). The hyperparameter-free nature of the method should make data synthesis and sharing easier, potentially leading to an increase in the quantity of available data and consequently the quality of the models trained that may ultimately be applied in the clinic. Our code is available at https://github.com/AwesomeLemon/HyFree-S3 △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: Accepted at MIDL 2024

arXiv:2403.14224 [pdf, other]

Stitching for Neuroevolution: Recombining Deep Neural Networks without Breaking Them

Authors: Arthur Guijt, Dirk Thierens, Tanja Alderliesten, Peter A. N. Bosman

Abstract: Traditional approaches to neuroevolution often start from scratch. This becomes prohibitively expensive in terms of computational and data requirements when targeting modern, deep neural networks. Using a warm start could be highly advantageous, e.g., using previously trained networks, potentially from different sources. This moreover enables leveraging the benefits of transfer learning (in partic… ▽ More Traditional approaches to neuroevolution often start from scratch. This becomes prohibitively expensive in terms of computational and data requirements when targeting modern, deep neural networks. Using a warm start could be highly advantageous, e.g., using previously trained networks, potentially from different sources. This moreover enables leveraging the benefits of transfer learning (in particular vastly reduced training effort). However, recombining trained networks is non-trivial because architectures and feature representations typically differ. Consequently, a straightforward exchange of layers tends to lead to a performance breakdown. We overcome this by matching the layers of parent networks based on their connectivity, identifying potential crossover points. To correct for differing feature representations between these layers we employ stitching, which merges the networks by introducing new layers at crossover points. To train the merged network, only stitching layers need to be considered. New networks can then be created by selecting a subnetwork by choosing which stitching layers to (not) use. Assessing their performance is efficient as only their evaluation on data is required. We experimentally show that our approach enables finding networks that represent novel trade-offs between performance and computational cost, with some even dominating the original networks. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 10 pages, submitted to GECCO 2024

arXiv:2403.11173 [pdf, other]

Multi-Objective Evolutionary Neural Architecture Search for Recurrent Neural Networks

Authors: Reinhard Booysen, Anna Sergeevna Bosman

Abstract: Artificial neural network (NN) architecture design is a nontrivial and time-consuming task that often requires a high level of human expertise. Neural architecture search (NAS) serves to automate the design of NN architectures and has proven to be successful in automatically finding NN architectures that outperform those manually designed by human experts. NN architecture performance can be quanti… ▽ More Artificial neural network (NN) architecture design is a nontrivial and time-consuming task that often requires a high level of human expertise. Neural architecture search (NAS) serves to automate the design of NN architectures and has proven to be successful in automatically finding NN architectures that outperform those manually designed by human experts. NN architecture performance can be quantified based on multiple objectives, which include model accuracy and some NN architecture complexity objectives, among others. The majority of modern NAS methods that consider multiple objectives for NN architecture performance evaluation are concerned with automated feed forward NN architecture design, which leaves multi-objective automated recurrent neural network (RNN) architecture design unexplored. RNNs are important for modeling sequential datasets, and prominent within the natural language processing domain. It is often the case in real world implementations of machine learning and NNs that a reasonable trade-off is accepted for marginally reduced model accuracy in favour of lower computational resources demanded by the model. This paper proposes a multi-objective evolutionary algorithm-based RNN architecture search method. The proposed method relies on approximate network morphisms for RNN architecture complexity optimisation during evolution. The results show that the proposed method is capable of finding novel RNN architectures with comparable performance to state-of-the-art manually designed RNN architectures, but with reduced computational demand. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.09739 [pdf, other]

The Carbon Isotopic Ratio and Planet Formation

Authors: Edwin A. Bergin, Arthur Bosman, Richard Teague, Jenny Calahan, Karen Willacy, L. Ilsedore Cleeves, Kamber Schwarz, Ke Zhang, Simon Bruderer

Abstract: We present the first detection of 13CCH in a protoplanetary disk (TW Hya). Using observations of C2H we measure CCH/13CCH = 65 +/- 20 in gas with a CO isotopic ratio of 12CO/13CO = 21 +/- 5 (Yoshida et al. 2022a). The TW Hya disk exhibits a gas phase C/O that exceeds unity and C2H is the tracer of this excess carbon. We confirm that the TW Hya gaseous disk exhibits two separate carbon isotopic res… ▽ More We present the first detection of 13CCH in a protoplanetary disk (TW Hya). Using observations of C2H we measure CCH/13CCH = 65 +/- 20 in gas with a CO isotopic ratio of 12CO/13CO = 21 +/- 5 (Yoshida et al. 2022a). The TW Hya disk exhibits a gas phase C/O that exceeds unity and C2H is the tracer of this excess carbon. We confirm that the TW Hya gaseous disk exhibits two separate carbon isotopic reservoirs as noted previously (Yoshida et al. 2022a). We explore two theoretical solutions for the development of this dichotomy. One model represents TW Hya today with a protoplanetary disk exposed to a cosmic ray ionization rate that is below interstellar as consistent with current estimates. We find that this model does not have sufficient ionization in cold (T < 40 K) layers to activate carbon isotopic fractionation. The second model investigates a younger TW Hya protostellar disk exposed to an interstellar cosmic ray ionization rate. We find that the younger model has sources of ionization deeper in a colder disk that generates two independent isotopic reservoirs. One reservoir is 12C-enriched carried by methane/hydrocarbon ices and the other is 13C-enriched carried by gaseous CO. The former potentially provides a source of methane/hydrocarbon ices to power the chemistry that generates the anomalously strong C$_2$H emission in this (and other) disk systems in later stages. The latter provides a source of gaseous 13C-rich material to generate isotopic enrichments in forming giant planets as recently detected in the super-Jupiter TYC 8998-760-1 b by Zhang et al. (2021). △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 16 pages, 8 figures, accepted by the Astrophysical Journal

arXiv:2402.16658 [pdf, other]

Multi-Objective Learning for Deformable Image Registration

Authors: Monika Grewal, Henrike Westerveld, Peter A. N. Bosman, Tanja Alderliesten

Abstract: Deformable image registration (DIR) involves optimization of multiple conflicting objectives, however, not many existing DIR algorithms are multi-objective (MO). Further, while there has been progress in the design of deep learning algorithms for DIR, there is no work in the direction of MO DIR using deep learning. In this paper, we fill this gap by combining a recently proposed approach for MO tr… ▽ More Deformable image registration (DIR) involves optimization of multiple conflicting objectives, however, not many existing DIR algorithms are multi-objective (MO). Further, while there has been progress in the design of deep learning algorithms for DIR, there is no work in the direction of MO DIR using deep learning. In this paper, we fill this gap by combining a recently proposed approach for MO training of neural networks with a well-known deep neural network for DIR and create a deep learning based MO DIR approach. We evaluate the proposed approach for DIR of pelvic magnetic resonance imaging (MRI) scans. We experimentally demonstrate that the proposed MO DIR approach -- providing multiple registration outputs for each patient that each correspond to a different trade-off between the objectives -- has additional desirable properties from a clinical use point-of-view as compared to providing a single DIR output. The experiments also show that the proposed MO DIR approach provides a better spread of DIR outputs across the entire trade-off front than simply training multiple neural networks with weights for each objective sampled from a grid of possible values. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.12510 [pdf, ps, other]

Function Class Learning with Genetic Programming: Towards Explainable Meta Learning for Tumor Growth Functionals

Authors: E. M. C. Sijben, J. C. Jansen, P. A. N. Bosman, T. Alderliesten

Abstract: Paragangliomas are rare, primarily slow-growing tumors for which the underlying growth pattern is unknown. Therefore, determining the best care for a patient is hard. Currently, if no significant tumor growth is observed, treatment is often delayed, as treatment itself is not without risk. However, by doing so, the risk of (irreversible) adverse effects due to tumor growth may increase. Being able… ▽ More Paragangliomas are rare, primarily slow-growing tumors for which the underlying growth pattern is unknown. Therefore, determining the best care for a patient is hard. Currently, if no significant tumor growth is observed, treatment is often delayed, as treatment itself is not without risk. However, by doing so, the risk of (irreversible) adverse effects due to tumor growth may increase. Being able to predict the growth accurately could assist in determining whether a patient will need treatment during their lifetime and, if so, the timing of this treatment. The aim of this work is to learn the general underlying growth pattern of paragangliomas from multiple tumor growth data sets, in which each data set contains a tumor's volume over time. To do so, we propose a novel approach based on genetic programming to learn a function class, i.e., a parameterized function that can be fit anew for each tumor. We do so in a unique, multi-modal, multi-objective fashion to find multiple potentially interesting function classes in a single run. We evaluate our approach on a synthetic and a real-world data set. By analyzing the resulting function classes, we can effectively explain the general patterns in the data. △ Less

Submitted 9 April, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.12183 [pdf, other]

MultiFIX: An XAI-friendly feature inducing approach to building models from multimodal data

Authors: Mafalda Malafaia, Thalea Schlender, Peter A. N. Bosman, Tanja Alderliesten

Abstract: In the health domain, decisions are often based on different data modalities. Thus, when creating prediction models, multimodal fusion approaches that can extract and combine relevant features from different data modalities, can be highly beneficial. Furthermore, it is important to understand how each modality impacts the final prediction, especially in high-stake domains, so that these models can… ▽ More In the health domain, decisions are often based on different data modalities. Thus, when creating prediction models, multimodal fusion approaches that can extract and combine relevant features from different data modalities, can be highly beneficial. Furthermore, it is important to understand how each modality impacts the final prediction, especially in high-stake domains, so that these models can be used in a trustworthy and responsible manner. We propose MultiFIX: a new interpretability-focused multimodal data fusion pipeline that explicitly induces separate features from different data types that can subsequently be combined to make a final prediction. An end-to-end deep learning architecture is used to train a predictive model and extract representative features of each modality. Each part of the model is then explained using explainable artificial intelligence techniques. Attention maps are used to highlight important regions in image inputs. Inherently interpretable symbolic expressions, learned with GP-GOMEA, are used to describe the contribution of tabular inputs. The fusion of the extracted features to predict the target label is also replaced by a symbolic expression, learned with GP-GOMEA. Results on synthetic problems demonstrate the strengths and limitations of MultiFIX. Lastly, we apply MultiFIX to a publicly available dataset for the detection of malignant skin lesions. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: 8 pages, 9 figures

arXiv:2402.12175 [pdf, other]

Learning Discretized Bayesian Networks with GOMEA

Authors: Damy M. F. Ha, Tanja Alderliesten, Peter A. N. Bosman

Abstract: Bayesian networks model relationships between random variables under uncertainty and can be used to predict the likelihood of events and outcomes while incorporating observed evidence. From an eXplainable AI (XAI) perspective, such models are interesting as they tend to be compact. Moreover, captured relations can be directly inspected by domain experts. In practice, data is often real-valued. Unl… ▽ More Bayesian networks model relationships between random variables under uncertainty and can be used to predict the likelihood of events and outcomes while incorporating observed evidence. From an eXplainable AI (XAI) perspective, such models are interesting as they tend to be compact. Moreover, captured relations can be directly inspected by domain experts. In practice, data is often real-valued. Unless assumptions of normality can be made, discretization is often required. The optimal discretization, however, depends on the relations modelled between the variables. This complicates learning Bayesian networks from data. For this reason, most literature focuses on learning conditional dependencies between sets of variables, called structure learning. In this work, we extend an existing state-of-the-art structure learning approach based on the Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) to jointly learn variable discretizations. The proposed Discretized Bayesian Network GOMEA (DBN-GOMEA) obtains similar or better results than the current state-of-the-art when tasked to retrieve randomly generated ground-truth networks. Moreover, leveraging a key strength of evolutionary algorithms, we can straightforwardly perform DBN learning multi-objectively. We show how this enables incorporating expert knowledge in a uniquely insightful fashion, finding multiple DBNs that trade-off complexity, accuracy, and the difference with a pre-determined expert network. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: The code is available at: https://github.com/damyha/dbn_gomea

arXiv:2402.10757 [pdf, other]

Fitness-based Linkage Learning and Maximum-Clique Conditional Linkage Modelling for Gray-box Optimization with RV-GOMEA

Authors: Georgios Andreadis, Tanja Alderliesten, Peter A. N. Bosman

Abstract: For many real-world optimization problems it is possible to perform partial evaluations, meaning that the impact of changing a few variables on a solution's fitness can be computed very efficiently. It has been shown that such partial evaluations can be excellently leveraged by the Real-Valued GOMEA (RV-GOMEA) that uses a linkage model to capture dependencies between problem variables. Recently, c… ▽ More For many real-world optimization problems it is possible to perform partial evaluations, meaning that the impact of changing a few variables on a solution's fitness can be computed very efficiently. It has been shown that such partial evaluations can be excellently leveraged by the Real-Valued GOMEA (RV-GOMEA) that uses a linkage model to capture dependencies between problem variables. Recently, conditional linkage models were introduced for RV-GOMEA, expanding its state-of-the-art performance even to problems with overlap** dependencies. However, that work assumed that the dependency structure is known a priori. Fitness-based linkage learning techniques have previously been used to detect dependencies during optimization, but only for non-conditional linkage models. In this work, we combine fitness-based linkage learning and conditional linkage modelling in RV-GOMEA. In addition, we propose a new way to model overlap** dependencies in conditional linkage models to maximize the joint sampling of fully interdependent groups of variables. We compare the resulting novel variant of RV-GOMEA to other variants of RV-GOMEA and VkD-CMA on 12 problems with varying degree of overlap** dependencies. We find that the new RV-GOMEA not only performs best on most problems, also the overhead of learning the conditional linkage models during optimization is often negligible. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.09854 [pdf, other]

Improving the efficiency of GP-GOMEA for higher-arity operators

Authors: Thalea Schlender, Mafalda Malafaia, Tanja Alderliesten, Peter A. N. Bosman

Abstract: Deploying machine learning models into sensitive domains in our society requires these models to be explainable. Genetic Programming (GP) can offer a way to evolve inherently interpretable expressions. GP-GOMEA is a form of GP that has been found particularly effective at evolving expressions that are accurate yet of limited size and, thus, promote interpretability. Despite this strength, a limita… ▽ More Deploying machine learning models into sensitive domains in our society requires these models to be explainable. Genetic Programming (GP) can offer a way to evolve inherently interpretable expressions. GP-GOMEA is a form of GP that has been found particularly effective at evolving expressions that are accurate yet of limited size and, thus, promote interpretability. Despite this strength, a limitation of GP-GOMEA is template-based. This negatively affects its scalability regarding the arity of operators that can be used, since with increasing operator arity, an increasingly large part of the template tends to go unused. In this paper, we therefore propose two enhancements to GP-GOMEA: (i) semantic subtree inheritance, which performs additional variation steps that consider the semantic context of a subtree, and (ii) greedy child selection, which explicitly considers parts of the template that in standard GP-GOMEA remain unused. We compare different versions of GP-GOMEA regarding search enhancements on a set of continuous and discontinuous regression problems, with varying tree depths and operator sets. Experimental results show that both proposed search enhancements have a generally positive impact on the performance of GP-GOMEA, especially when the set of operators to choose from is large and contains higher-arity operators. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.08255 [pdf, other]

Distal Interference: Exploring the Limits of Model-Based Continual Learning

Authors: Heinrich van Deventer, Anna Sergeevna Bosman

Abstract: Continual learning is the sequential learning of different tasks by a machine learning model. Continual learning is known to be hindered by catastrophic interference or forgetting, i.e. rapid unlearning of earlier learned tasks when new tasks are learned. Despite their practical success, artificial neural networks (ANNs) are prone to catastrophic interference. This study analyses how gradient desc… ▽ More Continual learning is the sequential learning of different tasks by a machine learning model. Continual learning is known to be hindered by catastrophic interference or forgetting, i.e. rapid unlearning of earlier learned tasks when new tasks are learned. Despite their practical success, artificial neural networks (ANNs) are prone to catastrophic interference. This study analyses how gradient descent and overlap** representations between distant input points lead to distal interference and catastrophic interference. Distal interference refers to the phenomenon where training a model on a subset of the domain leads to non-local changes on other subsets of the domain. This study shows that uniformly trainable models without distal interference must be exponentially large. A novel antisymmetric bounded exponential layer B-spline ANN architecture named ABEL-Spline is proposed that can approximate any continuous function, is uniformly trainable, has polynomial computational complexity, and provides some guarantees for distal interference. Experiments are presented to demonstrate the theoretical properties of ABEL-Splines. ABEL-Splines are also evaluated on benchmark regression problems. It is concluded that the weaker distal interference guarantees in ABEL-Splines are insufficient for model-only continual learning. It is conjectured that continual learning with polynomial complexity models requires augmentation of the training data or algorithm. △ Less

Submitted 13 February, 2024; originally announced February 2024.

MSC Class: 68T07 ACM Class: I.5.1

arXiv:2401.16867 [pdf, other]

A Tournament of Transformation Models: B-Spline-based vs. Mesh-based Multi-Objective Deformable Image Registration

Authors: Georgios Andreadis, Joas I. Mulder, Anton Bouter, Peter A. N. Bosman, Tanja Alderliesten

Abstract: The transformation model is an essential component of any deformable image registration approach. It provides a representation of physical deformations between images, thereby defining the range and realism of registrations that can be found. Two types of transformation models have emerged as popular choices: B-spline models and mesh models. Although both models have been investigated in detail, a… ▽ More The transformation model is an essential component of any deformable image registration approach. It provides a representation of physical deformations between images, thereby defining the range and realism of registrations that can be found. Two types of transformation models have emerged as popular choices: B-spline models and mesh models. Although both models have been investigated in detail, a direct comparison has not yet been made, since the models are optimized using very different optimization methods in practice. B-spline models are predominantly optimized using gradient-descent methods, while mesh models are typically optimized using finite-element method solvers or evolutionary algorithms. Multi-objective optimization methods, which aim to find a diverse set of high-quality trade-off registrations, are increasingly acknowledged to be important in deformable image registration. Since these methods search for a diverse set of registrations, they can provide a more complete picture of the capabilities of different transformation models, making them suitable for a comparison of models. In this work, we conduct the first direct comparison between B-spline and mesh transformation models, by optimizing both models with the same state-of-the-art multi-objective optimization method, the Multi-Objective Real-Valued Gene-pool Optimal Mixing Evolutionary Algorithm (MO-RV-GOMEA). The combination with B-spline transformation models, moreover, is novel. We experimentally compare both models on two different registration problems that are both based on pelvic CT scans of cervical cancer patients, featuring large deformations. Our results, on three cervical cancer patients, indicate that the choice of transformation model can have a profound impact on the diversity and quality of achieved registration outcomes. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: Pre-print for the SPIE Medical Imaging: Image Processing Conference

arXiv:2401.07372 [pdf, other]

Self and mixed delta-moves on algebraically split links

Authors: Anthony Bosman, Devin Garcia, Justyce Goode, Yamil Kas-Danouche, Davielle Smith

Abstract: A delta-move is a local move on a link diagram. The delta-Gordian distance between links measures the minimum number of delta-moves needed to move between link diagrams. A self delta-move only involves a single component of a link whereas a mixed delta-move involves multiple (2 or 3) components. We prove that two links are mixed delta-equivalent precisely when they have the same pairwise linking n… ▽ More A delta-move is a local move on a link diagram. The delta-Gordian distance between links measures the minimum number of delta-moves needed to move between link diagrams. A self delta-move only involves a single component of a link whereas a mixed delta-move involves multiple (2 or 3) components. We prove that two links are mixed delta-equivalent precisely when they have the same pairwise linking number; we also give a number of results on how (mixed/self) delta-moves relate to classical link invariants including the Arf invariant and crossing number. This allows us to produce a graph showing links related by a self delta-move for algebraically split links with up to 9-crossings. For these links we also introduce and calculate the delta-splitting number and mixed delta-splitting number, that is, the minimum number of delta-moves needed to separate the components of the link. △ Less

Submitted 14 January, 2024; originally announced January 2024.

Comments: 10 pages, 10 figures, 3 tables

arXiv:2307.15621 [pdf, other]

Shrink-Perturb Improves Architecture Mixing during Population Based Training for Neural Architecture Search

Authors: Alexander Chebykin, Arkadiy Dushatskiy, Tanja Alderliesten, Peter A. N. Bosman

Abstract: In this work, we show that simultaneously training and mixing neural networks is a promising way to conduct Neural Architecture Search (NAS). For hyperparameter optimization, reusing the partially trained weights allows for efficient search, as was previously demonstrated by the Population Based Training (PBT) algorithm. We propose PBT-NAS, an adaptation of PBT to NAS where architectures are impro… ▽ More In this work, we show that simultaneously training and mixing neural networks is a promising way to conduct Neural Architecture Search (NAS). For hyperparameter optimization, reusing the partially trained weights allows for efficient search, as was previously demonstrated by the Population Based Training (PBT) algorithm. We propose PBT-NAS, an adaptation of PBT to NAS where architectures are improved during training by replacing poorly-performing networks in a population with the result of mixing well-performing ones and inheriting the weights using the shrink-perturb technique. After PBT-NAS terminates, the created networks can be directly used without retraining. PBT-NAS is highly parallelizable and effective: on challenging tasks (image generation and reinforcement learning) PBT-NAS achieves superior performance compared to baselines (random search and mutation-based PBT). △ Less

Submitted 28 July, 2023; originally announced July 2023.

Comments: 10 pages, 7 figures. Accepted at ECAI 2023

arXiv:2306.16090 [pdf, other]

doi 10.1145/3583133.3596321

Empirical Loss Landscape Analysis of Neural Network Activation Functions

Authors: Anna Sergeevna Bosman, Andries Engelbrecht, Marde Helbig

Abstract: Activation functions play a significant role in neural network design by enabling non-linearity. The choice of activation function was previously shown to influence the properties of the resulting loss landscape. Understanding the relationship between activation functions and loss landscape properties is important for neural architecture and training algorithm design. This study empirically invest… ▽ More Activation functions play a significant role in neural network design by enabling non-linearity. The choice of activation function was previously shown to influence the properties of the resulting loss landscape. Understanding the relationship between activation functions and loss landscape properties is important for neural architecture and training algorithm design. This study empirically investigates neural network loss landscapes associated with hyperbolic tangent, rectified linear unit, and exponential linear unit activation functions. Rectified linear unit is shown to yield the most convex loss landscape, and exponential linear unit is shown to yield the least flat loss landscape, and to exhibit superior generalisation performance. The presence of wide and narrow valleys in the loss landscape is established for all activation functions, and the narrow valleys are shown to correlate with saturated neurons and implicitly regularised network configurations. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: Accepted for publication in Genetic and Evolutionary Computation Conference Companion, July 15--19, 2023, Lisbon, Portugal

arXiv:2306.01436 [pdf, other]

Multi-Objective Population Based Training

Authors: Arkadiy Dushatskiy, Alexander Chebykin, Tanja Alderliesten, Peter A. N. Bosman

Abstract: Population Based Training (PBT) is an efficient hyperparameter optimization algorithm. PBT is a single-objective algorithm, but many real-world hyperparameter optimization problems involve two or more conflicting objectives. In this work, we therefore introduce a multi-objective version of PBT, MO-PBT. Our experiments on diverse multi-objective hyperparameter optimization problems (Precision/Recal… ▽ More Population Based Training (PBT) is an efficient hyperparameter optimization algorithm. PBT is a single-objective algorithm, but many real-world hyperparameter optimization problems involve two or more conflicting objectives. In this work, we therefore introduce a multi-objective version of PBT, MO-PBT. Our experiments on diverse multi-objective hyperparameter optimization problems (Precision/Recall, Accuracy/Fairness, Accuracy/Adversarial Robustness) show that MO-PBT outperforms random search, single-objective PBT, and the state-of-the-art multi-objective hyperparameter optimization algorithm MO-ASHA. △ Less

Submitted 2 June, 2023; originally announced June 2023.

arXiv:2305.06246 [pdf, other]

doi 10.1145/3583133.3596361

A Joint Python/C++ Library for Efficient yet Accessible Black-Box and Gray-Box Optimization with GOMEA

Authors: Anton Bouter, Peter A. N. Bosman

Abstract: Exploiting knowledge about the structure of a problem can greatly benefit the efficiency and scalability of an Evolutionary Algorithm (EA). Model-Based EAs (MBEAs) are capable of doing this by explicitly modeling the problem structure. The Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) is among the state-of-the-art of MBEAs due to its use of a linkage model and the optimal mixing variatio… ▽ More Exploiting knowledge about the structure of a problem can greatly benefit the efficiency and scalability of an Evolutionary Algorithm (EA). Model-Based EAs (MBEAs) are capable of doing this by explicitly modeling the problem structure. The Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) is among the state-of-the-art of MBEAs due to its use of a linkage model and the optimal mixing variation operator. Especially in a Gray-Box Optimization (GBO) setting that allows for partial evaluations, i.e., the relatively efficient evaluation of a partial modification of a solution, GOMEA is known to excel. Such GBO settings are known to exist in various real-world applications to which GOMEA has successfully been applied. In this work, we introduce the GOMEA library, making existing GOMEA code in C++ accessible through Python, which serves as a centralized way of maintaining and distributing code of GOMEA for various optimization domains. Moreover, it allows for the straightforward definition of BBO as well as GBO fitness functions within Python, which are called from the C++ optimization code for each required (partial) evaluation. We describe the structure of the GOMEA library and how it can be used, and we show its performance in both GBO and Black-Box Optimization (BBO). △ Less

Submitted 10 May, 2023; originally announced May 2023.

arXiv:2304.02933 [pdf, other]

doi 10.1007/978-3-031-27524-1_19

Convolutional neural networks for crack detection on flexible road pavements

Authors: Hermann Tapamo, Anna Bosman, James Maina, Emile Horak

Abstract: Flexible road pavements deteriorate primarily due to traffic and adverse environmental conditions. Cracking is the most common deterioration mechanism; the surveying thereof is typically conducted manually using internationally defined classification standards. In South Africa, the use of high-definition video images has been introduced, which allows for safer road surveying. However, surveying is… ▽ More Flexible road pavements deteriorate primarily due to traffic and adverse environmental conditions. Cracking is the most common deterioration mechanism; the surveying thereof is typically conducted manually using internationally defined classification standards. In South Africa, the use of high-definition video images has been introduced, which allows for safer road surveying. However, surveying is still a tedious manual process. Automation of the detection of defects such as cracks would allow for faster analysis of road networks and potentially reduce human bias and error. This study performs a comparison of six state-of-the-art convolutional neural network models for the purpose of crack detection. The models are pretrained on the ImageNet dataset, and fine-tuned using a new real-world binary crack dataset consisting of 14000 samples. The effects of dataset augmentation are also investigated. Of the six models trained, five achieved accuracy above 97%. The highest recorded accuracy was 98%, achieved by the ResNet and VGG16 models. The dataset is available at the following URL: https://zenodo.org/record/7795975 △ Less

Submitted 6 April, 2023; originally announced April 2023.

Comments: A version of this paper is published in the Proceedings of the 14th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2022). Lecture Notes in Networks and Systems, vol 648. Springer, Cham

arXiv:2303.16912 [pdf, other]

Training Feedforward Neural Networks with Bayesian Hyper-Heuristics

Authors: Arné Schreuder, Anna Bosman, Andries Engelbrecht, Christopher Cleghorn

Abstract: The process of training feedforward neural networks (FFNNs) can benefit from an automated process where the best heuristic to train the network is sought out automatically by means of a high-level probabilistic-based heuristic. This research introduces a novel population-based Bayesian hyper-heuristic (BHH) that is used to train feedforward neural networks (FFNNs). The performance of the BHH is co… ▽ More The process of training feedforward neural networks (FFNNs) can benefit from an automated process where the best heuristic to train the network is sought out automatically by means of a high-level probabilistic-based heuristic. This research introduces a novel population-based Bayesian hyper-heuristic (BHH) that is used to train feedforward neural networks (FFNNs). The performance of the BHH is compared to that of ten popular low-level heuristics, each with different search behaviours. The chosen heuristic pool consists of classic gradient-based heuristics as well as meta-heuristics (MHs). The empirical process is executed on fourteen datasets consisting of classification and regression problems with varying characteristics. The BHH is shown to be able to train FFNNs well and provide an automated method for finding the best heuristic to train the FFNNs at various stages of the training process. △ Less

Submitted 29 March, 2023; originally announced March 2023.

arXiv:2303.15543 [pdf, other]

The Impact of Asynchrony on Parallel Model-Based EAs

Authors: Arthur Guijt, Dirk Thierens, Tanja Alderliesten, Peter A. N. Bosman

Abstract: In a parallel EA one can strictly adhere to the generational clock, and wait for all evaluations in a generation to be done. However, this idle time limits the throughput of the algorithm and wastes computational resources. Alternatively, an EA can be made asynchronous parallel. However, EAs using classic recombination and selection operators (GAs) are known to suffer from an evaluation time bias,… ▽ More In a parallel EA one can strictly adhere to the generational clock, and wait for all evaluations in a generation to be done. However, this idle time limits the throughput of the algorithm and wastes computational resources. Alternatively, an EA can be made asynchronous parallel. However, EAs using classic recombination and selection operators (GAs) are known to suffer from an evaluation time bias, which also influences the performance of the approach. Model-Based Evolutionary Algorithms (MBEAs) are more scalable than classic GAs by virtue of capturing the structure of a problem in a model. If this model is learned through linkage learning based on the population, the learned model may also capture biases. Thus, if an asynchronous parallel MBEA is also affected by an evaluation time bias, this could result in learned models to be less suited to solving the problem, reducing performance. Therefore, in this work, we study the impact and presence of evaluation time biases on MBEAs in an asynchronous parallelization setting, and compare this to the biases in GAs. We find that a modern MBEA, GOMEA, is unaffected by evaluation time biases, while the more classical MBEA, ECGA, is affected, much like GAs are. △ Less

Submitted 27 March, 2023; originally announced March 2023.

Comments: 9 pages, 3 figures, 3 tables, submitted to GECCO 2023

arXiv:2303.11501 [pdf, other]

Convolutions, Transformers, and their Ensembles for the Segmentation of Organs at Risk in Radiation Treatment of Cervical Cancer

Authors: Vangelis Kostoulas, Peter A. N. Bosman, Tanja Alderliesten

Abstract: Segmentation of regions of interest in images of patients, is a crucial step in many medical procedures. Deep neural networks have proven to be particularly adept at this task. However, a key question is what type of deep neural network to choose, and whether making a certain choice makes a difference. In this work, we will answer this question for the task of segmentation of the Organs At Risk (O… ▽ More Segmentation of regions of interest in images of patients, is a crucial step in many medical procedures. Deep neural networks have proven to be particularly adept at this task. However, a key question is what type of deep neural network to choose, and whether making a certain choice makes a difference. In this work, we will answer this question for the task of segmentation of the Organs At Risk (OARs) in radiation treatment of cervical cancer (i.e., bladder, bowel, rectum, sigmoid) in Magnetic Resonance Imaging (MRI) scans. We compare several state-of-the-art models belonging to different architecture categories, as well as a few new models that combine aspects of several state-of-the-art models, to see if the results one gets are markedly different. We visualize model predictions, create all possible ensembles of models by averaging their output probabilities, and calculate the Dice Coefficient between predictions of models, in order to understand the differences between them and the potential of possible combinations. The results show that small improvements in metrics can be achieved by advancing and merging architectures, but the predictions of the models are quite similar (most models achieve on average more than 0.8 Dice Coefficient when compared to the outputs of other models). However, the results from the ensemble experiments indicate that the best results are obtained when the best performing models from every category of the architectures are combined. △ Less

Submitted 20 March, 2023; originally announced March 2023.

arXiv:2303.09627 [pdf, other]

Denoising Diffusion Post-Processing for Low-Light Image Enhancement

Authors: Savvas Panagiotou, Anna S. Bosman

Abstract: Low-light image enhancement (LLIE) techniques attempt to increase the visibility of images captured in low-light scenarios. However, as a result of enhancement, a variety of image degradations such as noise and color bias are revealed. Furthermore, each particular LLIE approach may introduce a different form of flaw within its enhanced results. To combat these image degradations, post-processing d… ▽ More Low-light image enhancement (LLIE) techniques attempt to increase the visibility of images captured in low-light scenarios. However, as a result of enhancement, a variety of image degradations such as noise and color bias are revealed. Furthermore, each particular LLIE approach may introduce a different form of flaw within its enhanced results. To combat these image degradations, post-processing denoisers have widely been used, which often yield oversmoothed results lacking detail. We propose using a diffusion model as a post-processing approach, and we introduce Low-light Post-processing Diffusion Model (LPDM) in order to model the conditional distribution between under-exposed and normally-exposed images. We apply LPDM in a manner which avoids the computationally expensive generative reverse process of typical diffusion models, and post-process images in one pass through LPDM. Extensive experiments demonstrate that our approach outperforms competing post-processing denoisers by increasing the perceptual quality of enhanced low-light images on a variety of challenging low-light datasets. Source code is available at https://github.com/savvaki/LPDM. △ Less

Submitted 24 June, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

arXiv:2303.08124 [pdf, other]

Bi-objective optimization of organ properties for the simulation of intracavitary brachytherapy applicator placement in cervical cancer

Authors: Cedric J. Rodriguez, Stephanie M. de Boer, Peter A. N. Bosman, Tanja Alderliesten

Abstract: Validation of deformable image registration techniques is extremely important, but hard, especially when complex deformations or content mismatch are involved. These complex deformations and content mismatch, for example, occur after the placement of an applicator for brachytherapy for cervical cancer. Virtual phantoms could enable the creation of validation data sets with ground truth deformation… ▽ More Validation of deformable image registration techniques is extremely important, but hard, especially when complex deformations or content mismatch are involved. These complex deformations and content mismatch, for example, occur after the placement of an applicator for brachytherapy for cervical cancer. Virtual phantoms could enable the creation of validation data sets with ground truth deformations that simulate the large deformations that occur between image acquisitions. However, the quality of the multi-organ Finite Element Method (FEM)-based simulations is dependent on the patient-specific external forces and mechanical properties assigned to the organs. A common approach to calibrate these simulation parameters is through optimization, finding the parameter settings that optimize the match between the outcome of the simulation and reality. When considering inherently simplified organ models, we hypothesize that the optimal deformations of one organ cannot be achieved with a single parameter setting without compromising the optimality of the deformation of the surrounding organs. This means that there will be a trade-off between the optimal deformations of adjacent organs, such as the vagina-uterus and bladder. This work therefore proposes and evaluates a multi-objective optimization approach where the trade-off between organ deformations can be assessed after optimization. We showcase what the extent of the trade-off looks like when bi-objectively optimizing the patient-specific mechanical properties and external forces of the vagina-uterus and bladder for FEM-based simulations. △ Less

Submitted 22 February, 2023; originally announced March 2023.

arXiv:2303.04873 [pdf, other]

MOREA: a GPU-accelerated Evolutionary Algorithm for Multi-Objective Deformable Registration of 3D Medical Images

Authors: Georgios Andreadis, Peter A. N. Bosman, Tanja Alderliesten

Abstract: Finding a realistic deformation that transforms one image into another, in case large deformations are required, is considered a key challenge in medical image analysis. Having a proper image registration approach to achieve this could unleash a number of applications requiring information to be transferred between images. Clinical adoption is currently hampered by many existing methods requiring… ▽ More Finding a realistic deformation that transforms one image into another, in case large deformations are required, is considered a key challenge in medical image analysis. Having a proper image registration approach to achieve this could unleash a number of applications requiring information to be transferred between images. Clinical adoption is currently hampered by many existing methods requiring extensive configuration effort before each use, or not being able to (realistically) capture large deformations. A recent multi-objective approach that uses the Multi-Objective Real-Valued Gene-pool Optimal Mixing Evolutionary Algorithm (MO-RV-GOMEA) and a dual-dynamic mesh transformation model has shown promise, exposing the trade-offs inherent to image registration problems and modeling large deformations in 2D. This work builds on this promise and introduces MOREA: the first evolutionary algorithm-based multi-objective approach to deformable registration of 3D images capable of tackling large deformations. MOREA includes a 3D biomechanical mesh model for physical plausibility and is fully GPU-accelerated. We compare MOREA to two state-of-the-art approaches on abdominal CT scans of 4 cervical cancer patients, with the latter two approaches configured for the best results per patient. Without requiring per-patient configuration, MOREA significantly outperforms these approaches on 3 of the 4 patients that represent the most difficult cases. △ Less

Submitted 8 March, 2023; originally announced March 2023.

arXiv:2302.10661 [pdf, other]

Clinically Acceptable Segmentation of Organs at Risk in Cervical Cancer Radiation Treatment from Clinically Available Annotations

Authors: Monika Grewal, Dustin van Weersel, Henrike Westerveld, Peter A. N. Bosman, Tanja Alderliesten

Abstract: Deep learning models benefit from training with a large dataset (labeled or unlabeled). Following this motivation, we present an approach to learn a deep learning model for the automatic segmentation of Organs at Risk (OARs) in cervical cancer radiation treatment from a large clinically available dataset of Computed Tomography (CT) scans containing data inhomogeneity, label noise, and missing anno… ▽ More Deep learning models benefit from training with a large dataset (labeled or unlabeled). Following this motivation, we present an approach to learn a deep learning model for the automatic segmentation of Organs at Risk (OARs) in cervical cancer radiation treatment from a large clinically available dataset of Computed Tomography (CT) scans containing data inhomogeneity, label noise, and missing annotations. We employ simple heuristics for automatic data cleaning to minimize data inhomogeneity and label noise. Further, we develop a semi-supervised learning approach utilizing a teacher-student setup, annotation imputation, and uncertainty-guided training to learn in presence of missing annotations. Our experimental results show that learning from a large dataset with our approach yields a significant improvement in the test performance despite missing annotations in the data. Further, the contours generated from the segmentation masks predicted by our model are found to be equally clinically acceptable as manually generated contours. △ Less

Submitted 21 February, 2023; originally announced February 2023.

Comments: submitted to MIDL 2023 conference

arXiv:2302.07646 [pdf, other]

Genetic Micro-Programs for Automated Software Testing with Large Path Coverage

Authors: Jarrod Goschen, Anna Sergeevna Bosman, Stefan Gruner

Abstract: Ongoing progress in computational intelligence (CI) has led to an increased desire to apply CI techniques for the purpose of improving software engineering processes, particularly software testing. Existing state-of-the-art automated software testing techniques focus on utilising search algorithms to discover input values that achieve high execution path coverage. These algorithms are trained on t… ▽ More Ongoing progress in computational intelligence (CI) has led to an increased desire to apply CI techniques for the purpose of improving software engineering processes, particularly software testing. Existing state-of-the-art automated software testing techniques focus on utilising search algorithms to discover input values that achieve high execution path coverage. These algorithms are trained on the same code that they intend to test, requiring instrumentation and lengthy search times to test each software component. This paper outlines a novel genetic programming framework, where the evolved solutions are not input values, but micro-programs that can repeatedly generate input values to efficiently explore a software component's input parameter domain. We also argue that our approach can be generalised such as to be applied to many different software systems, and is thus not specific to merely the particular software component on which it was trained. △ Less

Submitted 14 February, 2023; originally announced February 2023.

Comments: A version of this paper has been accepted for publication in CEC'22

arXiv:2302.07238 [pdf, other]

Cauchy Loss Function: Robustness Under Gaussian and Cauchy Noise

Authors: Thamsanqa Mlotshwa, Heinrich van Deventer, Anna Sergeevna Bosman

Abstract: In supervised machine learning, the choice of loss function implicitly assumes a particular noise distribution over the data. For example, the frequently used mean squared error (MSE) loss assumes a Gaussian noise distribution. The choice of loss function during training and testing affects the performance of artificial neural networks (ANNs). It is known that MSE may yield substandard performance… ▽ More In supervised machine learning, the choice of loss function implicitly assumes a particular noise distribution over the data. For example, the frequently used mean squared error (MSE) loss assumes a Gaussian noise distribution. The choice of loss function during training and testing affects the performance of artificial neural networks (ANNs). It is known that MSE may yield substandard performance in the presence of outliers. The Cauchy loss function (CLF) assumes a Cauchy noise distribution, and is therefore potentially better suited for data with outliers. This papers aims to determine the extent of robustness and generalisability of the CLF as compared to MSE. CLF and MSE are assessed on a few handcrafted regression problems, and a real-world regression problem with artificially simulated outliers, in the context of ANN training. CLF yielded results that were either comparable to or better than the results yielded by MSE, with a few notable exceptions. △ Less

Submitted 14 February, 2023; originally announced February 2023.

Comments: A version of this paper was accepted for publication in SACAIR'22

arXiv:2302.03726 [pdf, other]

doi 10.3847/2041-8213/acb651

A potential site for wide-orbit giant planet formation in the IM Lup disk

Authors: Arthur Bosman, Johan Appelgren, Edwin A. Bergin, Michiel Lambrechts, Anders Johansen

Abstract: The radial transport, or drift, of dust has taken a critical role in giant planet formation theory. However, it has been challenging to identify dust drift pile ups in the hard-to-observe inner disk. We find that the IM Lup disk shows evidence that it has been shaped by an episode of dust drift. Using radiative transfer and dust dynamical modeling we study the radial and vertical dust distribution… ▽ More The radial transport, or drift, of dust has taken a critical role in giant planet formation theory. However, it has been challenging to identify dust drift pile ups in the hard-to-observe inner disk. We find that the IM Lup disk shows evidence that it has been shaped by an episode of dust drift. Using radiative transfer and dust dynamical modeling we study the radial and vertical dust distribution. We find that high dust drift rates exceeding 110 M_earth/Myr are necessary to explain both the dust and CO observations. Furthermore, the bulk of the large dust present in the inner 20 au needs to be vertically extended, implying high turbulence alpha_z > 10^{-3} and small grains (0.2-1 mm). We suggest that this increased level of particle stirring is consistent with the inner dust-rich disk undergoing turbulence triggered by the vertical shear instability. The conditions in the IM Lup disk imply that giant planet formation through pebble accretion is only effective outside 20 au. If such an early, high turbulence inner region is a natural consequence of high dust drift rates, then this has major implications for understanding the formation regions of giant planets including Jupiter and Saturn. △ Less

Submitted 7 February, 2023; originally announced February 2023.

Comments: 12 pages, 9 figures, Accepted by Astrophysical Journal Letters

arXiv:2212.05539 [pdf, other]

UV-driven Chemistry as a Signpost for Late-stage Planet Formation

Authors: Jenny K. Calahan, Edwin A. Bergin, Arthur D. Bosman, Evan Rich, Sean M. Andrews, Jennifer B. Bergner, L. Ilsedore Cleeves, Viviana V. Guzman, Jane Huang, John D. Ilee, Charles J. Law, Romane Le Gal, Karin I. Oberg, Richard Teague, Catherine Walsh, David J. Wilner, Ke Zhang

Abstract: The chemical reservoir within protoplanetary disks has a direct impact on planetary compositions and the potential for life. A long-lived carbon-and nitrogen-rich chemistry at cold temperatures (<=50K) is observed within cold and evolved planet-forming disks. This is evidenced by bright emission from small organic radicals in 1-10 Myr aged systems that would otherwise have frozen out onto grains w… ▽ More The chemical reservoir within protoplanetary disks has a direct impact on planetary compositions and the potential for life. A long-lived carbon-and nitrogen-rich chemistry at cold temperatures (<=50K) is observed within cold and evolved planet-forming disks. This is evidenced by bright emission from small organic radicals in 1-10 Myr aged systems that would otherwise have frozen out onto grains within 1 Myr. We explain how the chemistry of a planet-forming disk evolves from a cosmic-ray/X-ray-dominated regime to an ultraviolet-dominated chemical equilibrium. This, in turn, will bring about a temporal transition in the chemical reservoir from which planets will accrete. This photochemical dominated gas phase chemistry develops as dust evolves via growth, settling and drift, and the small grain population is depleted from the disk atmosphere. A higher gas-to-dust mass ratio allows for deeper penetration of ultraviolet photons is coupled with a carbon-rich gas (C/O > 1) to form carbon-bearing radicals and ions. This further results in gas phase formation of organic molecules, which then would be accreted by any actively forming planets present in the evolved disk. △ Less

Submitted 11 December, 2022; originally announced December 2022.

Comments: Accepted to Nature Astronomy, Published Dec 8th 2022

arXiv:2211.00731 [pdf, other]

Comparision Of Adversarial And Non-Adversarial LSTM Music Generative Models

Authors: Moseli Mots'oehli, Anna Sergeevna Bosman, Johan Pieter De Villiers

Abstract: Algorithmic music composition is a way of composing musical pieces with minimal to no human intervention. While recurrent neural networks are traditionally applied to many sequence-to-sequence prediction tasks, including successful implementations of music composition, their standard supervised learning approach based on input-to-output map** leads to a lack of note variety. These models can the… ▽ More Algorithmic music composition is a way of composing musical pieces with minimal to no human intervention. While recurrent neural networks are traditionally applied to many sequence-to-sequence prediction tasks, including successful implementations of music composition, their standard supervised learning approach based on input-to-output map** leads to a lack of note variety. These models can therefore be seen as potentially unsuitable for tasks such as music generation. Generative adversarial networks learn the generative distribution of data and lead to varied samples. This work implements and compares adversarial and non-adversarial training of recurrent neural network music composers on MIDI data. The resulting music samples are evaluated by human listeners, their preferences recorded. The evaluation indicates that adversarial training produces more aesthetically pleasing music. △ Less

Submitted 1 November, 2022; originally announced November 2022.

Comments: Submitted to a 2023 conference, 20 pages, 13 figures

arXiv:2209.08216 [pdf, other]

doi 10.3847/1538-3881/aca80b

The kinematics and excitation of infrared water vapor emission from planet-forming disks: results from spectrally-resolved surveys and guidelines for JWST spectra

Authors: Andrea Banzatti, Klaus M. Pontoppidan, José Pérez Chávez, Colette Salyk, Lindsey Diehl, Simon Bruderer, Greg J. Herczeg, Andres Carmona, Ilaria Pascucci, Sean Brittain, Stanley Jensen, Sierra Grant, Ewine F. van Dishoeck, Inga Kamp, Arthur D. Bosman, Karin I. Öberg, Geoff A. Blake, Michael R. Meyer, Eric Gaidos, Adwin Boogert, John T. Rayner, Caleb Wheeler

Abstract: This work presents ground-based spectrally-resolved water emission at R = 30000-100000 over infrared wavelengths covered by JWST (2.9-12.8 $μ$m). Two new surveys with iSHELL and VISIR are combined with previous spectra from CRIRES and TEXES to cover parts of multiple ro-vibrational and rotational bands observable within telluric transmission bands, for a total of $\approx160$ spectra and 85 disks… ▽ More This work presents ground-based spectrally-resolved water emission at R = 30000-100000 over infrared wavelengths covered by JWST (2.9-12.8 $μ$m). Two new surveys with iSHELL and VISIR are combined with previous spectra from CRIRES and TEXES to cover parts of multiple ro-vibrational and rotational bands observable within telluric transmission bands, for a total of $\approx160$ spectra and 85 disks (30 of which are JWST targets in Cycle 1). The general expectation of a range of regions and excitation conditions traced by infrared water spectra is for the first time supported by the combined kinematics and excitation as spectrally resolved at multiple wavelengths. The main findings from this analysis are: 1) water lines are progressively narrower from the ro-vibrational bands at 2-9 $μ$m to the rotational lines at 12 $μ$m, and partly match a broad (BC) and narrow (NC) emission components, respectively, as extracted from ro-vibrational CO spectra; 2) rotation diagrams of resolved water lines from upper level energies of 4000-9500 K show vertical spread and curvatures indicative of optically thick emission ($\approx 10^{18}$ cm$^{-2}$) from a range of excitation temperatures ($\approx 800$-1100 K); 3) the new 5 $μ$m spectra demonstrate that slab model fits to the rotational lines at $> 10$ $μ$m strongly over-predict the ro-vibrational emission bands at $< 9$ $μ$m, implying non-LTE vibrational excitation. We discuss these findings in the context of emission from a disk surface and a molecular inner disk wind, and provide a list of guidelines to support the analysis of spectrally-unresolved JWST spectra. △ Less

Submitted 16 November, 2022; v1 submitted 16 September, 2022; originally announced September 2022.

Comments: Accepted for publication on AJ

arXiv:2208.13789 [pdf, other]

doi 10.1051/0004-6361/202142181

The formation of CO$_2$ through consumption of gas-phase CO on vacuum-UV irradiated water ice

Authors: J. Terwisscha van Scheltinga, N. F. W. Ligterink, A. D. Bosman, M. R. Hogerheijde, H. Linnartz

Abstract: [Abridged] Observations of protoplanetary disks suggest that they are depleted in gas-phase CO. It has been posed that gas-phase CO is chemically consumed and converted into less volatile species through gas-grain processes. Observations of interstellar ices reveal a CO$_2$ component within H$_2$O ice suggesting co-formation. The aim of this work is to experimentally verify the interaction of gas-… ▽ More [Abridged] Observations of protoplanetary disks suggest that they are depleted in gas-phase CO. It has been posed that gas-phase CO is chemically consumed and converted into less volatile species through gas-grain processes. Observations of interstellar ices reveal a CO$_2$ component within H$_2$O ice suggesting co-formation. The aim of this work is to experimentally verify the interaction of gas-phase CO with solid-state OH radicals above the sublimation temperature of CO. Amorphous solid water (ASW) is deposited at 15 K and followed by vacuum-UV (VUV) irradiation to dissociate H$_2$O and create OH radicals. Gas-phase CO is simultaneously admitted and only adsorbs with a short residence time on the ASW. Products in the solid state are studied with infrared spectroscopy and once released into the gas phase with mass spectrometry. Results show that gas-phase CO is converted into CO$_2$, with an efficiency of 7-27%, when interacting with VUV irradiated ASW. Between 40 and 90 K, CO$_2$ production is constant, above 90 K, O$_2$ production takes over. In the temperature range of 40-60 K, the CO$_2$ remains in the solid state, while at temperatures $\geq$ 70 K the formed CO$_2$ is released into the gas phase. We conclude that gas-phase CO reacts with solid-state OH radicals above its sublimation temperature. This gas-phase CO and solid-state OH radical interaction could explain the observed CO$_2$ embedded in water-rich ices. It may also contribute to the observed lack of gas-phase CO in planet-forming disks, as previously suggested. Our experiments indicate a lower water ice dissociation efficiency than originally adopted in model descriptions of planet-forming disks and molecular clouds. Incorporation of the reduced water ice dissociation and increased binding energy of CO on a water ice surfaces in these models would allow investigation of this gas-grain interaction to its full extend. △ Less

Submitted 29 August, 2022; originally announced August 2022.

Comments: Accepted for publication in Astronomy & Astrophysics

Journal ref: A&A 666, A35 (2022)

arXiv:2208.05388 [pdf, other]

ATLAS: Universal Function Approximator for Memory Retention

Authors: Heinrich van Deventer, Anna Bosman

Abstract: Artificial neural networks (ANNs), despite their universal function approximation capability and practical success, are subject to catastrophic forgetting. Catastrophic forgetting refers to the abrupt unlearning of a previous task when a new task is learned. It is an emergent phenomenon that hinders continual learning. Existing universal function approximation theorems for ANNs guarantee function… ▽ More Artificial neural networks (ANNs), despite their universal function approximation capability and practical success, are subject to catastrophic forgetting. Catastrophic forgetting refers to the abrupt unlearning of a previous task when a new task is learned. It is an emergent phenomenon that hinders continual learning. Existing universal function approximation theorems for ANNs guarantee function approximation ability, but do not predict catastrophic forgetting. This paper presents a novel universal approximation theorem for multi-variable functions using only single-variable functions and exponential functions. Furthermore, we present ATLAS: a novel ANN architecture based on the new theorem. It is shown that ATLAS is a universal function approximator capable of some memory retention, and continual learning. The memory of ATLAS is imperfect, with some off-target effects during continual learning, but it is well-behaved and predictable. An efficient implementation of ATLAS is provided. Experiments are conducted to evaluate both the function approximation and memory retention capabilities of ATLAS. △ Less

Submitted 10 August, 2022; originally announced August 2022.

MSC Class: 68T07 ACM Class: I.5.1

arXiv:2207.09027 [pdf, other]

doi 10.3847/2041-8213/ac822b

Water shielding in the terrestrial planet-forming zone: Implication for inner disk organics

Authors: Sara E. Duval, Arthur D. Bosman, Edwin A. Bergin

Abstract: The chemical composition of the inner region of protoplanetary disks can trace the composition of planetary building material. The exact elemental composition of the inner disk has not yet been measured and tensions between models and observations still exist. Recent advancements have shown UV-shielding to be able to increase emission of organics. Here, we expand on these models and investigate ho… ▽ More The chemical composition of the inner region of protoplanetary disks can trace the composition of planetary building material. The exact elemental composition of the inner disk has not yet been measured and tensions between models and observations still exist. Recent advancements have shown UV-shielding to be able to increase emission of organics. Here, we expand on these models and investigate how UV-shielding may impact chemical composition in the inner 5 au. In this work, we use the model from arxiv:2204.07108 and expand it with a larger chemical network. We focus on the chemical abundances in the upper disk atmosphere where the effects of water UV-shielding are most prominent and molecular lines originate. We find rich carbon and nitrogen chemistry with enhanced abundances of C2H2, CH4, HCN, CH3CN, and NH3 by > 3 orders of magnitude. This is caused by the self-shielding of H2O, which locks oxygen in water. This subsequently results in a suppression of oxygen-containing species like CO and CO2. The increase in C2H2 seen in the model with the inclusion of water UV-shielding allows us to explain the observed C2H2 abundance without resorting to elevated C/O ratios as water UV-shielding induced an effectively oxygen-poor environment in oxygen-rich gas. Thus, water UV-shielding is important for reproducing the observed abundances of hydrocarbons and nitriles. From our model result, species like CH4, NH3, and NO are expected to be observable with the James Webb Space Telescope (JWST). △ Less

Submitted 18 July, 2022; originally announced July 2022.

Comments: 8 pages, 3 figures, accepted to ApJL

arXiv:2207.04063 [pdf, other]

doi 10.3847/2041-8213/ac7e55

Water UV-Shielding in the Terrestrial Planet-Forming Zone: Implications for Oxygen-18 Isotope Anomalies in H2-18O Infrared Emission and Meteorites

Authors: Jenny K. Calahan, Edwin A. Bergin, Arthur D. Bosman

Abstract: An understanding of the abundance and distribution of water vapor in the innermost region of protoplanetary disks is key to understanding the origin of habitable worlds and planetary systems. Past observations have shown H2O to be abundant and a major carrier of elemental oxygen in disk surface layers that lie within the inner few au of the disk. The combination of high abundance and strong radiat… ▽ More An understanding of the abundance and distribution of water vapor in the innermost region of protoplanetary disks is key to understanding the origin of habitable worlds and planetary systems. Past observations have shown H2O to be abundant and a major carrier of elemental oxygen in disk surface layers that lie within the inner few au of the disk. The combination of high abundance and strong radiative transitions leads to emission lines that are optically thick across the infrared spectral range. Its rarer isotopologue H2-18O traces deeper into this layer and will trace the full content of the planet forming zone. In this work, we explore the relative distribution of H2-16O and H2-18O within a model that includes water self-shielding from the destructive effects of ultraviolet radiation. In this Letter we show that there is an enhancement in the relative H2-18O abundance high up in the warm molecular layer within 0.1-10 au due to self-shielding of CO, C18O, and H2O. Most transitions of H2-18O that can be observed with JWST will partially emit from this layer, making it essential to take into account how H2O self-shielding may effect the H2O to H2-18O ratio. Additionally, this reservoir of H2-18O-enriched gas in combination with the vertical "cold finger" effect might provide a natural mechanism to account for oxygen isotopic anomalies found in meteoritic material in the solar system △ Less

Submitted 8 July, 2022; originally announced July 2022.

Comments: 8 pages, 4 figures, accepted to ApJ Letters

arXiv:2207.02236 [pdf, other]

doi 10.3847/2041-8213/ac7d9f

Water UV-shielding in the terrestrial planet-forming zone: Implications for carbon dioxide emission

Authors: Arthur D. Bosman, Edwin A. Bergin, Jenny K. Calahan, Sara E. Duval

Abstract: Carbon Dioxide is an important tracer of the chemistry and physics in the terrestrial planet forming zone. Using a thermo-chemical model that has been tested against the mid-infrared water emission we re-interpret the CO2 emission as observed with Spitzer. We find that both water UV-shielding and extra chemical heating significantly reduce the total CO2 column in the emitting layer. Water UV-shiel… ▽ More Carbon Dioxide is an important tracer of the chemistry and physics in the terrestrial planet forming zone. Using a thermo-chemical model that has been tested against the mid-infrared water emission we re-interpret the CO2 emission as observed with Spitzer. We find that both water UV-shielding and extra chemical heating significantly reduce the total CO2 column in the emitting layer. Water UV-shielding is the more efficient effect, reducing the CO2 column by $\sim$ 2 orders of magnitude. These lower CO2 abundances lead to CO2-to-H2O flux ratios that are closer to the observed values, but CO2 emission is still too bright, especially in relative terms. Invoking the depletion of elemental oxygen outside of the water mid-plane iceline more strongly impacts the CO2 emission than it does the H2O emission, bringing the CO2-to-H2O emission in line with the observed values. We conclude that the CO2 emission observed with Spitzer-IRS is coming from a thin layer in the photo-sphere of the disk, similar to the strong water lines. Below this layer, we expect CO2 not to be present except when replenished by a physical process. This would be visible in the $^{13}$CO2 spectrum as well as certain $^{12}$CO2 features that can be observed by JWST-MIRI. △ Less

Submitted 7 July, 2022; v1 submitted 5 July, 2022; originally announced July 2022.

Comments: 8 pages, 4 figures, accepted for publication in ApJL

arXiv:2206.06903 [pdf, other]

A Local Optima Network Analysis of the Feedforward Neural Architecture Space

Authors: Isak Potgieter, Christopher W. Cleghorn, Anna S. Bosman

Abstract: This study investigates the use of local optima network (LON) analysis, a derivative of the fitness landscape of candidate solutions, to characterise and visualise the neural architecture space. The search space of feedforward neural network architectures with up to three layers, each with up to 10 neurons, is fully enumerated by evaluating trained model performance on a selection of data sets. Ex… ▽ More This study investigates the use of local optima network (LON) analysis, a derivative of the fitness landscape of candidate solutions, to characterise and visualise the neural architecture space. The search space of feedforward neural network architectures with up to three layers, each with up to 10 neurons, is fully enumerated by evaluating trained model performance on a selection of data sets. Extracted LONs, while heterogeneous across data sets, all exhibit simple global structures, with single global funnels in all cases but one. These results yield early indication that LONs may provide a viable paradigm by which to analyse and optimise neural architectures. △ Less

Submitted 2 June, 2022; originally announced June 2022.

Comments: A version of this paper has been accepted for publication at IJCNN'22

arXiv:2205.06376 [pdf, other]

KASAM: Spline Additive Models for Function Approximation

Authors: Heinrich van Deventer, Pieter Janse van Rensburg, Anna Bosman

Abstract: Neural networks have been criticised for their inability to perform continual learning due to catastrophic forgetting and rapid unlearning of a past concept when a new concept is introduced. Catastrophic forgetting can be alleviated by specifically designed models and training techniques. This paper outlines a novel Spline Additive Model (SAM). SAM exhibits intrinsic memory retention with sufficie… ▽ More Neural networks have been criticised for their inability to perform continual learning due to catastrophic forgetting and rapid unlearning of a past concept when a new concept is introduced. Catastrophic forgetting can be alleviated by specifically designed models and training techniques. This paper outlines a novel Spline Additive Model (SAM). SAM exhibits intrinsic memory retention with sufficient expressive power for many practical tasks, but is not a universal function approximator. SAM is extended with the Kolmogorov-Arnold representation theorem to a novel universal function approximator, called the Kolmogorov-Arnold Spline Additive Model - KASAM. The memory retention, expressive power and limitations of SAM and KASAM are illustrated analytically and empirically. SAM exhibited robust but imperfect memory retention, with small regions of overlap** interference in sequential learning tasks. KASAM exhibited greater susceptibility to catastrophic forgetting. KASAM in combination with pseudo-rehearsal training techniques exhibited superior performance in regression tasks and memory retention. △ Less

Submitted 12 May, 2022; originally announced May 2022.

MSC Class: 68T07 (Primary) ACM Class: I.5.1

arXiv:2204.12244 [pdf, other]

doi 10.1007/978-3-030-93314-2_11

Hybridised Loss Functions for Improved Neural Network Generalisation

Authors: Matthew C. Dickson, Anna S. Bosman, Katherine M. Malan

Abstract: Loss functions play an important role in the training of artificial neural networks (ANNs), and can affect the generalisation ability of the ANN model, among other properties. Specifically, it has been shown that the cross entropy and sum squared error loss functions result in different training dynamics, and exhibit different properties that are complementary to one another. It has previously bee… ▽ More Loss functions play an important role in the training of artificial neural networks (ANNs), and can affect the generalisation ability of the ANN model, among other properties. Specifically, it has been shown that the cross entropy and sum squared error loss functions result in different training dynamics, and exhibit different properties that are complementary to one another. It has previously been suggested that a hybrid of the entropy and sum squared error loss functions could combine the advantages of the two functions, while limiting their disadvantages. The effectiveness of such hybrid loss functions is investigated in this study. It is shown that hybridisation of the two loss functions improves the generalisation ability of the ANNs on all problems considered. The hybrid loss function that starts training with the sum squared error loss function and later switches to the cross entropy error loss function is shown to either perform the best on average, or to not be significantly different than the best loss function tested for all problems considered. This study shows that the minima discovered by the sum squared error loss function can be further exploited by switching to cross entropy error loss function. It can thus be concluded that hybridisation of the two loss functions could lead to better performance in ANNs. △ Less

Submitted 26 April, 2022; originally announced April 2022.

Comments: Edited version of this paper appears in the proceedings of the PAAISS'22 conference

arXiv:2204.12159 [pdf, other]

Coefficient Mutation in the Gene-pool Optimal Mixing Evolutionary Algorithm for Symbolic Regression

Authors: Marco Virgolin, Peter A. N. Bosman

Abstract: Currently, the genetic programming version of the gene-pool optimal mixing evolutionary algorithm (GP-GOMEA) is among the top-performing algorithms for symbolic regression (SR). A key strength of GP-GOMEA is its way of performing variation, which dynamically adapts to the emergence of patterns in the population. However, GP-GOMEA lacks a mechanism to optimize coefficients. In this paper, we study… ▽ More Currently, the genetic programming version of the gene-pool optimal mixing evolutionary algorithm (GP-GOMEA) is among the top-performing algorithms for symbolic regression (SR). A key strength of GP-GOMEA is its way of performing variation, which dynamically adapts to the emergence of patterns in the population. However, GP-GOMEA lacks a mechanism to optimize coefficients. In this paper, we study how fairly simple approaches for optimizing coefficients can be integrated into GP-GOMEA. In particular, we considered two variants of Gaussian coefficient mutation. We performed experiments using different settings on 23 benchmark problems, and used machine learning to estimate what aspects of coefficient mutation matter most. We find that the most important aspect is that the number of coefficient mutation attempts needs to be commensurate with the number of mixing operations that GP-GOMEA performs. We applied GP-GOMEA with the best-performing coefficient mutation approach to the data sets of SRBench, a large SR benchmark, for which a ground-truth underlying equation is known. We find that coefficient mutation can help re-discovering the underlying equation by a substantial amount, but only when no noise is added to the target variable. In the presence of noise, GP-GOMEA with coefficient mutation discovers alternative but similarly-accurate equations. △ Less

Submitted 26 April, 2022; originally announced April 2022.

Comments: preprint of paper accepted at GECCO 2022 Workshop on Symbolic Regression

arXiv:2204.07108 [pdf, other]

doi 10.3847/2041-8213/ac66ce

Water UV-shielding in the terrestrial planet-forming zone: Implications from water emission

Authors: Arthur D. Bosman, Edwin A. Bergin, Jenny Calahan, Sara E. Duval

Abstract: Mid-infrared spectroscopy is one of the few ways to observe the composition of the terrestial planet forming zone, the inner few au, of proto-planetary disks. The species currently detected in the disk atmosphere, for example CO, CO2, H2O and C2H2, are theoretically enough to constrain the C/O ratio in the disk surface. However, thermo-chemical models have difficulties in reproducing the full arra… ▽ More Mid-infrared spectroscopy is one of the few ways to observe the composition of the terrestial planet forming zone, the inner few au, of proto-planetary disks. The species currently detected in the disk atmosphere, for example CO, CO2, H2O and C2H2, are theoretically enough to constrain the C/O ratio in the disk surface. However, thermo-chemical models have difficulties in reproducing the full array of detected species in the mid-infrared simultaneously. In an effort to get closer to the observed spectra, we have included water UV-shielding as well as more efficient chemical heating into thermo-chemical code Dust And Lines. We find that both are required to match the observed emission spectrum. Efficient chemical heating, in addition to traditional heating from UV photons, is necessary to elevate the temperature of the water emitting layer to match the observed excitation temperature of water. We find that water UV-shielding stops UV photons from reaching deep into the disk, cooling down the lower layers with higher column. These two effects create a hot emitting layer of water with a column of 1-10$\times 10^{18}$ cm$^{-2}$. This is only 1-10% of the water column above the dust $τ=1$ surface at mid-infrared wavelengths in the models and represents <1% of the total water column. △ Less

Submitted 20 April, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

Comments: 8 pages, 4 figures, accepted to ApJL

arXiv:2204.03666 [pdf, other]

doi 10.1051/0004-6361/202243229

Gas temperature structure across transition disk cavities

Authors: M. Leemker, A. S. Booth, E. F. van Dishoeck, A. F. Pérez-Sánchez, J. Szulágyi, A. D. Bosman, S. Bruderer, S. Facchini, M. R. Hogerheijde, T. Paneque-Carreño, J. A. Sturm

Abstract: [Abridged] Most disks observed at high angular resolution show substructures. Knowledge about the gas surface density and temperature is essential to understand these. The aim of this work is to constrain the gas temperature and surface density in two transition disks: LkCa15 and HD 169142. We use new ALMA observations of the $^{13}$CO $J=6-5$ transition together with archival $J=2-1$ data of… ▽ More [Abridged] Most disks observed at high angular resolution show substructures. Knowledge about the gas surface density and temperature is essential to understand these. The aim of this work is to constrain the gas temperature and surface density in two transition disks: LkCa15 and HD 169142. We use new ALMA observations of the $^{13}$CO $J=6-5$ transition together with archival $J=2-1$ data of $^{12}$CO, $^{13}$CO and C$^{18}$O to observationally constrain the gas temperature and surface density. Furthermore, we use the thermochemical code DALI to model the temperature and density structure of a typical transition disk. The $6-5/2-1$ line ratio in LkCa15 constrains the gas temperature in the emitting layers inside the dust cavity to be up to 65 K, warmer than in the outer disk at 20-30 K. For the HD 169142, the peak brightness temperature constrains the gas in the dust cavity of HD 169142 to be 170 K, whereas that in the outer disk is only 100 K. Models also show that a more luminous central star, a lower abundance of PAHs and the absence of a dusty inner disk increase the temperature of the emitting layers and hence the line ratio in the gas cavity. The gas column density in the LkCa15 dust cavity drops by a factor >2 compared to the outer disk, with an additional drop of an order of magnitude inside the gas cavity at 10 AU. In the case of HD 169142, the gas column density drops by a factor of 200$-$500 inside the gas cavity, which could be due to a massive companion of several M$_{\mathrm{J}}$. The broad dust-depleted gas region from 10-68 AU for LkCa15 may imply several lower mass planets. This work demonstrates that knowledge of the gas temperature is important to determine the gas surface density and thus whether planets, and if so what kind of planets, are the most likely carving the dust cavities. △ Less

Submitted 7 April, 2022; originally announced April 2022.

Comments: Accepted for publication in Astronomy and astrophysics

Journal ref: A&A 663, A23 (2022)

arXiv:2204.02046 [pdf, other]

Less is More: A Call to Focus on Simpler Models in Genetic Programming for Interpretable Machine Learning

Authors: Marco Virgolin, Eric Medvet, Tanja Alderliesten, Peter A. N. Bosman

Abstract: Interpretability can be critical for the safe and responsible use of machine learning models in high-stakes applications. So far, evolutionary computation (EC), in particular in the form of genetic programming (GP), represents a key enabler for the discovery of interpretable machine learning (IML) models. In this short paper, we argue that research in GP for IML needs to focus on searching in the… ▽ More Interpretability can be critical for the safe and responsible use of machine learning models in high-stakes applications. So far, evolutionary computation (EC), in particular in the form of genetic programming (GP), represents a key enabler for the discovery of interpretable machine learning (IML) models. In this short paper, we argue that research in GP for IML needs to focus on searching in the space of low-complexity models, by investigating new kinds of search strategies and recombination methods. Moreover, based on our experience of bringing research into clinical practice, we believe that research should strive to design better ways of modeling and pursuing interpretability, for the obtained solutions to ultimately be most useful. △ Less

Submitted 5 April, 2022; originally announced April 2022.

arXiv:2203.13347 [pdf, ps, other]

Multi-modal multi-objective model-based genetic programming to find multiple diverse high-quality models

Authors: E. M. C. Sijben, T. Alderliesten, P. A. N. Bosman

Abstract: Explainable artificial intelligence (XAI) is an important and rapidly expanding research topic. The goal of XAI is to gain trust in a machine learning (ML) model through clear insights into how the model arrives at its predictions. Genetic programming (GP) is often cited as being uniquely well-suited to contribute to XAI because of its capacity to learn (small) symbolic models that have the potent… ▽ More Explainable artificial intelligence (XAI) is an important and rapidly expanding research topic. The goal of XAI is to gain trust in a machine learning (ML) model through clear insights into how the model arrives at its predictions. Genetic programming (GP) is often cited as being uniquely well-suited to contribute to XAI because of its capacity to learn (small) symbolic models that have the potential to be interpreted. Nevertheless, like many ML algorithms, GP typically results in a single best model. However, in practice, the best model in terms of training error may well not be the most suitable one as judged by a domain expert for various reasons, including overfitting, multiple different models existing that have similar accuracy, and unwanted errors on particular data points due to typical accuracy measures like mean squared error. Hence, to increase chances that domain experts deem a resulting model plausible, it becomes important to be able to explicitly search for multiple, diverse, high-quality models that trade-off different meanings of accuracy. In this paper, we achieve exactly this with a novel multi-modal multi-tree multi-objective GP approach that extends a modern model-based GP algorithm known as GP-GOMEA that is already effective at searching for small expressions. △ Less

Submitted 24 March, 2022; originally announced March 2022.

arXiv:2203.09214 [pdf, other]

Obtaining Smoothly Navigable Approximation Sets in Bi-Objective Multi-Modal Optimization

Authors: Renzo J. Scholman, Anton Bouter, Leah R. M. Dickhoff, Tanja Alderliesten, Peter A. N. Bosman

Abstract: Even if a Multi-modal Multi-Objective Evolutionary Algorithm (MMOEA) is designed to find solutions well spread over all locally optimal approximation sets of a Multi-modal Multi-objective Optimization Problem (MMOP), there is a risk that the found set of solutions is not smoothly navigable because the solutions belong to various niches, reducing the insight for decision makers. To tackle this issu… ▽ More Even if a Multi-modal Multi-Objective Evolutionary Algorithm (MMOEA) is designed to find solutions well spread over all locally optimal approximation sets of a Multi-modal Multi-objective Optimization Problem (MMOP), there is a risk that the found set of solutions is not smoothly navigable because the solutions belong to various niches, reducing the insight for decision makers. To tackle this issue, a new MMOEAs is proposed: the Multi-Modal Bézier Evolutionary Algorithm (MM-BezEA), which produces approximation sets that cover individual niches and exhibit inherent decision-space smoothness as they are parameterized by Bézier curves. MM-BezEA combines the concepts behind the recently introduced BezEA and MO-HillVallEA to find all locally optimal approximation sets. When benchmarked against the MMOEAs MO_Ring_PSO_SCD and MO-HillVallEA on MMOPs with linear Pareto sets, MM-BezEA was found to perform best in terms of best hypervolume. △ Less

Submitted 4 July, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

Comments: Updated to correct format

arXiv:2203.08851 [pdf, other]

Adaptive Objective Configuration in Bi-Objective Evolutionary Optimization for Cervical Cancer Brachytherapy Treatment Planning

Authors: Leah R. M. Dickhoff, Ellen M. Kerkhof, Heloisa H. Deuzeman, Carien L. Creutzberg, Tanja Alderliesten, Peter A. N. Bosman

Abstract: The Multi-Objective Real-Valued Gene-pool Optimal Mixing Evolutionary Algorithm (MO-RV-GOMEA) has been proven effective and efficient in solving real-world problems. A prime example is optimizing treatment plans for prostate cancer brachytherapy, an internal form of radiation treatment, for which equally important clinical aims from a base protocol are grouped into two objectives and bi-objectivel… ▽ More The Multi-Objective Real-Valued Gene-pool Optimal Mixing Evolutionary Algorithm (MO-RV-GOMEA) has been proven effective and efficient in solving real-world problems. A prime example is optimizing treatment plans for prostate cancer brachytherapy, an internal form of radiation treatment, for which equally important clinical aims from a base protocol are grouped into two objectives and bi-objectively optimized. This use of MO-RV-GOMEA was recently successfully introduced into clinical practice. Brachytherapy can also play an important role in treating cervical cancer. However, using the same approach to optimize treatment plans often does not immediately lead to clinically desirable results. Concordantly, medical experts indicate that they use additional aims beyond the cervix base protocol. Moreover, these aims have different priorities and can be patient-specifically adjusted. For this reason, we propose a novel adaptive objective configuration method to use with MO-RV-GOMEA so that we can accommodate additional aims of this nature. Based on results using only the base protocol, in consultation with medical experts, we configured key additional aims. We show how, for 10 patient cases, the new approach achieves the intended result, properly taking into account the additional aims. Consequently, plans resulting from the new approach are preferred by medical specialists in 8/10 cases. △ Less

Submitted 16 March, 2022; originally announced March 2022.

arXiv:2203.08680 [pdf, other]

GPU-Accelerated Parallel Gene-pool Optimal Mixing in a Gray-Box Optimization Setting

Authors: Anton Bouter, Peter A. N. Bosman

Abstract: In a Gray-Box Optimization (GBO) setting that allows for partial evaluations, the fitness of an individual can be updated efficiently after a subset of its variables has been modified. This enables more efficient evolutionary optimization with the Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) due to its key strength: Gene-pool Optimal Mixing (GOM). For each solution, GOM performs variati… ▽ More In a Gray-Box Optimization (GBO) setting that allows for partial evaluations, the fitness of an individual can be updated efficiently after a subset of its variables has been modified. This enables more efficient evolutionary optimization with the Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) due to its key strength: Gene-pool Optimal Mixing (GOM). For each solution, GOM performs variation for many (small) sets of variables. To improve efficiency even further, parallel computing can be leveraged. For EAs, typically, this comprises population-wise parallelization. However, unless population sizes are large, this offers limited gains. For large GBO problems, parallelizing GOM-based variation holds greater speed-up potential, regardless of population size. However, this potential cannot be directly exploited because of dependencies between variables. We show how graph coloring can be used to group sets of variables that can undergo variation in parallel without violating dependencies. We test the performance of a CUDA implementation of parallel GOM on a Graphics Processing Unit (GPU) for the Max-Cut problem, a well-known problem for which the dependency structure can be controlled. We find that, for sufficiently large graphs with limited connectivity, finding high-quality solutions can be achieved up to 100 times faster, showcasing the great potential of our approach. △ Less

Submitted 16 March, 2022; originally announced March 2022.

arXiv:2203.05970 [pdf, other]

Solving Multi-Structured Problems by Introducing Linkage Kernels into GOMEA

Authors: Arthur Guijt, Dirk Thierens, Tanja Alderliesten, Peter A. N. Bosman

Abstract: Model-Based Evolutionary Algorithms (MBEAs) can be highly scalable by virtue of linkage (or variable interaction) learning. This requires, however, that the linkage model can capture the exploitable structure of a problem. Usually, a single type of linkage structure is attempted to be captured using models such as a linkage tree. However, in practice, problems may exhibit multiple linkage structur… ▽ More Model-Based Evolutionary Algorithms (MBEAs) can be highly scalable by virtue of linkage (or variable interaction) learning. This requires, however, that the linkage model can capture the exploitable structure of a problem. Usually, a single type of linkage structure is attempted to be captured using models such as a linkage tree. However, in practice, problems may exhibit multiple linkage structures. This is for instance the case in multi-objective optimization when the objectives have different linkage structures. This cannot be modelled sufficiently well when using linkage models that aim at capturing a single type of linkage structure, deteriorating the advantages brought by MBEAs. Therefore, here, we introduce linkage kernels, whereby a linkage structure is learned for each solution over its local neighborhood. We implement linkage kernels into the MBEA known as GOMEA that was previously found to be highly scalable when solving various problems. We further introduce a novel benchmark function called Best-of-Traps (BoT) that has an adjustable degree of different linkage structures. On both BoT and a worst-case scenario-based variant of the well-known MaxCut problem, we experimentally find a vast performance improvement of linkage-kernel GOMEA over GOMEA with a single linkage tree as well as the MBEA known as DSMGA-II. △ Less

Submitted 11 March, 2022; originally announced March 2022.

Comments: 10 pages, 6 figures, submitted to GECCO 2022

Showing 1–50 of 128 results for author: Bosman, A