Search | arXiv e-print repository

Large Language Models for Tuning Evolution Strategies

Abstract: Large Language Models (LLMs) exhibit world knowledge and inference capabilities, making them powerful tools for various applications. This paper proposes a feedback loop mechanism that leverages these capabilities to tune Evolution Strategies (ES) parameters effectively. The mechanism involves a structured process of providing programming instructions, executing the corresponding code, and conduct… ▽ More Large Language Models (LLMs) exhibit world knowledge and inference capabilities, making them powerful tools for various applications. This paper proposes a feedback loop mechanism that leverages these capabilities to tune Evolution Strategies (ES) parameters effectively. The mechanism involves a structured process of providing programming instructions, executing the corresponding code, and conducting thorough analysis. This process is specifically designed for the optimization of ES parameters. The method operates through an iterative cycle, ensuring continuous refinement of the ES parameters. First, LLMs process the instructions to generate or modify the code. The code is then executed, and the results are meticulously logged. Subsequent analysis of these results provides insights that drive further improvements. An experiment on tuning the learning rates of ES using the LLaMA3 model demonstrate the feasibility of this approach. This research illustrates how LLMs can be harnessed to improve ES algorithms' performance and suggests broader applications for similar feedback loop mechanisms in various domains. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2401.09862 [pdf, other]

Evolutionary Multi-Objective Optimization of Large Language Model Prompts for Balancing Sentiments

Authors: Jill Baumann, Oliver Kramer

Abstract: The advent of large language models (LLMs) such as ChatGPT has attracted considerable attention in various domains due to their remarkable performance and versatility. As the use of these models continues to grow, the importance of effective prompt engineering has come to the fore. Prompt optimization emerges as a crucial challenge, as it has a direct impact on model performance and the extraction… ▽ More The advent of large language models (LLMs) such as ChatGPT has attracted considerable attention in various domains due to their remarkable performance and versatility. As the use of these models continues to grow, the importance of effective prompt engineering has come to the fore. Prompt optimization emerges as a crucial challenge, as it has a direct impact on model performance and the extraction of relevant information. Recently, evolutionary algorithms (EAs) have shown promise in addressing this issue, paving the way for novel optimization strategies. In this work, we propose a evolutionary multi-objective (EMO) approach specifically tailored for prompt optimization called EMO-Prompts, using sentiment analysis as a case study. We use sentiment analysis capabilities as our experimental targets. Our results demonstrate that EMO-Prompts effectively generates prompts capable of guiding the LLM to produce texts embodying two conflicting emotions simultaneously. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: Accepted in EvoApps at EvoStar 2024

arXiv:2110.14535 [pdf, other]

Comparing Heuristics, Constraint Optimization, and Reinforcement Learning for an Industrial 2D Packing Problem

Authors: Stefan Böhm, Martin Neumayer, Oliver Kramer, Alexander Schiendorfer, Alois Knoll

Abstract: Cutting and Packing problems are occurring in different industries with a direct impact on the revenue of businesses. Generally, the goal in Cutting and Packing is to assign a set of smaller objects to a set of larger objects. To solve Cutting and Packing problems, practitioners can resort to heuristic and exact methodologies. Lately, machine learning is increasingly used for solving such problems… ▽ More Cutting and Packing problems are occurring in different industries with a direct impact on the revenue of businesses. Generally, the goal in Cutting and Packing is to assign a set of smaller objects to a set of larger objects. To solve Cutting and Packing problems, practitioners can resort to heuristic and exact methodologies. Lately, machine learning is increasingly used for solving such problems. This paper considers a 2D packing problem from the furniture industry, where a set of wooden workpieces must be assigned to different modules of a trolley in the most space-saving way. We present an experimental setup to compare heuristics, constraint optimization, and deep reinforcement learning for the given problem. The used methodologies and their results get collated in terms of their solution quality and runtime. In the given use case a greedy heuristic produces optimal results and outperforms the other approaches in terms of runtime. Constraint optimization also produces optimal results but requires more time to perform. The deep reinforcement learning approach did not always produce optimal or even feasible solutions. While we assume this could be remedied with more training, considering the good results with the heuristic, deep reinforcement learning seems to be a bad fit for the given use case. △ Less

Submitted 27 October, 2021; originally announced October 2021.

arXiv:2105.02636 [pdf, other]

Estimating Presentation Competence using Multimodal Nonverbal Behavioral Cues

Authors: Ömer Sümer, Cigdem Beyan, Fabian Ruth, Olaf Kramer, Ulrich Trautwein, Enkelejda Kasneci

Abstract: Public speaking and presentation competence plays an essential role in many areas of social interaction in our educational, professional, and everyday life. Since our intention during a speech can differ from what is actually understood by the audience, the ability to appropriately convey our message requires a complex set of skills. Presentation competence is cultivated in the early school years… ▽ More Public speaking and presentation competence plays an essential role in many areas of social interaction in our educational, professional, and everyday life. Since our intention during a speech can differ from what is actually understood by the audience, the ability to appropriately convey our message requires a complex set of skills. Presentation competence is cultivated in the early school years and continuously developed over time. One approach that can promote efficient development of presentation competence is the automated analysis of human behavior during a speech based on visual and audio features and machine learning. Furthermore, this analysis can be used to suggest improvements and the development of skills related to presentation competence. In this work, we investigate the contribution of different nonverbal behavioral cues, namely, facial, body pose-based, and audio-related features, to estimate presentation competence. The analyses were performed on videos of 251 students while the automated assessment is based on manual ratings according to the Tübingen Instrument for Presentation Competence (TIP). Our classification results reached the best performance with early fusion in the same dataset evaluation (accuracy of 71.25%) and late fusion of speech, face, and body pose features in the cross dataset evaluation (accuracy of 78.11%). Similarly, regression results performed the best with fusion strategies. △ Less

Submitted 6 May, 2021; originally announced May 2021.

arXiv:2006.03132 [pdf, other]

doi 10.1007/978-3-030-58285-2_22

Earnings Prediction with Deep Learning

Authors: Lars Elend, Sebastian A. Tideman, Kerstin Lopatta, Oliver Kramer

Abstract: In the financial sector, a reliable forecast the future financial performance of a company is of great importance for investors' investment decisions. In this paper we compare long-term short-term memory (LSTM) networks to temporal convolution network (TCNs) in the prediction of future earnings per share (EPS). The experimental analysis is based on quarterly financial reporting data and daily stoc… ▽ More In the financial sector, a reliable forecast the future financial performance of a company is of great importance for investors' investment decisions. In this paper we compare long-term short-term memory (LSTM) networks to temporal convolution network (TCNs) in the prediction of future earnings per share (EPS). The experimental analysis is based on quarterly financial reporting data and daily stock market returns. For a broad sample of US firms, we find that both LSTMs outperform the naive persistent model with up to 30.0% more accurate predictions, while TCNs achieve and an improvement of 30.8%. Both types of networks are at least as accurate as analysts and exceed them by up to 12.2% (LSTM) and 13.2% (TCN). △ Less

Submitted 12 October, 2020; v1 submitted 3 June, 2020; originally announced June 2020.

Comments: 7 pages, 4 figures, 2 tables

Journal ref: LNCS 12325 (2020) 267-274

arXiv:2005.02666 [pdf, other]

doi 10.1007/978-3-030-58115-2_25

Evolutionary Multi-Objective Design of SARS-CoV-2 Protease Inhibitor Candidates

Authors: Tim Cofala, Lars Elend, Philip Mirbach, Jonas Prellberg, Thomas Teusch, Oliver Kramer

Abstract: Computational drug design based on artificial intelligence is an emerging research area. At the time of writing this paper, the world suffers from an outbreak of the coronavirus SARS-CoV-2. A promising way to stop the virus replication is via protease inhibition. We propose an evolutionary multi-objective algorithm (EMOA) to design potential protease inhibitors for SARS-CoV-2's main protease. Base… ▽ More Computational drug design based on artificial intelligence is an emerging research area. At the time of writing this paper, the world suffers from an outbreak of the coronavirus SARS-CoV-2. A promising way to stop the virus replication is via protease inhibition. We propose an evolutionary multi-objective algorithm (EMOA) to design potential protease inhibitors for SARS-CoV-2's main protease. Based on the SELFIES representation the EMOA maximizes the binding of candidate ligands to the protein using the docking tool QuickVina 2, while at the same time taking into account further objectives like drug-likeliness or the fulfillment of filter constraints. The experimental part analyzes the evolutionary process and discusses the inhibitor candidates. △ Less

Submitted 18 May, 2020; v1 submitted 6 May, 2020; originally announced May 2020.

Comments: 15 pages, 7 figures, submitted to PPSN 2020

ACM Class: I.2.8; J.3

Journal ref: LNCS 12270 (2020) 357-371

arXiv:2003.10159 [pdf, other]

Learned Weight Sharing for Deep Multi-Task Learning by Natural Evolution Strategy and Stochastic Gradient Descent

Authors: Jonas Prellberg, Oliver Kramer

Abstract: In deep multi-task learning, weights of task-specific networks are shared between tasks to improve performance on each single one. Since the question, which weights to share between layers, is difficult to answer, human-designed architectures often share everything but a last task-specific layer. In many cases, this simplistic approach severely limits performance. Instead, we propose an algorithm… ▽ More In deep multi-task learning, weights of task-specific networks are shared between tasks to improve performance on each single one. Since the question, which weights to share between layers, is difficult to answer, human-designed architectures often share everything but a last task-specific layer. In many cases, this simplistic approach severely limits performance. Instead, we propose an algorithm to learn the assignment between a shared set of weights and task-specific layers. To optimize the non-differentiable assignment and at the same time train the differentiable weights, learning takes place via a combination of natural evolution strategy and stochastic gradient descent. The end result are task-specific networks that share weights but allow independent inference. They achieve lower test errors than baselines and methods from literature on three multi-task learning datasets. △ Less

Submitted 23 March, 2020; originally announced March 2020.

Comments: Accepted at IJCNN 2020

arXiv:1906.09020 [pdf, other]

Acute Lymphoblastic Leukemia Classification from Microscopic Images using Convolutional Neural Networks

Authors: Jonas Prellberg, Oliver Kramer

Abstract: Examining blood microscopic images for leukemia is necessary when expensive equipment for flow cytometry is unavailable. Automated systems can ease the burden on medical experts for performing this examination and may be especially helpful to quickly screen a large number of patients. We present a simple, yet effective classification approach using a ResNeXt convolutional neural network with Squee… ▽ More Examining blood microscopic images for leukemia is necessary when expensive equipment for flow cytometry is unavailable. Automated systems can ease the burden on medical experts for performing this examination and may be especially helpful to quickly screen a large number of patients. We present a simple, yet effective classification approach using a ResNeXt convolutional neural network with Squeeze-and-Excitation modules. The approach was evaluated in the C-NMC online challenge and achieves a weighted F1-score of 88.91% on the test set. Code is available at https://github.com/jprellberg/isbi2019cancer △ Less

Submitted 1 April, 2020; v1 submitted 21 June, 2019; originally announced June 2019.

arXiv:1806.09819 [pdf, other]

Limited Evaluation Evolutionary Optimization of Large Neural Networks

Authors: Jonas Prellberg, Oliver Kramer

Abstract: Stochastic gradient descent is the most prevalent algorithm to train neural networks. However, other approaches such as evolutionary algorithms are also applicable to this task. Evolutionary algorithms bring unique trade-offs that are worth exploring, but computational demands have so far restricted exploration to small networks with few parameters. We implement an evolutionary algorithm that exec… ▽ More Stochastic gradient descent is the most prevalent algorithm to train neural networks. However, other approaches such as evolutionary algorithms are also applicable to this task. Evolutionary algorithms bring unique trade-offs that are worth exploring, but computational demands have so far restricted exploration to small networks with few parameters. We implement an evolutionary algorithm that executes entirely on the GPU, which allows to efficiently batch-evaluate a whole population of networks. Within this framework, we explore the limited evaluation evolutionary algorithm for neural network training and find that its batch evaluation idea comes with a large accuracy trade-off. In further experiments, we explore crossover operators and find that unprincipled random uniform crossover performs extremely well. Finally, we train a network with 92k parameters on MNIST using an EA and achieve 97.6 % test accuracy compared to 98 % test accuracy on the same network trained with Adam. Code is available at https://github.com/jprellberg/gpuea. △ Less

Submitted 26 June, 2018; originally announced June 2018.

Comments: Accepted at KI 2018

arXiv:1806.08099 [pdf, other]

Lamarckian Evolution of Convolutional Neural Networks

Authors: Jonas Prellberg, Oliver Kramer

Abstract: Convolutional neural networks belong to the most successul image classifiers, but the adaptation of their network architecture to a particular problem is computationally expensive. We show that an evolutionary algorithm saves training time during the network architecture optimization, if learned network weights are inherited over generations by Lamarckian evolution. Experiments on typical image da… ▽ More Convolutional neural networks belong to the most successul image classifiers, but the adaptation of their network architecture to a particular problem is computationally expensive. We show that an evolutionary algorithm saves training time during the network architecture optimization, if learned network weights are inherited over generations by Lamarckian evolution. Experiments on typical image datasets show similar or significantly better test accuracies and improved convergence speeds compared to two different baselines without weight inheritance. On CIFAR-10 and CIFAR-100 a 75 % improvement in data efficiency is observed. △ Less

Submitted 19 December, 2018; v1 submitted 21 June, 2018; originally announced June 2018.

Comments: Accepted at PPSN 2018

arXiv:1805.05760 [pdf, other]

Multi-label Classification of Surgical Tools with Convolutional Neural Networks

Authors: Jonas Prellberg, Oliver Kramer

Abstract: Automatic tool detection from surgical imagery has a multitude of useful applications, such as real-time computer assistance for the surgeon. Using the successful residual network architecture, a system that can distinguish 21 different tools in cataract surgery videos is created. The videos are provided as part of the 2017 CATARACTS challenge and pose difficulties found in many real-world dataset… ▽ More Automatic tool detection from surgical imagery has a multitude of useful applications, such as real-time computer assistance for the surgeon. Using the successful residual network architecture, a system that can distinguish 21 different tools in cataract surgery videos is created. The videos are provided as part of the 2017 CATARACTS challenge and pose difficulties found in many real-world datasets, for example a strong class imbalance. The construction of the detection system is guided by a wide array of experiments that explore different design decisions. △ Less

Submitted 15 May, 2018; originally announced May 2018.

Comments: Accepted at IJCNN 2018

arXiv:1709.03247 [pdf, other]

Evolution of Convolutional Highway Networks

Authors: Oliver Kramer

Abstract: Convolutional highways are deep networks based on multiple stacked convolutional layers for feature preprocessing. We introduce an evolutionary algorithm (EA) for optimization of the structure and hyperparameters of convolutional highways and demonstrate the potential of this optimization setting on the well-known MNIST data set. The (1+1)-EA employs Rechenberg's mutation rate control and a nichin… ▽ More Convolutional highways are deep networks based on multiple stacked convolutional layers for feature preprocessing. We introduce an evolutionary algorithm (EA) for optimization of the structure and hyperparameters of convolutional highways and demonstrate the potential of this optimization setting on the well-known MNIST data set. The (1+1)-EA employs Rechenberg's mutation rate control and a niching mechanism to overcome local optima adapts the optimization approach. An experimental study shows that the EA is capable of improving the state-of-the-art network contribution and of evolving highway networks from scratch. △ Less

Submitted 11 September, 2017; originally announced September 2017.

Comments: 8 pages, 4 figures

arXiv:1405.1234 [pdf, ps, other]

A Novel Approach to the Common Due-Date Problem on Single and Parallel Machines

Authors: Abhishek Awasthi, Jörg Lässig, Oliver Kramer

Abstract: This paper presents a novel idea for the general case of the Common Due-Date (CDD) scheduling problem. The problem is about scheduling a certain number of jobs on a single or parallel machines where all the jobs possess different processing times but a common due-date. The objective of the problem is to minimize the total penalty incurred due to earliness or tardiness of the job completions. This… ▽ More This paper presents a novel idea for the general case of the Common Due-Date (CDD) scheduling problem. The problem is about scheduling a certain number of jobs on a single or parallel machines where all the jobs possess different processing times but a common due-date. The objective of the problem is to minimize the total penalty incurred due to earliness or tardiness of the job completions. This work presents exact polynomial algorithms for optimizing a given job sequence for single and identical parallel machines with the run-time complexities of $O(n \log n)$ for both cases, where $n$ is the number of jobs. Besides, we show that our approach for the parallel machine case is also suitable for non-identical parallel machines. We prove the optimality for the single machine case and the runtime complexities of both. Henceforth, we extend our approach to one particular dynamic case of the CDD and conclude the chapter with our results for the benchmark instances provided in the OR-library. △ Less

Submitted 6 May, 2014; originally announced May 2014.

Comments: Book Chapter 22 pages

arXiv:1311.2880 [pdf, ps, other]

doi 10.1109/CSE.2013.14

Aircraft Landing Problem: Efficient Algorithm for a Given Landing Sequence

Authors: Abhishek Awasthi, Oliver Kramer, Jörg Lässig

Abstract: In this paper, we investigate a special case of the static aircraft landing problem (ALP) with the objective to optimize landing sequences and landing times for a set of air planes. The problem is to land the planes on one or multiple runways within a time window as close as possible to the preferable target landing time, maintaining a safety distance constraint. The objective of this well-known N… ▽ More In this paper, we investigate a special case of the static aircraft landing problem (ALP) with the objective to optimize landing sequences and landing times for a set of air planes. The problem is to land the planes on one or multiple runways within a time window as close as possible to the preferable target landing time, maintaining a safety distance constraint. The objective of this well-known NP-hard optimization problem is to minimize the sum of the total penalty incurred by all the aircraft for arriving earlier or later than their preferred landing times. For a problem variant that optimizes a given feasible landing sequence for the single runway case, we present an exact polynomial algorithm and prove the run-time complexity to lie in $O(N^3)$, where $N$ is the number of aircraft. The proposed algorithm returns the optimal solution for the ALP for a given feasible landing sequence on a single runway for a common practical case of the ALP described in the paper. Furthermore, we propose a strategy for the ALP with multiple runways and present our results for all the benchmark instances with single and multiple runways, while comparing them to previous results in the literature. △ Less

Submitted 26 October, 2013; originally announced November 2013.

Comments: 16th IEEE International Conference on Computational Science and Engineering (CSE 2013)

arXiv:1311.2879 [pdf, ps, other]

Common Due-Date Problem: Exact Polynomial Algorithms for a Given Job Sequence

Authors: Abhishek Awasthi, Jörg Lässig, Oliver Kramer

Abstract: This paper considers the problem of scheduling jobs on single and parallel machines where all the jobs possess different processing times but a common due date. There is a penalty involved with each job if it is processed earlier or later than the due date. The objective of the problem is to find the assignment of jobs to machines, the processing sequence of jobs and the time at which they are pro… ▽ More This paper considers the problem of scheduling jobs on single and parallel machines where all the jobs possess different processing times but a common due date. There is a penalty involved with each job if it is processed earlier or later than the due date. The objective of the problem is to find the assignment of jobs to machines, the processing sequence of jobs and the time at which they are processed, which minimizes the total penalty incurred due to tardiness or earliness of the jobs. This work presents exact polynomial algorithms for optimizing a given job sequence or single and parallel machines with the run-time complexities of $O(n \log n)$ and $O(mn^2 \log n)$ respectively, where $n$ is the number of jobs and $m$ the number of machines. The algorithms take a sequence consisting of all the jobs $(J_i, i=1,2,\dots,n)$ as input and distribute the jobs to machines (for $m>1$) along with their best completion times so as to get the least possible total penalty for this sequence. We prove the optimality for the single machine case and the runtime complexities of both. Henceforth, we present the results for the benchmark instances and compare with previous work for single and parallel machine cases, up to $200$ jobs. △ Less

Submitted 26 October, 2013; originally announced November 2013.

Comments: 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing

arXiv:1107.3600 [pdf, other]

Unsupervised K-Nearest Neighbor Regression

Authors: Oliver Kramer

Abstract: In many scientific disciplines structures in high-dimensional data have to be found, e.g., in stellar spectra, in genome data, or in face recognition tasks. In this work we present a novel approach to non-linear dimensionality reduction. It is based on fitting K-nearest neighbor regression to the unsupervised regression framework for learning of low-dimensional manifolds. Similar to related approa… ▽ More In many scientific disciplines structures in high-dimensional data have to be found, e.g., in stellar spectra, in genome data, or in face recognition tasks. In this work we present a novel approach to non-linear dimensionality reduction. It is based on fitting K-nearest neighbor regression to the unsupervised regression framework for learning of low-dimensional manifolds. Similar to related approaches that are mostly based on kernel methods, unsupervised K-nearest neighbor (UNN) regression optimizes latent variables w.r.t. the data space reconstruction error employing the K-nearest neighbor heuristic. The problem of optimizing latent neighborhoods is difficult to solve, but the UNN formulation allows the design of efficient strategies that iteratively embed latent points to fixed neighborhood topologies. UNN is well appropriate for sorting of high-dimensional data. The iterative variants are analyzed experimentally. △ Less

Submitted 26 September, 2011; v1 submitted 18 July, 2011; originally announced July 2011.

Comments: 4 pages, 12 figures

Showing 1–16 of 16 results for author: Kramer, O