-
Large Language Models for Tuning Evolution Strategies
Authors:
Oliver Kramer
Abstract:
Large Language Models (LLMs) exhibit world knowledge and inference capabilities, making them powerful tools for various applications. This paper proposes a feedback loop mechanism that leverages these capabilities to tune Evolution Strategies (ES) parameters effectively. The mechanism involves a structured process of providing programming instructions, executing the corresponding code, and conduct…
▽ More
Large Language Models (LLMs) exhibit world knowledge and inference capabilities, making them powerful tools for various applications. This paper proposes a feedback loop mechanism that leverages these capabilities to tune Evolution Strategies (ES) parameters effectively. The mechanism involves a structured process of providing programming instructions, executing the corresponding code, and conducting thorough analysis. This process is specifically designed for the optimization of ES parameters. The method operates through an iterative cycle, ensuring continuous refinement of the ES parameters. First, LLMs process the instructions to generate or modify the code. The code is then executed, and the results are meticulously logged. Subsequent analysis of these results provides insights that drive further improvements. An experiment on tuning the learning rates of ES using the LLaMA3 model demonstrate the feasibility of this approach. This research illustrates how LLMs can be harnessed to improve ES algorithms' performance and suggests broader applications for similar feedback loop mechanisms in various domains.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Evolutionary Multi-Objective Optimization of Large Language Model Prompts for Balancing Sentiments
Authors:
Jill Baumann,
Oliver Kramer
Abstract:
The advent of large language models (LLMs) such as ChatGPT has attracted considerable attention in various domains due to their remarkable performance and versatility. As the use of these models continues to grow, the importance of effective prompt engineering has come to the fore. Prompt optimization emerges as a crucial challenge, as it has a direct impact on model performance and the extraction…
▽ More
The advent of large language models (LLMs) such as ChatGPT has attracted considerable attention in various domains due to their remarkable performance and versatility. As the use of these models continues to grow, the importance of effective prompt engineering has come to the fore. Prompt optimization emerges as a crucial challenge, as it has a direct impact on model performance and the extraction of relevant information. Recently, evolutionary algorithms (EAs) have shown promise in addressing this issue, paving the way for novel optimization strategies. In this work, we propose a evolutionary multi-objective (EMO) approach specifically tailored for prompt optimization called EMO-Prompts, using sentiment analysis as a case study. We use sentiment analysis capabilities as our experimental targets. Our results demonstrate that EMO-Prompts effectively generates prompts capable of guiding the LLM to produce texts embodying two conflicting emotions simultaneously.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Comparing Heuristics, Constraint Optimization, and Reinforcement Learning for an Industrial 2D Packing Problem
Authors:
Stefan Böhm,
Martin Neumayer,
Oliver Kramer,
Alexander Schiendorfer,
Alois Knoll
Abstract:
Cutting and Packing problems are occurring in different industries with a direct impact on the revenue of businesses. Generally, the goal in Cutting and Packing is to assign a set of smaller objects to a set of larger objects. To solve Cutting and Packing problems, practitioners can resort to heuristic and exact methodologies. Lately, machine learning is increasingly used for solving such problems…
▽ More
Cutting and Packing problems are occurring in different industries with a direct impact on the revenue of businesses. Generally, the goal in Cutting and Packing is to assign a set of smaller objects to a set of larger objects. To solve Cutting and Packing problems, practitioners can resort to heuristic and exact methodologies. Lately, machine learning is increasingly used for solving such problems. This paper considers a 2D packing problem from the furniture industry, where a set of wooden workpieces must be assigned to different modules of a trolley in the most space-saving way. We present an experimental setup to compare heuristics, constraint optimization, and deep reinforcement learning for the given problem. The used methodologies and their results get collated in terms of their solution quality and runtime. In the given use case a greedy heuristic produces optimal results and outperforms the other approaches in terms of runtime. Constraint optimization also produces optimal results but requires more time to perform. The deep reinforcement learning approach did not always produce optimal or even feasible solutions. While we assume this could be remedied with more training, considering the good results with the heuristic, deep reinforcement learning seems to be a bad fit for the given use case.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
Estimating Presentation Competence using Multimodal Nonverbal Behavioral Cues
Authors:
Ömer Sümer,
Cigdem Beyan,
Fabian Ruth,
Olaf Kramer,
Ulrich Trautwein,
Enkelejda Kasneci
Abstract:
Public speaking and presentation competence plays an essential role in many areas of social interaction in our educational, professional, and everyday life. Since our intention during a speech can differ from what is actually understood by the audience, the ability to appropriately convey our message requires a complex set of skills. Presentation competence is cultivated in the early school years…
▽ More
Public speaking and presentation competence plays an essential role in many areas of social interaction in our educational, professional, and everyday life. Since our intention during a speech can differ from what is actually understood by the audience, the ability to appropriately convey our message requires a complex set of skills. Presentation competence is cultivated in the early school years and continuously developed over time. One approach that can promote efficient development of presentation competence is the automated analysis of human behavior during a speech based on visual and audio features and machine learning. Furthermore, this analysis can be used to suggest improvements and the development of skills related to presentation competence. In this work, we investigate the contribution of different nonverbal behavioral cues, namely, facial, body pose-based, and audio-related features, to estimate presentation competence. The analyses were performed on videos of 251 students while the automated assessment is based on manual ratings according to the Tübingen Instrument for Presentation Competence (TIP). Our classification results reached the best performance with early fusion in the same dataset evaluation (accuracy of 71.25%) and late fusion of speech, face, and body pose features in the cross dataset evaluation (accuracy of 78.11%). Similarly, regression results performed the best with fusion strategies.
△ Less
Submitted 6 May, 2021;
originally announced May 2021.
-
Earnings Prediction with Deep Learning
Authors:
Lars Elend,
Sebastian A. Tideman,
Kerstin Lopatta,
Oliver Kramer
Abstract:
In the financial sector, a reliable forecast the future financial performance of a company is of great importance for investors' investment decisions. In this paper we compare long-term short-term memory (LSTM) networks to temporal convolution network (TCNs) in the prediction of future earnings per share (EPS). The experimental analysis is based on quarterly financial reporting data and daily stoc…
▽ More
In the financial sector, a reliable forecast the future financial performance of a company is of great importance for investors' investment decisions. In this paper we compare long-term short-term memory (LSTM) networks to temporal convolution network (TCNs) in the prediction of future earnings per share (EPS). The experimental analysis is based on quarterly financial reporting data and daily stock market returns. For a broad sample of US firms, we find that both LSTMs outperform the naive persistent model with up to 30.0% more accurate predictions, while TCNs achieve and an improvement of 30.8%. Both types of networks are at least as accurate as analysts and exceed them by up to 12.2% (LSTM) and 13.2% (TCN).
△ Less
Submitted 12 October, 2020; v1 submitted 3 June, 2020;
originally announced June 2020.
-
Evolutionary Multi-Objective Design of SARS-CoV-2 Protease Inhibitor Candidates
Authors:
Tim Cofala,
Lars Elend,
Philip Mirbach,
Jonas Prellberg,
Thomas Teusch,
Oliver Kramer
Abstract:
Computational drug design based on artificial intelligence is an emerging research area. At the time of writing this paper, the world suffers from an outbreak of the coronavirus SARS-CoV-2. A promising way to stop the virus replication is via protease inhibition. We propose an evolutionary multi-objective algorithm (EMOA) to design potential protease inhibitors for SARS-CoV-2's main protease. Base…
▽ More
Computational drug design based on artificial intelligence is an emerging research area. At the time of writing this paper, the world suffers from an outbreak of the coronavirus SARS-CoV-2. A promising way to stop the virus replication is via protease inhibition. We propose an evolutionary multi-objective algorithm (EMOA) to design potential protease inhibitors for SARS-CoV-2's main protease. Based on the SELFIES representation the EMOA maximizes the binding of candidate ligands to the protein using the docking tool QuickVina 2, while at the same time taking into account further objectives like drug-likeliness or the fulfillment of filter constraints. The experimental part analyzes the evolutionary process and discusses the inhibitor candidates.
△ Less
Submitted 18 May, 2020; v1 submitted 6 May, 2020;
originally announced May 2020.
-
Learned Weight Sharing for Deep Multi-Task Learning by Natural Evolution Strategy and Stochastic Gradient Descent
Authors:
Jonas Prellberg,
Oliver Kramer
Abstract:
In deep multi-task learning, weights of task-specific networks are shared between tasks to improve performance on each single one. Since the question, which weights to share between layers, is difficult to answer, human-designed architectures often share everything but a last task-specific layer. In many cases, this simplistic approach severely limits performance. Instead, we propose an algorithm…
▽ More
In deep multi-task learning, weights of task-specific networks are shared between tasks to improve performance on each single one. Since the question, which weights to share between layers, is difficult to answer, human-designed architectures often share everything but a last task-specific layer. In many cases, this simplistic approach severely limits performance. Instead, we propose an algorithm to learn the assignment between a shared set of weights and task-specific layers. To optimize the non-differentiable assignment and at the same time train the differentiable weights, learning takes place via a combination of natural evolution strategy and stochastic gradient descent. The end result are task-specific networks that share weights but allow independent inference. They achieve lower test errors than baselines and methods from literature on three multi-task learning datasets.
△ Less
Submitted 23 March, 2020;
originally announced March 2020.
-
Acute Lymphoblastic Leukemia Classification from Microscopic Images using Convolutional Neural Networks
Authors:
Jonas Prellberg,
Oliver Kramer
Abstract:
Examining blood microscopic images for leukemia is necessary when expensive equipment for flow cytometry is unavailable. Automated systems can ease the burden on medical experts for performing this examination and may be especially helpful to quickly screen a large number of patients. We present a simple, yet effective classification approach using a ResNeXt convolutional neural network with Squee…
▽ More
Examining blood microscopic images for leukemia is necessary when expensive equipment for flow cytometry is unavailable. Automated systems can ease the burden on medical experts for performing this examination and may be especially helpful to quickly screen a large number of patients. We present a simple, yet effective classification approach using a ResNeXt convolutional neural network with Squeeze-and-Excitation modules. The approach was evaluated in the C-NMC online challenge and achieves a weighted F1-score of 88.91% on the test set. Code is available at https://github.com/jprellberg/isbi2019cancer
△ Less
Submitted 1 April, 2020; v1 submitted 21 June, 2019;
originally announced June 2019.
-
Limited Evaluation Evolutionary Optimization of Large Neural Networks
Authors:
Jonas Prellberg,
Oliver Kramer
Abstract:
Stochastic gradient descent is the most prevalent algorithm to train neural networks. However, other approaches such as evolutionary algorithms are also applicable to this task. Evolutionary algorithms bring unique trade-offs that are worth exploring, but computational demands have so far restricted exploration to small networks with few parameters. We implement an evolutionary algorithm that exec…
▽ More
Stochastic gradient descent is the most prevalent algorithm to train neural networks. However, other approaches such as evolutionary algorithms are also applicable to this task. Evolutionary algorithms bring unique trade-offs that are worth exploring, but computational demands have so far restricted exploration to small networks with few parameters. We implement an evolutionary algorithm that executes entirely on the GPU, which allows to efficiently batch-evaluate a whole population of networks. Within this framework, we explore the limited evaluation evolutionary algorithm for neural network training and find that its batch evaluation idea comes with a large accuracy trade-off. In further experiments, we explore crossover operators and find that unprincipled random uniform crossover performs extremely well. Finally, we train a network with 92k parameters on MNIST using an EA and achieve 97.6 % test accuracy compared to 98 % test accuracy on the same network trained with Adam. Code is available at https://github.com/jprellberg/gpuea.
△ Less
Submitted 26 June, 2018;
originally announced June 2018.
-
Lamarckian Evolution of Convolutional Neural Networks
Authors:
Jonas Prellberg,
Oliver Kramer
Abstract:
Convolutional neural networks belong to the most successul image classifiers, but the adaptation of their network architecture to a particular problem is computationally expensive. We show that an evolutionary algorithm saves training time during the network architecture optimization, if learned network weights are inherited over generations by Lamarckian evolution. Experiments on typical image da…
▽ More
Convolutional neural networks belong to the most successul image classifiers, but the adaptation of their network architecture to a particular problem is computationally expensive. We show that an evolutionary algorithm saves training time during the network architecture optimization, if learned network weights are inherited over generations by Lamarckian evolution. Experiments on typical image datasets show similar or significantly better test accuracies and improved convergence speeds compared to two different baselines without weight inheritance. On CIFAR-10 and CIFAR-100 a 75 % improvement in data efficiency is observed.
△ Less
Submitted 19 December, 2018; v1 submitted 21 June, 2018;
originally announced June 2018.
-
Multi-label Classification of Surgical Tools with Convolutional Neural Networks
Authors:
Jonas Prellberg,
Oliver Kramer
Abstract:
Automatic tool detection from surgical imagery has a multitude of useful applications, such as real-time computer assistance for the surgeon. Using the successful residual network architecture, a system that can distinguish 21 different tools in cataract surgery videos is created. The videos are provided as part of the 2017 CATARACTS challenge and pose difficulties found in many real-world dataset…
▽ More
Automatic tool detection from surgical imagery has a multitude of useful applications, such as real-time computer assistance for the surgeon. Using the successful residual network architecture, a system that can distinguish 21 different tools in cataract surgery videos is created. The videos are provided as part of the 2017 CATARACTS challenge and pose difficulties found in many real-world datasets, for example a strong class imbalance. The construction of the detection system is guided by a wide array of experiments that explore different design decisions.
△ Less
Submitted 15 May, 2018;
originally announced May 2018.
-
Evolution of Convolutional Highway Networks
Authors:
Oliver Kramer
Abstract:
Convolutional highways are deep networks based on multiple stacked convolutional layers for feature preprocessing. We introduce an evolutionary algorithm (EA) for optimization of the structure and hyperparameters of convolutional highways and demonstrate the potential of this optimization setting on the well-known MNIST data set. The (1+1)-EA employs Rechenberg's mutation rate control and a nichin…
▽ More
Convolutional highways are deep networks based on multiple stacked convolutional layers for feature preprocessing. We introduce an evolutionary algorithm (EA) for optimization of the structure and hyperparameters of convolutional highways and demonstrate the potential of this optimization setting on the well-known MNIST data set. The (1+1)-EA employs Rechenberg's mutation rate control and a niching mechanism to overcome local optima adapts the optimization approach. An experimental study shows that the EA is capable of improving the state-of-the-art network contribution and of evolving highway networks from scratch.
△ Less
Submitted 11 September, 2017;
originally announced September 2017.
-
A Novel Approach to the Common Due-Date Problem on Single and Parallel Machines
Authors:
Abhishek Awasthi,
Jörg Lässig,
Oliver Kramer
Abstract:
This paper presents a novel idea for the general case of the Common Due-Date (CDD) scheduling problem. The problem is about scheduling a certain number of jobs on a single or parallel machines where all the jobs possess different processing times but a common due-date. The objective of the problem is to minimize the total penalty incurred due to earliness or tardiness of the job completions. This…
▽ More
This paper presents a novel idea for the general case of the Common Due-Date (CDD) scheduling problem. The problem is about scheduling a certain number of jobs on a single or parallel machines where all the jobs possess different processing times but a common due-date. The objective of the problem is to minimize the total penalty incurred due to earliness or tardiness of the job completions. This work presents exact polynomial algorithms for optimizing a given job sequence for single and identical parallel machines with the run-time complexities of $O(n \log n)$ for both cases, where $n$ is the number of jobs. Besides, we show that our approach for the parallel machine case is also suitable for non-identical parallel machines. We prove the optimality for the single machine case and the runtime complexities of both. Henceforth, we extend our approach to one particular dynamic case of the CDD and conclude the chapter with our results for the benchmark instances provided in the OR-library.
△ Less
Submitted 6 May, 2014;
originally announced May 2014.
-
Aircraft Landing Problem: Efficient Algorithm for a Given Landing Sequence
Authors:
Abhishek Awasthi,
Oliver Kramer,
Jörg Lässig
Abstract:
In this paper, we investigate a special case of the static aircraft landing problem (ALP) with the objective to optimize landing sequences and landing times for a set of air planes. The problem is to land the planes on one or multiple runways within a time window as close as possible to the preferable target landing time, maintaining a safety distance constraint. The objective of this well-known N…
▽ More
In this paper, we investigate a special case of the static aircraft landing problem (ALP) with the objective to optimize landing sequences and landing times for a set of air planes. The problem is to land the planes on one or multiple runways within a time window as close as possible to the preferable target landing time, maintaining a safety distance constraint. The objective of this well-known NP-hard optimization problem is to minimize the sum of the total penalty incurred by all the aircraft for arriving earlier or later than their preferred landing times. For a problem variant that optimizes a given feasible landing sequence for the single runway case, we present an exact polynomial algorithm and prove the run-time complexity to lie in $O(N^3)$, where $N$ is the number of aircraft. The proposed algorithm returns the optimal solution for the ALP for a given feasible landing sequence on a single runway for a common practical case of the ALP described in the paper. Furthermore, we propose a strategy for the ALP with multiple runways and present our results for all the benchmark instances with single and multiple runways, while comparing them to previous results in the literature.
△ Less
Submitted 26 October, 2013;
originally announced November 2013.
-
Common Due-Date Problem: Exact Polynomial Algorithms for a Given Job Sequence
Authors:
Abhishek Awasthi,
Jörg Lässig,
Oliver Kramer
Abstract:
This paper considers the problem of scheduling jobs on single and parallel machines where all the jobs possess different processing times but a common due date. There is a penalty involved with each job if it is processed earlier or later than the due date. The objective of the problem is to find the assignment of jobs to machines, the processing sequence of jobs and the time at which they are pro…
▽ More
This paper considers the problem of scheduling jobs on single and parallel machines where all the jobs possess different processing times but a common due date. There is a penalty involved with each job if it is processed earlier or later than the due date. The objective of the problem is to find the assignment of jobs to machines, the processing sequence of jobs and the time at which they are processed, which minimizes the total penalty incurred due to tardiness or earliness of the jobs. This work presents exact polynomial algorithms for optimizing a given job sequence or single and parallel machines with the run-time complexities of $O(n \log n)$ and $O(mn^2 \log n)$ respectively, where $n$ is the number of jobs and $m$ the number of machines. The algorithms take a sequence consisting of all the jobs $(J_i, i=1,2,\dots,n)$ as input and distribute the jobs to machines (for $m>1$) along with their best completion times so as to get the least possible total penalty for this sequence. We prove the optimality for the single machine case and the runtime complexities of both. Henceforth, we present the results for the benchmark instances and compare with previous work for single and parallel machine cases, up to $200$ jobs.
△ Less
Submitted 26 October, 2013;
originally announced November 2013.
-
Unsupervised K-Nearest Neighbor Regression
Authors:
Oliver Kramer
Abstract:
In many scientific disciplines structures in high-dimensional data have to be found, e.g., in stellar spectra, in genome data, or in face recognition tasks. In this work we present a novel approach to non-linear dimensionality reduction. It is based on fitting K-nearest neighbor regression to the unsupervised regression framework for learning of low-dimensional manifolds. Similar to related approa…
▽ More
In many scientific disciplines structures in high-dimensional data have to be found, e.g., in stellar spectra, in genome data, or in face recognition tasks. In this work we present a novel approach to non-linear dimensionality reduction. It is based on fitting K-nearest neighbor regression to the unsupervised regression framework for learning of low-dimensional manifolds. Similar to related approaches that are mostly based on kernel methods, unsupervised K-nearest neighbor (UNN) regression optimizes latent variables w.r.t. the data space reconstruction error employing the K-nearest neighbor heuristic. The problem of optimizing latent neighborhoods is difficult to solve, but the UNN formulation allows the design of efficient strategies that iteratively embed latent points to fixed neighborhood topologies. UNN is well appropriate for sorting of high-dimensional data. The iterative variants are analyzed experimentally.
△ Less
Submitted 26 September, 2011; v1 submitted 18 July, 2011;
originally announced July 2011.