-
RoboMorph: Evolving Robot Morphology using Large Language Models
Authors:
Kevin Qiu,
Krzysztof Ciebiera,
Paweł Fijałkowski,
Marek Cygan,
Łukasz Kuciński
Abstract:
We introduce RoboMorph, an automated approach for generating and optimizing modular robot designs using large language models (LLMs) and evolutionary algorithms. In this framework, we represent each robot design as a grammar and leverage the capabilities of LLMs to navigate the extensive robot design space, which is traditionally time-consuming and computationally demanding. By integrating automat…
▽ More
We introduce RoboMorph, an automated approach for generating and optimizing modular robot designs using large language models (LLMs) and evolutionary algorithms. In this framework, we represent each robot design as a grammar and leverage the capabilities of LLMs to navigate the extensive robot design space, which is traditionally time-consuming and computationally demanding. By integrating automatic prompt design and a reinforcement learning based control algorithm, RoboMorph iteratively improves robot designs through feedback loops. Our experimental results demonstrate that RoboMorph can successfully generate nontrivial robots that are optimized for a single terrain while showcasing improvements in morphology over successive evolutions. Our approach demonstrates the potential of using LLMs for data-driven and modular robot design, providing a promising methodology that can be extended to other domains with similar design frameworks.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Scaling Laws for Fine-Grained Mixture of Experts
Authors:
Jakub Krajewski,
Jan Ludziejewski,
Kamil Adamczewski,
Maciej Pióro,
Michał Krutul,
Szymon Antoniak,
Kamil Ciebiera,
Krystian Król,
Tomasz Odrzygóźdź,
Piotr Sankowski,
Marek Cygan,
Sebastian Jaszczur
Abstract:
Mixture of Experts (MoE) models have emerged as a primary solution for reducing the computational cost of Large Language Models. In this work, we analyze their scaling properties, incorporating an expanded range of variables. Specifically, we introduce a new hyperparameter, granularity, whose adjustment enables precise control over the size of the experts. Building on this, we establish scaling la…
▽ More
Mixture of Experts (MoE) models have emerged as a primary solution for reducing the computational cost of Large Language Models. In this work, we analyze their scaling properties, incorporating an expanded range of variables. Specifically, we introduce a new hyperparameter, granularity, whose adjustment enables precise control over the size of the experts. Building on this, we establish scaling laws for fine-grained MoE, taking into account the number of training tokens, model size, and granularity. Leveraging these laws, we derive the optimal training configuration for a given computational budget. Our findings not only show that MoE models consistently outperform dense Transformers but also highlight that the efficiency gap between dense and MoE models widens as we scale up the model size and training budget. Furthermore, we demonstrate that the common practice of setting the size of experts in MoE to mirror the feed-forward layer is not optimal at almost any computational budget.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Authors:
Maciej Pióro,
Kamil Ciebiera,
Krystian Król,
Jan Ludziejewski,
Michał Krutul,
Jakub Krajewski,
Szymon Antoniak,
Piotr Miłoś,
Marek Cygan,
Sebastian Jaszczur
Abstract:
State Space Models (SSMs) have become serious contenders in the field of sequential modeling, challenging the dominance of Transformers. At the same time, Mixture of Experts (MoE) has significantly improved Transformer-based Large Language Models, including recent state-of-the-art open models. We propose that to unlock the potential of SSMs for scaling, they should be combined with MoE. We showcas…
▽ More
State Space Models (SSMs) have become serious contenders in the field of sequential modeling, challenging the dominance of Transformers. At the same time, Mixture of Experts (MoE) has significantly improved Transformer-based Large Language Models, including recent state-of-the-art open models. We propose that to unlock the potential of SSMs for scaling, they should be combined with MoE. We showcase this on Mamba, a recent SSM-based model that achieves remarkable performance. Our model, MoE-Mamba, outperforms both Mamba and baseline Transformer-MoE. In particular, MoE-Mamba reaches the same performance as Mamba in $2.35\times$ fewer training steps while preserving the inference performance gains of Mamba against Transformer.
△ Less
Submitted 26 February, 2024; v1 submitted 8 January, 2024;
originally announced January 2024.
-
Gras** Student: semi-supervised learning for robotic manipulation
Authors:
Piotr Krzywicki,
Krzysztof Ciebiera,
Rafał Michaluk,
Inga Maziarz,
Marek Cygan
Abstract:
Gathering real-world data from the robot quickly becomes a bottleneck when constructing a robot learning system for gras**. In this work, we design a semi-supervised gras** system that, on top of a small sample of robot experience, takes advantage of images of products to be picked, which are collected without any interactions with the robot. We validate our findings both in the simulation and…
▽ More
Gathering real-world data from the robot quickly becomes a bottleneck when constructing a robot learning system for gras**. In this work, we design a semi-supervised gras** system that, on top of a small sample of robot experience, takes advantage of images of products to be picked, which are collected without any interactions with the robot. We validate our findings both in the simulation and in the real world. In the regime of a small number of robot training samples, taking advantage of the unlabeled data allows us to achieve performance at the level of 10-fold bigger dataset size used by the baseline. The code and datasets used in the paper will be released at https://github.com/nomagiclab/gras**-student.
△ Less
Submitted 8 March, 2023;
originally announced March 2023.
-
Approximation Algorithms for Steiner Tree Problems Based on Universal Solution Frameworks
Authors:
Krzysztof Ciebiera,
Piotr Godlewski,
Piotr Sankowski,
Piotr Wygocki
Abstract:
This paper summarizes the work on implementing few solutions for the Steiner Tree problem which we undertook in the PAAL project. The main focus of the project is the development of generic implementations of approximation algorithms together with universal solution frameworks. In particular, we have implemented Zelikovsky 11/6-approximation using local search framework, and 1.39-approximation by…
▽ More
This paper summarizes the work on implementing few solutions for the Steiner Tree problem which we undertook in the PAAL project. The main focus of the project is the development of generic implementations of approximation algorithms together with universal solution frameworks. In particular, we have implemented Zelikovsky 11/6-approximation using local search framework, and 1.39-approximation by Byrka et al. using iterative rounding framework. These two algorithms are experimentally compared with greedy 2-approximation, with exact but exponential time Dreyfus-Wagner algorithm, as well as with results given by a state-of-the-art local search techniques by Uchoa and Werneck. The results of this paper are twofold. On one hand, we demonstrate that high level algorithmic concepts can be designed and efficiently used in C++. On the other hand, we show that the above algorithms with good theoretical guarantees, give decent results in practice, but are inferior to state-of-the-art heuristical approaches.
△ Less
Submitted 28 October, 2014;
originally announced October 2014.