\sidecaptionvpos

figurec

Text2Robot:
Evolutionary Robot Design from Text Descriptions

Ryan P. Ringel^∗ Zachary S. Charlick^∗ Jiaxun Liu^∗ Boxi Xia Boyuan Chen
Duke University
https://generalroboticslab.com/Text2Robot

Abstract

Robot design has traditionally been costly and labor-intensive. Despite advancements in automated processes, it remains challenging to navigate a vast design space while producing physically manufacturable robots. We introduce Text2Robot, a framework that converts user text specifications and performance preferences into physical quadrupedal robots. Within minutes, Text2Robot can use text-to-3D models to provide strong initializations of diverse morphologies. Within a day, our geometric processing algorithms and body-control co-optimization produce a walking robot by explicitly considering real-world electronics and manufacturability. Text2Robot enables rapid prototy** and opens new opportunities for robot design with generative models.

^*^*footnotetext: These authors contributed equally to this work

Keywords: Robot Design, Generative Model, Legged Robots

1 Introduction

Refer to caption — Figure 1: Text2Robot creates physical robots from user-specified text prompts and performance preferences while considering real-world electronics and manufacturability.

For over half a century, robot design has been a costly and labor-intensive process, requiring extensive human efforts from initial sketches to detailed modeling, prototy**, controller design, manufacturing, and testing. This traditional approach has significant limitations, such as prohibitive costs, lengthy development cycles, and constraints on innovation bounded by human imagination and manual capabilities. However, advancements in automated robot design [1, 2, 3, 4] promise to revolutionize this landscape. By automating key aspects of the design process, we can drastically reduce development time and costs, allowing industries to rapidly produce specialized robots and enabling engineers to establish efficient manufacturing processes. Researchers also benefit by quickly innovating desired hardware platforms. Ultimately, automating robot design not only enables rapid prototy** but also expands the realm of possible innovations, surpassing the boundaries of what human designers can envision and create.

One major challenge in automating robot design is navigating the vast and intricate design space. Traditional engineering design is time-intensive and demands considerable technical expertise. While advancements in control engineering [5, 6, 7, 8, 9] and machine learning [10, 11, 12, 13, 14] have enabled automatic training of robot policies, designing morphologies remains laborious. Human designers typically spend months conceptualizing, designing, and fabricating a robot, balancing cost, manufacturability, and performance. Previous automation attempts often simplify the design space by using large repeating modules [1, 15, 16, 17] or voxel representations [18, 19]. Though innovative, these designs are slow due to the need to search within a vast design space and do not consider real-world fabrication, resulting in theoretically sound but impractical designs to produce.

Automated methods for robot design predominately involve Evolutionary Algorithms (EAs) inspired by natural evolution [20]. However, EA-based approaches are inherently slow, starting with random solutions and iterating through hundreds of generations. Moreover, existing solutions do not scale well with increasing design complexity [21] and face significant challenges in balancing multiple objectives, such as control and morphology co-optimization [17, 22]. Consequently, while these solutions may excel in simulations, they frequently fail to address practical issues such as sim2real transfer [16, 17, 18] and manufacturability. Problems like high current draw from unrealistic degrees of freedom [16, 17] or complex morphologies that are difficult to manufacture [16, 17, 18] hinder the transition from theoretical designs to practical and producible robots.

We present Text2Robot (Fig. 2), a comprehensive “A-to-Z” framework from user text specifications to physical walking robots. Our approach utilizes recent advancements in text-to-3D generative models to create initial mesh designs, which are subsequently converted into kinetic robot models through our geometric processing algorithms. Within minutes we can generate a design, within an hour, a robot trained in simulation, and within a day, a fabricated walking robot. Our system not only fulfills users’ aesthetic preferences but further optimizes the designs using an evolutionary algorithm to incorporate other performance preferences. Our key insight is that text-to-3D generative models can provide a much stronger starting point for the evolutionary algorithm, significantly accelerating the optimization process. Experiments in both simulation and the physical world demonstrate our ability to specify both aesthetic qualities and performance metrics, such as velocity tracking and energy efficiency. Overall, our approach introduces the creative and artistic nature of generative models to automated robot design with fast prototy** from just a text prompt, and has the potential to open up novel opportunities in rapid design and manufacturing.

2 Related Work

Generative Models for Design Generative AI aims to enhance design efficiency [23, 24] and reduce labor costs [25, 26] by automating tasks [27] or portions of the workflows [28, 29, 30]. Recent advances in generative models have led to the integration of generative adversarial networks (GANs) [31, 32], diffusion models [33, 34], and transformers [35, 36] into the generative design process. Previous research has utilized generative models for domain-specific code creation [37, 38] or parameter-based physical designs [39, 40]. Such systems often heavily rely on human feedback and tuning during the design process. In contrast, our work focuses on directly enabling artificial intelligence-generated content (AIGC) for physically embodied and functional robot design.

Text-to-3D Models Recent advancements in text-to-3D generative models [41] utilize pre-trained text-to-image diffusion models to optimize Neural Radiance Fields (NeRF) [42, 43], text-to-3D shape embeddings in a GANs framework [44], or explicit and hybrid scene representations [45] to create 3D generated content. Despite their growing capabilities, current text-to-3D models are primarily used for visualization in graphics and not for the creation of functioning machines. Our work incorporates 3D AIGC into physical robot design while considering electrical components and manufacturability constraints.

Automated Robot Design Automated robot design often involves co-optimizing control and morphology using classical methods and analytical dynamics [46, 47, 48, 49]. These approaches rely on highly parameterized designs and specified models, which limit creativity and practicality in real-world environments. Recent studies employ auto-differentiable hardware policies [50, 51, 52] for co-optimization, but their designs focus on robots with limited degrees of freedom and often overlook real-world manufacturability and electronics. Evolutionary Algorithms (EA) have been widely used [53] to explore vast design spaces by combining modular building blocks [1, 15, 16, 54] or voxels [18, 19, 22] through genetic operators to co-evolve morphology and control to produce complex and unexpected [55] designs. However, EA-based approaches are computationally expensive, typically starting from random solutions and requiring numerous generations to produce effective designs. Our work leverages the open-ended nature of EA-based approaches for design evolution but drastically accelerates the process through strong initializations by incorporating generative models.

3 The Text2Robot Framework

Text2Robot (Fig. 2) generates a physical walking quadrupedal robot that caters to a user’s text description and performance priorities, such as energy efficiency or velocity tracking accuracy. There are four major components in Text2Robot: (1) A generative model to create static 3D meshes of robots given user-specified texts. (2) A set of geometric processing algorithms to convert the static meshes into kinetic models, including the necessary components for fabrication. (3) An optimization process based on evolutionary algorithms and reinforcement learning to further optimize the robot morphologies and walking policies according to the user’s performance preferences. (4) A final optimized robot is quickly 3D-printed and assembled.

3.1 Mesh Generation from Text Prompts

Given a text prompt specifying the aesthetic of the robot, we generate a 3D mesh of the robot using a text-to-3D model. In this paper, we used one of the state-of-the-art models, Meshy [56], that takes in the text prompt and produces several candidate meshes. One high-level assumption in this paper is to demonstrate our framework by automating the design of quadrupedal robots with eight motors. Having such constraints mimics the typical real-world design requirements without the loss of generality of our framework. We implemented a structured prompt design based on specified user descriptions to ensure Meshy consistently outputs quadrupedal meshes.

The user provides a text description of their desired robot in one to three words, which we incorporate into the following prompt format: <Quadrupedal walking robot resembling a "User-Provided Description">. The generated candidate meshes are then manually filtered according to the following constraints: (1) the mesh must be continuous without disjoint bodies; (2) the mesh must exhibit bilateral symmetry; and (3) the mesh must include four legs.

3.2 Kinetic Robot Model from Static Meshes

Current text-to-3D models only produce static meshes for visualization purposes. Our key challenge is to automatically convert such static meshes into kinetic robot models. Importantly, unlike most simulated robot models that are simplified for fast simulation, to automatically transfer our designs to physical functioning robots, our generated robot model should also consider real-world manufacturing factors such as the placement of electronic components, wire connections, physical collisions at joints, limits of the number of motors, and manufacturability.

Mesh Repair and Preprocessing Due to the lack of realistic constraints on the text-to-3D models, the generated mesh can have errors such as being non-watertight. We first call the mesh repair API through Fusion 360 [57] to repair the mesh for the downstream workflow. We then leverage the mesh conversion operation to convert the mesh to an organic BREP (Boundary Representation) body. We scale the BREP body to a volume of $6300${\mathrm{cm}}^{3}$$ to unify the initial mass of all robots.

Joint Allocation Deciding a set of feasible joint positions for a quadrupedal robot with eight motors purely from the mesh model is difficult. Inspired by natural quadrupedal animals, we assume that our robots have four legs and two movable joints for each leg. In other words, our robots have four shoulder joints and four knee joints, dividing each leg into an upper and lower leg.

We determine the position and orientation of each joint based on the mesh model’s geometric features (Fig. 3 A and B). The body’s origin is defined as the center of mass, assuming uniform mass distribution. Starting from the origin along the $+y$ and $-y$ direction, our algorithm creates vertical slices parallel to the $xz$ plane and records the cross-sectional area at each step. The slice closest to the origin with the minimum cross-sectional area is mirrored to the other side of the origin. The two slicing planes separate the mesh model into four legs and one body, defined as the base link. The origins of the four shoulder joints correspond to the centroids of their intersection profiles, with $z$ -axes perpendicular to the slicing plane and $y$ -axes pointing downwards. Similarly, knee joints are located at the slice planes with maximum cross-sectional area, traversing along the $xy$ plane for each leg. We traverse from the bottom of the base link and stop $2$\mathrm{cm}$$ from the ground plane to ensure space for motor placement. The origins of the knee joints correspond to the centroids of each intersection profile, with $z$ -axes perpendicular to their slicing planes and $x$ -axes parallel to the slicing planes pointing in the robot’s facing direction. We follow the right-hand coordinate systems.

Electronic Component Placement To create physically realistic robots for real-world manufacturing and walking, we need to consider the placements of electronic components, including

actuators, batteries, and controllers. As in Fig. 4, servo motors are housed in a 3D-printed box with snap-in pegs inspired by toy building blocks. Motors can be rotated in their casings to achieve the desired axis of rotation during assembly. A larger box encases other necessary electronics, such as the motor controller, Raspberry Pi, and battery, and can be easily slid into a central channel in the base link. To avoid self-collision and reserve space for motor and electronics modules, we offset the eight limbs by $4$\mathrm{cm}$$ and cut extrusions.

By this step, the robot’s kinetic model is automatically defined by our algorithms, and our pipeline exports the model in the Unified Robotics Description Format (URDF) [58]. To enlarge the design space for optimization in Sec. 3.3, we scale each leg in $0.5$\mathrm{cm}$$ increments to generate nine additional models. We further augment these models to thirty variants by defining each joint’s rotational axis along the $x,y,\textrm{or}~{}z$ axis of the joint coordinate.

3.3 Co-optimization of Morphology and Walking Policy

We propose an evolutionary algorithm (Fig. 5A) with a dual-loop architecture to optimize both robot morphology and control policy while simultaneously incorporating user preferences. The inner loop employs reinforcement learning to assess a robot’s capacity for acquiring a walking policy. The outer loop utilizes genetic operators to evolve the robot morphology from a design repository.

An initial population of robots will be trained for locomotion to track changing velocities along $x-y-yaw$ directions. The robot observations include the previous actions, base linear and angular velocities, joint positions and velocities, the gravity vector projected onto the robot’s coordinate system, and the target body velocity commands. The policy outputs the joint position offsets that will be converted to torques with a PD controller. The reward function consists of a baseline reward ( $r_{\textrm{baseline}}$ ) and optional user-adjustable reward terms based on preference. $r_{\textrm{baseline}}$ minimizes velocity tracking error, maintains a plausible robot pose, and penalizes excessive joint torque, accelerations, frequent step** behavior, and abrupt action changes. The user-specified components are the linear velocity tracking reward and joint power penalty. The per-step reward can be defined as:

r=\underbrace{\alpha_{1}e^{-0.25{{\left\|\mathbf{v}^{*}_{xy}-\mathbf{v}_{xy}% \right\|}^{2}}}}_{\text{linear velocity tracking}}+\underbrace{\alpha_{2}\sum_% {i=0}^{n}{\left\|\dot{q}_{i}\tau_{i}\right\|}}_{\text{joint power}}+r_{\text{% baseline}}

(1)

where $\mathbf{v}^{*}_{xy}$ is the commanded base linear velocity in $xy$ direction, and $\mathbf{v}_{xy}$ is the actual base linear velocity, $\dot{\mathbf{q}}$ is the joint velocities, $\mathbf{\tau}$ is the joint torques, $n$ is the number of joints and $\alpha_{1}$ and $\alpha_{2}$ are the weights of the respective reward terms. $r_{\text{baseline}}$ are listed in the supplementary material.

We select the top $100$ robots for each generation and create another $100$ new robots with genetic operators. 50 of the new robots are created through mutations, and the other 50 are created through crossovers. Crossover is achieved by duplicating a random robot in the current generation and choosing with a 50% chance to swap a joint or a limb with another random robot in the current generation. For the mutations, the robot can undergo a change in limb length, limb shape, body shape, joint axis or remain the same with the possibility of 15%, 15%, 25%, 40%, and 5%, respectively. A constraint of symmetry is imposed on the robot based on the observation of natural animal characteristics and the practicality of our implementation. To achieve this, the same change is applied to all legs or joints to unify the joint type and length or shape of the limb within the same body level. Although a larger design space can be achieved without this constraint, this ensures our robot has a stable starting pose at the beginning of the inner loop training and improves the training quality.

By defining the fitness score based on different evaluation metrics, our outer loop can prioritize different performances for robot selection and optimize both morphology and walking policy toward user preference. We design two criteria for users to prioritize: (1) Velocity tracking. We scale the velocity tracking reward by 20 and add it to the total reward. (2) Energy efficiency. We scale the energy penalty by 10 and add it to the total reward.

Implementation Details We trained our multi-directional walking policy using the Proximal Policy Optimization (PPO) algorithm [59], following the implementation of parallel reinforcement learning described in [60, 61]. Each robot was trained for $2.46\times 10^{7}$ steps ( a few minutes) before evaluation. We extended the open-source IsaacGymEnvs simulation environment [62] for our parallel training with $4096$ environments per robot. Both our actor and critic networks employ three fully connected layers with dimensions of [512, 256, 128] and ELU activation functions. All training were performed on servers with NVIDIA RTX A6000, NVIDIA A100, and NVIDIA GeForce RTX 3090 GPUs. Comprehensive details on the specific training hyperparameters are in the supplementary material.

3.4 Physical Assembly

All robots were printed on the Creality CR-10 Smart Pro 3D printer due to its large printing bed. We use the Hiwonder HTD-45H High Voltage Serial Bus Servo to actuate our robots. We list detailed information on other electric components in supplementary material. The resulting robot assembly can be completed in minutes due to our modular design and careful considerations of electronic component placement and manufacturability.

4 Experiment

In this section, we aim to evaluate Text2Robot from various aspects. First, we assess its ability to generate robot designs that align with diverse user-specified aesthetics. Second, we highlight the key advantages of integrating generative models into the automated robot design workflow. Third, we evaluate the performance of co-optimizing body morphology and control policy while prioritizing different performance metrics as specified by users. Finally, we demonstrate the real-world applicability of Text2Robot by fabricating and showcasing the physical robots.

4.1 Aesthetic Specification Matching

To assess the effectiveness of Text2Robot in generating robot designs that align with diverse user-specified aesthetics, we evaluated the initial stages of the framework using a variety of input text prompts. This evaluation aimed to demonstrate the capabilities of our chosen generative model in the context of robot design. We tested sixteen diverse prompts, encompassing both animal-inspired descriptions (e.g., “dog”, “frog”) and more creative concepts (e.g., “bread”, “can”, “shoe”). The qualitative results in Fig. 6A demonstrate that the generated designs can capture the essence of the input descriptions while adhering to the fundamental structure of a quadrupedal robot. We further investigated achieving subtle variations in generated morphologies by tweaking prompts, producing unique robot designs for specific species like “Bug”, “Dog”, and “Frog”, as shown in Fig. 6B. For each of the above designs, we exported $30$ robots in total by our geometric processing stage with various leg lengths and joint orientations for subsequent experiments. Text2Robot can generate a robot mesh within $1\sim 2~{}$\mathrm{m}\mathrm{i}\mathrm{n}\mathrm{s}$$ and convert it into a URDF model in $30~{}$\mathrm{s}\mathrm{e}\mathrm{c}\mathrm{s}$$ .

4.2 Advantages of Incorporating Text-to-3D Generative Models

We hypothesize that one key advantage of integrating text-to-3D models is to provide a much stronger initialization for evolutionary design. Therefore, we compared with RoboGrammar [16] which enables a wide range of symmetrical robot designs based on simple primitive geometries as a baseline. To ensure a fair comparison, we filtered robots generated with their recursive graph grammar to only include quadrupedal robots with two joints per leg and with similar body lengths. We created URDF files for their robot and assigned the weights to match the weights of our robots. We then trained 150 RoboGrammar robots and 150 of our generated designs for the same amount of steps for a walking policy. Our 150 robots are randomly selected from 600 robots augmented from the sixteen prompts (Fig. 6A) and another four prompts in the bug species (Fig. 6B). Fig. 7 shows that our initialized robots outperformed the baseline initialized robots by a large margin.

We found most RoboGrammar robots suffer from unnatural placement and orientation of joints and links. Limbs on the same side are often too close together which compromises balance, and links protrude in the same direction as the joint axis, making the limb ineffective for locomotion. In contrast, Text2Robot leverages the knowledge of the physical world embedded within a text-to-3D generative model to produce quadrupedal robots with appropriate leg lengths, link and body proportions, and stable static initial postures.

4.3 Morphology and Walking Policy Co-optimization

Single Species Optimization We first co-optimized morphology and control with similar morphologies to “Bug”, “Frog” and “Dog” as shown in Fig 6. Each species contains five variants of morphology from similar prompts. In each of the three trials, we used the 150 robots augmented from the five robots as the initial population in our EA, and all robots were optimized based on the general reward in Eq. 1. Fig. 8 shows the emergence of adequate walking policies within only a few generations, indicating the effectiveness of our method in generating walking quadrupedal robots. We continued to evolve the robots for 20 generations, and the performance was further improved with evolved morphologies.

Increasing Diversity We then investigated the effect of increasing diversity in the robot bank. We used the 600 robots augmented from the sixteen prompts and an additional four prompts in the bug species. The final selected robot achieved higher rewards compared to the robot optimized with similar morphologies (Fig. 8). Text2Robot enables higher-quality designs simply by expanding the diversity of text descriptions. This demonstrates the potential of our method to scale up and achieve better performance with more diverse and creative text prompts.

4.4 Co-optimization with Preferences

We evaluate Text2Robot to optimize robot designs according to user-supplied priorities: energy or velocity tracking. We used all robots with their augmented sets from Fig. 6. We adjusted the calculation of the fitness score by scaling the reward with additional amplified energy or velocity contribution. Fig. 9 shows the results of diverse species based on each preference under 50 generations of evolution. The results of a single species are listed in the supplementary material.

In the optimization for robots from single species or diverse species, our method is able to optimize the robot performance per generation while considering the performance priorities. The results show a strong correlation between the top-performing robot and the targeted performance criteria, and the other performance criteria, which were not being optimized, appeared more random and sporadic. These results demonstrate that our robot is optimized to meet user preferences. We found the robots selected from velocity optimization generally have longer and wider bodies, while the robots selected from energy optimization have lower body weights. We also find that the performance of the diverse robot bank is better than that of single-species banks, indicating the scalability of our method.

4.5 Co-optimization for Rough Terrain

We applied Text2Robot to rough terrain and observed that it effectively informed evolved robot morphologies with higher-performing foot shapes. We used the robots from the diverse bank in Fig. 6 and applied a VHACD decomposition to increase the simulated realism of foot contact. Following prior curriculum design [62], robots are trained to progress through increasingly challenging terrains (smooth slope, rough slope, stairs, discrete, and step** stones)

(Fig. 10). Analysis of selected robot morphologies reveals a correlation between foot shape and terrain type, as illustrated in Fig. 10. Arched feet, favored in flat terrain trials, provide enhanced stability, speed, and efficient energy transfer due to their curvature. This advantage, however, hinges on predictable surface contact dynamics. We observed that the large foot dimensions of arched feet increase the risk of snagging on uneven terrain. Conversely, simple rounded feet, chosen for rough terrain, demonstrated superior adaptability to unpredictable surfaces, promoting stability and balance.

4.6 Physical Walking Robot

We selected the highest-performing “Bug” and “Frog” from the single-species optimization and the two robots from the diverse bank optimized for velocity tracking and energy efficiency to demonstrate our ability to fabricate the generated designs. Each robot required approximately a day to manufacture, but assembly (Fig. 11) was completed within minutes. We show a primitive Sim2Real transfer by simply playing the trained locomotion in simulation and directly executing the joint positions on the real robot. As shown in Fig. 11, our real robots successfully transfer the walking policy learned in simulation to the real world and achieve sufficient performance in locomotion and speed. This further validates the practicability and robustness of our design from our Text2Robot pipeline.

5 Conclusion

We introduce Text2Robot, which generates a physical quadrupedal walking robot from text prompts to match user-specified aesthetic and performance preferences. Text2Robot leverages generative models for stronger initialization than traditional methods, while converting visual meshes to movable robots with considerations in electronics and real-world manufacturability. Both simulated and physical experiments show Text2Robot’s ability to co-optimize morphology and control to produce physically functional machines.

Limitations and Future Work Text2Robot presents several opportunities for improvement in future research. While our current pipeline focuses on designing quadrupedal robots, future work can extend the scope to robots with varying numbers of joints or other types of electromechanical machines. Additionally, the current framework still requires manual assembly. Integrating our method with automated assembly algorithms to construct physical robots would be a significant advancement. Furthermore, our text prompts remain fixed once optimization begins. Exploring a feedback mechanism to refine mesh generation from the text-to-3D model based on reward signals could offer greater flexibility.

Acknowledgments

This work is supported by DARPA FoundSci program under award HR00112490372, by ARL STRONG program under awards W911NF2320182 and W911NF2220113.

References

Lipson and Pollack [2000] H. Lipson and J. B. Pollack. Automatic design and manufacture of robotic lifeforms. Nature, 406(6799):974–978, 2000.
Hornby et al. [2003] G. S. Hornby, H. Lipson, and J. B. Pollack. Generative representations for the automated design of modular physical robots. IEEE transactions on Robotics and Automation, 19(4):703–719, 2003.
Matthews et al. [2023] D. Matthews, A. Spielberg, D. Rus, S. Kriegman, and J. Bongard. Efficient automatic design of robots. Proceedings of the National Academy of Sciences, 120(41):e2305180120, 2023.
Wang et al. [2019] T. Wang, Y. Zhou, S. Fidler, and J. Ba. Neural graph evolution: Towards efficient automatic robot design. arXiv preprint arXiv:1906.05370, 2019.
Gehring et al. [2013] C. Gehring, S. Coros, M. Hutter, M. Bloesch, M. A. Hoepflinger, and R. Siegwart. Control of dynamic gaits for a quadrupedal robot. In 2013 IEEE international conference on Robotics and automation, pages 3287–3292. IEEE, 2013.
Carpentier and Wieber [2021] J. Carpentier and P.-B. Wieber. Recent progress in legged robots locomotion control. Current Robotics Reports, 2(3):231–238, 2021.
Ding et al. [2021] Y. Ding, A. Pandala, C. Li, Y.-H. Shin, and H.-W. Park. Representation-free model predictive control for dynamic motions in quadrupeds. IEEE Transactions on Robotics, 37(4):1154–1171, 2021.
Bledt et al. [2018] G. Bledt, M. J. Powell, B. Katz, J. Di Carlo, P. M. Wensing, and S. Kim. Mit cheetah 3: Design and control of a robust, dynamic quadruped robot. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2245–2252. IEEE, 2018.
Farshidian et al. [2017] F. Farshidian, M. Neunert, A. W. Winkler, G. Rey, and J. Buchli. An efficient optimal planning and control framework for quadrupedal locomotion. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 93–100, 2017. doi:10.1109/ICRA.2017.7989016.
Wang et al. [2012] S. Wang, W. Chaovalitwongse, and R. Babuska. Machine learning algorithms in bipedal robot control. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(5):728–743, 2012.
Lee et al. [2020] J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter. Learning quadrupedal locomotion over challenging terrain. Science robotics, 5(47):eabc5986, 2020.
Tsounis et al. [2020] V. Tsounis, M. Alge, J. Lee, F. Farshidian, and M. Hutter. Deepgait: Planning and control of quadrupedal gaits using deep reinforcement learning. IEEE Robotics and Automation Letters, 5(2):3699–3706, 2020. doi:10.1109/LRA.2020.2979660.
Ibarz et al. [2021] J. Ibarz, J. Tan, C. Finn, M. Kalakrishnan, P. Pastor, and S. Levine. How to train your robot with deep reinforcement learning: lessons we have learned. The International Journal of Robotics Research, 40(4-5):698–721, 2021.
Miki et al. [2022] T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter. Learning robust perceptive locomotion for quadrupedal robots in the wild. Science Robotics, 7(62):eabk2822, 2022.
Strgar et al. [2024] L. Strgar, D. Matthews, T. Hummer, and S. Kriegman. Evolution and learning in differentiable robots. arXiv preprint arXiv:2405.14712, 2024.
Zhao et al. [2020] A. Zhao, J. Xu, M. Konaković-Luković, J. Hughes, A. Spielberg, D. Rus, and W. Matusik. Robogrammar: graph grammar for terrain-optimized robot design. ACM Transactions on Graphics (TOG), 39(6):1–16, 2020.
Gupta et al. [2021] A. Gupta, S. Savarese, S. Ganguli, and L. Fei-Fei. Embodied intelligence via learning and evolution. Nature Communications, 12(1):5721, 2021. ISSN 2041-1723. doi:10.1038/s41467-021-25874-z. URL https://doi.org/10.1038/s41467-021-25874-z.
Cheney et al. [2014] N. Cheney, R. MacCurdy, J. Clune, and H. Lipson. Unshackling evolution: Evolving soft robots with multiple materials and a powerful generative encoding. SIGEVOLUTION, 7:11–23, 2014.
Cheney et al. [2015] N. Cheney, J. Bongard, and H. Lipson. Evolving soft robots in tight spaces. In Proceedings of the 2015 annual conference on Genetic and Evolutionary Computation, pages 935–942, 2015.
Bäck and Schwefel [1993] T. Bäck and H. Schwefel. An overview of evolutionary algorithms for parameter optimization. Evolutionary Computation, 1(1):1–23, 1993.
Thierens [1999] D. Thierens. Scalability problems of simple genetic algorithms. Evolutionary Computation, 7(4):331–352, 1999. doi:10.1162/evco.1999.7.4.331.
Cheney et al. [2016] N. Cheney, V. Sunspiral, J. Bongard, and H. Lipson. On the difficulty of co-optimizing morphology and control in evolved virtual creatures. Artificial Life Conference Proceedings, 13:226–233, 2016.
Regenwetter et al. [2022] L. Regenwetter, A. H. Nobari, and F. Ahmed. Deep generative models in engineering design: A review. Journal of Mechanical Design, 144(7):071704, 2022.
Akande et al. [2024] T. O. Akande, O. O. Alabi, and J. B. Oyinloye. A review of generative models for 3d vehicle wheel generation and synthesis. Journal of Computing Theories and Applications, 1(4):368–385, 2024.
Makatura et al. [2023] L. Makatura, M. Foshey, B. Wang, F. HähnLein, P. Ma, B. Deng, M. Tjandrasuwita, A. Spielberg, C. E. Owens, P. Y. Chen, et al. How can large language models help humans in design and manufacturing? arXiv preprint arXiv:2307.14377, 2023.
Feuerriegel et al. [2024] S. Feuerriegel, J. Hartmann, C. Janiesch, and P. Zschech. Generative ai. Business & Information Systems Engineering, 66(1):111–126, 2024.
Kazi et al. [2017] R. H. Kazi, T. Grossman, H. Cheong, A. Hashemi, and G. W. Fitzmaurice. Dreamsketch: Early stage 3d design explorations with sketching and generative design. In UIST, volume 14, pages 401–414, 2017.
Sanchez-Lengeling and Aspuru-Guzik [2018] B. Sanchez-Lengeling and A. Aspuru-Guzik. Inverse molecular design using machine learning: Generative models for matter engineering. Science, 361(6400):360–365, 2018.
Buonamici et al. [2020] F. Buonamici, M. Carfagni, R. Furferi, Y. Volpe, L. Governi, et al. Generative design: an explorative study. Computer-Aided Design and Applications, 18(1):144–155, 2020.
Liu et al. [2023] V. Liu, J. Vermeulen, G. Fitzmaurice, and J. Matejka. 3dall-e: Integrating text-to-image ai in 3d design workflows. ACM Transactions on Graphics, 41:1955–1977, 2023.
Liu et al. [2018] Z. Liu, D. Zhu, S. P. Rodrigues, K.-T. Lee, and W. Cai. Generative model for the inverse design of metasurfaces. Nano letters, 18(10):6570–6576, 2018.
Oh et al. [2019] S. Oh, Y. Jung, S. Kim, I. Lee, and N. Kang. Deep generative design: Integration of topology optimization and generative models. Journal of Mechanical Design, 141(11):111405, 2019.
Luo et al. [2022] S. Luo, Y. Su, X. Peng, S. Wang, J. Peng, and J. Ma. Antigen-specific antibody design and optimization with diffusion-based generative models for protein structures. Advances in Neural Information Processing Systems, 35:9754–9767, 2022.
Yang et al. [2023] L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, W. Zhang, B. Cui, and M.-H. Yang. Diffusion models: A comprehensive survey of methods and applications. ACM Computing Surveys, 56(4):1–39, 2023.
Wu et al. [2021] R. Wu, C. Xiao, and C. Zheng. Deepcad: A deep generative network for computer-aided design models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6772–6782, 2021.
Siddiqui et al. [2023] Y. Siddiqui, A. Alliegro, A. Artemov, T. Tommasi, D. Sirigatti, V. Rosov, A. Dai, and M. Nießner. Meshgpt: Generating triangle meshes with decoder-only transformers. arXiv preprint arXiv:2311.15475, 2023.
Nordmann et al. [2014] A. Nordmann, N. Hochgeschwender, and S. Wrede. A survey on domain-specific languages in robotics. In International conference on simulation, modeling, and programming for autonomous robots, pages 195–206. Springer, 2014.
Chen et al. [2021] M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, et al. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
Abdullah and Kamara [2013] H. Abdullah and J. Kamara. Parametric design procedures: a new approach to generative-form in the conceptual design phase. In AEI 2013: Building Solutions for Architectural Engineering, pages 334–343. 2013.
Hornby and Pollack [2001] G. S. Hornby and J. B. Pollack. The advantages of generative grammatical encodings for physical design. In Proceedings of the 2001 congress on evolutionary computation (ieee cat. no. 01th8546), volume 1, pages 600–607. IEEE, 2001.
Liu et al. [2024] J. Liu, X. Huang, T. Huang, L. Chen, Y. Hou, S. Tang, Z. Liu, W. Ouyang, W. Zuo, J. Jiang, et al. A comprehensive survey on 3d content generation. arXiv preprint arXiv:2402.01166, 2024.
Lin et al. [2023] C.-H. Lin, J. Gao, L. Tang, T. Takikawa, X. Zeng, X. Huang, K. Kreis, S. Fidler, M.-Y. Liu, and T.-Y. Lin. Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 300–309, 2023.
Poole et al. [2022] B. Poole, A. Jain, J. T. Barron, and B. Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
Chen et al. [2019] K. Chen, C. B. Choy, M. Savva, A. X. Chang, T. Funkhouser, and S. Savarese. Text2shape: Generating shapes from natural language by learning joint embeddings. In Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part III 14, pages 100–116. Springer, 2019.
Chen et al. [2023] R. Chen, Y. Chen, N. Jiao, and K. Jia. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22246–22256, 2023.
Park and Asada [1994] J.-H. Park and H. Asada. Concurrent design optimization of mechanical structure and control for high speed robots. Journal of Dynamic Systems, Measurement, and Control, 116(3):344–356, 1994.
Paul and Bongard [2001] C. Paul and J. Bongard. The road less travelled: morphology in the optimization of biped robot locomotion. In Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180), volume 1, pages 226–232 vol.1, 2001. doi:10.1109/IROS.2001.973363.
Geijtenbeek et al. [2013] T. Geijtenbeek, M. Van De Panne, and A. F. Van Der Stappen. Flexible muscle-based locomotion for bipedal creatures. ACM Transactions on Graphics (TOG), 32(6):1–11, 2013.
Ha et al. [2018] S. Ha, S. Coros, A. Alspach, J. Kim, and K. Yamane. Computational co-optimization of design parameters and motion trajectories for robotic systems. The International Journal of Robotics Research, 37(13-14):1521–1536, 2018.
Chen et al. [2020] T. Chen, Z. He, and M. Ciocarlie. Hardware as policy: Mechanical and computational co-optimization using deep reinforcement learning. arXiv preprint arXiv:2008.04460, 2020.
Xu et al. [2021] J. Xu, T. Chen, L. Zlokapa, M. Foshey, W. Matusik, S. Sueda, and P. Agrawal. An End-to-End Differentiable Framework for Contact-Aware Robot Design. In Proceedings of Robotics: Science and Systems, Virtual, July 2021. doi:10.15607/RSS.2021.XVII.008.
Zhanpeng and Matei [2024] H. Zhanpeng and C. Matei. Morph: Design co-optimization with reinforcement learning via a differentiable hardware model proxy. IEEE Robotics and Automation, 2024.
Doncieux et al. [2015] S. Doncieux, N. Bredeche, J.-B. Mouret, and A. E. Eiben. Evolutionary robotics: what, why, and where to. Frontiers in Robotics and AI, 2:4, 2015.
Alattas et al. [2019] R. J. Alattas, S. Patel, and T. M. Sobh. Evolutionary modular robotics: Survey and analysis. Journal of Intelligent & Robotic Systems, 95:815–828, 2019.
Lehman et al. [2020] J. Lehman, J. Clune, D. Misevic, C. Ofria, K. O. Ellefsen, J.-B. Mouret, and A. Bernatskiy. The surprising creativity of digital evolution: A collection of anecdotes from the evolutionary computation and artificial life research communities. Artificial Life, 26:274–306, 2020.
[56] Meshy. URL https://www.meshy.ai/.
[57] Autodesk fusion 360. URL https://www.autodesk.com/products/fusion-360.
Kitamura [2020] T. Kitamura. Fusion2urdf. https://github.com/syuntoku14/fusion2urdf, 2020.
Schulman et al. [2017] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
Rudin et al. [2022] N. Rudin, D. Hoeller, P. Reist, and M. Hutter. Learning to walk in minutes using massively parallel deep reinforcement learning. In Conference on Robot Learning, pages 91–100. PMLR, 2022.
Makoviichuk and Makoviychuk [2021] D. Makoviichuk and V. Makoviychuk. rl-games: A high-performance framework for reinforcement learning. https://github.com/Denys88/rl_games, May 2021.
Makoviychuk et al. [2021] V. Makoviychuk, L. Wawrzyniak, Y. Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, et al. Isaac gym: High performance gpu-based physics simulation for robot learning. arXiv preprint arXiv:2108.10470, 2021.

Supplementary Material

A. Single Species Optimization

We show performance optimization results for single-species robot banks, ’Bug’, ’Frog’, and ’Dog,’ and compare performance to that of the diverse bank. As in our diverse bank performance optimization experiment, we adjusted the fitness score calculation with an additional velocity reward or energy cost through 50 generations of evolution to demonstrate the effect of an input performance preference on final designs. As shown in Fig. 13, our results show a large correlation between the selected bot and the prioritized performance criteria. Fig. 12 shows the visualization of the selected robots, their average final rewards, and physical characteristics.

B. Reinforcement Learning Details

We show the detailed training parameters of the inner reinforcement learning loop in Tab. 1 , and the defination of symbols and baseline reward in Tab. 2 and Tab. 3.

Hyper-parameters	Values
Dense network shape	[512, 256, 128]
Dense network activation	elu
Discount factor	0.99
GAE discount factor	0.95
PPO loss clip range	0.2
Entropy coefficient	0.001
Learning rate $\alpha$	adaptive
Batch size	98304 (4096x24)
Mini-batch size	16384 (4096x4)
Mini epochs	5
Critic loss coefficient	2
KL-divergence threshold	0.008

Table 1: PPO hyper parameters

Base linear velocity	$\mathbf{v}$
Base angular velocity	$\omega$
Commanded base linear velocity	$\mathbf{v}^{*}$
Commanded base angular velocity	$\omega^{*}$
Joint positions	$\mathbf{q}$
Joint velocities	$\dot{\mathbf{q}}$
Joint accelerations	$\ddot{\mathbf{q}}$
Target joint positions	$\mathbf{q}^{*}$
Joint torques	$\mathbf{\tau}$
Number of joints	$n$
Number of feet	$n_{f}$
Feet air time	$t_{air}$
Feet stance time	$t_{stance}$
Base gravity	$\mathbf{g}_{b}$
Environment time step	$dt$

Table 2: Definition of symbols.

baseline reward terms	definition	weight $[*dt]$
Linear velocity tracking	$e^{-0.25{{\left\\|\mathbf{v}^{*}_{xy}-\mathbf{v}_{xy}\right\\|}^{2}}}$	$1$
Angular velocity tracking	$e^{-0.25{{\left\\|\mathbf{\omega}^{*}_{z}-\mathbf{\omega}_{z}\right\\|}^{2}}}$	$0.5$
Linear velocity penalty	$v^{2}_{z}$	$-4$
Angular velocity penalty	$\left\\|\mathbf{\omega}_{xy}\right\\|^{2}$	$-0.05$
Joint acceleration penalty	$\left\\|\ddot{\mathbf{q}}\right\\|^{2}$	$-5^{-7}$
Joint torque penalty	$\left\\|\mathbf{\tau}\right\\|^{2}$	$-2^{-5}$
Action rate penalty	$\left\\|\mathbf{\dot{a}}\right\\|^{2}$	$-5^{-5}$
orientation	$\left\\|\mathbf{g}_{b,xy}\right\\|^{2}$	-0.5
Feet air time	$\sum_{k=0}^{n_{f}}(t_{air,k}-0.5)$	$0.1$
Feet stance time	$\sum_{k=0}^{n_{f}}(t_{stance,k}-0.5)$	$0.1$

Table 3: Definition of baseline reward (

r_{\text{baseline}}

) terms. The baseline reward contains base velocity and orientation tracking terms, action rate penalty, and joint torque penalty; the air time and stance time reward encourages longer air time and stand time to promote a more natural and fluid walking gait.

C. Full Hardware Specifications

We manufacture our robots using Creality CR-10 Smart Pro 3D printers. Robots are assembled using the 3D printed parts, as well as various electronic components which are easily inserted in the design. A comprehensive list of materials used in the construction of our robot is outlined in Tab. 4.

Component	Model	Quantity per robot
Microcontroller	Raspberry Pi $4$ Model B	$1$
Battery	Povway $5200\ \text{mA}\ \text{Lipo}\ 3S\ 11.1V\ 50C$	$1$
Servo Motor	Hiwonder $HTD-45H$ High Voltage Serial Bus Servo $45\ \text{KG}$	$8$
DC to DC Power Converter	DROK $10A$ Synchronous Step-Down Voltage Regulator DC-DC $4-30V$ to $1.2-30V\ 12V$	$1$
PLA Filament	Ender PLA 3D Printer PLA Filament $1.75\ \text{mm}\ 1KG\ (2.2\ \text{lbs})\ \text{Spool PLA White}$	$1$
Motor Controller	Hiwonder TTL / USB Debugging Board	$1$

Table 4: List of materials used in the construction of physical robots

Text2Robot: Evolutionary Robot Design from Text Descriptions