\sidecaptionvpos

figurec

Text2Robot:
Evolutionary Robot Design from Text Descriptions

Ryan P. Ringel Zachary S. Charlick Jiaxun Liu Boxi Xia Boyuan Chen
Duke University
https://generalroboticslab.com/Text2Robot
Abstract

Robot design has traditionally been costly and labor-intensive. Despite advancements in automated processes, it remains challenging to navigate a vast design space while producing physically manufacturable robots. We introduce Text2Robot, a framework that converts user text specifications and performance preferences into physical quadrupedal robots. Within minutes, Text2Robot can use text-to-3D models to provide strong initializations of diverse morphologies. Within a day, our geometric processing algorithms and body-control co-optimization produce a walking robot by explicitly considering real-world electronics and manufacturability. Text2Robot enables rapid prototy** and opens new opportunities for robot design with generative models.

**footnotetext: These authors contributed equally to this work

Keywords: Robot Design, Generative Model, Legged Robots

1 Introduction

Refer to caption
Figure 1: Text2Robot creates physical robots from user-specified text prompts and performance preferences while considering real-world electronics and manufacturability.

For over half a century, robot design has been a costly and labor-intensive process, requiring extensive human efforts from initial sketches to detailed modeling, prototy**, controller design, manufacturing, and testing. This traditional approach has significant limitations, such as prohibitive costs, lengthy development cycles, and constraints on innovation bounded by human imagination and manual capabilities. However, advancements in automated robot design [1, 2, 3, 4] promise to revolutionize this landscape. By automating key aspects of the design process, we can drastically reduce development time and costs, allowing industries to rapidly produce specialized robots and enabling engineers to establish efficient manufacturing processes. Researchers also benefit by quickly innovating desired hardware platforms. Ultimately, automating robot design not only enables rapid prototy** but also expands the realm of possible innovations, surpassing the boundaries of what human designers can envision and create.

One major challenge in automating robot design is navigating the vast and intricate design space. Traditional engineering design is time-intensive and demands considerable technical expertise. While advancements in control engineering [5, 6, 7, 8, 9] and machine learning [10, 11, 12, 13, 14] have enabled automatic training of robot policies, designing morphologies remains laborious. Human designers typically spend months conceptualizing, designing, and fabricating a robot, balancing cost, manufacturability, and performance. Previous automation attempts often simplify the design space by using large repeating modules [1, 15, 16, 17] or voxel representations [18, 19]. Though innovative, these designs are slow due to the need to search within a vast design space and do not consider real-world fabrication, resulting in theoretically sound but impractical designs to produce.

Automated methods for robot design predominately involve Evolutionary Algorithms (EAs) inspired by natural evolution [20]. However, EA-based approaches are inherently slow, starting with random solutions and iterating through hundreds of generations. Moreover, existing solutions do not scale well with increasing design complexity [21] and face significant challenges in balancing multiple objectives, such as control and morphology co-optimization [17, 22]. Consequently, while these solutions may excel in simulations, they frequently fail to address practical issues such as sim2real transfer [16, 17, 18] and manufacturability. Problems like high current draw from unrealistic degrees of freedom [16, 17] or complex morphologies that are difficult to manufacture [16, 17, 18] hinder the transition from theoretical designs to practical and producible robots.

We present Text2Robot (Fig. 2), a comprehensive “A-to-Z” framework from user text specifications to physical walking robots. Our approach utilizes recent advancements in text-to-3D generative models to create initial mesh designs, which are subsequently converted into kinetic robot models through our geometric processing algorithms. Within minutes we can generate a design, within an hour, a robot trained in simulation, and within a day, a fabricated walking robot. Our system not only fulfills users’ aesthetic preferences but further optimizes the designs using an evolutionary algorithm to incorporate other performance preferences. Our key insight is that text-to-3D generative models can provide a much stronger starting point for the evolutionary algorithm, significantly accelerating the optimization process. Experiments in both simulation and the physical world demonstrate our ability to specify both aesthetic qualities and performance metrics, such as velocity tracking and energy efficiency. Overall, our approach introduces the creative and artistic nature of generative models to automated robot design with fast prototy** from just a text prompt, and has the potential to open up novel opportunities in rapid design and manufacturing.

2 Related Work

Generative Models for Design Generative AI aims to enhance design efficiency [23, 24] and reduce labor costs [25, 26] by automating tasks [27] or portions of the workflows [28, 29, 30]. Recent advances in generative models have led to the integration of generative adversarial networks (GANs) [31, 32], diffusion models [33, 34], and transformers [35, 36] into the generative design process. Previous research has utilized generative models for domain-specific code creation [37, 38] or parameter-based physical designs [39, 40]. Such systems often heavily rely on human feedback and tuning during the design process. In contrast, our work focuses on directly enabling artificial intelligence-generated content (AIGC) for physically embodied and functional robot design.

Text-to-3D Models Recent advancements in text-to-3D generative models [41] utilize pre-trained text-to-image diffusion models to optimize Neural Radiance Fields (NeRF) [42, 43], text-to-3D shape embeddings in a GANs framework [44], or explicit and hybrid scene representations [45] to create 3D generated content. Despite their growing capabilities, current text-to-3D models are primarily used for visualization in graphics and not for the creation of functioning machines. Our work incorporates 3D AIGC into physical robot design while considering electrical components and manufacturability constraints.

Automated Robot Design Automated robot design often involves co-optimizing control and morphology using classical methods and analytical dynamics [46, 47, 48, 49]. These approaches rely on highly parameterized designs and specified models, which limit creativity and practicality in real-world environments. Recent studies employ auto-differentiable hardware policies [50, 51, 52] for co-optimization, but their designs focus on robots with limited degrees of freedom and often overlook real-world manufacturability and electronics. Evolutionary Algorithms (EA) have been widely used [53] to explore vast design spaces by combining modular building blocks [1, 15, 16, 54] or voxels [18, 19, 22] through genetic operators to co-evolve morphology and control to produce complex and unexpected [55] designs. However, EA-based approaches are computationally expensive, typically starting from random solutions and requiring numerous generations to produce effective designs. Our work leverages the open-ended nature of EA-based approaches for design evolution but drastically accelerates the process through strong initializations by incorporating generative models.

3 The Text2Robot Framework

Refer to caption
Figure 2: Overview of the four steps in Text2Robot framework.

Text2Robot (Fig. 2) generates a physical walking quadrupedal robot that caters to a user’s text description and performance priorities, such as energy efficiency or velocity tracking accuracy. There are four major components in Text2Robot: (1) A generative model to create static 3D meshes of robots given user-specified texts. (2) A set of geometric processing algorithms to convert the static meshes into kinetic models, including the necessary components for fabrication. (3) An optimization process based on evolutionary algorithms and reinforcement learning to further optimize the robot morphologies and walking policies according to the user’s performance preferences. (4) A final optimized robot is quickly 3D-printed and assembled.

3.1 Mesh Generation from Text Prompts

Given a text prompt specifying the aesthetic of the robot, we generate a 3D mesh of the robot using a text-to-3D model. In this paper, we used one of the state-of-the-art models, Meshy [56], that takes in the text prompt and produces several candidate meshes. One high-level assumption in this paper is to demonstrate our framework by automating the design of quadrupedal robots with eight motors. Having such constraints mimics the typical real-world design requirements without the loss of generality of our framework. We implemented a structured prompt design based on specified user descriptions to ensure Meshy consistently outputs quadrupedal meshes.

The user provides a text description of their desired robot in one to three words, which we incorporate into the following prompt format: <Quadrupedal walking robot resembling a "User-Provided Description">. The generated candidate meshes are then manually filtered according to the following constraints: (1) the mesh must be continuous without disjoint bodies; (2) the mesh must exhibit bilateral symmetry; and (3) the mesh must include four legs.

3.2 Kinetic Robot Model from Static Meshes

Current text-to-3D models only produce static meshes for visualization purposes. Our key challenge is to automatically convert such static meshes into kinetic robot models. Importantly, unlike most simulated robot models that are simplified for fast simulation, to automatically transfer our designs to physical functioning robots, our generated robot model should also consider real-world manufacturing factors such as the placement of electronic components, wire connections, physical collisions at joints, limits of the number of motors, and manufacturability.

Mesh Repair and Preprocessing Due to the lack of realistic constraints on the text-to-3D models, the generated mesh can have errors such as being non-watertight. We first call the mesh repair API through Fusion 360 [57] to repair the mesh for the downstream workflow. We then leverage the mesh conversion operation to convert the mesh to an organic BREP (Boundary Representation) body. We scale the BREP body to a volume of 6300cm36300centimeter36300${\mathrm{cm}}^{3}$6300 power start_ARG roman_cm end_ARG start_ARG 3 end_ARG to unify the initial mass of all robots.

Joint Allocation Deciding a set of feasible joint positions for a quadrupedal robot with eight motors purely from the mesh model is difficult. Inspired by natural quadrupedal animals, we assume that our robots have four legs and two movable joints for each leg. In other words, our robots have four shoulder joints and four knee joints, dividing each leg into an upper and lower leg.

We determine the position and orientation of each joint based on the mesh model’s geometric features (Fig. 3 A and B). The body’s origin is defined as the center of mass, assuming uniform mass distribution. Starting from the origin along the +y𝑦+y+ italic_y and y𝑦-y- italic_y direction, our algorithm creates vertical slices parallel to the xz𝑥𝑧xzitalic_x italic_z plane and records the cross-sectional area at each step. The slice closest to the origin with the minimum cross-sectional area is mirrored to the other side of the origin. The two slicing planes separate the mesh model into four legs and one body, defined as the base link. The origins of the four shoulder joints correspond to the centroids of their intersection profiles, with z𝑧zitalic_z-axes perpendicular to the slicing plane and y𝑦yitalic_y-axes pointing downwards. Similarly, knee joints are located at the slice planes with maximum cross-sectional area, traversing along the xy𝑥𝑦xyitalic_x italic_y plane for each leg. We traverse from the bottom of the base link and stop 2cm2centimeter2$\mathrm{cm}$2 roman_cm from the ground plane to ensure space for motor placement. The origins of the knee joints correspond to the centroids of each intersection profile, with z𝑧zitalic_z-axes perpendicular to their slicing planes and x𝑥xitalic_x-axes parallel to the slicing planes pointing in the robot’s facing direction. We follow the right-hand coordinate systems.

Refer to caption
Figure 3: Geometric Processing. (A) Heat maps to visualize the cross-section area created from the XZ plane (left) and XY plane (right). The arrow indicates the position of the selected slicing plane with the local minimum area or the local maximum area. (B) The selected planes for slicing the mesh model and the coordinate of the center of mass (left), the resulting nine robot bodies and demonstration of the joint coordinate system (middle), and the final robot model with extruded boxes to accommodate electronic components (right).

Electronic Component Placement To create physically realistic robots for real-world manufacturing and walking, we need to consider the placements of electronic components, including

Refer to caption
Figure 4: The modular design for the motor and electronic components, the resulting robot design tailored for these modular components, and the final robot assembly.

actuators, batteries, and controllers. As in Fig. 4, servo motors are housed in a 3D-printed box with snap-in pegs inspired by toy building blocks. Motors can be rotated in their casings to achieve the desired axis of rotation during assembly. A larger box encases other necessary electronics, such as the motor controller, Raspberry Pi, and battery, and can be easily slid into a central channel in the base link. To avoid self-collision and reserve space for motor and electronics modules, we offset the eight limbs by 4cm4centimeter4$\mathrm{cm}$4 roman_cm and cut extrusions.

By this step, the robot’s kinetic model is automatically defined by our algorithms, and our pipeline exports the model in the Unified Robotics Description Format (URDF) [58]. To enlarge the design space for optimization in Sec. 3.3, we scale each leg in 0.5cm0.5centimeter0.5$\mathrm{cm}$0.5 roman_cm increments to generate nine additional models. We further augment these models to thirty variants by defining each joint’s rotational axis along the x,y,orz𝑥𝑦or𝑧x,y,\textrm{or}~{}zitalic_x , italic_y , or italic_z axis of the joint coordinate.

3.3 Co-optimization of Morphology and Walking Policy

We propose an evolutionary algorithm (Fig. 5A) with a dual-loop architecture to optimize both robot morphology and control policy while simultaneously incorporating user preferences. The inner loop employs reinforcement learning to assess a robot’s capacity for acquiring a walking policy. The outer loop utilizes genetic operators to evolve the robot morphology from a design repository.

An initial population of robots will be trained for locomotion to track changing velocities along xyyaw𝑥𝑦𝑦𝑎𝑤x-y-yawitalic_x - italic_y - italic_y italic_a italic_w directions. The robot observations include the previous actions, base linear and angular velocities, joint positions and velocities, the gravity vector projected onto the robot’s coordinate system, and the target body velocity commands. The policy outputs the joint position offsets that will be converted to torques with a PD controller. The reward function consists of a baseline reward (rbaselinesubscript𝑟baseliner_{\textrm{baseline}}italic_r start_POSTSUBSCRIPT baseline end_POSTSUBSCRIPT) and optional user-adjustable reward terms based on preference. rbaselinesubscript𝑟baseliner_{\textrm{baseline}}italic_r start_POSTSUBSCRIPT baseline end_POSTSUBSCRIPT minimizes velocity tracking error, maintains a plausible robot pose, and penalizes excessive joint torque, accelerations, frequent step** behavior, and abrupt action changes. The user-specified components are the linear velocity tracking reward and joint power penalty. The per-step reward can be defined as:

r=α1e0.25𝐯xy𝐯xy2linear velocity tracking+α2i=0nq˙iτijoint power+rbaseline𝑟subscriptsubscript𝛼1superscript𝑒0.25superscriptnormsubscriptsuperscript𝐯𝑥𝑦subscript𝐯𝑥𝑦2linear velocity trackingsubscriptsubscript𝛼2superscriptsubscript𝑖0𝑛normsubscript˙𝑞𝑖subscript𝜏𝑖joint powersubscript𝑟baseliner=\underbrace{\alpha_{1}e^{-0.25{{\left\|\mathbf{v}^{*}_{xy}-\mathbf{v}_{xy}% \right\|}^{2}}}}_{\text{linear velocity tracking}}+\underbrace{\alpha_{2}\sum_% {i=0}^{n}{\left\|\dot{q}_{i}\tau_{i}\right\|}}_{\text{joint power}}+r_{\text{% baseline}}italic_r = under⏟ start_ARG italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - 0.25 ∥ bold_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_x italic_y end_POSTSUBSCRIPT - bold_v start_POSTSUBSCRIPT italic_x italic_y end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT linear velocity tracking end_POSTSUBSCRIPT + under⏟ start_ARG italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ end_ARG start_POSTSUBSCRIPT joint power end_POSTSUBSCRIPT + italic_r start_POSTSUBSCRIPT baseline end_POSTSUBSCRIPT (1)

where 𝐯xysubscriptsuperscript𝐯𝑥𝑦\mathbf{v}^{*}_{xy}bold_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_x italic_y end_POSTSUBSCRIPT is the commanded base linear velocity in xy𝑥𝑦xyitalic_x italic_y direction, and 𝐯xysubscript𝐯𝑥𝑦\mathbf{v}_{xy}bold_v start_POSTSUBSCRIPT italic_x italic_y end_POSTSUBSCRIPT is the actual base linear velocity, 𝐪˙˙𝐪\dot{\mathbf{q}}over˙ start_ARG bold_q end_ARG is the joint velocities, τ𝜏\mathbf{\tau}italic_τ is the joint torques, n𝑛nitalic_n is the number of joints and α1subscript𝛼1\alpha_{1}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and α2subscript𝛼2\alpha_{2}italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are the weights of the respective reward terms. rbaselinesubscript𝑟baseliner_{\text{baseline}}italic_r start_POSTSUBSCRIPT baseline end_POSTSUBSCRIPT are listed in the supplementary material.

We select the top 100100100100 robots for each generation and create another 100100100100 new robots with genetic operators. 50 of the new robots are created through mutations, and the other 50 are created through crossovers. Crossover is achieved by duplicating a random robot in the current generation and choosing with a 50% chance to swap a joint or a limb with another random robot in the current generation. For the mutations, the robot can undergo a change in limb length, limb shape, body shape, joint axis or remain the same with the possibility of 15%, 15%, 25%, 40%, and 5%, respectively. A constraint of symmetry is imposed on the robot based on the observation of natural animal characteristics and the practicality of our implementation. To achieve this, the same change is applied to all legs or joints to unify the joint type and length or shape of the limb within the same body level. Although a larger design space can be achieved without this constraint, this ensures our robot has a stable starting pose at the beginning of the inner loop training and improves the training quality.

By defining the fitness score based on different evaluation metrics, our outer loop can prioritize different performances for robot selection and optimize both morphology and walking policy toward user preference. We design two criteria for users to prioritize: (1) Velocity tracking. We scale the velocity tracking reward by 20 and add it to the total reward. (2) Energy efficiency. We scale the energy penalty by 10 and add it to the total reward.

Refer to caption
Figure 5: Morphology and Walking Policy Co-optimization. (A) The inner loop implements reinforcement learning to optimize the robot control policy, and the outer loop optimizes the robot morphologies through genetic operations. (B) Our genetic representation and examples of crossover and mutation operation.

Implementation Details We trained our multi-directional walking policy using the Proximal Policy Optimization (PPO) algorithm [59], following the implementation of parallel reinforcement learning described in [60, 61]. Each robot was trained for 2.46×1072.46superscript1072.46\times 10^{7}2.46 × 10 start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT steps ( a few minutes) before evaluation. We extended the open-source IsaacGymEnvs simulation environment [62] for our parallel training with 4096409640964096 environments per robot. Both our actor and critic networks employ three fully connected layers with dimensions of [512, 256, 128] and ELU activation functions. All training were performed on servers with NVIDIA RTX A6000, NVIDIA A100, and NVIDIA GeForce RTX 3090 GPUs. Comprehensive details on the specific training hyperparameters are in the supplementary material.

3.4 Physical Assembly

All robots were printed on the Creality CR-10 Smart Pro 3D printer due to its large printing bed. We use the Hiwonder HTD-45H High Voltage Serial Bus Servo to actuate our robots. We list detailed information on other electric components in supplementary material. The resulting robot assembly can be completed in minutes due to our modular design and careful considerations of electronic component placement and manufacturability.

4 Experiment

In this section, we aim to evaluate Text2Robot from various aspects. First, we assess its ability to generate robot designs that align with diverse user-specified aesthetics. Second, we highlight the key advantages of integrating generative models into the automated robot design workflow. Third, we evaluate the performance of co-optimizing body morphology and control policy while prioritizing different performance metrics as specified by users. Finally, we demonstrate the real-world applicability of Text2Robot by fabricating and showcasing the physical robots.

Refer to caption
Figure 6: Generated Meshes and Corresponding User Descriptions. (A) Sixteen robot mesh models generated from our structured prompt with diverse user descriptions. (B) We used the same or similar descriptions to generate four other morphology variants for bug, frog, and dog robots.

4.1 Aesthetic Specification Matching

To assess the effectiveness of Text2Robot in generating robot designs that align with diverse user-specified aesthetics, we evaluated the initial stages of the framework using a variety of input text prompts. This evaluation aimed to demonstrate the capabilities of our chosen generative model in the context of robot design. We tested sixteen diverse prompts, encompassing both animal-inspired descriptions (e.g., “dog”, “frog”) and more creative concepts (e.g., “bread”, “can”, “shoe”). The qualitative results in Fig. 6A demonstrate that the generated designs can capture the essence of the input descriptions while adhering to the fundamental structure of a quadrupedal robot. We further investigated achieving subtle variations in generated morphologies by tweaking prompts, producing unique robot designs for specific species like “Bug”, “Dog”, and “Frog”, as shown in Fig. 6B. For each of the above designs, we exported 30303030 robots in total by our geometric processing stage with various leg lengths and joint orientations for subsequent experiments. Text2Robot can generate a robot mesh within 12minssimilar-to12mins1\sim 2~{}$\mathrm{m}\mathrm{i}\mathrm{n}\mathrm{s}$1 ∼ 2 roman_mins and convert it into a URDF model in 30secs30secs30~{}$\mathrm{s}\mathrm{e}\mathrm{c}\mathrm{s}$30 roman_secs.

4.2 Advantages of Incorporating Text-to-3D Generative Models

Refer to caption
Figure 7: Reward comparison of the Text2Robot and RoboGrammar robots and visualization of the best-performing and worst-performing robots.

We hypothesize that one key advantage of integrating text-to-3D models is to provide a much stronger initialization for evolutionary design. Therefore, we compared with RoboGrammar [16] which enables a wide range of symmetrical robot designs based on simple primitive geometries as a baseline. To ensure a fair comparison, we filtered robots generated with their recursive graph grammar to only include quadrupedal robots with two joints per leg and with similar body lengths. We created URDF files for their robot and assigned the weights to match the weights of our robots. We then trained 150 RoboGrammar robots and 150 of our generated designs for the same amount of steps for a walking policy. Our 150 robots are randomly selected from 600 robots augmented from the sixteen prompts (Fig. 6A) and another four prompts in the bug species (Fig. 6B). Fig. 7 shows that our initialized robots outperformed the baseline initialized robots by a large margin.

We found most RoboGrammar robots suffer from unnatural placement and orientation of joints and links. Limbs on the same side are often too close together which compromises balance, and links protrude in the same direction as the joint axis, making the limb ineffective for locomotion. In contrast, Text2Robot leverages the knowledge of the physical world embedded within a text-to-3D generative model to produce quadrupedal robots with appropriate leg lengths, link and body proportions, and stable static initial postures.

4.3 Morphology and Walking Policy Co-optimization

Refer to caption
Figure 8: General Reward Optimization. The reward of the best robot per generation and the morphology of the best robot in the last generation.

Single Species Optimization We first co-optimized morphology and control with similar morphologies to “Bug”, “Frog” and “Dog” as shown in Fig 6. Each species contains five variants of morphology from similar prompts. In each of the three trials, we used the 150 robots augmented from the five robots as the initial population in our EA, and all robots were optimized based on the general reward in Eq. 1. Fig. 8 shows the emergence of adequate walking policies within only a few generations, indicating the effectiveness of our method in generating walking quadrupedal robots. We continued to evolve the robots for 20 generations, and the performance was further improved with evolved morphologies.

Increasing Diversity We then investigated the effect of increasing diversity in the robot bank. We used the 600 robots augmented from the sixteen prompts and an additional four prompts in the bug species. The final selected robot achieved higher rewards compared to the robot optimized with similar morphologies (Fig. 8). Text2Robot enables higher-quality designs simply by expanding the diversity of text descriptions. This demonstrates the potential of our method to scale up and achieve better performance with more diverse and creative text prompts.

4.4 Co-optimization with Preferences

Refer to caption
Figure 9: Optimization Based on Preference with Diverse Robot Species. The scaled reward represents the fitness score evolutionary loop. It consists of the original reward received by the robots during inner loop training and the scaled contribution based on energy or velocity, depending on the user’s preference. The best-performing robots are marked with dark blue, and the rest of the top 100 robots are marked with light blue in the figure.

We evaluate Text2Robot to optimize robot designs according to user-supplied priorities: energy or velocity tracking. We used all robots with their augmented sets from Fig. 6. We adjusted the calculation of the fitness score by scaling the reward with additional amplified energy or velocity contribution. Fig. 9 shows the results of diverse species based on each preference under 50 generations of evolution. The results of a single species are listed in the supplementary material.

In the optimization for robots from single species or diverse species, our method is able to optimize the robot performance per generation while considering the performance priorities. The results show a strong correlation between the top-performing robot and the targeted performance criteria, and the other performance criteria, which were not being optimized, appeared more random and sporadic. These results demonstrate that our robot is optimized to meet user preferences. We found the robots selected from velocity optimization generally have longer and wider bodies, while the robots selected from energy optimization have lower body weights. We also find that the performance of the diverse robot bank is better than that of single-species banks, indicating the scalability of our method.

4.5 Co-optimization for Rough Terrain

We applied Text2Robot to rough terrain and observed that it effectively informed evolved robot morphologies with higher-performing foot shapes. We used the robots from the diverse bank in Fig. 6 and applied a VHACD decomposition to increase the simulated realism of foot contact. Following prior curriculum design [62], robots are trained to progress through increasingly challenging terrains (smooth slope, rough slope, stairs, discrete, and step** stones)

Refer to caption
Figure 10: Rough Terrain Optimization. (A) The selected robot traversing rough terrain. (B) Robots optimized for flat terrain (left and middle) evolve to have larger arcs for feet, while the robot evolved for rough terrain (right) has smaller, simple rounded feet.

(Fig. 10). Analysis of selected robot morphologies reveals a correlation between foot shape and terrain type, as illustrated in Fig. 10. Arched feet, favored in flat terrain trials, provide enhanced stability, speed, and efficient energy transfer due to their curvature. This advantage, however, hinges on predictable surface contact dynamics. We observed that the large foot dimensions of arched feet increase the risk of snagging on uneven terrain. Conversely, simple rounded feet, chosen for rough terrain, demonstrated superior adaptability to unpredictable surfaces, promoting stability and balance.

4.6 Physical Walking Robot

We selected the highest-performing “Bug” and “Frog” from the single-species optimization and the two robots from the diverse bank optimized for velocity tracking and energy efficiency to demonstrate our ability to fabricate the generated designs. Each robot required approximately a day to manufacture, but assembly (Fig. 11) was completed within minutes. We show a primitive Sim2Real transfer by simply playing the trained locomotion in simulation and directly executing the joint positions on the real robot. As shown in Fig. 11, our real robots successfully transfer the walking policy learned in simulation to the real world and achieve sufficient performance in locomotion and speed. This further validates the practicability and robustness of our design from our Text2Robot pipeline.

Refer to caption
Figure 11: Real Robot Performance. (A) The four real robots manufactured for sim2real experiments. (B) We play the best robot policy in simulation with a goal speed of 0.1m/s0.1ms0.1~{}$\mathrm{m}\mathrm{/}\mathrm{s}$0.1 roman_m / roman_s in a straight direction, and the real robot executes the same position command in the real world.

5 Conclusion

We introduce Text2Robot, which generates a physical quadrupedal walking robot from text prompts to match user-specified aesthetic and performance preferences. Text2Robot leverages generative models for stronger initialization than traditional methods, while converting visual meshes to movable robots with considerations in electronics and real-world manufacturability. Both simulated and physical experiments show Text2Robot’s ability to co-optimize morphology and control to produce physically functional machines.

Limitations and Future Work Text2Robot presents several opportunities for improvement in future research. While our current pipeline focuses on designing quadrupedal robots, future work can extend the scope to robots with varying numbers of joints or other types of electromechanical machines. Additionally, the current framework still requires manual assembly. Integrating our method with automated assembly algorithms to construct physical robots would be a significant advancement. Furthermore, our text prompts remain fixed once optimization begins. Exploring a feedback mechanism to refine mesh generation from the text-to-3D model based on reward signals could offer greater flexibility.

Acknowledgments

This work is supported by DARPA FoundSci program under award HR00112490372, by ARL STRONG program under awards W911NF2320182 and W911NF2220113.

References

  • Lipson and Pollack [2000] H. Lipson and J. B. Pollack. Automatic design and manufacture of robotic lifeforms. Nature, 406(6799):974–978, 2000.
  • Hornby et al. [2003] G. S. Hornby, H. Lipson, and J. B. Pollack. Generative representations for the automated design of modular physical robots. IEEE transactions on Robotics and Automation, 19(4):703–719, 2003.
  • Matthews et al. [2023] D. Matthews, A. Spielberg, D. Rus, S. Kriegman, and J. Bongard. Efficient automatic design of robots. Proceedings of the National Academy of Sciences, 120(41):e2305180120, 2023.
  • Wang et al. [2019] T. Wang, Y. Zhou, S. Fidler, and J. Ba. Neural graph evolution: Towards efficient automatic robot design. arXiv preprint arXiv:1906.05370, 2019.
  • Gehring et al. [2013] C. Gehring, S. Coros, M. Hutter, M. Bloesch, M. A. Hoepflinger, and R. Siegwart. Control of dynamic gaits for a quadrupedal robot. In 2013 IEEE international conference on Robotics and automation, pages 3287–3292. IEEE, 2013.
  • Carpentier and Wieber [2021] J. Carpentier and P.-B. Wieber. Recent progress in legged robots locomotion control. Current Robotics Reports, 2(3):231–238, 2021.
  • Ding et al. [2021] Y. Ding, A. Pandala, C. Li, Y.-H. Shin, and H.-W. Park. Representation-free model predictive control for dynamic motions in quadrupeds. IEEE Transactions on Robotics, 37(4):1154–1171, 2021.
  • Bledt et al. [2018] G. Bledt, M. J. Powell, B. Katz, J. Di Carlo, P. M. Wensing, and S. Kim. Mit cheetah 3: Design and control of a robust, dynamic quadruped robot. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2245–2252. IEEE, 2018.
  • Farshidian et al. [2017] F. Farshidian, M. Neunert, A. W. Winkler, G. Rey, and J. Buchli. An efficient optimal planning and control framework for quadrupedal locomotion. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 93–100, 2017. doi:10.1109/ICRA.2017.7989016.
  • Wang et al. [2012] S. Wang, W. Chaovalitwongse, and R. Babuska. Machine learning algorithms in bipedal robot control. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(5):728–743, 2012.
  • Lee et al. [2020] J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter. Learning quadrupedal locomotion over challenging terrain. Science robotics, 5(47):eabc5986, 2020.
  • Tsounis et al. [2020] V. Tsounis, M. Alge, J. Lee, F. Farshidian, and M. Hutter. Deepgait: Planning and control of quadrupedal gaits using deep reinforcement learning. IEEE Robotics and Automation Letters, 5(2):3699–3706, 2020. doi:10.1109/LRA.2020.2979660.
  • Ibarz et al. [2021] J. Ibarz, J. Tan, C. Finn, M. Kalakrishnan, P. Pastor, and S. Levine. How to train your robot with deep reinforcement learning: lessons we have learned. The International Journal of Robotics Research, 40(4-5):698–721, 2021.
  • Miki et al. [2022] T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter. Learning robust perceptive locomotion for quadrupedal robots in the wild. Science Robotics, 7(62):eabk2822, 2022.
  • Strgar et al. [2024] L. Strgar, D. Matthews, T. Hummer, and S. Kriegman. Evolution and learning in differentiable robots. arXiv preprint arXiv:2405.14712, 2024.
  • Zhao et al. [2020] A. Zhao, J. Xu, M. Konaković-Luković, J. Hughes, A. Spielberg, D. Rus, and W. Matusik. Robogrammar: graph grammar for terrain-optimized robot design. ACM Transactions on Graphics (TOG), 39(6):1–16, 2020.
  • Gupta et al. [2021] A. Gupta, S. Savarese, S. Ganguli, and L. Fei-Fei. Embodied intelligence via learning and evolution. Nature Communications, 12(1):5721, 2021. ISSN 2041-1723. doi:10.1038/s41467-021-25874-z. URL https://doi.org/10.1038/s41467-021-25874-z.
  • Cheney et al. [2014] N. Cheney, R. MacCurdy, J. Clune, and H. Lipson. Unshackling evolution: Evolving soft robots with multiple materials and a powerful generative encoding. SIGEVOLUTION, 7:11–23, 2014.
  • Cheney et al. [2015] N. Cheney, J. Bongard, and H. Lipson. Evolving soft robots in tight spaces. In Proceedings of the 2015 annual conference on Genetic and Evolutionary Computation, pages 935–942, 2015.
  • Bäck and Schwefel [1993] T. Bäck and H. Schwefel. An overview of evolutionary algorithms for parameter optimization. Evolutionary Computation, 1(1):1–23, 1993.
  • Thierens [1999] D. Thierens. Scalability problems of simple genetic algorithms. Evolutionary Computation, 7(4):331–352, 1999. doi:10.1162/evco.1999.7.4.331.
  • Cheney et al. [2016] N. Cheney, V. Sunspiral, J. Bongard, and H. Lipson. On the difficulty of co-optimizing morphology and control in evolved virtual creatures. Artificial Life Conference Proceedings, 13:226–233, 2016.
  • Regenwetter et al. [2022] L. Regenwetter, A. H. Nobari, and F. Ahmed. Deep generative models in engineering design: A review. Journal of Mechanical Design, 144(7):071704, 2022.
  • Akande et al. [2024] T. O. Akande, O. O. Alabi, and J. B. Oyinloye. A review of generative models for 3d vehicle wheel generation and synthesis. Journal of Computing Theories and Applications, 1(4):368–385, 2024.
  • Makatura et al. [2023] L. Makatura, M. Foshey, B. Wang, F. HähnLein, P. Ma, B. Deng, M. Tjandrasuwita, A. Spielberg, C. E. Owens, P. Y. Chen, et al. How can large language models help humans in design and manufacturing? arXiv preprint arXiv:2307.14377, 2023.
  • Feuerriegel et al. [2024] S. Feuerriegel, J. Hartmann, C. Janiesch, and P. Zschech. Generative ai. Business & Information Systems Engineering, 66(1):111–126, 2024.
  • Kazi et al. [2017] R. H. Kazi, T. Grossman, H. Cheong, A. Hashemi, and G. W. Fitzmaurice. Dreamsketch: Early stage 3d design explorations with sketching and generative design. In UIST, volume 14, pages 401–414, 2017.
  • Sanchez-Lengeling and Aspuru-Guzik [2018] B. Sanchez-Lengeling and A. Aspuru-Guzik. Inverse molecular design using machine learning: Generative models for matter engineering. Science, 361(6400):360–365, 2018.
  • Buonamici et al. [2020] F. Buonamici, M. Carfagni, R. Furferi, Y. Volpe, L. Governi, et al. Generative design: an explorative study. Computer-Aided Design and Applications, 18(1):144–155, 2020.
  • Liu et al. [2023] V. Liu, J. Vermeulen, G. Fitzmaurice, and J. Matejka. 3dall-e: Integrating text-to-image ai in 3d design workflows. ACM Transactions on Graphics, 41:1955–1977, 2023.
  • Liu et al. [2018] Z. Liu, D. Zhu, S. P. Rodrigues, K.-T. Lee, and W. Cai. Generative model for the inverse design of metasurfaces. Nano letters, 18(10):6570–6576, 2018.
  • Oh et al. [2019] S. Oh, Y. Jung, S. Kim, I. Lee, and N. Kang. Deep generative design: Integration of topology optimization and generative models. Journal of Mechanical Design, 141(11):111405, 2019.
  • Luo et al. [2022] S. Luo, Y. Su, X. Peng, S. Wang, J. Peng, and J. Ma. Antigen-specific antibody design and optimization with diffusion-based generative models for protein structures. Advances in Neural Information Processing Systems, 35:9754–9767, 2022.
  • Yang et al. [2023] L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, W. Zhang, B. Cui, and M.-H. Yang. Diffusion models: A comprehensive survey of methods and applications. ACM Computing Surveys, 56(4):1–39, 2023.
  • Wu et al. [2021] R. Wu, C. Xiao, and C. Zheng. Deepcad: A deep generative network for computer-aided design models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6772–6782, 2021.
  • Siddiqui et al. [2023] Y. Siddiqui, A. Alliegro, A. Artemov, T. Tommasi, D. Sirigatti, V. Rosov, A. Dai, and M. Nießner. Meshgpt: Generating triangle meshes with decoder-only transformers. arXiv preprint arXiv:2311.15475, 2023.
  • Nordmann et al. [2014] A. Nordmann, N. Hochgeschwender, and S. Wrede. A survey on domain-specific languages in robotics. In International conference on simulation, modeling, and programming for autonomous robots, pages 195–206. Springer, 2014.
  • Chen et al. [2021] M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, et al. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
  • Abdullah and Kamara [2013] H. Abdullah and J. Kamara. Parametric design procedures: a new approach to generative-form in the conceptual design phase. In AEI 2013: Building Solutions for Architectural Engineering, pages 334–343. 2013.
  • Hornby and Pollack [2001] G. S. Hornby and J. B. Pollack. The advantages of generative grammatical encodings for physical design. In Proceedings of the 2001 congress on evolutionary computation (ieee cat. no. 01th8546), volume 1, pages 600–607. IEEE, 2001.
  • Liu et al. [2024] J. Liu, X. Huang, T. Huang, L. Chen, Y. Hou, S. Tang, Z. Liu, W. Ouyang, W. Zuo, J. Jiang, et al. A comprehensive survey on 3d content generation. arXiv preprint arXiv:2402.01166, 2024.
  • Lin et al. [2023] C.-H. Lin, J. Gao, L. Tang, T. Takikawa, X. Zeng, X. Huang, K. Kreis, S. Fidler, M.-Y. Liu, and T.-Y. Lin. Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 300–309, 2023.
  • Poole et al. [2022] B. Poole, A. Jain, J. T. Barron, and B. Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  • Chen et al. [2019] K. Chen, C. B. Choy, M. Savva, A. X. Chang, T. Funkhouser, and S. Savarese. Text2shape: Generating shapes from natural language by learning joint embeddings. In Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part III 14, pages 100–116. Springer, 2019.
  • Chen et al. [2023] R. Chen, Y. Chen, N. Jiao, and K. Jia. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22246–22256, 2023.
  • Park and Asada [1994] J.-H. Park and H. Asada. Concurrent design optimization of mechanical structure and control for high speed robots. Journal of Dynamic Systems, Measurement, and Control, 116(3):344–356, 1994.
  • Paul and Bongard [2001] C. Paul and J. Bongard. The road less travelled: morphology in the optimization of biped robot locomotion. In Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180), volume 1, pages 226–232 vol.1, 2001. doi:10.1109/IROS.2001.973363.
  • Geijtenbeek et al. [2013] T. Geijtenbeek, M. Van De Panne, and A. F. Van Der Stappen. Flexible muscle-based locomotion for bipedal creatures. ACM Transactions on Graphics (TOG), 32(6):1–11, 2013.
  • Ha et al. [2018] S. Ha, S. Coros, A. Alspach, J. Kim, and K. Yamane. Computational co-optimization of design parameters and motion trajectories for robotic systems. The International Journal of Robotics Research, 37(13-14):1521–1536, 2018.
  • Chen et al. [2020] T. Chen, Z. He, and M. Ciocarlie. Hardware as policy: Mechanical and computational co-optimization using deep reinforcement learning. arXiv preprint arXiv:2008.04460, 2020.
  • Xu et al. [2021] J. Xu, T. Chen, L. Zlokapa, M. Foshey, W. Matusik, S. Sueda, and P. Agrawal. An End-to-End Differentiable Framework for Contact-Aware Robot Design. In Proceedings of Robotics: Science and Systems, Virtual, July 2021. doi:10.15607/RSS.2021.XVII.008.
  • Zhanpeng and Matei [2024] H. Zhanpeng and C. Matei. Morph: Design co-optimization with reinforcement learning via a differentiable hardware model proxy. IEEE Robotics and Automation, 2024.
  • Doncieux et al. [2015] S. Doncieux, N. Bredeche, J.-B. Mouret, and A. E. Eiben. Evolutionary robotics: what, why, and where to. Frontiers in Robotics and AI, 2:4, 2015.
  • Alattas et al. [2019] R. J. Alattas, S. Patel, and T. M. Sobh. Evolutionary modular robotics: Survey and analysis. Journal of Intelligent & Robotic Systems, 95:815–828, 2019.
  • Lehman et al. [2020] J. Lehman, J. Clune, D. Misevic, C. Ofria, K. O. Ellefsen, J.-B. Mouret, and A. Bernatskiy. The surprising creativity of digital evolution: A collection of anecdotes from the evolutionary computation and artificial life research communities. Artificial Life, 26:274–306, 2020.
  • [56] Meshy. URL https://www.meshy.ai/.
  • [57] Autodesk fusion 360. URL https://www.autodesk.com/products/fusion-360.
  • Kitamura [2020] T. Kitamura. Fusion2urdf. https://github.com/syuntoku14/fusion2urdf, 2020.
  • Schulman et al. [2017] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  • Rudin et al. [2022] N. Rudin, D. Hoeller, P. Reist, and M. Hutter. Learning to walk in minutes using massively parallel deep reinforcement learning. In Conference on Robot Learning, pages 91–100. PMLR, 2022.
  • Makoviichuk and Makoviychuk [2021] D. Makoviichuk and V. Makoviychuk. rl-games: A high-performance framework for reinforcement learning. https://github.com/Denys88/rl_games, May 2021.
  • Makoviychuk et al. [2021] V. Makoviychuk, L. Wawrzyniak, Y. Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, et al. Isaac gym: High performance gpu-based physics simulation for robot learning. arXiv preprint arXiv:2108.10470, 2021.

Supplementary Material

A. Single Species Optimization

We show performance optimization results for single-species robot banks, ’Bug’, ’Frog’, and ’Dog,’ and compare performance to that of the diverse bank. As in our diverse bank performance optimization experiment, we adjusted the fitness score calculation with an additional velocity reward or energy cost through 50 generations of evolution to demonstrate the effect of an input performance preference on final designs. As shown in Fig. 13, our results show a large correlation between the selected bot and the prioritized performance criteria. Fig. 12 shows the visualization of the selected robots, their average final rewards, and physical characteristics.

Refer to caption
Figure 12: (A)The morphology of the best robot in the last generation from the eight experiments with energy or velocity prioritize. Yellow color represents the robots that prioritize energy contribution and red color represents the robots that prioritize velocity contributions. (B) The unscaled reward of the eight best robots. The average length, width, height and weight of the eight robots.
Refer to caption
Figure 13: The scaled reward is used as the fitness metric in the EA loop. It consists of the original reward received by the robots during inner loop training summed with the requested priority: (A) Scaled energy contribution, (B) Scaled velocity contribution. The best-performing robots are marked with dark blue, and the rest of the top 100 robots are marked with light blue in the figure.

B. Reinforcement Learning Details

We show the detailed training parameters of the inner reinforcement learning loop in Tab. 1 , and the defination of symbols and baseline reward in Tab. 2 and Tab. 3.

Hyper-parameters Values
Dense network shape [512, 256, 128]
Dense network activation elu
Discount factor 0.99
GAE discount factor 0.95
PPO loss clip range 0.2
Entropy coefficient 0.001
Learning rate α𝛼\alphaitalic_α adaptive
Batch size 98304 (4096x24)
Mini-batch size 16384 (4096x4)
Mini epochs 5
Critic loss coefficient 2
KL-divergence threshold 0.008
Table 1: PPO hyper parameters
Base linear velocity 𝐯𝐯\mathbf{v}bold_v
Base angular velocity ω𝜔\omegaitalic_ω
Commanded base linear velocity 𝐯superscript𝐯\mathbf{v}^{*}bold_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
Commanded base angular velocity ωsuperscript𝜔\omega^{*}italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
Joint positions 𝐪𝐪\mathbf{q}bold_q
Joint velocities 𝐪˙˙𝐪\dot{\mathbf{q}}over˙ start_ARG bold_q end_ARG
Joint accelerations 𝐪¨¨𝐪\ddot{\mathbf{q}}over¨ start_ARG bold_q end_ARG
Target joint positions 𝐪superscript𝐪\mathbf{q}^{*}bold_q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
Joint torques τ𝜏\mathbf{\tau}italic_τ
Number of joints n𝑛nitalic_n
Number of feet nfsubscript𝑛𝑓n_{f}italic_n start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT
Feet air time tairsubscript𝑡𝑎𝑖𝑟t_{air}italic_t start_POSTSUBSCRIPT italic_a italic_i italic_r end_POSTSUBSCRIPT
Feet stance time tstancesubscript𝑡𝑠𝑡𝑎𝑛𝑐𝑒t_{stance}italic_t start_POSTSUBSCRIPT italic_s italic_t italic_a italic_n italic_c italic_e end_POSTSUBSCRIPT
Base gravity 𝐠bsubscript𝐠𝑏\mathbf{g}_{b}bold_g start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT
Environment time step dt𝑑𝑡dtitalic_d italic_t
Table 2: Definition of symbols.
baseline reward terms definition weight [dt][*dt][ ∗ italic_d italic_t ]
Linear velocity tracking e0.25𝐯xy𝐯xy2superscript𝑒0.25superscriptnormsubscriptsuperscript𝐯𝑥𝑦subscript𝐯𝑥𝑦2e^{-0.25{{\left\|\mathbf{v}^{*}_{xy}-\mathbf{v}_{xy}\right\|}^{2}}}italic_e start_POSTSUPERSCRIPT - 0.25 ∥ bold_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_x italic_y end_POSTSUBSCRIPT - bold_v start_POSTSUBSCRIPT italic_x italic_y end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT 1111
Angular velocity tracking e0.25ωzωz2superscript𝑒0.25superscriptnormsubscriptsuperscript𝜔𝑧subscript𝜔𝑧2e^{-0.25{{\left\|\mathbf{\omega}^{*}_{z}-\mathbf{\omega}_{z}\right\|}^{2}}}italic_e start_POSTSUPERSCRIPT - 0.25 ∥ italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT - italic_ω start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT 0.50.50.50.5
Linear velocity penalty vz2subscriptsuperscript𝑣2𝑧v^{2}_{z}italic_v start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT 44-4- 4
Angular velocity penalty ωxy2superscriptnormsubscript𝜔𝑥𝑦2\left\|\mathbf{\omega}_{xy}\right\|^{2}∥ italic_ω start_POSTSUBSCRIPT italic_x italic_y end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 0.050.05-0.05- 0.05
Joint acceleration penalty 𝐪¨2superscriptnorm¨𝐪2\left\|\ddot{\mathbf{q}}\right\|^{2}∥ over¨ start_ARG bold_q end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 57superscript57-5^{-7}- 5 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT
Joint torque penalty τ2superscriptnorm𝜏2\left\|\mathbf{\tau}\right\|^{2}∥ italic_τ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 25superscript25-2^{-5}- 2 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT
Action rate penalty 𝐚˙2superscriptnorm˙𝐚2\left\|\mathbf{\dot{a}}\right\|^{2}∥ over˙ start_ARG bold_a end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 55superscript55-5^{-5}- 5 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT
orientation 𝐠b,xy2superscriptnormsubscript𝐠𝑏𝑥𝑦2\left\|\mathbf{g}_{b,xy}\right\|^{2}∥ bold_g start_POSTSUBSCRIPT italic_b , italic_x italic_y end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT -0.5
Feet air time k=0nf(tair,k0.5)superscriptsubscript𝑘0subscript𝑛𝑓subscript𝑡𝑎𝑖𝑟𝑘0.5\sum_{k=0}^{n_{f}}(t_{air,k}-0.5)∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_t start_POSTSUBSCRIPT italic_a italic_i italic_r , italic_k end_POSTSUBSCRIPT - 0.5 ) 0.10.10.10.1
Feet stance time k=0nf(tstance,k0.5)superscriptsubscript𝑘0subscript𝑛𝑓subscript𝑡𝑠𝑡𝑎𝑛𝑐𝑒𝑘0.5\sum_{k=0}^{n_{f}}(t_{stance,k}-0.5)∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_t start_POSTSUBSCRIPT italic_s italic_t italic_a italic_n italic_c italic_e , italic_k end_POSTSUBSCRIPT - 0.5 ) 0.10.10.10.1
Table 3: Definition of baseline reward (rbaselinesubscript𝑟baseliner_{\text{baseline}}italic_r start_POSTSUBSCRIPT baseline end_POSTSUBSCRIPT) terms. The baseline reward contains base velocity and orientation tracking terms, action rate penalty, and joint torque penalty; the air time and stance time reward encourages longer air time and stand time to promote a more natural and fluid walking gait.

C. Full Hardware Specifications

We manufacture our robots using Creality CR-10 Smart Pro 3D printers. Robots are assembled using the 3D printed parts, as well as various electronic components which are easily inserted in the design. A comprehensive list of materials used in the construction of our robot is outlined in Tab. 4.

Component Model Quantity per robot
Microcontroller Raspberry Pi 4444 Model B 1111
Battery Povway 5200mALipo 3S 11.1V 50C5200mALipo3𝑆11.1𝑉50𝐶5200\ \text{mA}\ \text{Lipo}\ 3S\ 11.1V\ 50C5200 mA Lipo 3 italic_S 11.1 italic_V 50 italic_C 1111
Servo Motor Hiwonder HTD45H𝐻𝑇𝐷45𝐻HTD-45Hitalic_H italic_T italic_D - 45 italic_H High Voltage Serial Bus Servo 45KG45KG45\ \text{KG}45 KG 8888
DC to DC Power Converter DROK 10A10𝐴10A10 italic_A Synchronous Step-Down Voltage Regulator DC-DC 430V430𝑉4-30V4 - 30 italic_V to 1.230V 12V1.230𝑉12𝑉1.2-30V\ 12V1.2 - 30 italic_V 12 italic_V 1111
PLA Filament Ender PLA 3D Printer PLA Filament 1.75mm 1KG(2.2lbs)Spool PLA White1.75mm1𝐾𝐺2.2lbsSpool PLA White1.75\ \text{mm}\ 1KG\ (2.2\ \text{lbs})\ \text{Spool PLA White}1.75 mm 1 italic_K italic_G ( 2.2 lbs ) Spool PLA White 1111
Motor Controller Hiwonder TTL / USB Debugging Board 1111
Table 4: List of materials used in the construction of physical robots