Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Reinforcement Learning

Godot RL Agents Integration for Autonomous Driving

Author: Mikayla Lee

Overview

The Godot RL Agents integration enables reinforcement learning agents to train autonomous driving policies on the AER simulator. This system bridges the Godot physics simulator with RL training libraries, allowing any standard AI RL agents to learn driving behaviors in a virtual environment.

The integration serves as the interface layer between:

  • Godot car physics simulator
  • Stable-Baselines3 PPO training algorithms
  • Future autonomous driving control systems

System Architecture

The RL training pipeline consists of:

  1. GodotEnv Wrapper - Connects RL agents to the Godot simulator
  2. Action Space - Defines steering and throttle controls (2D continuous space)
  3. Observation Space - Captures car telemetry (position, velocity, speed)
  4. PPO Training Model - Implements Proximal Policy Optimization for policy learning
  5. Reset/Step Functions - Handles episode management and action execution

The system uses the Godot RL Agents library to establish direct communication between the Python training environment and the Godot game engine, eliminating the need for manual WebSocket implementation.

Implementation Details

Environment Configuration

class GodotCarEnv(GodotEnv):
    def __init__(self, env_path=None, port=11008, show_window=True, seed=0):
        super().__init__(
            env_path=env_path,
            port=port,
            show_window=show_window,
            seed=seed
        )

The environment wrapper inherits from GodotEnv and manages:

  • Connection to Godot executable on port 11008
  • Visual rendering control for training monitoring

Action Space

Actions are represented as a 2D continuous vector [steering, throttle]:

ControlRangeMapping
Steering-1.0 to 1.0-20° to +20°
Throttle-1.0 to 1.0Full brake to full gas

Observation Space

The agent receives 7D telemetry data per timestep:

  • Position (x, y, z) - 3D coordinates in world space
  • Velocity (vx, vy, vz) - 3D velocity vector
  • Speed - Scalar magnitude of velocity

Training Pipeline

The system uses Proximal Policy Optimization (PPO) with the following arameters:

ParameterValuePurpose
Learning Rate3e-4Gradient descent step size
Steps per Update2048Experience collection before update
Batch Size64Mini-batch size for optimization
Epochs10Training iterations per update
Gamma0.99Discount factor for future rewards

Integration Workflow

The RL training loop operates as follows:

  1. Start:
    reset() resets car to starting position → returns initial observation

  2. Action Execution:
    Agent predicts action based on current observation → step(action) sends controls to Godot

  3. Simulation Update:
    Godot applies physics, updates car state → returns new observation and reward

  4. Learning:
    PPO updates policy based on collected experience → repeat until convergence

This cycle runs for 100,000 timesteps during initial training, with the trained model saved for deployment.

Current Status & Next Steps

Completed:

  • Researched and integrated Godot RL Agents library
  • Implemented GodotEnv wrapper with action/observation spaces
  • Configured PPO training pipeline with Stable-Baselines3
  • Established reset/step function architecture

In Progress:

  • Creating basic test simulator in Godot to validate integration
  • Working with other daq members to fully integrate RL wrapper with car physics simulation

Future Work:

  • Implement reward shaping for racing-specific behaviors (staying on track, lap time optimization)
  • Add termination conditions (crash detection, lap completion)

Dependencies

pip install godot-rl stable-baselines3

References