Reinforcement Learning
Godot RL Agents Integration for Autonomous Driving
Author: Mikayla Lee
Overview
The Godot RL Agents integration enables reinforcement learning agents to train autonomous driving policies on the AER simulator. This system bridges the Godot physics simulator with RL training libraries, allowing any standard AI RL agents to learn driving behaviors in a virtual environment.
The integration serves as the interface layer between:
- Godot car physics simulator
- Stable-Baselines3 PPO training algorithms
- Future autonomous driving control systems
System Architecture
The RL training pipeline consists of:
- GodotEnv Wrapper - Connects RL agents to the Godot simulator
- Action Space - Defines steering and throttle controls (2D continuous space)
- Observation Space - Captures car telemetry (position, velocity, speed)
- PPO Training Model - Implements Proximal Policy Optimization for policy learning
- Reset/Step Functions - Handles episode management and action execution
The system uses the Godot RL Agents library to establish direct communication between the Python training environment and the Godot game engine, eliminating the need for manual WebSocket implementation.
Implementation Details
Environment Configuration
class GodotCarEnv(GodotEnv):
def __init__(self, env_path=None, port=11008, show_window=True, seed=0):
super().__init__(
env_path=env_path,
port=port,
show_window=show_window,
seed=seed
)
The environment wrapper inherits from GodotEnv and manages:
- Connection to Godot executable on port 11008
- Visual rendering control for training monitoring
Action Space
Actions are represented as a 2D continuous vector [steering, throttle]:
| Control | Range | Mapping |
|---|---|---|
| Steering | -1.0 to 1.0 | -20° to +20° |
| Throttle | -1.0 to 1.0 | Full brake to full gas |
Observation Space
The agent receives 7D telemetry data per timestep:
- Position (x, y, z) - 3D coordinates in world space
- Velocity (vx, vy, vz) - 3D velocity vector
- Speed - Scalar magnitude of velocity
Training Pipeline
The system uses Proximal Policy Optimization (PPO) with the following arameters:
| Parameter | Value | Purpose |
|---|---|---|
| Learning Rate | 3e-4 | Gradient descent step size |
| Steps per Update | 2048 | Experience collection before update |
| Batch Size | 64 | Mini-batch size for optimization |
| Epochs | 10 | Training iterations per update |
| Gamma | 0.99 | Discount factor for future rewards |
Integration Workflow
The RL training loop operates as follows:
-
Start:
reset()resets car to starting position → returns initial observation -
Action Execution:
Agent predicts action based on current observation →step(action)sends controls to Godot -
Simulation Update:
Godot applies physics, updates car state → returns new observation and reward -
Learning:
PPO updates policy based on collected experience → repeat until convergence
This cycle runs for 100,000 timesteps during initial training, with the trained model saved for deployment.
Current Status & Next Steps
Completed:
- Researched and integrated Godot RL Agents library
- Implemented GodotEnv wrapper with action/observation spaces
- Configured PPO training pipeline with Stable-Baselines3
- Established reset/step function architecture
In Progress:
- Creating basic test simulator in Godot to validate integration
- Working with other daq members to fully integrate RL wrapper with car physics simulation
Future Work:
- Implement reward shaping for racing-specific behaviors (staying on track, lap time optimization)
- Add termination conditions (crash detection, lap completion)
Dependencies
pip install godot-rl stable-baselines3