A research-grade reinforcement learning system for adaptive traffic signal optimization. Trains RL agents (DQN, PPO, A2C) to control traffic light phase durations at a 4-way intersection, minimizing vehicle waiting times and queue lengths compared to a fixed-timer baseline.
- Project Overview
- Architecture
- Installation
- Quick Start
- Training
- Evaluation
- Results
- Project Structure
- Configuration
- Roadmap
Traditional traffic signals operate on fixed timers regardless of real-time traffic conditions. This project uses Deep Reinforcement Learning to train agents that observe queue lengths and signal states, then adaptively control signal phases to minimize congestion.
- Duration-based action space β agent chooses green phase duration (20/40/60/80 steps) rather than binary switch decisions
- Gym-compatible environment β clean TrafficEnv interface for easy algorithm swapping
- Three RL algorithms β DQN, PPO, A2C all benchmarked against fixed-timer baseline
- Reward function iteration β 5 reward versions developed to solve degenerate policy problems
- Pygame visualization β real-time demo with color-coded vehicles and traffic lights
- Demo mode β side-by-side Fixed Timer vs RL Agent comparison
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RL TRAINING SYSTEM β
β train.py β Stable-Baselines3 (DQN / PPO / A2C) β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β action (per intersection)
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GYM ENVIRONMENT LAYER β
β TrafficEnv.step() β observation | reward | done β
β observation.py β state vector builder β
β reward.py β configurable reward functions β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β dt tick
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TRAFFIC SIMULATION ENGINE β
β RoadNetwork β Intersection β TrafficLight β
β VehicleSpawner β Vehicle (IDM-inspired motion) β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β pixel state
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β VISUALIZATION LAYER β
β PygameRenderer β road grid + vehicles + HUD β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
For each intersection (10-dimensional observation vector):
[queue_N, queue_S, queue_E, queue_W, # Stopped vehicles per direction (normalized)
count_N, count_S, count_E, count_W, # Total vehicles per direction (normalized)
phase_id, phase_timer] # Current signal phase and duration
Discrete(4) β agent selects green phase duration at start of each phase:
0 β 20 steps green (short β high opposing traffic)
1 β 40 steps green (medium)
2 β 60 steps green (long)
3 β 80 steps green (very long β low opposing traffic)
reward = -(queue_penalty + 2.0 Γ imbalance_penalty)
Penalizes both total congestion and unequal treatment of directions. Five reward versions were developed during research (V1βV5).
git clone https://github.com/yourusername/rl-traffic-system.git
cd rl-traffic-system
uv syncRequirements: Python 3.12, CUDA optional (CPU training supported)
uv run python demo.pyuv run python -m rl.train --algo DQN --timesteps 500000
uv run python -m rl.train --algo PPO --timesteps 500000
uv run python -m rl.train --algo A2C --timesteps 500000uv run python -m rl.evaluate --compare --episodes 10uv run python plot_results.pyModels, checkpoints, and logs are saved to experiments/<algo>_results/.
# Quick test run (1k steps)
uv run python -m rl.train --algo DQN --timesteps 1000
# Full training run
uv run python -m rl.train --algo DQN --timesteps 500000
# Heavy traffic experiment
uv run python -m rl.train --algo DQN --spawn-rate 0.10 --timesteps 500000# Compare all trained algorithms vs fixed-timer baseline
uv run python -m rl.evaluate --compare --episodes 10
# Evaluate single algorithm with visualization
uv run python -m rl.evaluate --algo DQN --render --episodes 5| Controller | Avg Waiting Time | Avg Queue | Throughput | vs Baseline |
|---|---|---|---|---|
| Fixed Timer | 15,248,102 | 33.02 | 898 veh | baseline |
| DQN | 14,048,423 | 31.30 | 924 veh | -7.9% |
| PPO | 13,808,619 | 31.09 | 924 veh | -9.4% |
| A2C | 13,663,147 | 30.76 | 923 veh | -10.4% |
All three RL algorithms outperform the fixed-timer baseline. A2C achieved best performance: 10.4% reduction in waiting time.
traffic-rl-project/
βββ README.md
βββ requirements.txt
β
βββ env/
β βββ traffic_env.py # Gymnasium TrafficEnv (main interface)
β βββ observation.py # State vector builders
β βββ reward.py # Reward function implementations
β
βββ simulation/
β βββ vehicle.py # Vehicle dataclass + motion model
β βββ road.py # RoadNetwork (grid of intersections)
β βββ intersection.py # Per-intersection queue + routing
β βββ traffic_light.py # Phase state machine (NS/EW/Yellow/AllRed)
β βββ vehicle_spawner.py # Edge vehicle spawning
β
βββ rl/
β βββ train.py # Training entry point (CLI)
β βββ evaluate.py # Evaluation + comparison plots
β βββ algorithm_configs.py # DQN / PPO / A2C hyperparameters
β
βββ visualization/
β βββ pygame_renderer.py # Real-time Pygame display
β
βββ utils/
β βββ config.py # All project configuration
β βββ logger.py # TensorBoard + CSV logger
β
βββ experiments/
βββ dqn_results/
βββ ppo_results/
βββ a2c_results/
All parameters are centralized in utils/config.py:
class SimConfig:
GRID_SIZE = 1 # 1 = single intersection
SPAWN_RATE = 0.05 # Vehicle spawn probability per step
MIN_GREEN_DURATION = 15 # Minimum steps before phase can change
MAX_STEPS_PER_EPISODE = 3000 # Episode length
class RLConfig:
TOTAL_TIMESTEPS = 500_000
ALGORITHM = "DQN"
EVAL_FREQUENCY = 10_000- Phase 1 β Single intersection simulation engine
- Phase 1 β Gym-compatible RL environment
- Phase 1 β DQN, PPO, A2C training and evaluation
- Phase 1 β Pygame visualization and expo demo
- Phase 1 β Reward function iteration (V1βV5)
- Phase 2 β Multi-intersection 3Γ3 city grid
- Phase 3 β Vehicle turning logic
- Phase 4 β Congestion propagation between intersections
- Phase 5 β Multi-agent RL coordination
- Stable-Baselines3 for RL algorithm implementations
- Gymnasium for the environment interface standard
- Research inspiration: IntelliLight (Wei et al., 2018) and CoLight (Wei et al., 2019)