Skip to content

Jettgal09/rl-traffic-control

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

27 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🚦 Reinforcement Learning Based Adaptive Traffic Signal Control System

Python PyTorch Stable-Baselines3 Gymnasium License: MIT

A research-grade reinforcement learning system for adaptive traffic signal optimization. Trains RL agents (DQN, PPO, A2C) to control traffic light phase durations at a 4-way intersection, minimizing vehicle waiting times and queue lengths compared to a fixed-timer baseline.


πŸ“‹ Table of Contents


Project Overview

Traditional traffic signals operate on fixed timers regardless of real-time traffic conditions. This project uses Deep Reinforcement Learning to train agents that observe queue lengths and signal states, then adaptively control signal phases to minimize congestion.

Key Features

  • Duration-based action space β€” agent chooses green phase duration (20/40/60/80 steps) rather than binary switch decisions
  • Gym-compatible environment β€” clean TrafficEnv interface for easy algorithm swapping
  • Three RL algorithms β€” DQN, PPO, A2C all benchmarked against fixed-timer baseline
  • Reward function iteration β€” 5 reward versions developed to solve degenerate policy problems
  • Pygame visualization β€” real-time demo with color-coded vehicles and traffic lights
  • Demo mode β€” side-by-side Fixed Timer vs RL Agent comparison

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    RL TRAINING SYSTEM                        β”‚
β”‚  train.py  β†’  Stable-Baselines3 (DQN / PPO / A2C)          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚  action (per intersection)
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  GYM ENVIRONMENT LAYER                       β”‚
β”‚  TrafficEnv.step() β†’ observation | reward | done            β”‚
β”‚  observation.py    β†’ state vector builder                    β”‚
β”‚  reward.py         β†’ configurable reward functions          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚  dt tick
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              TRAFFIC SIMULATION ENGINE                       β”‚
β”‚  RoadNetwork β†’ Intersection β†’ TrafficLight                  β”‚
β”‚  VehicleSpawner β†’ Vehicle (IDM-inspired motion)             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚  pixel state
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  VISUALIZATION LAYER                         β”‚
β”‚  PygameRenderer β†’ road grid + vehicles + HUD                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

State Space

For each intersection (10-dimensional observation vector):

[queue_N, queue_S, queue_E, queue_W,    # Stopped vehicles per direction (normalized)
 count_N, count_S, count_E, count_W,    # Total vehicles per direction (normalized)
 phase_id, phase_timer]                  # Current signal phase and duration

Action Space

Discrete(4) β€” agent selects green phase duration at start of each phase:
  0 β†’ 20 steps green  (short β€” high opposing traffic)
  1 β†’ 40 steps green  (medium)
  2 β†’ 60 steps green  (long)  
  3 β†’ 80 steps green  (very long β€” low opposing traffic)

Reward Function

reward = -(queue_penalty + 2.0 Γ— imbalance_penalty)

Penalizes both total congestion and unequal treatment of directions. Five reward versions were developed during research (V1β†’V5).


Installation

git clone https://github.com/yourusername/rl-traffic-system.git
cd rl-traffic-system
uv sync

Requirements: Python 3.12, CUDA optional (CPU training supported)


Quick Start

Run Expo Demo (Fixed Timer vs RL Agent)

uv run python demo.py

Train an Agent

uv run python -m rl.train --algo DQN --timesteps 500000
uv run python -m rl.train --algo PPO --timesteps 500000
uv run python -m rl.train --algo A2C --timesteps 500000

Evaluate and Compare All Algorithms

uv run python -m rl.evaluate --compare --episodes 10

View Learning Curves

uv run python plot_results.py

Training

Models, checkpoints, and logs are saved to experiments/<algo>_results/.

# Quick test run (1k steps)
uv run python -m rl.train --algo DQN --timesteps 1000

# Full training run
uv run python -m rl.train --algo DQN --timesteps 500000

# Heavy traffic experiment
uv run python -m rl.train --algo DQN --spawn-rate 0.10 --timesteps 500000

Evaluation

# Compare all trained algorithms vs fixed-timer baseline
uv run python -m rl.evaluate --compare --episodes 10

# Evaluate single algorithm with visualization
uv run python -m rl.evaluate --algo DQN --render --episodes 5

Results

Controller Avg Waiting Time Avg Queue Throughput vs Baseline
Fixed Timer 15,248,102 33.02 898 veh baseline
DQN 14,048,423 31.30 924 veh -7.9%
PPO 13,808,619 31.09 924 veh -9.4%
A2C 13,663,147 30.76 923 veh -10.4%

All three RL algorithms outperform the fixed-timer baseline. A2C achieved best performance: 10.4% reduction in waiting time.

Project Structure

traffic-rl-project/
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt
β”‚
β”œβ”€β”€ env/
β”‚   β”œβ”€β”€ traffic_env.py       # Gymnasium TrafficEnv (main interface)
β”‚   β”œβ”€β”€ observation.py       # State vector builders
β”‚   └── reward.py            # Reward function implementations
β”‚
β”œβ”€β”€ simulation/
β”‚   β”œβ”€β”€ vehicle.py           # Vehicle dataclass + motion model
β”‚   β”œβ”€β”€ road.py              # RoadNetwork (grid of intersections)
β”‚   β”œβ”€β”€ intersection.py      # Per-intersection queue + routing
β”‚   β”œβ”€β”€ traffic_light.py     # Phase state machine (NS/EW/Yellow/AllRed)
β”‚   └── vehicle_spawner.py   # Edge vehicle spawning
β”‚
β”œβ”€β”€ rl/
β”‚   β”œβ”€β”€ train.py             # Training entry point (CLI)
β”‚   β”œβ”€β”€ evaluate.py          # Evaluation + comparison plots
β”‚   └── algorithm_configs.py # DQN / PPO / A2C hyperparameters
β”‚
β”œβ”€β”€ visualization/
β”‚   └── pygame_renderer.py   # Real-time Pygame display
β”‚
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ config.py            # All project configuration
β”‚   └── logger.py            # TensorBoard + CSV logger
β”‚
└── experiments/
    β”œβ”€β”€ dqn_results/
    β”œβ”€β”€ ppo_results/
    └── a2c_results/

Configuration

All parameters are centralized in utils/config.py:

class SimConfig:
    GRID_SIZE = 1                  # 1 = single intersection
    SPAWN_RATE = 0.05              # Vehicle spawn probability per step
    MIN_GREEN_DURATION = 15        # Minimum steps before phase can change
    MAX_STEPS_PER_EPISODE = 3000   # Episode length

class RLConfig:
    TOTAL_TIMESTEPS = 500_000
    ALGORITHM = "DQN"
    EVAL_FREQUENCY = 10_000

Roadmap

  • Phase 1 β€” Single intersection simulation engine
  • Phase 1 β€” Gym-compatible RL environment
  • Phase 1 β€” DQN, PPO, A2C training and evaluation
  • Phase 1 β€” Pygame visualization and expo demo
  • Phase 1 β€” Reward function iteration (V1β†’V5)
  • Phase 2 β€” Multi-intersection 3Γ—3 city grid
  • Phase 3 β€” Vehicle turning logic
  • Phase 4 β€” Congestion propagation between intersections
  • Phase 5 β€” Multi-agent RL coordination

Acknowledgements

  • Stable-Baselines3 for RL algorithm implementations
  • Gymnasium for the environment interface standard
  • Research inspiration: IntelliLight (Wei et al., 2018) and CoLight (Wei et al., 2019)

About

Reinforcement Learning based adaptive traffic signal control system using Python, Pygame, PyTorch and Stable-Baselines3.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages