Skip to content

sushmasai1704-web/lunar-lander-ppo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LunarLander-v2 with PPO (Deep Reinforcement Learning)

Training a PPO agent to autonomously land a spacecraft using Deep Reinforcement Learning.

Environment

  • Env: LunarLander-v2 (OpenAI Gymnasium / Box2D)
  • Observation space: 8 continuous values (position, velocity, angle, leg contact)
  • Action space: 4 discrete actions (do nothing, fire left, fire main, fire right)
  • Goal: Land between the flags with minimal fuel. Score ≥ 200 = solved.

Algorithm: PPO (Proximal Policy Optimization)

PPO is a policy gradient method that clips the objective to prevent destructively large updates — making it stable and sample-efficient for continuous control tasks.

Hyperparameter Value
Learning rate 3e-4
Timesteps 500,000
Batch size 64
Gamma (discount) 0.999
GAE Lambda 0.98

Results

Agent achieves mean reward > 200 after ~300k timesteps, consistently landing successfully.

Training Curve

Setup & Run

pip install stable-baselines3==2.3.2 "gymnasium[box2d]==0.29.1"
python train.py       # Train the agent (~15-20 mins)
python evaluate.py    # Evaluate trained model
python plot_results.py  # Plot reward curve

Stack

  • Python 3.11
  • PyTorch
  • Stable Baselines3
  • Gymnasium (Box2D)
  • Matplotlib

Demo

Agent Landing

About

PPO RL agent — LunarLander-v2, reward curve analysis, GIF demo

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages