This guide covers everything you need to know about training agents in Agent Arcade, from configuration to performance metrics.
Related Guides:
- For command reference, see CLI Reference
- For score submission, see Competition Guide
- For adding custom games, see Adding Games
total_timesteps: 1000000
learning_rate: 0.00025
buffer_size: 250000
learning_starts: 50000
batch_size: 256
exploration_fraction: 0.2
target_update_interval: 2000
frame_stack: 16- total_timesteps: Total number of environment steps for training
- learning_rate: Rate at which the model updates its parameters
- buffer_size: Size of the replay buffer for experience replay
- learning_starts: Number of steps before starting gradient updates
- batch_size: Number of samples per gradient update
- exploration_fraction: Fraction of training spent exploring
- target_update_interval: Steps between target network updates
- frame_stack: Number of frames to stack for temporal information
-
Mean Episode Reward
- Primary indicator of agent performance
- Higher values indicate better gameplay
- Compare against human benchmarks
-
Success Rate
- Percentage of episodes meeting target score
- Key metric for competition evaluation
- Used for reward multiplier calculation
-
Episode Length
- Number of steps per episode
- Indicates efficiency and survival ability
- Game-specific optimal ranges
-
Learning Curves
- Episode rewards over time
- Loss function trends
- Exploration rate decay
-
Policy Statistics
- Action distribution
- Value function estimates
- Policy entropy
-
Resource Usage
- Training FPS
- Memory utilization
- GPU/CPU usage
# Launch TensorBoard
tensorboard --logdir ./tensorboard/DQN_[game]_[timestamp]Available metrics:
- Episode rewards
- Learning rate
- Loss curves
- Training FPS
- Network gradients
# Enable recording during training
agent-arcade train pong --record
# Record evaluation episodes
agent-arcade evaluate pong models/pong/final_model.zip --record-
Frame Processing
- Normalize pixel values (0-1)
- Stack frames for temporal information
- Apply frame skipping for efficiency
-
Network Architecture
- Custom CNN feature extractor
- Dual 512-unit fully connected layers
- Apple Silicon (MPS) optimizations
-
Training Stability
- Gradient clipping
- Learning rate scheduling
- Reward scaling
-
Poor Learning
- Check learning rate
- Verify reward scaling
- Inspect network gradients
-
Unstable Performance
- Increase buffer size
- Adjust batch size
- Modify update frequency
-
Resource Constraints
- Enable frame skipping
- Reduce batch size
- Optimize replay buffer
-
Recommended Settings
total_timesteps: 500000 learning_rate: 0.0001 frame_stack: 4
-
Target Score: 21
-
Success Threshold: 15
-
Recommended Settings
total_timesteps: 1000000 learning_rate: 0.00025 frame_stack: 16
-
Target Score: 1000
-
Success Threshold: 500
-
Recommended Settings
total_timesteps: 2000000 learning_rate: 0.0001 frame_stack: 16
-
Target Score: 15000
-
Success Threshold: 10000
After training, models are organized as follows:
models/
├── pong/
│ ├── pong_final.zip # Final trained model
│ ├── checkpoints/ # Periodic checkpoints
│ │ ├── checkpoint_100000.zip
│ │ ├── checkpoint_200000.zip
│ │ └── ...
│ └── videos/ # Recorded gameplay
├── space_invaders/
│ └── ...
└── river_raid/
└── ...- Using SCP:
# Create local directory
mkdir -p ~/agent-arcade-models
# Copy all models and checkpoints
scp -r ubuntu@your-lambda-ip:~/agent-arcade/models/* ~/agent-arcade-models/- Using Rsync (Recommended):
# Copy models with progress indication
rsync -avz --progress ubuntu@your-lambda-ip:~/agent-arcade/models/ ~/agent-arcade-models/
# Copy TensorBoard logs for local analysis
rsync -avz --progress ubuntu@your-lambda-ip:~/agent-arcade/tensorboard/ ~/agent-arcade-tensorboard/After transferring, you can evaluate models locally:
# Basic evaluation
agent-arcade evaluate pong ~/agent-arcade-models/pong/pong_final.zip --episodes 10
# Evaluation with rendering
agent-arcade evaluate pong ~/agent-arcade-models/pong/pong_final.zip --episodes 10 --render
# Record evaluation videos
agent-arcade evaluate pong ~/agent-arcade-models/pong/pong_final.zip --episodes 5 --recordWhen you evaluate a model, the system automatically generates a cryptographically signed verification token. This token:
- Proves that your score was legitimately achieved through evaluation
- Contains game ID, account ID, score, and timestamp information
- Is required when submitting scores for staking or competitions
- Is stored in
~/.agent-arcade/verification_tokens/
After evaluation, you can submit your verified score:
# First run an evaluation to generate a verification token
agent-arcade evaluate pong ~/agent-arcade-models/pong/pong_final.zip --episodes 10
# Then submit your score (must be done after evaluation)
agent-arcade stake submit pong 18-
Version Control
- Keep checkpoints for different training runs
- Document hyperparameters used
- Track evaluation metrics
-
Backup Strategy
- Regular transfers from GPU instances
- Keep multiple checkpoints
- Document performance at each checkpoint
-
Performance Documentation
- Record final evaluation metrics
- Save TensorBoard logs
- Document hardware specifications used
-
Model Sharing
- Include configuration files
- Document environment versions
- Provide example evaluation scripts