i don't understand. Why is episode_rewards negative when running ant_v3 with PPO?  besides,there are some parameter definitions that I don't quite understand, such as --gail。 Can anyone help explain this? thank you very much
i don't understand. Why is episode_rewards negative when running ant_v3 with PPO?

besides,there are some parameter definitions that I don't quite understand, such as --gail。
Can anyone help explain this? thank you very much