I tried to run mario_a2c.py, mario_ppo.py and mario_curio.py but for non of them I cannot improve the reward.
Did you use the same hyper-parameters as in the files to conduct the evaluation? (i.e. number of workers, learning rate)
Which version of the libraries did you use ?
For instance, A2C without ICM: (after 3M time-steps)

I tried to run mario_a2c.py, mario_ppo.py and mario_curio.py but for non of them I cannot improve the reward.
Did you use the same hyper-parameters as in the files to conduct the evaluation? (i.e. number of workers, learning rate)
Which version of the libraries did you use ?
For instance, A2C without ICM: (after 3M time-steps)