Reproduce Rainbow Results

Hi there,

I’m trying to reproduce to use Rainbow on my custom environment. Up to this point, I haven’t been able to do so. In order to try and find out what is going wrong, I’ve tried reproducing rainbow results for both vanilla space invaders and vanilla pong.

In both of these cases, the agent does not seem to train. As an example, simply running “rllib train -f pong-deterministic-rainbow.yaml” from the tuned rainbow example starts the reward at -21 and results in -20.75 by the 1 millionth timestep. As a note, I am running this without a GPU.

Even with a GPU, however, I haven’t been able to reproduce any of Deepmind’s rainbow results on vanilla Space Invaders. This is in contrast to both ddqn and dist dqn, which I’ve been able to successfully reproduce the results of using RLLib’s algorithms.

Are there any particular reasons that Rainbow doesn’t seem to be working.

Thank you for your help!

Best,
Matt