Dreamerv3 default cartpole example not learning?


Just trying out different algorithms on default gyms envs. PPO works fine with cartpole, getting to 500 reward in about 15 minutes.

However, dreamver3, using the stock tuned_examples/dreamerv3/cartpole.py, it seems like it won’t learn, staying below 20 reward even after 30+ minutes of running.

Anybody with the same experience? Is the rllib implementation flawed or am I missing something?

Also as a side question, dreamerv3 uses a lot of ram. Cartpole is about 16GB, and atari_100k gets me to OOM (24GB). Are there memory requirements indicated somewhere? How much memory should I get to be able to run atari 100k? What about an XL model for the atari_200M?