Hi,
Just trying out different algorithms on default gyms envs. PPO works fine with cartpole, getting to 500 reward in about 15 minutes.
However, dreamver3, using the stock tuned_examples/dreamerv3/cartpole.py
, it seems like it won’t learn, staying below 20 reward even after 30+ minutes of running.
Anybody with the same experience? Is the rllib implementation flawed or am I missing something?
Also as a side question, dreamerv3 uses a lot of ram. Cartpole is about 16GB, and atari_100k gets me to OOM (24GB). Are there memory requirements indicated somewhere? How much memory should I get to be able to run atari 100k? What about an XL model for the atari_200M?
Thanks!