PPO torch vs tf2


I was preferring to use tf2, because it is something I am more familiar with.
However, in my setup, ppo seems to run well, when I try running with tf2, every so often, things between the worker / environment stall.

I am guessing this is when the learner performs its learning.
Is there anyway to split the resource used for learning and worker (assuming that is the issue).

At the moment I have 16 CPU threads and 1 GPU core.

@NDR008 Thanks for posting. Pinging people from the RLlib team

cc: @arturn @gjoliver @kourosh @avnishn @Rohan138

Wait for me to give a more detailed update on this because I’ve observed:
PPO torch vs TF: torch is fine, tf does the behaviour I mentioned earlier.
A3C: torch is fine, tf crashes.

I have changed from a 1650 Super 4GB GPU to a 3090 24GB, but no improvement.

I’m starting to wonder if it is a tensorflow version issue. I’m on 2.6, or the way I’m configuring resources / learners / rollout workers.

@NDR008 I think you might be on to something regarding the tensforflow version issue. We always run release tests on both torch and tf for PPO and they run fine.
Here is our release requirement files:

Make sure you use these versions?