Hi @arturn
You are right in terms of the tf2 error originating outside of rllib. There seems to be some debate on the web on how to solve it (interested users may have a look here) and there appears to be some updates to the Tensorflow installation page as well.
Downgrading to Tensorflow version 2.10 solves the issue though - at least for now.
The reason for me using the Impala algo is that it is asynchronous. My custom environment suffers from that some state transitions are more computational expensive than others and thus slower. Additionally, my cluster setup comprises different types of CPUs with different speeds. Hence, if I was using a synchronous algo like PPO I would have rollout workers sitting idle waiting for the slower ones to finish. As my custom environment is rather slow anyway this would not be very efficient.
Normally, I run with 28 workers with 4 envs per worker.
I think the “one worker” issue originates from when I started to investigate this issue when I tried to migrate from Ray 2.1.0 to 2.2.0 and communicated with @sven1977 on this (more details here). Again, the episode_reward_mean etc kept being just .nan and I tried various rollout worker configurations including using only one. In this process I also managed to “harass” the Impala algo enough to get the learner queue empty error. But that is not whats happening here. Nevertheless, I still think these issues are related as my custom environment runs fine in Ray 2.1.0.
I have just been running the reproduction code that I provided a link to above after downgrading Tensorflow to 2.10 in Ray 2.3.1. It runs fine but episode_reward_mean etc. should start to show around 12k time steps but again they don’t - although the algo resets the environments after termination/truncation which should indicate that the episodes ended. Moreover, Info[“learner”] and info[“learner_queue”] etc appears to me to be running as expected. See sample output below:
So something is clearly going on and learning appears to take place but the RL relevant metrics are not provided.