I really wonder how this can happen: num_agent_steps is sometimes less than num_(env_)steps !?
E.g., see snippet of train result’s output:
In my multi-agent use case I deploy
PolicyServer classes since my simulator is an external one. I have two policies and two agents where the agents in my env interact sequential, i.e. agents don’t act synchronous but always one at a time.
Thus, I’m astonished why num_agent_steps is sometimes less than num_(env_)steps, I would expect that they are the same. In my understanding, one call to
client.get_action means one step of the env, but then why can occur this difference in numbers?