I really wonder how this can happen: num_agent_steps is sometimes less than num_(env_)steps !?
E.g., see snippet of train result’s output:
num_agent_steps_sampled: 127
num_agent_steps_trained: 127
num_steps_sampled: 128
num_steps_trained: 128
In my multi-agent use case I deploy PolicyClient
and PolicyServer
classes since my simulator is an external one. I have two policies and two agents where the agents in my env interact sequential, i.e. agents don’t act synchronous but always one at a time.
Thus, I’m astonished why num_agent_steps is sometimes less than num_(env_)steps, I would expect that they are the same. In my understanding, one call to client.get_action
means one step of the env, but then why can occur this difference in numbers?