Understanding stopping criterion

Hi, I was hoping someone could help me differentiate between “stop-timesteps”, “stop-iters” and “stop-rewards”. Specifically, what is the relationship between timesteps and iters? Eg if I increase timesteps by x, do I need to increase iters by y?
My env is multiagent. Is “stop-rewards” the mean of the two agents or the total?

One call to env.step() is one timestep. Iters is the number of batches your model will train on and the number of times your model weights will be updated (not counting minibatches).

1 Like

Also, the reward for multi-agent is the total sum (not the mean) over the agents.

1 Like