Understanding stopping criterion

kia · May 2, 2021, 9:04am

Hi, I was hoping someone could help me differentiate between “stop-timesteps”, “stop-iters” and “stop-rewards”. Specifically, what is the relationship between timesteps and iters? Eg if I increase timesteps by x, do I need to increase iters by y?
My env is multiagent. Is “stop-rewards” the mean of the two agents or the total?

smorad · May 3, 2021, 1:15pm

One call to env.step() is one timestep. Iters is the number of batches your model will train on and the number of times your model weights will be updated (not counting minibatches).

sven1977 · May 4, 2021, 7:18am

Also, the reward for multi-agent is the total sum (not the mean) over the agents.

Topic		Replies	Views
How to change default agent_timesteps_total in rllib_trainer.train() RLlib	3	499	June 29, 2021
'timesteps_per_iteration' parameter RLlib	1	814	July 21, 2021
What is timesteps per iteration? Configure Algorithm, Training, Evaluation, Scaling	0	248	December 12, 2023
What does the 'training_iteration' parameter relate to in the RLlib? RLlib	3	721	May 4, 2021
Limit number of steps? Ray Tune	7	1115	May 7, 2022

Understanding stopping criterion

Related topics