Num_sgd_iter and evaluation_interval

wangjunhe8127 · November 23, 2022, 12:40am

Hi, I want to know that

the “evaluation_interval” and “num_sgd_iter” in PPO is the same things?
for example
num_sgd_iter=30
evaluation_interval = 3
that means when i execute one train batch , it will execute evalution? beacuse 3<30?

mannyv · November 23, 2022, 12:44am

No. The evaluation step happens seperately and before the training step. The sgd_num_iters specifies the number of optimizations within one training step. So there will be a total of 90 updates between each evaluation with the example settings you provided.

wangjunhe8127 · November 23, 2022, 1:09am

ok,and what about “min_train_timesteps_per_reporting” ?

wangjunhe8127 · November 23, 2022, 1:36am

the problem is:

baseconfig:-----------------------------
“num_workers”:30,
“num_gpus”: 8,
“min_train_timesteps_per_reporting”: 30,
“min_sample_timesteps_per_reporting”: 30,
‘rollout_fragment_length’:21,
‘train_batch_size’:360,
“batch_mode”: “truncate_episodes”,

evalconfig-----------------------
“evaluation_interval”: 40,
“evaluation_duration”: 1,
“evaluation_duration_unit”: “episodes”,
“evaluation_parallel_to_training”: True,
“in_evaluation”: False,
“evaluation_config”: {
# Example: overriding env_config, exploration, etc:
“env_config”: {“train_name”:“a good day”,“use_acc”:True,“record_num_episode”:10,“reward_yaml”:" "},
“explore”: False,
“callbacks”:MyCallbacks,
},
“evaluation_num_workers”: 1,
“custom_eval_function”: None,
“always_attach_evaluation_results”: False,

in my env, one episode is 200 step ,and in one train, one worker will sample 21 steps, and when attach 21*40=840=4episodes, it will excute eval, but i find it will excute eval in first episode,even though i set evaluation_parallel_to_training=False.

mannyv · November 23, 2022, 2:29am

Hi @wangjunhe8127,

With these settings during the sample phase of the training loop rllib will ask each rollout worker for new samples as many times as it needs to to generate a training batch of size 360.

You have 30 workers and a rollout_fragment_length of 21 so after the first sample there will be enough steps (30*21=630) to fill a training batch so it will train. At this point you will have completed one iteration of training and you will have 30 environments each 21 steps into the first episode.

If every episode is exactly 200 steps then it should take 10 training iterations before you finish 1 episode in each rollout worker at this point you would have finished 30 episodes total. I would have expected you would completed around 120 episodes total before the first evaluation.

Two more points.

With a batch size of 360 you are not likely to make full use of 1 gpu let alone 8.
With PPO of you train with explore:True then you should evaluate with it on as well. Most people report much worse performance when they switch it off during evaluation.

mannyv · November 23, 2022, 2:44am

@wangjunhe8127,

One other thing that might be relevant. The evaluation worker has a seperate environment from the training workers so if you are using a value in the environment to tracking the number of episodes then 1 episode would make sense because that is only counting the number of evaluation episodes.

wangjunhe8127 · November 23, 2022, 4:02am

thank you ,and i have two question:
1）you means i can use it to test?
i will test it ,but now i find it will excute val in the first episode, rather than 4th.
2）i think that if i set explore = True, the action will be stochastic?

wangjunhe8127 · November 24, 2022, 2:24am

Hi ,I have an interesting thing, when i set sample_async=False ,it will eval when train after evaluation_interval. so it is a bug?

Topic		Replies	Views
Confusing behavior in PPO training loop (train_batch_size, sgd_minibatch_size, num_sgd_iter) RLlib	1	535	July 27, 2022
Evaluation_interval not work Ray Tune stopping condition & comparisons	2	421	November 30, 2022
"Iteration always 1" challenge Ray Tune	3	578	April 25, 2023
How to train better Configure Algorithm, Training, Evaluation, Scaling	0	122	March 29, 2024
[RLlib] Ray RLlib config parameters for PPO RLlib	8	7537	April 28, 2021

Num_sgd_iter and evaluation_interval

Related topics