Num_sgd_iter and evaluation_interval

Hi, I want to know that

  1. the “evaluation_interval” and “num_sgd_iter” in PPO is the same things?
    for example
    evaluation_interval = 3
    that means when i execute one train batch , it will execute evalution? beacuse 3<30?

Hi @wangjunhe8127,

No. The evaluation step happens seperately and before the training step. The sgd_num_iters specifies the number of optimizations within one training step. So there will be a total of 90 updates between each evaluation with the example settings you provided.

ok,and what about “min_train_timesteps_per_reporting” ?

the problem is:

“num_gpus”: 8,
“min_train_timesteps_per_reporting”: 30,
“min_sample_timesteps_per_reporting”: 30,
“batch_mode”: “truncate_episodes”,

“evaluation_interval”: 40,
“evaluation_duration”: 1,
“evaluation_duration_unit”: “episodes”,
“evaluation_parallel_to_training”: True,
“in_evaluation”: False,
“evaluation_config”: {
# Example: overriding env_config, exploration, etc:
“env_config”: {“train_name”:“a good day”,“use_acc”:True,“record_num_episode”:10,“reward_yaml”:" "},
“explore”: False,
“evaluation_num_workers”: 1,
“custom_eval_function”: None,
“always_attach_evaluation_results”: False,

in my env, one episode is 200 step ,and in one train, one worker will sample 21 steps, and when attach 21*40=840=4episodes, it will excute eval, but i find it will excute eval in first episode,even though i set evaluation_parallel_to_training=False.

Hi @wangjunhe8127,

With these settings during the sample phase of the training loop rllib will ask each rollout worker for new samples as many times as it needs to to generate a training batch of size 360.

You have 30 workers and a rollout_fragment_length of 21 so after the first sample there will be enough steps (30*21=630) to fill a training batch so it will train. At this point you will have completed one iteration of training and you will have 30 environments each 21 steps into the first episode.

If every episode is exactly 200 steps then it should take 10 training iterations before you finish 1 episode in each rollout worker at this point you would have finished 30 episodes total. I would have expected you would completed around 120 episodes total before the first evaluation.

Two more points.

  1. With a batch size of 360 you are not likely to make full use of 1 gpu let alone 8.

  2. With PPO of you train with explore:True then you should evaluate with it on as well. Most people report much worse performance when they switch it off during evaluation.


One other thing that might be relevant. The evaluation worker has a seperate environment from the training workers so if you are using a value in the environment to tracking the number of episodes then 1 episode would make sense because that is only counting the number of evaluation episodes.

thank you ,and i have two question:
1)you means i can use it to test?
i will test it ,but now i find it will excute val in the first episode, rather than 4th.
2)i think that if i set explore = True, the action will be stochastic?

Hi ,I have an interesting thing, when i set sample_async=False ,it will eval when train after evaluation_interval. so it is a bug?