the problem is:
baseconfig:-----------------------------
“num_workers”:30,
“num_gpus”: 8,
“min_train_timesteps_per_reporting”: 30,
“min_sample_timesteps_per_reporting”: 30,
‘rollout_fragment_length’:21,
‘train_batch_size’:360,
“batch_mode”: “truncate_episodes”,
evalconfig-----------------------
“evaluation_interval”: 40,
“evaluation_duration”: 1,
“evaluation_duration_unit”: “episodes”,
“evaluation_parallel_to_training”: True,
“in_evaluation”: False,
“evaluation_config”: {
Example: overriding env_config, exploration, etc:
“env_config”: {“train_name”:“a good day”,“use_acc”:True,“record_num_episode”:10,“reward_yaml”:" "},
“explore”: False,
“callbacks”:MyCallbacks,
},
“evaluation_num_workers”: 1,
“custom_eval_function”: None,
“always_attach_evaluation_results”: False,
in my env, one episode is 200 step ,and in one train, one worker will sample 21 steps, and when attach 21*40=840=4episodes, it will excute eval, but i find it will excute eval in first episode,even though i set evaluation_parallel_to_training=False