I execute the PPO training I see the callback method “on_episode_start” called once,
I don’t see the on_episode_end called at all.
When is “on_episode_start” and “on_episode_end” called?
Do they depend on algo config parameter combination? e.g. count_steps_by=“env_steps” vs count_steps_by=“agent_steps”
for i in range(env_cfg['max_episode_steps']):
print(f"Episodes:{i}")
result = algo.train()
checkpoint_dir = algo.save()
algo.stop()