Can ray allow access to individual episodes?

Hi everyone,

I am using PPOtrainer.train() call for training. It returns result dictionary which is corresponds to a collection of episodes in current iteration. I need to access individual episodes happening during iteration. Is there a way to pull out / save one whole episode that had shortest length?

Hi @Saurabh_Arora,

I am not exactly sure what you are asking but a place to start is by looking in the results object returned by results=trainer.train(). There will be a key in there called “hist_stats”. Maybe what you are looking for is there. The other place you can look is in the log directory for your training run if you are using tune. There will be a json file (I think result.json) . In that file you should have lists of individual rewards and actions for each iteration.

Hi @mannyv , An episode is a finite length sequence of state-action pairs ending either because of reaching an artificial limit on its length or because of reaching a terminal state.

Result has stats on the group of episodes in current iteration. It does not show individual episode themselves.
episode_reward_max=max_reward,
episode_reward_min=min_reward,
episode_reward_mean=avg_reward,
episode_len_mean=avg_length,
episode_media=dict(episode_media),
episodes_this_iter=len(new_episodes),
policy_reward_min=policy_reward_min,
policy_reward_max=policy_reward_max,
policy_reward_mean=policy_reward_mean,
custom_metrics=dict(custom_metrics),
hist_stats=dict(hist_stats),
sampler_perf=dict(perf_stats),
off_policy_estimator=dict(estimators)

I am looking for a way to access the state-action sequence in each individual episode during trainer.train(). Any idea how to do that?

@sven1977 , would you like to comment on this?

Take a look at the DefaultCallbacks class. There should be callbacks for on episode step and on episode end.

1 Like

Thanks. I will look into it.