Can ray allow access to individual episodes?

Saurabh_Arora · August 30, 2021, 10:26am

Hi everyone,

I am using PPOtrainer.train() call for training. It returns result dictionary which is corresponds to a collection of episodes in current iteration. I need to access individual episodes happening during iteration. Is there a way to pull out / save one whole episode that had shortest length?

mannyv · August 30, 2021, 12:12pm

Hi @Saurabh_Arora,

I am not exactly sure what you are asking but a place to start is by looking in the results object returned by results=trainer.train(). There will be a key in there called “hist_stats”. Maybe what you are looking for is there. The other place you can look is in the log directory for your training run if you are using tune. There will be a json file (I think result.json) . In that file you should have lists of individual rewards and actions for each iteration.

Saurabh_Arora · August 31, 2021, 4:03pm

Hi @mannyv , An episode is a finite length sequence of state-action pairs ending either because of reaching an artificial limit on its length or because of reaching a terminal state.

Result has stats on the group of episodes in current iteration. It does not show individual episode themselves.
episode_reward_max=max_reward,
episode_reward_min=min_reward,
episode_reward_mean=avg_reward,
episode_len_mean=avg_length,
episode_media=dict(episode_media),
episodes_this_iter=len(new_episodes),
policy_reward_min=policy_reward_min,
policy_reward_max=policy_reward_max,
policy_reward_mean=policy_reward_mean,
custom_metrics=dict(custom_metrics),
hist_stats=dict(hist_stats),
sampler_perf=dict(perf_stats),
off_policy_estimator=dict(estimators)

I am looking for a way to access the state-action sequence in each individual episode during trainer.train(). Any idea how to do that?

Saurabh_Arora · September 8, 2021, 2:08pm

@sven1977 , would you like to comment on this?

smorad · September 9, 2021, 12:31pm

Take a look at the DefaultCallbacks class. There should be callbacks for on episode step and on episode end.

Saurabh_Arora · September 22, 2021, 10:47am

Thanks. I will look into it.

Topic		Replies	Views
[Rllib] Store actions during training with PPOTrainer to get statistics about action-distribution over episodes RLlib	1	475	October 21, 2022
How to obtain single episode reward? RLlib	6	1455	March 19, 2024
RLlib callbacks to get custom metrics such as observation, reward...etc in each episode from SingleAgentEpisode and access it in the trainer RLlib	2	109	November 19, 2024
Trainer.evaluate() runs 1 extra episode instead of as defined in evaluation_duration RLlib	1	364	August 26, 2022
Unable to get 'episode_reward_mean' RLlib	3	174	January 3, 2025

Can ray allow access to individual episodes?

Related topics