Log action sequence per episode

PhilippWillms · July 9, 2024, 8:51pm

How severe does this issue affect your experience of using Ray?

Low: It annoys or frustrates me for a moment.

Dear ray community,

what is the best way to extract the sequence of actions leading to episode_max_reward?

My scenario is that I have an environment with episodes of fixed lengths. The order in which certain actions are taken is decisive for the reward. Is there any smart way to log within ray tune or train API the actions taken in an episode, so that afterwards the sequence of actions can be collected?

I am thinking of custom metrics and callbacks. A first try with on_episode_step() unfortunately failed, as last_action_for() method is not available in class EpisodeV2, which is used by PPO in ray 2.10.0.

starkj · July 14, 2024, 1:31am

If you just want the sequence in order to assign a reward in thr context of a single episode, then it can all be handled within the environment. So why not just have step() append the incoming action to a member action list, then the reward function can see it at any time.

PhilippWillms · July 15, 2024, 10:09am

As I said in original post, reward function visibility is one thing. The other thing which comes to my mind now is to improve accessibility for the best solution. Currently, I have my difficulties to access that and corresponding action sequence after training via a trainer based on PPOConfig.build().

Topic		Replies	Views
Action Masking Model: Deterministic selection of the best action RLlib	0	27	August 11, 2024
Explorative action or not? RLlib	1	268	April 26, 2022
Can ray allow access to individual episodes? RLlib	5	450	September 22, 2021
Saving episode trajectories during training RLlib	0	221	July 13, 2023
[Rllib] Store actions during training with PPOTrainer to get statistics about action-distribution over episodes RLlib	1	476	October 21, 2022

Log action sequence per episode

Related topics