Log action sequence per episode

How severe does this issue affect your experience of using Ray?

  • Low: It annoys or frustrates me for a moment.

Dear ray community,

what is the best way to extract the sequence of actions leading to episode_max_reward?

My scenario is that I have an environment with episodes of fixed lengths. The order in which certain actions are taken is decisive for the reward. Is there any smart way to log within ray tune or train API the actions taken in an episode, so that afterwards the sequence of actions can be collected?

I am thinking of custom metrics and callbacks. A first try with on_episode_step() unfortunately failed, as last_action_for() method is not available in class EpisodeV2, which is used by PPO in ray 2.10.0.

If you just want the sequence in order to assign a reward in thr context of a single episode, then it can all be handled within the environment. So why not just have step() append the incoming action to a member action list, then the reward function can see it at any time.