How severe does this issue affect your experience of using Ray?
- Low: It annoys or frustrates me for a moment.
Dear ray community,
what is the best way to extract the sequence of actions leading to episode_max_reward?
My scenario is that I have an environment with episodes of fixed lengths. The order in which certain actions are taken is decisive for the reward. Is there any smart way to log within ray tune or train API the actions taken in an episode, so that afterwards the sequence of actions can be collected?
I am thinking of custom metrics and callbacks. A first try with on_episode_step() unfortunately failed, as last_action_for() method is not available in class EpisodeV2, which is used by PPO in ray 2.10.0.