How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Problem
I want to use the info dictionary in a custom policy to help choose an action. However, I noticed that info and obs are out of sync.
I subclassed ray.rllib.examples.policy.random_policy.RandomPolicy
from here and overrode method compute_actions(self, obs_batch, *args, info_batch, **kwargs)
. There I noticed that infos in info_batch
were one ahead of obs_batch
.
To check this, I also added a callback on_postprocess_trajectory(self, *, worker, episode, agent_id, policy_id, policies, postprocessed_batch, original_batches, **kwargs)
in order to have a look at postprocessed_batch
. In there, I noticed that infos were in sync with new_obs
but one ahead of obs
(in accordance with the previous paragraph).
So my question is: is this a bug or a feature? if it’s not a bug, is it possible to use the info object during training?
Thanks in advance for your help.