Observation and info out of sync

lorric · April 5, 2024, 11:10am

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Problem
I want to use the info dictionary in a custom policy to help choose an action. However, I noticed that info and obs are out of sync.
I subclassed ray.rllib.examples.policy.random_policy.RandomPolicy from here and overrode method compute_actions(self, obs_batch, *args, info_batch, **kwargs). There I noticed that infos in info_batch were one ahead of obs_batch.
To check this, I also added a callback on_postprocess_trajectory(self, *, worker, episode, agent_id, policy_id, policies, postprocessed_batch, original_batches, **kwargs) in order to have a look at postprocessed_batch. In there, I noticed that infos were in sync with new_obs but one ahead of obs (in accordance with the previous paragraph).
So my question is: is this a bug or a feature? if it’s not a bug, is it possible to use the info object during training?
Thanks in advance for your help.

Topic		Replies	Views
Error: Custom observation Space not treated correctly RLlib	5	1011	July 15, 2021
Offline data with self made dataset RLlib	1	259	June 7, 2023
Inconsistent actions from Algorithm.compute_single_action RLlib	3	402	June 14, 2023
Skipping some actions RLlib	2	320	May 9, 2022
Alert `Policy_Client`s when `Policy_Server` completes an epoch RLlib	2	224	March 30, 2023

Observation and info out of sync

Related topics