Hi folks,
I am working on a bare-metal policy that works fine now. However, I need some clarity regarding the agent’s infos
from compute_actions()
:
I store agent infos in this dictionary. Is the infos
dictionary indeed intended to carry this information?
What do I want? I want to carry agent infos from one timestep to the next, to store it in the SampleBatch
that gets written out to output
in the Trainer
configuration.
What have I tried? Simply outputting in compute_actions()
the infos I want to store does not work (I see only the environment infos
). Adding a ViewRequirement
for 'infos'
does not work either (I then do not get even the infos
from the environment).
Any help is appreciated to bring clarity to my question.
Best
Simon
1 Like
Hey @Lars_Simon_Zehnder, if you access “infos” within your compute_actions method, you should see it then. E.g., in your custom compute_actions_from_input_dict
(or action_sampler_fn
or action_distribution_fn
):
def compute_actions_from_input_dict(...):
...
# Access the `infos` key here so it'll show up here always during
# action sampling.
infos = input_dict.get("infos") # <- RLlib will - after an initial test call - always include the "infos" in the input-dict from here on.
assert infos is not None
...
1 Like
So, I did some research, and it turns out that it needs some specific settings in the Policy to make RLlib write out the infos
from the agent to the output
:
- It needs a
ViewRequirement
with the parameter space
provided (e.g. a Box
space like the default space
in ViewRequirements()
and
-
used_for_training=
Trueotherwise the variable does not get collected for the
output`.
See PR #18111 for an example.
I hope this spares some time for others