Hi folks,
I am working on a bare-metal policy that works fine now. However, I need some clarity regarding the agent’s infos from compute_actions():
I store agent infos in this dictionary. Is the infos dictionary indeed intended to carry this information?
What do I want? I want to carry agent infos from one timestep to the next, to store it in the SampleBatch that gets written out to output in the Trainer configuration.
What have I tried? Simply outputting in compute_actions() the infos I want to store does not work (I see only the environment infos). Adding a ViewRequirement for 'infos' does not work either (I then do not get even the infos from the environment).
Any help is appreciated to bring clarity to my question.
Best
Simon
Hey @Lars_Simon_Zehnder, if you access “infos” within your compute_actions method, you should see it then. E.g., in your custom compute_actions_from_input_dict (or action_sampler_fn or action_distribution_fn):
def compute_actions_from_input_dict(...):
...
# Access the `infos` key here so it'll show up here always during
# action sampling.
infos = input_dict.get("infos") # <- RLlib will - after an initial test call - always include the "infos" in the input-dict from here on.
assert infos is not None
...
So, I did some research, and it turns out that it needs some specific settings in the Policy to make RLlib write out the infos from the agent to the output:
- It needs a
ViewRequirement with the parameter space provided (e.g. a Box space like the default space in ViewRequirements() and
-
used_for_training=Trueotherwise the variable does not get collected for theoutput`.
See PR #18111 for an example.
I hope this spares some time for others 