Info dict keys and then add them as new entries of SampleBatch

Lars_Simon_Zehnder · November 26, 2021, 12:19pm

@klausk55 I found another topic here that deals with something similar and here is @sven1977 answer:

In the initial call to the loss function (during Policy setup), you should see an all-0s train_batch being passed into the loss function (including all possible SampleBatch columns).
Then, if you access some column in your loss function, RLlib will detect this and provide that column in all subsequent calls.
So I just tried this:

set breakpoint into PPOTorchPolicy’s loss function.

run the rllib/agents/ppo/tests/test_ppo.py::test_ppo_compilation_and_lr_schedule test case with ray.init(local_mode=True)

For the initial test call to the loss, I see “infos” in train_batch properly initialized with 0s.

Then, if I access this column in the loss function to tell RLlib that “infos” are needed (e.g. by printing train_batch[“infos”]), I do see this column also in all subsequent loss calls.

Following the @sven1977 's answer here the same should also hold for the postprocess_trajectory() function in your policy:

You will get env infos automatically in your loss or postprocessing function (if these functions need this field, i.e. access it in a test pass).

Topic		Replies	Views
Accessing info dicts in postprocessing callback RLlib	10	1413	January 11, 2021
How to store env info as a SampleBatch? RLlib	7	380	June 22, 2023
Customize trainbatch data RLlib	1	299	February 12, 2023
Setting global info state in Multi-Agent step function RLlib	0	222	December 9, 2020
'infos' in view requirement replaced with zeros in dummy batch RLlib	0	307	March 4, 2021

Info dict keys and then add them as new entries of SampleBatch

Related topics