Accessing info dicts in postprocessing callback

sven1977 · December 22, 2020, 12:35pm

In the initial call to the loss function (during Policy setup), you should see an all-0s train_batch being passed into the loss function (including all possible SampleBatch columns).
Then, if you access some column in your loss function, RLlib will detect this and provide that column in all subsequent calls.

So I just tried this:

set breakpoint into PPOTorchPolicy’s loss function.
run the rllib/agents/ppo/tests/test_ppo.py::test_ppo_compilation_and_lr_schedule test case with ray.init(local_mode=True)
For the initial test call to the loss, I see “infos” in train_batch properly initialized with 0s.
Then, if I access this column in the loss function to tell RLlib that “infos” are needed (e.g. by printing train_batch[“infos”]), I do see this column also in all subsequent loss calls.

Topic		Replies	Views
Info dict keys and then add them as new entries of SampleBatch RLlib	5	830	November 26, 2021
In trajectory_view_API, I want to add "infos" to the model input RLlib	6	524	August 28, 2022
'infos' automatically stripped if they are accessed in mixin RLlib	2	343	February 26, 2021
'infos' in view requirement replaced with zeros in dummy batch RLlib	0	307	March 4, 2021
Setting up trajectory view correctly for repeated+non-repeated input RLlib	1	260	January 9, 2023

Accessing info dicts in postprocessing callback

Related topics