Accessing info dicts in postprocessing callback

In the initial call to the loss function (during Policy setup), you should see an all-0s train_batch being passed into the loss function (including all possible SampleBatch columns).
Then, if you access some column in your loss function, RLlib will detect this and provide that column in all subsequent calls.

So I just tried this:

  • set breakpoint into PPOTorchPolicy’s loss function.
  • run the rllib/agents/ppo/tests/test_ppo.py::test_ppo_compilation_and_lr_schedule test case with ray.init(local_mode=True)
  • For the initial test call to the loss, I see “infos” in train_batch properly initialized with 0s.
  • Then, if I access this column in the loss function to tell RLlib that “infos” are needed (e.g. by printing train_batch[“infos”]), I do see this column also in all subsequent loss calls.