Info dict keys and then add them as new entries of SampleBatch

@klausk55 I found another topic here that deals with something similar and here is @sven1977 answer:

In the initial call to the loss function (during Policy setup), you should see an all-0s train_batch being passed into the loss function (including all possible SampleBatch columns).
Then, if you access some column in your loss function, RLlib will detect this and provide that column in all subsequent calls.
So I just tried this:

  • set breakpoint into PPOTorchPolicy’s loss function.
  • run the rllib/agents/ppo/tests/test_ppo.py::test_ppo_compilation_and_lr_schedule test case with ray.init(local_mode=True)
  • For the initial test call to the loss, I see “infos” in train_batch properly initialized with 0s.
  • Then, if I access this column in the loss function to tell RLlib that “infos” are needed (e.g. by printing train_batch[“infos”]), I do see this column also in all subsequent loss calls.

Following the @sven1977 's answer here the same should also hold for the postprocess_trajectory() function in your policy:

You will get env infos automatically in your loss or postprocessing function (if these functions need this field, i.e. access it in a test pass).