Customize trainbatch data

I want to add additional data to trainbatch. For example, I want to add a “cost” term that is exactly the same as the reward term, since I want to use the cost term in my loss function. Any help would be appreciated. Thank you very much!!!

Hi @Ke_Fan, the sample batch is basically a Python dictionary with some additional features. So, you could add variables by simply adding a new key and assigning an array to it.

It very much depends on where in your code you want to do this. As if you have a MultiAgentSampleBatch (the RLlib default) then you also need to take care of the policy ids (if you have a single policy only then this is the default_policy. But feel free to take a look into the class definition.