Adding custom data in training batch while sampling data from environment

I want to add custom data in the training batch while sampling data from the environment. For example, I want to add an additional reward term for each state/action that is similar to the reward term but comes from another neural network. I want to use this reward combined with the original reward from the environment to calculate the final loss.

Any help would be appreciated. Thank you very much.

Hey @dev1dze,

You may use RLlib to add an extra reward term for each condition or action in your custom environment by following some instructions. In light of the unique features of your issue domain, this enables you to modify the incentive structure and maybe enhance the performance of your reinforcement learning agent.
:point_down: :point_down:

Hi @edison ,

Thanks for your reply. I think I should be looking for the answer in *ray/rllib/env/single_agent_env_runner.py* directory since that is the place where
env.step(action) is used and once can maybe push some more metadata inside the episode? Still not entirely sure yet…

@dev1dze,

You might want to model it similar to this: