Implementation example of intrinsic reward in MARL


I am looking for some documentation or example showing how agents could cusotmize their reward, for instance by summing up the environment reward with an own intrisic reward.

The only examples I have found so far “only” show MARL agents using the environment reward (i.e. the reward coming from the environment’s step() function).