Hello,
I am looking for some documentation or example showing how agents could cusotmize their reward, for instance by summing up the environment reward with an own intrisic reward.
The only examples I have found so far “only” show MARL agents using the environment reward (i.e. the reward coming from the environment’s step() function).
Thanks.