# PPO Reward Scaling

Hi All,

I have a question regarding how big should the rewards be? I currently have a reward of 1000. Then any punishments or rewards (per step and at the very end) calculated based off of that amount.

For example:

reward = 0

if self.tinyPunish:
self.tinyPunish = False
reward -= firstPlace * 0.0001
if self.smallPunish:
self.smallPunish = False
reward -= firstPlace * 0.001
if self.mediumPunish:
self.mediumPunish = False
reward -= firstPlace * 0.01
if self.strongPunish:
self.strongPunish = False
reward -= firstPlace * 0.1

…a bit further down…

if tieredUp == 10:
reward += firstPlace * 0.02
elif tieredUp == 11:
reward += firstPlace * 0.08

if self.leveledUp:

if (self.level > 4) and ((self.boardUnitCount() + 1) >= self.level):  # don't want to reward for rushing early levels as I think that's just dumb

"""
Reward for getting to level: 5: 12.5
Reward for getting to level: 6: 21.6
Reward for getting to level: 7: 34.3
Reward for getting to level: 8: 51.2
Reward for getting to level: 9: 72.9
Reward for getting to level: 10: 100.0
"""
award = firstPlace * 0.0001 * (self.level ** 3)
print(f"Awarded: {award} for leveling up with: {self.boardUnitCount()} heroes!")
reward += award
self.leveledUp = False

I was wondering if that is okay, or do I need to scale everything between 0 and 1 per step? Or make sure that rewards don’t exceed 1 per episode?

It is fine with rewards exceed 1 per episode. For reference, the Mujoco environments can have pretty large rewards, which are passed into the policy loss function.

For the PPO parameters there is this:

# PPO clip parameter.
"clip_param": 0.3,
# Clip param for the value function. Note that this is sensitive to the
# scale of the rewards. If your expected V is large, increase this.
"vf_clip_param": 100000.0

What would large constitute? And how much would I need to increase that by?