Maximum recommended reward

I use APPO, I have only the reward plots from ray. What kind of plots are you talking about?
No, I don’t use ray.tune yet, because I am struggling to make it learn.
The reward min and mean is going down instead of up, the maximum is going up and the len is not reaching the maximum of 100, because the actor choose to die instead of learning and if I increase the “game over” penalty then it chooses to do almost nothing to maintain a fixed reward.
I need the reward mean to be above 0 in order to have a good result.

I am thinking to use offline data to get it moving in the right direction, that’s why I asked this question here