Ray RLLIB PPO does not solve very simple problem

Solution was to use the change in reward as the reward and not the distance directly