NaNs in reward fields in `results` dict

I had this happen in a custom environment, there were a couple things that you might check.

  1. Check that your environment is properly resetting. I had some weird reset bug in mine that just caused every episode after a certain occurrence to only last 1 step and never returned any rewards.
  2. Check to make sure that you don’t have any values in your observation vectors that would return nan’s. Remember these algorithms just inputs and outputs - if you have something like an infinite value or a value divided by zero in your observation space then the algorithm you use will likely spit out nan. Even it it only spits out a single nan value, it’ll corrupt the rest of your metric calculations.
  3. If your environment is like mine and provides it’s primary reward at the end of an episode but the episode length varies, you may want to turn batch_mode to "complete_episodes". I found that to help me eliminate lots of noise in training. Though, notably, this won’t help you specifically with the nan issue I don’t think.

Hope these help!

1 Like