I had this happen in a custom environment, there were a couple things that you might check.
- Check that your environment is properly resetting. I had some weird reset bug in mine that just caused every episode after a certain occurrence to only last 1 step and never returned any rewards.
- Check to make sure that you don’t have any values in your observation vectors that would return nan’s. Remember these algorithms just inputs and outputs - if you have something like an infinite value or a value divided by zero in your observation space then the algorithm you use will likely spit out
nan. Even it it only spits out a singlenanvalue, it’ll corrupt the rest of your metric calculations. - If your environment is like mine and provides it’s primary reward at the end of an episode but the episode length varies, you may want to turn
batch_modeto"complete_episodes". I found that to help me eliminate lots of noise in training. Though, notably, this won’t help you specifically with thenanissue I don’t think.
Hope these help!