NaNs in reward fields in `results` dict

Brendan_A · November 30, 2023, 10:22pm

I had this happen in a custom environment, there were a couple things that you might check.

Check that your environment is properly resetting. I had some weird reset bug in mine that just caused every episode after a certain occurrence to only last 1 step and never returned any rewards.
Check to make sure that you don’t have any values in your observation vectors that would return nan’s. Remember these algorithms just inputs and outputs - if you have something like an infinite value or a value divided by zero in your observation space then the algorithm you use will likely spit out nan. Even it it only spits out a single nan value, it’ll corrupt the rest of your metric calculations.
If your environment is like mine and provides it’s primary reward at the end of an episode but the episode length varies, you may want to turn batch_mode to "complete_episodes". I found that to help me eliminate lots of noise in training. Though, notably, this won’t help you specifically with the nan issue I don’t think.

Hope these help!

Topic		Replies	Views
No Reward Appearing for MARL Environment during Training	5	1303	April 10, 2021
Error: nan Tensors in PyTorch with Ray RLlib for MARL RLlib	12	1263	August 10, 2024
Rewards leaks to different multi agent policies in training only Configure Algorithm, Training, Evaluation, Scaling	3	184	May 31, 2024
Unexpected dramatic drop in reward RLlib	8	1015	November 13, 2023
Ray tune never done RLlib	0	213	April 5, 2023