Good afternoon.
I have a custom gym Env with with a termination condition like this:
self.current_step += 1
if self.current_step >= len(self.df) - 1:
self.truncated = True
self.terminated = True
self.df - is a dataset in pd.Dataframe format
When I run ray.tune using the example to run from custom_env.py (from rllib/examples), I get output(PPO):
episode_len_mean: .nan
episode_media: {}
episode_reward_max: .nan
episode_reward_mean: .nan
episode_reward_min: .nan
episodes_this_iter: 0
episodes_total: 0
However, when I change the condition for completion to this:
if self.current_step >= 4000 - 1:
self.truncated = True
self.terminated = True
Then my error disappears and env still does the action and output is not .nan. What could be the cause of this problem and how to solve it? Dataset has more than 100K rows.